Descoberta de conhecimento em artigos digitais em ciências biomédicas

Carlos Henrique Marcondes; Leonardo Cruz da Costa; Sergio de Castro Martins

doi:10.5433/1981-8920.2016v21n2p170

Autores

Carlos Henrique Marcondes Universidade Federal Fluminense
Leonardo Cruz da Costa Universidade Federal Fluminense.
Sergio de Castro Martins Universidade Federal Fluminense.

DOI:

https://doi.org/10.5433/1981-8920.2016v21n2p170

Palavras-chave:

Descoberta de Conhecimento na Literatura, Mineração de Textos, Publicações Estendidas, Publicações Semânticas, Ciências Biomédicas

Resumo

Introdução: A emergência da Web Semântica vem impactando o ambiente de publicações digitais científicas. O crescimento da literatura biomédica em formato digital suscita a questão de que novas descobertas podem ter origem não só nos laboratórios, mas também nas bases de dados bibliográficas e factuais. O formato textual linear dos artigos científicos, voltado para leitura humana, é inadequado ao tratamento por computadores. Objetivos: Contextualizar o problema de pesquisa de modelos semântico para artigos digitais em ciências biomédicas, identificando áreas correlatas e produzindo uma revisão da literatura. Metodologia: A pesquisa é descritiva e exploratória, centrada no problema da de pesquisa, procurando identificar interfaces entre as áreas revistas; a abordagem é qualitativa, os métodos foram pesquisa bibliográfica e documentaria. Resultados: apresentação do estado da arte e principais questões em cada subárea, mineração de textos biomédicos e publicações estendidas/modelos semânticos de publicações; são áreas complementares mas não integradas. Conclusões: Estamos em transição de um modelo de publicação centrado no artigo textual linear para um modelo em que o artigo digital, em um novo formato, é um dos elos de uma rede de recursos interligados e acessíveis simultaneamente, processáveis por programas, tirando partido das tecnologias da Web Semântica.

Downloads

Não há dados estatísticos.

Biografia do Autor

Carlos Henrique Marcondes, Universidade Federal Fluminense

Professor do Programa de Pós Graduação em Ciência da Informação da Universidade Federal Fluminense.

Leonardo Cruz da Costa, Universidade Federal Fluminense.

Professor do Programa de Pós Graduação em Ciência da Informação da Universidade Federal Fluminense.

Sergio de Castro Martins, Universidade Federal Fluminense.

Doutorando do Programa de Pós Graduação em Ciência da Informação da Universidade Federal Fluminense.

Referências

ABOUT JDS. Journal of Data Science. Disponível em: http://www.jdsonline.com/about>. Acesso em: 2 maio 2015.
AGARWAL, Ritu; DHAR, Vasant. Big Data, Data Science, and Analytics: the opportunity and challenge for IS research. Information Systems Research, Providence, v. 25, n. 3, p. 443-448, Sep. 2014.
AGICHTEIN, Eugene; GRAVANO, Luis. Snowball: extracting relations from large plain-text collections. In: ACM INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES, 5., 2000, New York. Proceedingsâ€¦ New York, 2000. Disponível em: http://www.cs.columbia.edu/~gravano/Papers/2000/dl>. Acesso em: 4 set. 2009.
ARANHA, Christian Nunes. Uma abordagem de pré-processamento para mineração de textos em português. 2007. Tese (Doutorado em Engenharia Elétrica) - Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, 2007.
ATTWOOD, Teresa K. et al. Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal, London, v. 424, n. 3, p. 317333, Dec. 2009.
BAZERMAN, Charles. Shaping written knowledge: the genre and activity of the experimental article in science. Madison, Wisconsin: The University of Wisconsin Press, 1988.
BELL, Daniel. O advento da sociedade pós-industrial. São Paulo, Cultrix, 1977.
BERNERS-Lee, T.; HENDLER, J; LASSILA, O. 2001. "The semantic web". Scientific American, v. 284, n. 5, 2001.
BIOINFER: bio information extraction resource. 2006. Disponível em: http://mars.cs.utu.fi/BioInfer/>. Acesso em: 5 ago. 2016.
BIOMEDICAL DISCOVERY SUPPORT SYSTEM. Purpose. Disponível em: http://ibmi.mf.uni-lj.si/bitola/>. Acesso em: 12 maio 2010.
BODENREIDER, O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. IMIA Yearbook of Medical Informatics, p. 67-79, 2008.
BOGER, Zvi et al. Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering systems. Information Processing & Management, Elmsford, v. 37, n. 2, p.187-198, 2001.
BRIAN, Sergey. Extracting patterns and relations from the world wide web. In: SELECTED PAPERS FROM THE INTERNATIONAL WORKSHOP ON THE WORLD WIDE WEB AND DATABASES, 1998, Valencia. Proceedingsâ€¦ Valencia, Spain, 1998. Disponível em: http://bolek.ii.pw.edu.pl/~gawrysia/WEDT/brin.pdf>. Acesso em: 4 set. 2009.
BROOKES, Bertran. The foundations of information science. Part I. Philosophical aspects. Journal of Information Science, Cambridge, v. 2, p. 125133, 1980.
BUNGE, Mario. Philosophy of science. London: Transaction Publishers, 1998.
CAFARELLA, Michael J. et al. KnowItNow: fast, scalable information extraction from the web. In: HUMAN LANGUAGE TECHNOLOGY CONFERENCE; CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2005, Vancouver. Proceedingsâ€¦Vancouver: Association for Computational Linguistics, 2005.p. 563-570.
CARR, Leslie et al. The case for explicit knowledge in documents. In:Proceedings of the 2004 ACM symposium on Document engineering. ACM, 2004. p. 90-98.
CHEN, Bin et al. Chem2bio2rdf: A linked open data portal for chemical biology. 2010. Disponível em: https://arxiv.org/ftp/arxiv/papers/1012/1012.4759.pdf>. Acesso em: 18 abr. 2016.
CHEN, Jinxiu et al. Automatic relation extraction with model order selection and discriminative label identification. In: INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, 2., 2005, Jeju Island, Korea. Proceedingsâ€¦ Jeju Island, Korea, 2005.
CHOMSKY, N. Syntactic structures. The Hague: Mouton & Co., 1957.
COHEN, Aaron M.; HERSH, William R. A survey of current work in biomedical text mining. Briefings in Bioinformatics, London, v. 6, n. 1, p. 57-71, Mar. 2005.
COLLIER, Nigel et al. Recent advances in natural language processing for biomedical applications. International Journal of Medical Informatics, Shannon, v. 75, n. 6, p. 413-417, 2006.
COLLIER, Nigel; NOBATA, Chikashi; TSUJII, Jun-ichi. Extracting the names of genes and gene products with a hidden Markov model. In: INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS,18., 2000, Saarbrücken. Proceedingsâ€¦ Saarbrücken, 2000. v.1. COMMUNICATIONS in physics. 2001. Disponível em: http://www.science.uva.nl/projects/commmphys>. Acesso em 15 mar. 2005.
CONRAD, Jack G.; UTT, Mary H. A system for discovering relationships by feature extraction from text databases. In: ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 17., 1994, Dublin. Proceedingsâ€¦ Dublin: City University, 1994. p. 260-270.
COSTA, Leonardo Cruz. Uma proposta de processo de submissão de artigos científicos às publicações eletrônicas semânticas em ciências biomédicas. 2010. Tese (Doutorado em Ciência da Informação) - Universidade Federal Fluminense, Niterói, 2010.
DAVENPORT, Thomas. H. Big data no trabalho. Rio de Janeiro: Campus, 2014.
DE WAARD, A. et al. Modeling rhetoric in scientific publications. In: the International Conference on Multidisciplinary Information Sciences and Technologies, InSciT2006, October 2006, Merida, Spain. Proceedingsâ€¦ 2006; Merida, Spain, 2006. p. 25-28. Disponível em: http://www.instac.es/inscit2006/ papers/pdf/133.pdf>. Acesso em: 30 mar. 2007.
DE WAARD, Anita. et al. Hypotheses, evidence and relationships: the HypER approach for representing scientific knowledge claims. InINTERNATIONAL SEMANTIC WEB CONFERENCE, 8th, 2009, Washington DC. Proceedingsâ€¦ Washington DC: Springer Verlag Berlin, 2009. p. 818-832.
DINAKARPADIAN, Deendayal et al. MachineProse: an ontological framework for scientific assertions. Journal of the American Medical Informatics Association, Philadelphia, v. 13, n. 2, Mar./Apr. p. 220-232, 2006.
DRUCKER, Peter. Sociedade pós-capitalista. São Paulo: Pioneira, 1995.
ERHARDT, Ramón A-A; SCHNEIDER, Reinhard; BLASCHKE, Christian. Status of text-mining techniques applied to biomedical text. Drug Discovery Today, Oxford, v. 11, n. 7-8, Apr. 2006. Disponível em: http://www.drugdiscoverytoday.com/ echoice/jun2008/erhardt. pdf>. Acesso em: 23 jun. 2009.
EXPRESSÃO REGULAR. In: Wikipédia. Disponível em: https://pt.wikipedia.org/wiki/Express%C3%A3o_regular>. Acesso em: 4 out. 2016.
FAKUDA, K. et al. Towards information extraction: identifying protein names from biological papers. Pacific Symposium on Biocomputing, p. 707-718, 1998. Disponível em: http://www.ncbi.nlm.nih.gov/ pubmed/9697224>. Acesso em: 20 out. 2008.
FARRADANE, Jason E. L. Relational indexing. Part I. Journal of Information Science, v. 1, p. 267-276, 1980.
FELLBAUM, Christiane (Ed.). WordNet: an electronic lexical database. Cambridge: The MIT Press Cambridge, 1998.
FILATOVA, Elena; HATZIVASSILOGLOU, Vasileios. Domain-independent detection, extraction, and labeling of atomic events. In: RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING, 2003, Borovetz, Bulgaria. Proceedingsâ€¦ Borovetz, Bulgaria, 2003.
GANIZ, Murat Can; POTTENGER, William M.; JANNECK, Christopher D. Recent advances in literature based discovery. Journal of the American Society for Information Science and Technology, New York, v. 56, 2005. Disponível em: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.6842&rep=rep1& type=pdf>. Acesso em: 6 abr. 2016. 7-11 September.
GARDIN, J-C. Vers un remodelage des publications savantes: ses rapports avec sciences de l'information. In: CHAUDRION, S.; Fluhr, C. (Ed.). Filtrage et Résumé Automatique de l'Information sur les Reseaux - Actes du 3ème Colloque du Chapitre Français de l'ISKO. Paris: Université de Nanterre-Paris X, 2001.
GENOME DATABASE. Disponível em: http://www.gdb.org/>. Acesso em: 13 abr. 2009.
GILLAM, Michael et al. The healthcare singularity and the age of semantic medicine. 2009. Disponível em: https://www.cs.umd.edu/users/ben/papers/ Gillam2009healthcare.pdf>. Acesso em: 12 jan. 2016.
GINSPARG, Paul. Text in a data-centric world. In: HEY, Tony; TANSLEY, Stewart; TOLLE, Kristin. The fourth paradigm. Washington: Microsoft Research, 2009.
GORDON, Michael D.; DUMAIS, Susan. Using latent semantic indexing for literature based discovery. Journal of the Association for Information Science and Technology, United States, v. 49, n. 9, p. 674-685, 1998.
GREIFF, Warren R.; MORGAN, William T.; PONTE, Jay M. The role of variance in term weighting for probabilistic information retrieval. In: CIKM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2002, New York. Proceedings... New York, 2002.p. 252-259. GROSS, Alan G. The rethoric of science. London: Harvard Univerity Press, 1990.
GUIMARÃES, Carlos Alberto. Structured abstracts: narrative review. Acta Cirúrgica Brasileira, São Paulo, v. 21, n. 4, p. 263-268, ago. 2006. Disponível em: http://www.scielo.br/scielo.php?script=sci_arttext&pid=S010286502006000400014 >. Acesso em: 20 abr. 2009.
HACHEY, Benjamin. Towards generic relation extraction. 2009. Tese (Doctor of Philosophy) - Institute for Communicating and Collaborative Systems, School of Informatics University of Edinburgh, Edinburgh, 2009.
HASEGAWA, Takaaki; SEKINE, Satoshi; GRISHMAN, Ralph. Discovering relations among named entities from large corpora. In: ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 42., 2004, Barcelona. Proceedings... Barcelona, 2004. p. 415-422.
HEY, Tony; TANSLEY, Stewart; TOLLE, Kristin. (Ed.). Jim Gray on e-science: a transformed scientific method. In: ______. The fourth paradigm. Washington: Microsoft Research, 2009.
HOFFMANN, R.; VALENCIA, A. Implementing the iHOP concept for navigation of biomedical literature.Bioinformatics, Oxford, v. 21, n. 2, p. ii252-ii258, 2005. Disponível em: https://nar.oxfordjournals.org/content/35/suppl_2/W21.full>. Acesso em: 24 abr. 2010.
HUCKA, Michael et al. System biology markup language (SBML) level 1: structures and facilities for basic model definitions. 2003. Disponível em: http://www.sbml.org/specifications/sbml-level-1/version-2/sbml-level-1-v2.pdf>. Acesso em: 2 nov. 2005.
HUGO. Nomenclature committee. Disponível em: http://www.gene.ucl.ac.uk/nomenclature/>. Acesso em 12 maio 2010.
HUNTER, Lawrence et al. Concept recognition for extracting protein interaction relations from biomedical text. Genome Biology, London, v. 9, Suppl 2, 2008.
HUTCHINS, John. On the structure of scientific texts. In: UEA PAPERS IN LINGUISTICS, 5., 1977, Norwich. Proceedingsâ€¦ Norwich, UK: University of East Anglia,1977. p. 18-39. Disponível em: http://ourworld.compuserve.com/homepages/ wjhutchins/UEAP/L-1977.pdf>. Acesso em: 20 mar. 2006.
JENSSEN, T-K.; LÃ†GREID, A.; KOMOROWSKI, J.; HOVIG, E. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, New York, v. 28, p. 21-28, 2001.
JONES, Karen Sparck, WILLETT, Peter. Readings in information retrieval. San Francisco: Morgan Kaufmann Publishers, 1997.
KANDO, N. Text-level structure of research papers: implications for text-based information processing systems. In: ANNUAL BCS-IRSG Colloquium on IR ResearchIRSG COLLOQUIUM ON IR,19th, 1997, Aberdeen. Proceedingsâ€¦. Aberdeen, Scotland: Springer-Verlag, 1997.
KANDO, Noriko. Text structure analysis as a tool to make retrieved documents usable. In: International Workshop on Information Retrieval with Asian Language, 4th, 1999, Taipei. Proceedingsâ€¦ Taipei, Taiwan: Academia Sinica, 1999.
KEEN, P. Keynote address: relevance and rigor in information systems research. In: NISSEN, Hans-Erik; KLEIN, Heinz K.; HIRSCHHEIM, Rudy (Ed.). Information systems research: contemporary approaches and emergent traditions. North Holland: Elsevier Publishers, 1991. p. 27-49.
KINTSCH, Walter; van DIJK, Teun A. Towards a model of text comprehension and production. Psycological Review, Washington, v. 84, n. 5, p. 363-393, 1972.
KNOESIS. Semantics and services enabled problem solving environment for Tcruzi. Disponível em: http://knoesis.org/?q=projects/tcruzi>. Acesso em: 4 out. 2016.
KUHN, Thomas. A estrutura das revoluções científicas. 9. ed. São Paulo: Perspectiva, 2007.
LIDDY, Elizabeth. Text mining. Bulletin of The American Society for Information Science, Washington, v. 27, n.1, p. 13-14, 2000.
LINGPIPE. Home. Disponível em: http://alias-i.com/lingpipe/>. Acesso em: 21 ago. 2009.
LOH, Stanley. Abordagem baseada em conceitos para descoberta de conhecimento em textos. 2001. Tese (Doutorado em ciência da Informação) - Universidade Federal do Rio Grande do Sul, Porto Alegre, 2001. Disponível em: http://www.lume.ufrgs.br/handle/10183/1849>. Acesso: em 17 ago. 2009.
LUHN, H. P. The automatic creation of literature abstracts. IBM Journal of Research and Development, Armonk, v. 2, n. 2, p.159-165, 1958.
LYNCH, Clifford. Jim Gray's fourth paradigm and the construction of the scientific record. In: HEY, Tony; TANSLEY, Stewart; TOLLE, Kristin (Ed.). The fourth paradigm. Washington: Microsoft Research, 2009. p. 177-184.
MALHEIROS, Luciana Reis. A identificação de traços de descobertas científicas pela comparação do conteúdo de artigos em Ciências Biomédicas com uma ontologia pública. 2010. Tese (Doutorado em Ciência da Informação) - PPGCI UFF/IBICT, Niterói, 2010.
MALHEIROS, Luciana Reis; MARCONDES, Carlos Henrique. Identificación de indicios de descubrimientos científicos en artículos biomédicos mediante análisis de contenidos. Revista Española de Documentación Científica, Madrid, v. 36, n. 2, abr./jun. 2013. Disponível em: http://dx.doi.org/10.3989/redc.2013.2.915>. Acesso em: 5 jan. 2014.
MARCONDES, Carlos Henrique. From scientific communication to public knowledge: the scientific article Web published as a knowledge base. In: EGELEN, Jan; DOBREVA, Milena (Ed.). ICCC ElPub - INTERNATIONAL CONFERENCE ON ELECTRONIC PUBLISHING, 9, 2005, Leuven, Bélgica. Proceedings... Leuven, Bélgica, 2005. p. 119-127. Disponível em: http://elpub.scix.net>. Acesso em: 10 maio 2010.
MARCONDES, Carlos Henrique; COSTA, Leonardo C. A model to represent and process scientific knowledge in biomedical articles with semantic web technologies. Knowledge Organization, Wurzburg, Alemanha, v. 43, n. 2, p. 86101, 2016.
MARKOWITZ, Judith; NUTTER, J. Terry; EVENS, Martha W. Beyond is-a and part-whole: more semantic network links. Journal of Computers and Mathematics with Applications, v. 23, n. 6, p. 377-390, 1992.
MARTIN, Philippe. Knowledge acquisition using documents, conceptual graphs and a semantically structured dictionary. In: GAINES, B. R. (Ed.). Proc. of KAW'95. Canada: University of Calgary, 1995.
MATTELART, Armand. História da sociedade da informação. 2. ed. São Paulo: Loyola, 2002.
MEDELYAN, Olena; WITTEN, Ian H. Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology, New York, v.59, n.1, p.1026-1040, 2008.
MELUCCI, Massimo. Passage retrieval: A probabilistic technique. Information Processing & Management, Elmsford, NY, v. 34, n. 1, p. 43-68, 1998.
MILLER, David. Explanation versus description. Philosophical Review, Ithaca, NY, v. 56, n. 3, p. 306-312, 1947.
MILLER, David. Popper: textos escolhidos. Rio de Janeiro: Contraponto, PUC, 2010.
MOENS, Marie-Francine. Information extraction: algorithms and prospects in a retrieval context. Dordrecht: Springer, 2006.
MULLER, Hans-Michael; KENNY, Eimear; STERNBERG, Paul W. Textpresso: an ontology-based information retrieval and extraction system for biological literature. Plos Biology, San Francisco, v. 2, n. 11, 2004. Disponível em: http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020309>. Acesso em: 9 mar. 2012.
MURRAY-RUST, Peter; RZEPA, Henry S. Chemical markup, XML and the worldwide web. I: basic principles. Journal of Chemical Information and Computer Science, Washington v. 39, p. 928-942, 1999.
MURRAY-RUST, Peter; RZEPA, Henry S. STMML: a markup language for scientific, technical and medical publishing. Data Science Journal, Paris, v. 1, n. 2, p. 128-193, 2002. Disponível em: http://journals.eecs.qub.ac.uk/codata/journal/contents/1_2/1_2pdfs/ ds121.pdf>. Acesso em: 18 set. 2005.
MUSLEA, Ion. Extraction patterns for information extraction tasks: a survey. In: AAAI-99 WORKSHOP ON MACHINE LEARNING FOR INFORMATION EXTRACTION, 1999. Proceedings... Orlando, Florida, 1999. Disponível em: http://www.ai.sri.com/~muslea/PS/ml4ie-aaai99.pdf> Acesso em: 17 ago. 2009.
NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION. LocusLink. Disponível em: http://www.ncbi.nlm.nih. gov/LocusLink/>. Acesso em: 13 abril 2009.
NWOGU, Kevin Ngozi. The medical research paper: structure and functions. English for Specific Purposes, v. 16, n. 2, p. 119-138, 1997.
PARIS. René Descartes University. Genatlas. Disponível em: http://genatlas.medecine.univ-paris5.fr//>. Acesso em: 12 maio 2010.
PONTE, Jay M.; CROFT, W. Bruce. A language modeling approach to information retrieval. In: ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 21st, 1998, New York, p. 275-281, 1998. Proceedingsâ€¦ New York, 1998.
POPESCU, Ana-Maria; ETZIONI, Oren. Extracting product features and opinions from reviews. 2005. Disponível em: http://turing.cs.washington.edu/papers/ emnlp05_opine.pdf>. Acesso em: 17 ago. 2009.
PRINCETON UNIVERSITY. WordNet. Disponível em: http://wordnet.princeton.edu/>. Acesso em: 10 set. 2008.
PUSTEJOVSKY, J. et al. Robust relational parsing over biomedical literature: extracting inhibit relations. In: PACIFIC SYMPOSIUM ON BIOCOMPUTING, 2002, Hawaii. Proceedingsâ€¦Hawaii, 2002. p. 362-373.

PYYSALO, Sampo et al. BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics, London, v. 8, n.50, p. 1-24, fev. 2007.
RACUNAS, S. A. et al.. HyBrow: a prototype system for computer-aided hypothesis evaluation. Bioinformatics, v. 20, n. 1, p. 257â€”264, 2004.
REBHOLZ-SCHUHMANN, Dietrich; OELLRICH, Anika; HOEHNDORF, Robert. Text-mining solutions for biomedical research: enabling integrative biology. Nature Reviews Genetics, London, v. 13, n. 12, p. 829-839, dez. 2012. Disponível em: http://www.nature.com/nrg/journal/v13/n12/full/nrg3337.html>. Acesso em: 9 mar. 2016.
RENEAR, Allen H.; PALMER, Carole L. Strategic reading, ontologies and the future of scientific publishing. Science, Washington, v. 325, p.828-832, ago. 2009.
RIBEIRO, Claudio, J. S. Big data: uma investigação com uso de dados abertos sobre acidentes de trabalho. In: ENANCIB ENCONTRO NACIONAL DE PESQUISA EM CIÊNCIA DA INFORMAÇÃO, 15., 2014, Belo Horizonte. Anais... Belo Horizonte: ECI/UFMG, 2014. p. 4116-4131.
RICHARDSON, Stephen D.; DOLAN, William B.; VANDERWENDE Lucy. MindNet: acquiring and structuring semantic information from text. 1998. Disponível em: http://www.aclweb.org/anthology/C98-2175>. Acesso em: 9 mar. 2016.
RILOFF, Ellen; LEHNERT, Wendy. Information extraction as a basis for highprecision text classification. ACM Transactions on Information Systems, New York, v.12, n.3, p. 296-333, jul. 1994.
RILOFF, Ellen; LORENZEN, Jeffrey. Extraction-based text categorization: generating domain-specific role relationships automatically. In: STRZALKOWSKI, T. (Ed.). Natural Language Information Retrieval. London: Kluwer Academic Publishers, 1999. v. 7, p. 167-196.
ROBERTSON, Alexander M.; WILLETT, Peter. An upperbound to the performance of ranked-output searching: optimal weighting of query terms using a genetic algorithm. Journal of Documentation, London, v. 52, n.4, p. 405420, 1996.
ROSARIO, Barbara; HEARST, Marti. Classifying the semantic relations in noun compounds via a domain-specific lexical hierarchy. In: CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2001, Pittsburgh. Proceedingsâ€¦ Pittsburgh, PA, 2001.
SALTON, Gerald; YANG, C. S.; YU, C. T. A theory of term importance in automatic text analysis. Journal of the American Association Science, v. 26, n.1, p. 33-44, 1975.
SAMWALD, Matthias. Extracting conclusion sections from PubMed abstracts for rapid key assertion integration in biomedical research. Nature Proceedings v. 3775, n.1, 2009.
SARAWAGI, Sunita. Information extraction. Journal Foundations and Trends in Databases, Hanover, MA, v. 1, n. 3, p. 261-377, Mar. 2008.
SAYÃO, Luis F.; SALES, Luana F. Curadoria digital: um novo patamar para preservação de dados digitais de pesquisa. Informação & Sociedade, João Pessoa, v. 22, n. 3, p. 179-191, set./dez. 2012.
SCHUTT, Rachel; O'NEIL, Cathy. Doing Data Science. Sebastopol, CA: O'Reilly Media, 2014.
SHANNON, Claude E. A mathematical theory of communication. Bell System Technical Journal, New York, v. 27, p. 379-423, 1948.
SHETH, Amit; ARPINAR, I. Budak; KASHYAP, Vipul. Relationships at the heart of semantic web: Modeling, discovering, and exploiting complex semantic relationships. In: Enhancing the Power of the Internet. Springer Berlin Heidelberg, 2004. p. 63-94.
SHOTTON, David. Semantic Publishing: the coming revolution in scientific journal publishing. Learned Publishing, v. 22, n., p. 85-94, April, 2009. Disponível em: doi:10.1087/2009202>. Acesso em: 2 jul. 2012.
SKELTON, John. Analysis of the structure of original research papers: an aid to writing original papers for publication. British Journal of General Practice, London, v. 44, p. 455-459, 1994.
SMITH, Barry et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology, New York, v. 25, p.1251-1255, 2007. Disponível em: http://www.nature.com/nbt/journal/v25/n11/full/nbt1 346.html>. Acesso em: 25 março 2009.
SMITH, F. Jack. Data science as an academic discipline. Data Science Journal, Paris, v. 5, p. 163-164, Oct. 2006.
SODERLAND, Stephen. Learning to extract text-based information from the World Wide Web. In: INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING - KDD-97, 3th, 1997 Newport Beach, California. Proceedingâ€¦ Newport Beach, Califórnia, 1997.
SOLDATOVA, Larisa N.; KING, Ross D. Are the current ontologies in biology good ontologies?. Nature biotechnology, v. 23, n. 9, p. 1095-1098, 2005.
SRINIVASAN, Padmini. MeSHmap: A Text Mining Tool for MEDLINE. Journal of Biomedical Informatics, Philadelphia, p. 642-646. 2001. Disponível em http://mingo.info-science.uiowa.edu/padmini/Papers/amia01.doc>. Acesso em 20 out. 2008.
STANTON, Jeffrey M. An introduction to data science. New York: Syracuse University, 2012.
STEIN, Lincoln D. Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nature Reviews Genetic, Londres, v. 9, 2008.
SUDO, Kiyoshi; SEKINE, Satoshi; GRISHMAN, Ralph. An improved extraction pattern representation model for automatic IE pattern acquisition. In: ANNUAL MEETING ASSN. COMPUTATIONAL LINGUISTICS, 41., 2003, Sapporo. Proceedingsâ€¦ Sapporo, Japan, 2003. Disponível: http://nlp.cs.nyu.edu/ publication/papers/sudo-acl03.pdf>. Acesso: 4 set 2009.
SWANSON, Don R. Fish oil, raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, Baltimore, v. 30, p. 7-18, 1986.
SWANSON, Don R. Medical literature as a potential source of new knowledge. Bulletin of the Medical Library Association, Chicago, v. 78, n. 1, p. 29, 1990.
SWANSON, Don R.; SMALHEISER, Neil R. An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artificial Intelligence, v. 91, n. 2, p. 183-203, 1997. Disponível em: http://www.sciencedirect.com/science/article/pii/S0004370297000088>. Acesso em: 4 abr. 2016.
SWANSON, Don R.; SMALHEISER, Neil R; TORVIK, Vetle I. Ranking indirect connections in literature based discovery. The role of Medical Subject Headings. JASIST, Malden, v. 57, n. 11, p.1427-1439, 2006. Disponível em: https://www.researchgate.net/profile/Neil_Smalheiser/publication/220435241_ Ranking_indirect_connections_in_literature-based_discovery_The_role_of_ medical_subject_headings/links/00b7d53550ec3e8dd8000000.pdf>. Acesso em: 30 set. 2009.
SZARVAS, GyÃ¶rgy et al. The bioscope corpus: annotation for negation, uncertainty and their scope in biomedical texts. BMC bioinformatics, London, v. 9, Suppl 11, 2008.
TANABE, L. et al. MedMiner: an internet text-mining tool for biomedical information, with application to gene expression profiling. BioTechniques, London, v. 27, p. 1210-1217, Dec.1999.
TEI: Text Encoding Initiative. Disponível em: http://www.tei-c.org. Acesso em: 7 out. 2016.
TENOPIR, Carol et al. Electronic journals and changes in scholarly article seeking and reading patterns. In: Aslib proceedings. Emerald Group Publishing Limited, 2009. p. 5-32.
UMLS. Specialist natural language processing. Disponível em: http://lexsrv3.nlm.nih.gov/ SPECIALIST/index.html>. Acesso em: 11 julho 2008.
WEEBER, Marc et al. Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association, Philadelphia, v. 10, n. 3, p. 252-259, 2003. Disponível em: https://lhncbc.nlm.nih.gov/files/archive/pub2003034.pdf>. Acesso em: 19 out. 2005.
YEH, Alexander S. A.; HIRSCHMAN, Lynette; MORGAN, Alexander A. Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics, Oxford, v. 19, suppl. 1, p.331-39, 2003. Disponível em: http://bioinformatics.oxfordjournals.org/cgi/reprint/19/suppl_1/i331>. Acesso em: 10 jul. 2009.
ZIPF, George Kingsley. Human behaviour and the principle of least effort. New York: Addison-Wesley, 1949.
ZWEIGENBAUM, Pierre et al. Frontiers of biomedical textmining: current progress. Briefings In Bioinformatics. Oxford, v. 8. n. 5. p.358-375, 2007. Disponível em: http://bib.oxfordjournals.org/cgi/reprint/8/5/358>. Acesso em: 23 jun. 2009.