Biomedical Text Mining Applied To Document Retrieval and Semantic Indexing

Anália Lourenço1, Sónia Carneiro1, Eugénio C. Ferreira1, Rafael Carreira1,2, Luis M. Rocha3,6, Daniel Glez-Peña4, José R. Méndez4, Florentino Fdez-Riverola4, Fernando Diaz5, Isabel Rocha1 and Miguel Rocha2 ,

1IBB/CEB, University of Minho, Campus Gualtar, Braga, Portugal
2CCTC, University of Minho, Campus Gualtar, Braga, Portugal
3School of Informatics, Indiana University, 1900 East Tenth Street, Bloomington IN 47408, USA
4Computer Science Dept., Univ. Vigo, Campus As Lagoas, Ourense, Spain
5Computer Science Department, University of Valladolid, Segóvia, Spain
6FLAD Computational Biology Collaboratorium, Instituto Gulbenkian de Ciencia, Portugal

Citation: A. Lourenço, S. Carneiro, E.C. Ferreira, R. Carreira, L.M. Rocha, D. Glez-Peña, J.R. Méndez, F. Fdez-Riverola, F. Diaz, I. Rocha and M. Rocha [2009]. "Biomedical Text Mining Applied To Document Retrieval and Semantic Indexing." (ACM Portal). In: Proc. of the 3rd International Workshop on Practical Applications of Computational Biology & Bioinformatics (IWPACBB'09). Lecture Notes in Computer Science. Springer-Verlag, 5518: 954-963. doi:10.1007/978-3-642-02481-8

The pdf re-print is available.


In Biomedical research, the ability to retrieve the adequate information from the ever growing literature is an extremely important asset. This work provides an enhanced and general purpose approach to the process of document retrieval that enables the filtering of PubMed query results. The system is based on semantic indexing providing, for each set of retrieved documents, a network that links documents and relevant terms obtained by the annotation of biological entities (e.g. genes or proteins). This network provides distinct user perspectives and allows navigation over documents with similar terms and is also used to assess document relevance. A network learning procedure, based on previous work from e-mail spam filtering, is proposed, receiving as input a training set of manually classified documents.

Keywords:Biomedical Document Retrieval, Document Relevance, Enhanced Instance Retrieval Network, Named Entity Recognition, Semantic Indexing Document Network.

For more information contact Luis Rocha at Check the Web Design Credits, for due credit.
Last Modified: October 27, 2009