BioDR: Semantic Indexing Networks for Biomedical Document Retrieval

Anália Lourenço1, Rafael Carreira1,2, Daniel Glez-Peña4, José R. Méndez4, Sónia Carneiro1, Luis M. Rocha3,6, Fernando Diaz5, Eugénio C. Ferreira1, Isabel Rocha1, Florentino Fdez-Riverola4, and Miguel Rocha2 ,

1IBB/CEB, University of Minho, Campus Gualtar, Braga, Portugal
2CCTC, University of Minho, Campus Gualtar, Braga, Portugal
3School of Informatics, Indiana University, 1900 East Tenth Street, Bloomington IN 47408, USA
4Computer Science Dept., Univ. Vigo, Campus As Lagoas, Ourense, Spain
5Computer Science Department, University of Valladolid, Segóvia, Spain
6FLAD Computational Biology Collaboratorium, Instituto Gulbenkian de Ciencia, Portugal

Citation: A. Lourenço; R.C. Carreira; D. Glez-Peña; J.R. Méndez; S.A. Carneiro; L.M. Rocha; F. Díaz; E.C. Ferreira; I.P. Rocha; F. Fdez-Riverola; M. Rocha [2009]. "BioDR: Semantic Indexing Networks for Biomedical Document Retrieval.". Expert Systems with Applications, 37, 3444–3453. doi:10.1016/j.eswa.2009.10.044

The full text and pdf re-print will be available soon from the Expert systems with Applications site. Due to mathematical notation and graphics, only the abstract is presented here.


In Biomedical research, retrieving documents that match an interesting query is a task performed quite frequently. Typically, the set of obtained results is extensive containing many non-interesting documents and consists in a flat list, i.e. not organized or indexed in any way. This work proposes BioDR, a novel approach that allows the semantic indexing of the results of a query, by identifying relevant terms in the documents. These terms emerge from a process of Named Entity Recognition that annotates occurrences of biological terms (e.g. genes or proteins) in abstracts or full texts. The system is based on a learning process that builds an Enhanced Instance Retrieval Network (EIRN) from a set of manually classified documents, regarding their relevance to a given problem. The resulting EIRN implements the semantic indexing of documents and terms, allowing for enhanced navigation and visualization tools, as well as the assessment of relevance for new documents.

Keywords: Biomedical Document Retrieval; Document Relevance; Enhanced Instance Retrieval Network; Named Entity Recognition; Semantic Indexing Document Network.

For more information contact Luis Rocha at Check the Web Design Credits, for due credit.
Last Modified: January 11, 2010