Labaer

MedGene and BioGene

MedGene

MedGene is a text-mining tool for the associations between human genes and diseases in literature (J Proteome Res. 2003, 2:405-12; Nat Biotechnol. 2003 21(9):9767; Surgery. 2004 136(3):504). It searches the titles and the abstracts of over 16,000,000 Medline records to identify genes co-cited with human diseases. Normalization is applied to assess the strength of each association so that for any given human disease, MedGene can return a list of associated genes in rank order. Although eventually it will be feasible to study the entire human proteome in high-throughput studies, at present, practicality often demands a focus on relevant subsets of genes. In the case of the breast cancer studies, for example, it was important to identify a set of 1000 genes with a high likelihood of yielding results in the screening experiments (Figure 1).

In addition, high-throughput technologies, such as proteomic screening and DNA micro-arrays, produce vast amounts of data requiring comprehensive analytical methods to decipher the biologically relevant results. The global understanding of gene-disease relationships enables comprehensive comparisons between large experimental data sets and existing knowledge in the medical literature.

Figure 1.

medgene_figure1.png

Estimation of the false negative rate by comparison with hand-curated databases. The breast cancer-related genes identified by MedGene were compared with those listed in several other databases including the Tumor Gene Database (TGD), the Breast Cancer Gene Database(BCG), GeneCards (GC) and Swissprot. Genes were considered false negatives if they were represented in at least one of these other databases and not in MedGene and their link to breast cancer was supported by at least one literature reference. All literature references were verified by manual review to confirm their validity. The number of genes in each database or shared by more than one database is indicated. The false negative rate was calculated by genes missed at MedGene (26)/total number of nonoverlapping genes in other databases (285).

BioGene

BioGene is based on a similar concept to MedGene. Instead of disease terms, BioGene searches for the associations between human genes and biological themes in literature (PLoS ONE. 2008; 3(1): e1528). It allows more broad searches using any Biological or chemical MeSH term, such as “cell cycle”, “lipids” and “tetrahydrofolates”