More about the science
Bioinformatics software tools are traditionally difficult to use, often taking the form of command-line utilities with cryptic file formats.
Our research seeks to mitigate this problem with several projects:
- Develop MEGA, a sophisticated, user-friendly tool for comparative analysis of DNA and protein sequences.
- Create a database of functionally important differences in paralogous genes in vertebrates.
- Develop a Web resource containing molecular and fossil timescales of vertebrate evolution.
Current software and database development
| MEGA: Molecular Evolutionary Genetics Analysis |
www.megasoftware.net |
| FlyExpress: Drosophila melanogaster Expression Pattern Search Engine |
www.flyexpress.net |
| TimeTree: A Public Knowledge-Base of Divergence Times Among Organisms |
www.timetree.org |
| GRASP: Genomic Resource Access for Stoichioproteomics |
www.graspdb.net |
Comparing sequences
In an effort to counter this trend and to provide biologists with an easy-to-use tool for comparative sequence analysis, Dr. Sudhir Kumar developed Molecular Evolutionary Genetic Analysis (MEGA) in 1993. Now in its fourth major release, MEGA has become one of the most popular and most highly cited bioinformatics tools available. Designed for exploring and analyzing aligned DNA or protein sequences from an evolutionary perspective, this software package has had more than 25,000 unique downloads to date.

MEGA 4 is integrated software that facilitates multiple aspects of large-scale, comparative sequence analysis through an intuitive, Graphical User Interface (GUI). It makes useful methods of comparative sequence analysis easily accessible to the scientific community for research and education. It provides several advanced visualization modules for the visual management of sequence data, web-based data mining, phylogenetic tree construction and analysis, distance matrix visualization, and a rich integrated help viewer.
In addition to advanced visualization, MEGA 4 provides a comprehensive repertoire of computational and statistical modules for the analysis of nucleotide and amino acid sequence data. This repertoire includes methods for evolutionary distance estimation that allow for the relaxation of the homogeneity assumption; methods for phylogenetic tree construction such as Unlimited Pair Group Method with Arithmetic Mean (UPGMA), Neighbour-Joining, Minimum Evolution, and Maximum Parsimony; methods for testing the molecular clock hypothesis; methods for studying positive selection and conducting tests of neutrality; and a robust sequence alignment method based on the ClustalW algorithm.
MEGA 4 is written by researchers for researchers and is designed to reduce the amount of time needed for mundane non-technical tasks in data analysis. MEGA3 comes with on-line help showing how to use different aspects of its user interface. Extensive details of statistical and computational methods available of MEGA are presented in the book "Molecular Evolution and Phylogenetics" (Nei and Kumar, Oxford University Press, 2000).
MEGA 4 is provided to the research community free of charge, and the latest version can be downloaded from the MEGA website (http://www.megasoftware.net).
A Buzz in Develomental Biology
FlyExpress (www.flyexpress.net) is a discovery platform in developmental biology. It contains a digital library of standardized images capturing the spatial expression of thousands of genes at different developmental stages in the fruit-fly Drosophila melanogaster. An image-matching search engine and genome-scale summaries of spatiotemporal expression patterns (GEMs) reveal genes with similar patterns of expression in the developing embryo.
Telling Evolutionary Time
Researchers at our institute have also recently found a way to mark evolution using molecular timescales. This is the Timescale Website, which provides an intuitive query interface to the Timescale Database, a comprehensive resource for retrieving information on the time of divergence of species.
The query interface allows a person to search for molecular divergence time estimates for a specified pair of taxa. The system is capable of handling common and scientific taxa names and can account for simple spelling mistakes. Upon receiving a valid query for a pair of taxa, the Timescale system determines the most inclusive taxonomic groups for the supplied taxa and displays all available molecular records that detail time estimates between members of these groups. In addition, the Timescale database may be queried by author name with the results of this query containing all molecular time estimates associated with the query author.
The evolutionary history of life includes two primary components: phylogeny and timescale. Phylogeny refers to the branching order (relationships) of species or other taxa within a group and is crucial for understanding the inheritance of traits and for erecting classifications. However, a timescale is equally important because it provides a way to compare phylogeny directly with the evolution of other organisms and with planetary history such as geology, climate, extraterrestrial impacts, and other features.
The Timetree of Life is the first reference book to synthesize the wealth of information relating to the temporal component of phylogenetic trees. In the past, biologists have relied exclusively upon the fossil record to infer an evolutionary timescale. However, recent revolutionary advances in molecular biology have made it possible to not only estimate the relationships of many groups of organisms, but also to estimate their times of divergence with molecular clocks. The routine estimation and utilization of these so-called 'time-trees' could add exciting new dimensions to biology including enhanced opportunities to integrate large molecular data sets with fossil and biogeographic evidence (and thereby foster greater communication between molecular and traditional systematists). They could help estimate not only ancestral character states but also evolutionary rates in numerous categories of organismal phenotype; establish more reliable associations between causal historical processes and biological outcomes; develop a universally standardized scheme for biological classifications; and generally promote novel avenues of thought in many arenas of comparative evolutionary biology.
This authoritative reference work brings together, for the first time, experts on all major groups of organisms to assemble a timetree of life. The result is a comprehensive resource on evolutionary history which will be an indispensable reference for scientists, educators, and students in the life sciences, earth sciences, and molecular biology. For each major group of organism, a representative is illustrated and a timetree of families and higher taxonomic groups is shown. Basic aspects of the evolutionary history of the group, the fossil record, and competing hypotheses of relationships are discussed. Details of the divergence times are presented for each node in the timetree, and primary literature references are included. The book is complemented by an online database (www.timetree.org) which allows researchers to both deposit and retrieve data.