What's new in phylomeDB 3

Visualization of alignments and phylogenetic trees

Current version of PhylomeDB uses ETE v2.0 to visualize phylogenetic trees, and Jalview Lite v2.4  and trimAl v1.2 for the visualization of raw and processed alignments. The visualization of trees and alignments is interactive and users can choose among various display options, such as collapsing parts of a tree or enabling/disabling the display of alternative IDs. Moreover text strings (e.g a species or a protein name) can be searched within phylogenetic trees. In addition, further information for each sequence, including hyperlinks to other databases, can be accessed by clicking on the corresponding node.

PhylomeDB unique sequence ID system

A new unique ID system has been developed for this version of PhylomeDB. This solves previous issues related to the inclusion of newer versions of existing proteomes and makes the information on the sequence species source more intuitive. In addition, it ensures that the same gene will receive the same ID in subsequent genome versions, unless the sequence has been updated. All PhylomeDB IDs (eg Phy0008C1X_HUMAN) start with the code “Phy”, followed by an alphanumerical string of length 7, an underscore symbol “_”, and an alphanumeric species code. This species code corresponds to that assigned by UniprotKB in the “controlled vocabulary of species” or, when no code is present in Uniprot, to the NCBI taxonomic ID (http://www.ncbi.nlm.nih.gov/Taxonomy/). For consistency, older PhylomeDB IDs are still associated to their corresponding sequences and are still searchable. Finally, PhylomeDB IDs are regularly mapped to IDs from other databases such as Ensembl and Uniprot  and corresponding conversion tables are provided in the download section.

Cross-links to other databases

The possibility of external linkage to phylomeDB has been improved and now phylomeDB is linked from many sequence, process, and organism reference databases, including Uniprot , EnsemblCompara , Saccharomyces Genome Database (SGD) , AphidBase , DeathBase , and PeroxisomeDB . Links to specific entries in phylomeDB are easily customizable and detailed instructions are provided in the help section of PhylomeDB. Thus, phylomeDB can be also regarded as complementary resource providing evolutionary information for sequences maintained in other databases, and we encourage administrators of other databases to consider this possibility.

Seed and Collateral trees

A sequence entry is now associated not only with the trees in which that sequence is used as a seed but also to all other trees that contain that sequence. These so-called “collateral” trees may include phylogenies from the same phylome (e.g trees in which a paralogous protein was used as a seed), but also trees from other phylomes that contain that sequence. This provides users with additional information on the evolution of the sequence of interest and may serve to evaluate whether a given scenario (e.g an orthology relationship) is supported also by alternative trees. Indeed, partially overlapping phylogenetic trees from PhylomeDB and other phylogenetic databases are explored by the MetaPhOrs database (http://orthology.phylomedb.org) to provide consistency-based confidence scores to phylogeny-based orthology and paralogy predictions.

Data download and programmatic access

A FTP-based download section has been developed to provide easy access to files containing all alignments, trees and orthology and paralogy predictions associated to every public phylome. In addition, ID conversion tables to Uniprot, Ensembl and other major sequence repositories are also provided. Alternatively, for each phylomeDB entry, a compressed folder containing all information associated with the corresponding sequence can be downloaded. Finally, an Application User Interface (API) for accessing phylomeDB is available through the ETE software. With this API, users can connect to phylomeDB and search for pre-computed gene phylogenies, download complete phylomes or obtain the orthology and paralogy predictions provided by the database. This allows programmatic access to PhylomeDB, as well as the automation of any downstream analysis.

Pre-release section

A log-in protected private section has been created to store pre-release versions of phylomes, so that they can be used on-line before publication. This option is mainly used now by genome sequencing consortia that generate phylomes within their annotation pipeline. PhylomeDB has currently 13 private phylomes that will be released during the following months.