Database searches: For each protein a Smith-Waterman search was performed against the proteome database to retrieve a set of proteins with a significant similarity (e-value < 1e-05). Only sequences that aligned with a continuous region longer than 50% of the query sequence were selected. At most 150 sequences were taken.
Multiple sequence alignment: Sets of homologous protein sequences were aligned using three different programs: MUSCLE v3.8.31, MAFFT v6.814b and DIALIGN-TX. Alignments were performed in forward and reverse direction and the six resulting alignments were combined using M-COFFEE (T-Coffee v8.80). The resulting alignment was trimmed using trimAl v1.3 using a consistency cutoff of 0.1667 and a gap score cutoff of 0.1.
Phylogenetic reconstructions: Phylogenetic trees were reconstructed using a Neighbour Joining approach as implemented in BioNJ. The likelihood of this topology was computed, allowing branch-length optimisation, using eight different models (JTT, WAG, MtREV, VT, LG, Blosum62, CpREV and DCMut), as implemented in PhyML 3.0. The two evolutionary models best fitting the data were determined by comparing the likelihood of the used models according to the AIC criterion. Maximum likelihood trees were derived using the two selected models. In all cases a discrete gamma-distribution model with four rate categories plus invariant positions was used, the gamma parameter and the fraction of invariant positions were estimated from the data.
Seed species: Arxula adeninivorans