Database searches: For each protein a Smith-Waterman search was performed against the proteome database to retrieve a set of proteins with a significant similarity (e-value < 1e-03). Only sequences that aligned with a continuous region longer than 30% of the query sequence were selected. At most 150 sequences were taken. An artificial database size of 1.000.000 sequences was put to make comparable results with other phylomes in the database.
Multiple sequence alignment: Sets of homologous protein sequences were aligned using three different programs: MUSCLE v3.8, MAFFT v6.712b and kAlign v2.04. Alignments were performed in forward and reverse direction and the six resulting alignments were combined using M-COFFEE. The resulting alignment was trimmed using trimAl v1.4 using a consistency cutoff of 0.1667 and a gap score cutoff of 0.1.
Phylogenetic reconstructions: Phylogenetic trees were reconstructed using a Neighbour Joining approach as implemented in BioNJ. The likelihood of this topology was computed, allowing branch-length optimisation, using eight different models (JTT, WAG, MtREV, VT, LG, Blosum62, Dayhoff and DCMut), as implemented in PhyML 3.0. The two evolutionary models best fitting the data were determined by comparing the likelihood of the used models according to the AIC criterion. Maximum likelihood trees were derived using the two selected models. In all cases a discrete gamma-distribution model with four rate categories plus invariant positions was used, the gamma parameter and the fraction of invariant positions were estimated from the data.
Seed species: Capsaspora owczarzaki