Universitätsbibliothek Wien

Goodness of fit and robustness of phylogenetic methods in the light of intermittent evolution

Nguyen, Thi Minh Anh (2011) Goodness of fit and robustness of phylogenetic methods in the light of intermittent evolution.
Dissertation, University of Vienna. Fakultät für Lebenswissenschaften
BetreuerIn: von Haeseler, Arndt

Alle Rechte vorbehalten / All rights reserved

Download (1777Kb)
URN: urn:nbn:at:at-ubw:1-30250.63614.638266-5
URN: urn:nbn:at:at-ubw:1-30250.63614.638266-5

Link zu u:search

Abstract in English

Charles Darwin's theory of `The Origin of Species' (1859) states that species have evolved from common ancestors. Reconstructing so-called phylogenetic trees to elucidate the evolutionary relationships among species has since then become one of the main objectives in biology. In recent years, more and more phylogenetic studies have been published thanks to the advent of massive sequence data and to the development of efficient software packages. However, before drawing biological implications from the inferred evolutionary relationships, several issues should be taken into account. This thesis investigates two interesting issues in more detail: First, how can one know that the model used describes the data adequately? We present MISFITS, a novel approach to evaluate the goodness of fit between a phylogenetic model and an alignment, which at the same time pinpoints to alignment site patterns that do not fit. MISFITS introduces a minimum number of extra substitutions on the inferred tree to provide a biologically motivated justification for the deviation between the observed site pattern frequency and the corresponding expectation. The extra substitutions plus the evolutionary model then fully explain the alignment. Moreover, the significance of the required number of extra substitutions can be determined by conducting a parametric bootstrap analysis. Therefore, MISFITS rejects inadequate models in terms of fit to the data. We demonstrate MISFITS on several examples and present a survey of the goodness of fit of the best-fit models (suggested by model selection) to thousands of alignments in the PANDIT database. Second, insights into the performance of tree inference methods are essential because they may help to avoid wrong conclusions from the inferred phylogenies due to reconstruction artefacts such as long branch attraction. Among the criteria to evaluate the performance of a phylogenetic method, robustness to model violation is of particular practical importance as complete a priori knowledge of evolutionary processes is typically unavailable. We first develop ImOSM, a convenient tool to imbed intermittent evolution as model violation into an alignment. Intermittent evolution refers to extra substitutions occurring randomly on branches of a tree and thus changing alignment site patterns. We then study the robustness of widely used phylogenetic methods: maximum likelihood (ML), maximum parsimony (MP) and a distance-based method (BIONJ) to various scenarios of model violation. We show that violation of rates across sites (RaS) heterogeneity, and simultaneous violation of RaS and the transition transversion ratio along two nonadjacent external branches hinder all methods recovery of the true topology for a four-taxon tree. For an eight-taxon balanced tree these violations cause each of the three methods to infer a different topology: both ML and MP fail whilst BIONJ reconstructs the true tree. Furthermore, we report that several tests including the MISFITS test have enough power to detect such model violations. Thus, for analyses of real data, such reconstruction results require further investigation and these tests are recommended at the first glance.

Schlagwörter in Englisch

sequence evolution, phylogeny inference, model test, model adequacy, model violation, maximum likelihood, maximum parsimony, neighbor joining.

Abstract in German

nicht angegeben

Schlagwörter in Deutsch

nicht angegeben

Item Type: Hochschulschrift (Dissertation)
Author: Nguyen, Thi Minh Anh
Title: Goodness of fit and robustness of phylogenetic methods in the light of intermittent evolution
Umfangsangabe: XIV, 113 S. : graph. Darst.
Institution: University of Vienna
Faculty: Fakultät für Lebenswissenschaften
Publication year: 2011
Language: eng ... Englisch
Supervisor: von Haeseler, Arndt
Assessor: Metzler, Dirk
2. Assessor: Whelan, Simon
Classification: 54 Informatik > 54.80 Angewandte Informatik
42 Biologie > 42.10 Theoretische Biologie
AC Number: AC08960046
Item ID: 16620
(Das PDF-Layout ist ident mit der Druckausgabe der Hochschulschrift.)

Urheberrechtshinweis: Für Dokumente, die in elektronischer Form über Datennetze angeboten werden, gilt uneingeschränkt das österreichische Urheberrechtsgesetz; insbesondere sind gemäß § 42 UrhG Kopien und Vervielfältigungen nur zum eigenen und privaten Gebrauch gestattet. Details siehe Gesetzestext.

Edit item (Administrators only) Edit item (Administrators only)