an evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with salmonella

;James B. Pettengill;Yan Luo;Steven Davis;Yi Chen;Narjol Gonzalez-Escalona;Andrea Ottesen;Hugh Rand;Marc W. Allard;Errol Strain

doi:10.7717/peerj.620

an evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with salmonella

Clicks: 232

ID: 180127

2014

Free PDF

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Steady Performance

30.0 /100

231 views

30 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

Comparative genomics based on whole genome sequencing (WGS) is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks). Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1) next-generation sequencing (NGS) platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD), (2) algorithms used to construct a SNP (single nucleotide polymorphism) matrix (reference-based and reference-free), and (3) phylogenetic inference method (FastTreeMP, GARLI, and RAxML). We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data) were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by our results represent a preliminary set of guidelines and a step towards developing validated standards for clustering based on whole genome sequence data.

Reference Key	pettengill2014peerjan Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	;James B. Pettengill;Yan Luo;Steven Davis;Yi Chen;Narjol Gonzalez-Escalona;Andrea Ottesen;Hugh Rand;Marc W. Allard;Errol Strain
Journal	pediatrics
Year	2014
DOI	10.7717/peerj.620 Searching for DOI...
URL	https://peerj.com/articles/620/ https://doi.org/10.7717/peerj.620
Keywords	outbreak next generation sequencing phylogenetics

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.