TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification.

Sayyari; Erfan;Kawas; Ban;Mirarab; Siavash;

doi:10.1093/bioinformatics/btz394

TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification.

Clicks: 324

ID: 45930

2019

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Popular Article

30.0 /100

290 views

18 readers

AI Quality Assessment

Not analyzed

Abstract

EN
- Turkish
- Spanish
- Portuguese
- Arabic
- Chinese
- French
- German
- Indonesian
- Russian
- Thai

Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks.In this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes.TADA is available at https://github.com/tada-alg/TADA.Supplementary data are available at Bioinformatics online.

Abstract Quality Issue: This abstract appears to be incomplete or contains metadata (257 words). Try re-searching for a better abstract.

Reference Key	sayyari2019tadabioinformatics Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	Sayyari, Erfan;Kawas, Ban;Mirarab, Siavash;
Journal	Bioinformatics
Year	2019
DOI	10.1093/bioinformatics/btz394 Searching for DOI...
URL	https://doi.org/10.1093/bioinformatics/btz394
Keywords	natural language processing bert deep learning electronic health record note entity normalization

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

Login to comment Register

No comments yet. Be the first to comment on this article.