Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data.

Clicks: 493
ID: 72295
2019
Article Quality & Performance Metrics
Overall Quality Improving Quality
0.0 /100
Combines engagement data with AI-assessed academic quality
AI Quality Assessment
Not analyzed
Abstract
A survey of presences and absences of specific species across multiple biogeographic units (or bioregions) are used in a broad area of biological studies from ecology to microbiology. Using binary presence-absence data, we evaluate species co-occurrences that help elucidate relationships among organisms and environments. To summarize similarity between occurrences of species, we routinely use the Jaccard/Tanimoto coefficient, which is the ratio of their intersection to their union. It is natural, then, to identify statistically significant Jaccard/Tanimoto coefficients, which suggest non-random co-occurrences of species. However, statistical hypothesis testing using this similarity coefficient has been seldom used or studied.We introduce a hypothesis test for similarity for biological presence-absence data, using the Jaccard/Tanimoto coefficient. Several key improvements are presented including unbiased estimation of expectation and centered Jaccard/Tanimoto coefficients, that account for occurrence probabilities. The exact and asymptotic solutions are derived. To overcome a computational burden due to high-dimensionality, we propose the bootstrap and measurement concentration algorithms to efficiently estimate statistical significance of binary similarity. Comprehensive simulation studies demonstrate that our proposed methods produce accurate p-values and false discovery rates. The proposed estimation methods are orders of magnitude faster than the exact solution, particularly with an increasing dimensionality. We showcase their applications in evaluating co-occurrences of bird species in 28 islands of Vanuatu and fish species in 3347 freshwater habitats in France. The proposed methods are implemented in an open source R package called jaccard (https://cran.r-project.org/package=jaccard).We introduce a suite of statistical methods for the Jaccard/Tanimoto similarity coefficient for binary data, that enable straightforward incorporation of probabilistic measures in analysis for species co-occurrences. Due to their generality, the proposed methods and implementations are applicable to a wide range of binary data arising from genomics, biochemistry, and other areas of science.
Reference Key
chung2019jaccardtanimotobmc Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Chung, Neo Christopher;Miasojedow, BłaŻej;Startek, Michał;Gambin, Anna;
Journal BMC Bioinformatics
Year 2019
DOI
10.1186/s12859-019-3118-5
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.