methods for determining the statistical significance of enrichment or depletion of gene ontology classifications under weighted membership

Clicks: 192

ID: 135494

2012

Article Quality & Performance Metrics

Overall Quality Improving Quality

0.0 /100

Combines engagement data with AI-assessed academic quality

Reader Engagement Emerging Content

0.3 /100

1 views

1 readers

AI Quality Assessment

Not analyzed

Abstract

High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an interesting set of genes—say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover gold standard annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

Reference Key	eiacucci2012frontiersmethods Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors	;Ernesto eIacucci;Hans H. Zingg;Theodore J. Perkins;Theodore J. Perkins
Journal	chemical record (new york, ny)
Year	2012
DOI	10.3389/fgene.2012.00024
URL	http://journal.frontiersin.org/Journal/10.3389/fgene.2012.00024/full https://doi.org/10.3389/fgene.2012.00024
Keywords	gene ontology dynamic programming enrichment depletion weighted membershipgenetics

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

Comments

No comments yet. Be the first to comment on this article.