Solr-Plant: efficient extraction of plant names from text.

Clicks: 114
ID: 49848
2019
Article Quality & Performance Metrics
Overall Quality Improving Quality
0.0 /100
Combines engagement data with AI-assessed academic quality
AI Quality Assessment
Not analyzed
Abstract
The retrieval of plant-related information is a challenging task due to variations in species name mentions as well as spelling or typographical errors across data sources. Scalable solutions are needed for identifying plant name mentions from text and resolving them to accepted taxonomic names.An Apache Solr-based fuzzy matching system enhanced with the Smith-Waterman alignment algorithm ("Solr-Plant") was developed for mapping and resolution to a plant name and synonym thesaurus. Evaluation of Solr-Plant suggests promising results in terms of both accuracy and processing efficiency on misspelled species names from two benchmark datasets: (1) SALVIAS and (2) National Center for Biotechnology Information (NCBI) Taxonomy. Additional evaluation using S800 text corpus also reflects high precision and recall. The latest version of the source code is available at https://github.com/bcbi/SolrPlantAPI . A REST-compliant web interface and service for Solr-Plant is hosted at http://bcbi.brown.edu/solrplant .Automated techniques are needed for efficient and accurate identification of knowledge linked with biological scientific names. Solr-Plant complements the current state-of-the-art in terms of both efficiency and accuracy in identification of names restricted at species level. The approach can be extended to identify broader groups of organisms at different taxonomic levels. The results reflect potential utility of Solr-Plant as a data mining tool for extracting and correcting plant species names.
Reference Key
sharma2019solrplantbmc Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Sharma, Vivekanand;Restrepo, Maria Isabel;Sarkar, Indra Neil;
Journal BMC Bioinformatics
Year 2019
DOI
10.1186/s12859-019-2874-6
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.