Acumenta Gene Thesaurus™ Reaches New Milestone, Enabling Higher Quality Searches of Biomedical Literature
Boston, MA – March 25, 2008: Acumenta Corporation
(www.acumenta.com) announces a major new release of its Gene Thesaurus™ of Gene and Protein Nomenclature.
The Gene Thesaurus™ is produced through a combination of Acumenta technology and human curation.
The new release being announced today contains nearly 12,000 aliases (synonyms) that are not included in public
gene nomenclature repositories. When used, these aliases greatly improve the quality of results of searches for
biomedical literature that mentions genes or proteins.
Acumenta technology interrogates the world's most important genomic databases on a regular basis to build
and maintain a comprehensive repository of human gene and protein nomenclature. These databases
include NCBI's Entrez Gene, UniProt and HUGO as well as other sources.
Acumenta has identified and curated “stealth aliases”, which are synonyms that do not appear in these
public nomenclature repositories, yet are widely used to reference many genes and proteins. If these
synonyms are omitted from a search, recall is seriously compromised. The new release of the Acumenta
Gene Thesaurus contains 11,856 stealth aliases for over 4,274 genes which collectively identify thousands of
scientific abstracts in the NCBI PubMed database, for example, that would be missed if only the standard gene
synonyms were used.
It is well understood in academic research, biotechnology and pharmaceutical R&D organizations that there
is a serious quality problem in text searches on gene and protein topics. Low quality search results
(low precision and recall) are caused by the following issues:
- Highly ambiguous gene names results in poor relevancy (precision).
- The abundance of gene name aliases (synonyms). If they aren't used, a search can provide poor recall.
- Continuous flux in the nomenclature results in poor precision and poor recall if changes and new synonyms aren't incorporated.
The purpose of the Acumenta Gene Thesaurus™ is to provide an economical means for achieving comprehensive
recall and high precision on text searches of biomedical literature for references to genes and proteins.
“Through use of proprietary technology, extensive human curation and a continuous update process,
Acumenta has created a best-of-breed solution for incorporation in corporate taxonomies, search engines and
dictionary systems,” says Stephen C. Taylor, Acumenta's CEO. “In addition to the stealth
aliases which improve recall, we have identified 1,860 ambiguous terms in the public nomenclature for 1,532 genes
that damage precision of searches.”
“The Gene Thesaurus™ enables us to maintain a very timely and accurate definition of gene and protein
references in the biomedical literature. It is a key component of our literature mining-based gene set enrichment
product, Literature Lab™,” said Paul R. Martinez, Acumenta's Vice President of Sales and Marketing.
“Our customers use it to provide high quality archived research and current awareness for their
scientists, marketers and managers.”
Founded in 2001, Acumenta Corporation is based in Boston, Massachusetts. Acumenta is a software and information
company serving biotechnology and pharmaceutical firms and non-profit research organizations. Acumenta builds
applications on the Acumenta platform that automate information gathering, analysis and management tasks
associated with Web-based and proprietary external and internal databases. Acumenta's curated Thesaurus of Gene
and Protein Nomenclature and software products improve both the efficiency and quality of research in the life
sciences R&D function.
Contact: Paul Martinez
Title: Vice President Sales & Marketing
Phone: 617-379-0691
Email: pmartinez@Acumenta.com
URL: www.Acumenta.com |