Acumenta Gene Thesaurus™ Reaches New Milestone,
Enabling Higher Quality Searches of Biomedical Literature

Boston, MA – March 25, 2008: Acumenta Corporation (www.acumenta.com) announces a major new release of its Gene Thesaurus™ of Gene and Protein Nomenclature.

The Gene Thesaurus™ is produced through a combination of Acumenta technology and human curation.   The new release being announced today contains nearly 12,000 aliases (synonyms) that are not included in public gene nomenclature repositories.  When used, these aliases greatly improve the quality of results of searches for biomedical literature that mentions genes or proteins.

Acumenta technology interrogates the world's most important genomic databases on a regular basis to build and maintain a comprehensive repository of human gene and protein nomenclature.  These databases include NCBI's Entrez Gene, UniProt and HUGO as well as other sources.

Acumenta has identified and curated “stealth aliases”, which are synonyms that do not appear in these public nomenclature repositories, yet are widely used to reference many genes and proteins.  If these synonyms are omitted from a search, recall is seriously compromised.  The new release of the Acumenta Gene Thesaurus contains 11,856 stealth aliases for over 4,274 genes which collectively identify thousands of scientific abstracts in the NCBI PubMed database, for example, that would be missed if only the standard gene synonyms were used.

It is well understood in academic research, biotechnology and pharmaceutical R&D organizations that there is a serious quality problem in text searches on gene and protein topics.  Low quality search results (low precision and recall) are caused by the following issues:

  • Highly ambiguous gene names results in poor relevancy (precision).
  • The abundance of gene name aliases (synonyms).  If they aren't used, a search can provide poor recall.
  • Continuous flux in the nomenclature results in poor precision and poor recall if changes and new synonyms aren't incorporated.

The purpose of the Acumenta Gene Thesaurus™ is to provide an economical means for achieving comprehensive recall and high precision on text searches of biomedical literature for references to genes and proteins.

“Through use of proprietary technology, extensive human curation and a continuous update process, Acumenta has created a best-of-breed solution for incorporation in corporate taxonomies, search engines and dictionary systems,” says Stephen C. Taylor, Acumenta's CEO.  “In addition to the stealth aliases which improve recall, we have identified 1,860 ambiguous terms in the public nomenclature for 1,532 genes that damage precision of searches.”

“The Gene Thesaurus™ enables us to maintain a very timely and accurate definition of gene and protein references in the biomedical literature.  It is a key component of our literature mining-based gene set enrichment product, Literature Lab™,” said Paul R. Martinez, Acumenta's Vice President of Sales and Marketing.   “Our customers use it to provide high quality archived research and current awareness for their scientists, marketers and managers.”

Founded in 2001, Acumenta Corporation is based in Boston, Massachusetts.  Acumenta is a software and information company serving biotechnology and pharmaceutical firms and non-profit research organizations.  Acumenta builds applications on the Acumenta platform that automate information gathering, analysis and management tasks associated with Web-based and proprietary external and internal databases.  Acumenta's curated Thesaurus of Gene and Protein Nomenclature and software products improve both the efficiency and quality of research in the life sciences R&D function.


Contact: Paul Martinez
Title: Vice President Sales & Marketing
Phone: 617-379-0691
Email: pmartinez@Acumenta.com
URL: www.Acumenta.com

 

[ back to top ]