A lab that isn't able to quickly make use of the literature is at a competitive disadvantage.
Find out what’s missing from the analyses of your transcriptomic datasets.
The biomedical literature record within PubMed is a rich source of associative information but it is too vast to permit comprehensive manual review. Since 1990 there are > 20 million new publications, 12.5 million mention one or more human genes, 8.5 million mention one or more pathways, 2.2 million discuss neoplastic diseases, 1.5 million discuss CVD and so on. And modern high content genomic technologies are producing data at rates that outpace meaningful interpretation by legacy analysis platforms and bioinformatics techniques.
We have created Literature Lab™, the only data mining platform that identifies statistically significant associations between gene lists and key concepts in the literature.
WHY IS LITERATURE LAB’S APPROACH TO GENE DATASET ENRICHMENT SO IMPORTANT?
Virtually all annotation engines, no matter what their algorithm is - some take gene lists and work on those, some, like GSEA, take the entire data set of change values...what they are working with is finite groups of genes.
What they do is simply map those genes into your data whether it is a gene list, or a full data set and simply ask: “Is this gene set doing something non-random?” This approach does have value, for example, consider a good tool like DAVID - it has a good search algorithm and ranking system and large databases. It is fast and it can evaluate a gene list and look for enrichment (non-random patterns) scanning through every list in its database. However, this approach, is completely dependent on the nature of the constituent gene lists. So where do gene lists come from? Well they come from different types of information, e.g.: GO. The GO categories simply try to locate genes into bins of functional significance/relatedness. There's nothing wrong with that, but those lists are all finite...once you are in that tnf-alpha pathway, that's it...they have 35 genes in that pathway, job done.
Literature Lab™ finds enrichment in real time between the literature and the gene data set, and the analysis is totally unique to that gene list. It is not dependent on any a priori knowledge. Every one of your gene lists has unique qualities – and is accordingly viewed as a unique entity in the Literature Lab™ PLUS analysis. For example, in the case study, below, on data from two different days Literature Lab™ was able to tease out statistically significant associations in the nitric oxide pathway (p-values .0005, .0006). Although there were some overlapping genes there was plenty of non-overlap as is shown in the report, but each list mapped separately to a nitric oxide pathway identification. This happened because the genes were associated with terms that connected with the NO pathway in the literature. Each list was unique and each one in its own right was related to NO. This is very important because Literature Lab™ PLUS didn't interrogate with a preset list of genes to say “this is the NO group, how is the data doing against this list”. The analysis is dynamic, depending on the list and the identified associations.
A Literature Lab™ Use Case
Situation: A bioinformatics expert at a major research university was asked to analyze data from a time course experiment on 7 patients. The scientist needed to “push” the data in order to see differential gene expression activity. The data proved to be uniform, and many small changes were observed to be operating in a coordinated fashion. However, standard enrichment analysis tools were unable to provide useful information about the functional roles of the gene sets.
Functional Analysis: Literature Lab™ was used to compare the data from days 3 and 7. The Nitric Oxide Signaling pathway was immediately shown to be strongly associated with both data sets, and Inflammatory Response was strongly associated with day 7. These findings were highly relevant to the biology under analysis.
Results: The view below is a composite of several Literature Lab screens showing the NOS pathway associations on both days 3 and 7. The insets list the genes driving the association at each time point, and indicate that the gene sets are not highly overlapping.
There is some overlap, but the gene sets are distinctly different.
Plotting the expression of the genes elevated in the patients vs control between days 3 and 7 clearly illustrates the differential regulation between the two phenotypes and that the drivers are the NOS signaling genes.
In this Case Study the data sets from the two days had significant non-overlap, but each one uniquely had a statistically significant association with the Nitric Oxide pathway, along with the Inflammatory Response pathway on day seven. Each list had genes that were associated with these pathways in the literature, but there were not enough canonical genes or other data in these lists to enable enrichment by conventional gene list-and ontology-based tools.
Analysis of the metabolites in the patients confirmed the associations revealed by Literature Lab™.
Note that the Use Case employed Literature Lab’s ability to show trends and identify similarities and differences in the results of multiple dataset analyses.