Mastermind’s Genomic Search Engine uncovers 594% More Articles per Variant and 63% More Variants per Gene in a head-to-head test against COSMIC variant database. The Mastermind Genomic Search Engine has a number of meaningful advantages over the Catalogue of Somatic Mutations in Cancer (COSMIC) as a tool for searching genomic evidence, reducing the amount of time required to curate variants and resulting in fewer missed articles containing evidence for the variant.

In the test described below, Mastermind found a greater number of articles and variants while matching the sensitivity and specificity of COSMIC at both the article and variant level. Using Mastermind as a primary source for variant search, supplemented by COSMIC, can ensure the most comprehensive and accurate results.

The Challenge of Variant Interpretation with COSMIC

Advances in DNA sequencing technology and precision medicine have not been accompanied by improvements in analytical techniques. Specifically, legacy tools such as HGMD and COSMIC used in genomic interpretation lack the ability to quickly and accurately interpret genomic data from the most recent scientific publications, which limits the extent to which the legacy databases can be used in making informed, up-to-date diagnostic, prognostic, and therapeutic decisions.

The practical limitations of legacy tools are due in large part to their lack of automation.

Specifically, legacy variant interpretation tools like the Human Genomic Mutation Database (HGMD) and the Catalogue of Somatic Mutations in Cancer (COSMIC) have been built with manual curation efforts over the past 15-20 years. Due to this limitation, those tools are unable to keep pace with the rapid influx of new genomic research, and are not readily scalable for use in personalized diagnostic decisions. Variant curation needs a more automated approach for DNA sequencing data interpretation in order to stay up to date with the latest peer-reviewed publications.

The Mastermind (MM) Genomic Search Engine is an automated index of the genomic literature that avoids the pitfalls of manually curated genomic databases. To determine the veracity of an automated approach in assembling such a genomic search engine, we compared the results in MM to a popular legacy database used in genetic analysis of cancer-associated variants: The Catalogue of Somatic Mutations in Cancer (COSMIC). We were interested in assessing both tools across two parameters

  • Sensitivity: Is Mastermind able to find the same articles and variants identified in COSMIC?
  • Specificity: Is Mastermind able to identify the same quantity or more relevant articles and variants as COSMIC?

This blog post presents the results of this study.

Editor’s note: This study was performed in November of 2016 with a pre-release version of the Mastermind Genomic Search Engine when it had indexed only 2.7 million full text articles. Since that time, Mastermind has indexed nearly 6 million full text genomic articles, resulting in twice the content coverage that was available at the time of this study. The Mastermind content corpus continues to grow as it indexes all new genomic literature each week. The test was only performed on variants found in primary publications and did not include supplemental data.

Methodology

Mastermind organizes all variant data from the medical literature by disease and gene. This method grants the user easy access to all published variant information even if the variant is mentioned only once amongst millions of scientific articles. We compared the results of Mastermind’s automated process against COSMIC’s manually curated database for both a comprehensive collection of variants, as well as a random selection of genes.

To assess the comprehensiveness of the Mastermind results, more than 9,000 protein-coding variants in 407 cancer-associated genes were randomly selected from the COSMIC v76 database extracted from 22,844 articles. Each variant was searched for in Mastermind to find the COSMIC-associated references. To assess the extent to which additional variants found in Mastermind were not present in COSMIC, 50 of the 407 genes were randomly selected, and for each gene, all variants and associated references present in the COSMIC database were compared with those found in MM. The additional variants found in MM but not in COSMIC were assessed for evidence of the association with the disease using ACMG variant classification criteria.

Search Accuracy Test: COSMIC vs. Mastermind

Mastermind vs. COSMIC Variant Database

Sensitivity

Mastermind matched the sensitivity of COSMIC at both the article and variant levels.

In total, 2,975 articles were identified by COSMIC of which 96.5% were also identified and fully processed by MM, demonstrating an effective automated strategy to identify high-yield content. Of those articles that were not found in MM, 38 (1.3%) were foreign-language and 23 (0.8%) were no longer in the PubMed index and indicate out-of-date or retracted articles that are still referenced in COSMIC.

Overall 9,329 COSMIC variants were examined for concordance in MM. Of these, MM found 96.5% of COSMIC variants with 66.3% of these only being found in the supplementary material. Of the 3.5% that were missed, the variants were described in figures that did not include a text description of the variant (1.0%), tables that presented the variant description split across multiple columns (0.7%) or prose-like text descriptions of the variant that were difficult to parse automatically or ambiguous (0.8%).

Specificity

Mastermind matched the specificity of COSMIC at both the article and variant levels and was also able to identify a greater number of relevant articles and variants.

Whereas COSMIC identified 2,975 articles for the randomly selected set of variants, MM found a total of 20,645 articles for the same variant data set representing a 6.9-fold increase in total references.

For the 50 randomly selected genes, a total of 6,743 variants were identified in COSMIC. For the same genes, MM identified 11,014 variants, which were then manually reviewed to be accurate, representing a 63% increase in variants.

These results demonstrate the ability of an automated approach of indexing the genomic literature to find more articles and variants than manual curation methods. This is especially significant because when using this assessment clinically, missing information at either the article or variant level, or missing the most up-to-date studies, could result in a different and less informed prognostic or treatment decision and a less favorable outcome for the patient. Further, the automated approach MM takes offers the ability to scale the diagnostic information of COSMIC to include all diseases and to account for the vast quantity of novel research information to be utilized in a highly efficient way.

Conclusion: COSMIC vs. Mastermind

The Mastermind Genomic Search Engine addresses the crucial problem of interpreting patient DNA sequencing information in a way that accounts for exponential increases in newly published research information. Legacy tools such as COSMIC and HGMD databases rely on manual curation that is unsustainable and can’t scale with the rapidly increasing demand for personalized medicine.

When compared to COSMIC, a database that has been manually curated for more than a decade, Mastermind’s automated approach was able to identify 96.5 and 96.7% of the articles and variants identified in COSMIC while simultaneously keeping pace with new literature published on a weekly basis. In addition, Mastermind identified 594% more articles per variant and 63% more variants per gene than COSMIC. These results illustrate the power of automated article indexing techniques to identify clinically meaningful content and the superior search capability of Mastermind over legacy manually curated databases.

In summary, manually curated variant databases such a HGMD and COSMIC can’t scale to meet the massive influx of new genomic articles published each year. Mastermind’s automated approach delivers high sensitivity compared to these manually curated databases with a much higher level of specificity in article and variant identification from the published literature.

Learn more and try Mastermind for yourself. Register for the Free version and get an upgrade to Mastermind Professional for your first 30 days.

See the results of our comparison to Google Scholar