Continuing our discussion on CNV search in Mastermind, we’ll discuss the practical advantages of automated semantic searches, and explore how Mastermind’s powerful algorithm resolves sensitivity and specificity issues associated with multiple CNV nomenclatures.

As we’ve previously described, Mastermind finds CNV information in the primary evidence of the medical literature by many different nomenclatures, using our Genomic Language Processing engine. For example:

Mastermind finds CNV information in the primary evidence of the medical literature by many different nomenclatures, including cytogenetic band, ISCN karyotype notation, genomic coordinates, and genes and exons.
Mastermind finds CNV information in the primary evidence of the medical literature by many different nomenclatures, including cytogenetic band, ISCN karyotype notation, genomic coordinates, and genes and exons.

Mastermind recognizes these nomenclatures variations, and parses them into their constituent components, which allows us to normalize those CNV citations to genomic coordinates.

Schematic image depicting how Mastermind recognizes many CNV nomenclatures and parses, resolves, and normalizes them to genomic coordinates.
Schematic image depicting how Mastermind recognizes many CNV nomenclatures and parses, resolves, and normalizes them to genomic coordinates.

This means that when you do a search for a CNV in Mastermind, you will see all relevant results from the medical literature, irrespective of how you or the authors described those CNVs or specific breakpoints. This automated, semantic understanding of CNVs has several benefits compared to plain text-based search approaches, like Google Scholar, or manually curated databases, like ClinVar:

Text-based, automated searches do not understand CNV descriptions or genomics (chromosome and gene coordinate systems), while semantic, manual searches cannot keep up with volume of medical literature.
Text-based, automated searches do not understand CNV descriptions or genomics (chromosome and gene coordinate systems), while semantic, manual searches cannot keep up with volume of medical literature.

Automated, text-based searches often provide good sensitivity for the specific nomenclature searched, but poor sensitivity for overlapping or similar CNVs, poor sensitivity for alternate descriptions of the same CNV, and poor specificity due to a lack of understanding of genomic language.

Image of results for an automated, text-based search using Google Scholar.
Image of results for an automated, text-based search using Google Scholar.

Manual, semantic searches, which understand the positions described by certain CNV nomenclatures can help alleviate the issues with nomenclature-based sensitivity and specificity; however, they require specific, often hard-to-remember formats for user searching, and they often lag even the automated, text-based search engines in sensitivity due to the unscalable nature of manual curation used to find and normalize the CNV positions to their genomic coordinates.

Image of results for a manual, semantic search using ClinVar.
Image of results for a manual, semantic search using ClinVar.
Image of results for a manual, semantic search using ClinVar displaying entries with no evidence.
Image of results for a manual, semantic search using ClinVar displaying entries with no evidence.

Mastermind’s automated, semantic search powered by Genomic Language Processing to identify named CNV entities in literature maximizes sensitivity with scalable search indexing, which can keep pace with the volume of existing and newly published medical literature. Mastermind recognizes user searches using preferred nomenclatures, normalizing search terms in real-time to the canonical description of the CNV in Mastermind.

Image of how Mastermind recognizes and semantically parses a variety of CNV nomenclatures.
Image of how Mastermind recognizes and semantically parses a variety of CNV nomenclatures.

Mastermind’s semantic CNV search also maximizes sensitivity by showing all overlapping and similar CNVs cited in the medical literature, providing further search filters to refine the results.

Image of how Mastermind’s semantic CNV search also maximizes sensitivity by showing all overlapping and similar CNVs cited in the medical literature.
Image of how Mastermind’s semantic CNV search also maximizes sensitivity by showing all overlapping and similar CNVs cited in the medical literature.

Concomitantly, Mastermind optimizes specificity by understanding the genomic language used to describe structural variants. This minimizes confusion and misinterpretation of unrelated labels and descriptions in papers, even when multiple CNV descriptions are intertwined.

Image of how Mastermind’s understands distinctions between multiple CNV nomenclatures, maximizing both sensitivity and specificity.
Image of how Mastermind’s understands distinctions between multiple CNV nomenclatures, maximizing both sensitivity and specificity.

These advances in CNV search capabilities represent our ongoing dedication to the continuous improvement of our user experience – and demonstrate why variant scientists can count on Mastermind as the first resource for variant literature interpretation. We welcome your thoughts at support@genomenon.com.

See the latest updates for yourself. Sign up for Mastermind here.