Mastermind Masterclass: ACMG/AMP Classification

Thursday, June 25th, 2020

The Mastermind Genomic Search Engine has the essential ability to filter the genomic literature by ACMG/AMP criteria. 

The American College of Medical Genomics (ACMG), in collaboration with the Association for Molecular Pathology (AMP) and the College of American Pathologists (CAP), guide the standards for the interpretation of genomic variants. This includes classifying genetic variants into five categories based on specific scientific evidence.

While some of the evidence can be based on population data or computational data, genomic variants cannot be classified as pathogenic without citing evidence from peer-reviewed scientific literature. Mastermind is the only genomic search engine that provides an extensive search of all the scientific literature according to ACMG classification guidelines. Users can search Mastermind by disease, gene, variant, and ACMG/AMP criteria to find clinically prioritized scientific evidence that can be cited in patient reports.

Co-founders Dr. Mark Kiel and Steve Schwartz led a live discussion and demonstration of the ACMG/AMP filtering process in the Mastermind Genomic Search Engine.

Topics Discussed: 

  • How to more efficiently identify and prioritize publications by ACMG/AMP variant classification guidelines
  • How increased specificity and immediate access to annotated search results accelerates variant interpretation workflow
  • How increased sensitivity in literature search results in fewer false negatives

Q&A

Hello, thanks for this Mastermind Masterclass on ACMG/AMP. I have a question about ACMG interpretation. On which databases are the ACMG criteria based on ? For example, for a mutation to be PM1, it needs to be found in a mutational hotspot based on a literature. How do you determine that ? Thanks in advance.

There are two ways to use Mastermind to determine whether a variant is in a hot spot region. The first way would be to look at the number of variants found in Mastermind in the vicinity of the variant being interpreted and assess the literature evidence supporting the pathogenicity of each of these. The other way would be to search Mastermind for papers that describe defined hotspot regions (using either the category keyword hotspot or a similar free-text keyword search) and have your curation results incorporate the consensus from such references.

Does Mastermind use the HGMD database ? If yes, is it the public version?

No. Mastermind does not depend on other databases to build the Genomic associations nor does it provide canned interpretations for variant pathogenicity. Instead, Mastermind contains information for many more variants and much more information per variant to ensure curators have the most sensitive search results to determine the most accurate and up-to-date interpretation possible.

What about transcript differences leading to variant nomenclature differences?

Mastermind includes results for all transcripts for each gene that may result in differing nomenclatures or variant numbering as a core comp[etency of its Genomic Language Processing (GLP) technology. As you know, most authors do not identify the transcript they are using in their publications and the results in Mastermind are purposely designed to maximize sensitivity of search results.

Can we search for SNPs functional evidence? Can we query by the RSID number?

Yes, Mastermind normalizes this type of variant search and can handily recognize RSIDs. Additionally, Mastermind recognizes “c.” and “p.” nomenclatures among many others.

Are you using machine learning or some sort of “AI” to scour the professional articles or is this done by “hand” ?

Mastermind uses a proprietary process we call Genomic Language Processing which has its foundation in NLP (Natural Language Processing). We have spent 6 years perfecting GLP resulting in a command of the “language” of genetics – recognizing the myriad ways genes and variants can be variably described by authors and reconciling and disambiguating and organizing the results. GLP is how we use automation to index the content of the literature so the needful information is readily available with a simple, single search in Mastermind.

Can searches be saved to compare them?

We do allow users to set up Alerts for variants to stay apprised of newly published information. The user’s Alert dashboard can be used in this way to keep track of the changes in literature citations for each saved variant.

Do other nomenclatures like BIC nomenclature be captured by Mastermind for the same mutation?

BIC nomenclatures are not explicitly included in the Mastermind indexing process. However, we strive to continue building upon our best-in-industry sensitivity for recognizing all variants cited in the medical evidence, regardless of nomenclature or formatting. If you know of any articles which cite variants exclusively by BIC nomenclature, please reach out to us, and we can add them to our roadmap Although the BIC database is no longer actively maintained or curated, if there are any older papers which use this nomenclature exclusively, they could be useful to add to the Mastermind data.

Which is the reference transcript (Refseq, MANE, etc.) used for searching the variants? Can other transcripts be used as well?

Refseq is currently used in the indexing process for both the canonical and legacy transcripts.

PM7 is not defined in ACMG (believe this was in the filter). Is that what you are associating with case studies?

PM7 was a one-time category of ACMG that was not widely adopted that relied on previously interpreted pathogenicity from a reputable source that had identified the variant in a clinical setting. The presence of PM7 in the Mastermind user interface is a reflection of this designation.

If I put in an intronic splice site variant, will it also find synonymous variants near the end of the exon (that may also affect splicing)? Intronic variants seem not too relevant unless there are functional studies.

Mastermind aggregates variants that influence splicing into splice donor and splice acceptor variants if the change is directly within the intronic nucleotides one or two positions from the exon-intron or intron-exon boundary. These variant groupings will be defined by the protein-coding position they are closest to and labeled with either an “sa” or “sd”.

Additionally, variants deeper within the intron but still close to the exon-intron and intron-exon boundary as well as coding variants in the exon that are near the exon-intron or intron-exon boundary are aggregated as “srd” and “sra”, respectively.

These splice region variants can be searched directly for results that would include any synonymous exonic variants within the splice region. To narrow down to only the synonymous exonic variants within the splice region, you can do a boolean search as well for the synonymous variant.

We have written a couple of blog posts that describes this feature of Mastermind that provides more in depth information.

https://www.genomenon.com/blog/non-coding-variants/

https://www.genomenon.com/blog/sensitivity-versus-specificity-non-coding-variants/

Do you highlight functional (meaning RNA splicing) results for such variants?

Functional studies that detail effects on splicing can be searched for using category keywords under “Genetic Mechanism – Variants”.

Do I have to filter by a certain ACMG guideline? Can Mastermind just tell me which ACMG criteria a given paper gives info about?

At present, we require the user to specify which aspect of ACMG they are searching for. In the future, we plan to have this information (which ACMG category is addressed in any one reference) automatically appear for each result that Mastermind returns.

Sometimes the same variant is differently named in different publications, mainly across time (older publications specially), how does mastermind deal with this issue?

Mark: There’s a couple ways to think about answering that question: The first is pretty generic and that’s to say that authors can use any kind of nomenclature that they want. I think it was talked about RSID level searching – if the variants mentioned in a paper as an RSID – that’s one way that an author can describe the variant. Another way is cDNA versus protein, and a more nuanced way to answer that more generic version of the question, is whether it’s following HGVs standard nomenclature guidelines or not – which it almost never is, and which is part of the genius of what Steve and his team have been able to build in Mastermind’s indexing.

The other way to think about answering that question is with legacy nomenclature, which is I think what you were alluding to when you talk about it throughout time. Mastermind is genomically literate and it’s aware of those legacy issues. A couple of examples would be a differential variant nomenclature and different transcripts of a gene. Another example would be legacy nomenclature, where you omit the first initiation which obviously changes all the numbers, and the third example would be a signal peptide that’s either included or not included depending on the author’s fancy. So Mastermind is aware of all of those things that I talked about, and just like Steve suggested, maximizes the sensitivity and then optimizes for specificity. Steve I dare you to build on that!

Steve: Everything that you mentioned, I would say is just scratching the surface in one respect because those are all of the things that make sense! There’s an entire class of variant nomenclatures that just plainly don’t make sense. For example, there’s what we consider the colloquial nicknames like in CFTR you’ve got the f 508 deletion that authors love referring to as Delta F 508, with a delta symbol, and there’s about 50 different ways that represent a delta symbol in different character maps. So things like that that we’ve addressed as well, and then there are also other ones that make even less sense for example, I can’t recall which variant it was specifically, but there’s one variant we encountered where the seminal paper on that variant referenced it by RSID but the author transposed the first and last number of the RSID, and so the RSID used by the author is a completely different variant than what the author was was writing a paper on. Because that paper was so early and so important for the study of that variant, the next ten papers published on that variant over the following years use the same transposed RSID number as the initial author. You literally end up with 10 different papers describing the same variant with a typo throughout the entire paper. That’s sort of the you know, going below sea level on the iceberg and that is alternate variant nomenclatures. There’s the ones that make sense like legacy transcripts, which we are fully aware of when we’re doing the indexing and searching for these variants.

We’re looking not just the current transcript for the variant, but we’re looking for those nomenclatures across different transcripts and recognizing the alternate ways to describe a variant across different nomenclatures, across different numbering, or position systems, taking into account things like wide shifts, from the fact that they ‘do or don’t know’ a number starting from ‘one or zero’ and then going into things like that that are just purely colloquial happenstance and part of the history of publishing on that variant. We try to address all of those. One of the ways that those often come to our attention because the genome is huge, so a lot of times we have customers and users who specialize in those genes and let us know about those issues. Sometimes they know exactly what the issue is and they can send us a list of articles that illustrate the issue. Other times they say. ‘you didn’t find this variant that I know this paper exists for’ and then we have to dig into why that was, but that’s always something that we try to do is ensure that we’re continually improving our pipeline, our genomic language processing, to take those kinds of things into account.