Ask the Masterminds: Leveraging Literature for Effective Variant Curation

The ever-increasing volume of genomic data and the corresponding literature can be hard to keep pace with, making variant interpretation a bottleneck step in the clinical report pipeline. As larger panels, WES, and WGS are ordered more regularly this leads to an influx in the number of variants that may require human assessment. The utilization of a comprehensive database of published variants and accompanying evidence annotations can significantly expedite interpretation, standardize calls between curations and laboratories, and bolster accuracy, adhering to our highest curation standards.

‍

Join us for our next live webinar where we’ll welcome two of our variant curation experts and our Customer Success Manager to explain how Mastermind helps clinical labs handle this increased volume and complexity of genomic information.

‍

You Will Learn:

‍

Methods for literature review/tools for integration of AI and machine learning
Challenges and solutions in published variant descriptions
Real life ACMG criteria applications from the literature

‍

Speakers

Denice Belandres

Director, Customer Success

Denice provides support and training for all products across our clinical diagnostic portfolio and for users at all levels. With a background in germline variant analysis and preimplantation genetics in clinical NGS labs, she turns feedback into function, enabling implementation of Genomenon solutions for a variety of clinical use-cases.

Jessica Farmer Bugarin

Curation Scientist II

Jessica is a highly accomplished Curation Scientist II at Genomenon, bringing a wealth of expertise in the field of genetics and genomics. With a deep-seated commitment to advancing genomic research, Jessica has dedicated her career to making meaningful contributions in the areas of rare diseases, oncology, and genomics as a whole. Her work at Genomenon showcases her proficiency in curating and analyzing genetic data with precision and insight.

KT Curry, MS CGC

Field Application Scientist, Genomenon

KT Curry, MS, CGC, is a genetic counselor whose career has spanned the full breadth of the rare disease landscape - from general and metabolic genetics clinic to laboratory genetic counseling, quality assurance of germline variant interpretation, and her current role as a Field Application Scientist at Genomenon. With deep roots in both clinical and molecular genetics, KT brings a uniquely integrated perspective to the challenges of rare disease diagnosis. She is driven by the belief that pharmaceutical solutions should reflect the full spectrum of human disease. No condition is too rare, too complex, or too overlooked to deserve scientific pursuit and meaningful therapeutics.

‍

Genomenon

The Real-world Evidence to validate a drug target, identify trial-eligible patients, or change a diagnosis already exists. It is buried in 39 million biomedical articles, locked behind paywalls and supplemental files most researchers never find.

‍

Genomenon closes that gap. Fit-for-purpose AI-powered search reads 11.2 million full-text papers and 3.7 million supplemental datasets. Eighty expert scientific curators validate every finding. The result is structured, traceable, regulatory-grade Real-World Evidence at the genetic variant and patient level. Loxo@Lilly used Genomenon to add 73 variants to the RET label. In a head-to-head, Genomenon identified 83% more rare disease patients than ChatGPT plus OpenEvidence.

‍

250+ diagnostic labs and 75 biopharma programs rely on Genomenon as the evidence layer behind precision medicine.

‍

WEBINAR TRANSCRIPT

‍

KATE: Hi everyone, and welcome to today’s webinar, where we will hear from our Masterminds on ways to leverage literature for effective variant interpretation. Thank you all for joining us! My name is Kate Oesterle, and I’m a member of the Genomenon team. Before we get started, I wanted to cover some housekeeping items as everyone starts to join today’s event. Some of the features that our speakers will be covering are only available in Mastermind Professional Edition. If, by some chance, you don’t have a Mastermind account yet, you can create one today by using the bit.ly link that you’re seeing here on the screen, and that will also be dropped in the chat window, which will start you with a complimentary trial of Mastermind Pro. We also welcome you to submit questions and share your thoughts and feedback about Mastermind in the chat. If we have time, we’ll have a Q&A session at the end of the presentations. If we don’t get to your question today, a member of our team will be in contact after the webinar to follow up. There’s also a handout about Mastermind that you can download from the resources section in the Zoom window panel below. Also, today’s event is being recorded, and we will be emailing that to you soon after today’s presentation. We’ll also have this posted to our Genomenon YouTube channel, so make sure you’re subscribed there, and it will also be posted to our website. You may have noticed that Mastermind is now the genomic intelligence platform. That’s because it’s now much more than just a genomic search engine. Mastermind provides a growing number of pre-curated variants, information on clinical trials and treatments. We have an associations page where you can see connections between diseases and genes, variants, phenotypes, therapies, CNVs, and the updated gene information page. I’m looking forward to Denice telling you more about that, and KT and Jessica will be showing you more about Mastermind’s capabilities to ensure that your team is getting the most out of your access to the platform. With that, now I’d like to welcome our speakers and join us for some introductions. Hello, welcome. First, I’d like to welcome Denice Belandres. Hello, Denice. Thank you for being here. Denice is Genomenon’s customer success manager. She provides technical support and training to Mastermind users at all levels, with a background in germline variant analysis and pre-implantation genetics in clinical NGS labs. Denice turns feedback into function, enabling implementation of Mastermind for a variety of our clinical use cases. Thanks again for being here, Denice. DENICE: Excited to be here.

KATE: Next, we have Jessica Bugarin. Hi, Jessica. Jessica is a highly accomplished curation scientist at Genomenon. She brings a wealth of expertise in the field of genetics and genomics. With a deep-seated commitment to advancing genomic research, Jessica has dedicated her career to making meaningful contributions in the areas of rare diseases, oncology, and genomics as a whole. Her work at Genomenon showcases her proficiency in curating and analyzing genetic data with precision and insight. Again, welcome, Jessica. We’re so happy you’re here today. JESSICA: Thank you, happy to be here.

KATE: Last but not least, we’re also joined by KT Curry. Hello. KT is our QA team lead for the variant curation team here. She has a clinical genetic counseling background in genetics and metabolic patient care, and specializes in rare disease, kidney disease, and cardiogenetics. KT leads a team conducting quality control and assurance for curation projects for our Mastermind and pharma customers, and creates variant curation training materials. Welcome, everyone. We’re so happy you’re all here. I’m going to hand off to Denice, and then we will get started with today’s presentations.

‍

DENICE: Awesome, thanks so much, Kate, for the introduction. I’m going to start with an introduction of Mastermind and highlight some of the features that Jess will be going over in greater detail a bit later. Most of you are probably familiar to some degree with Mastermind as a tool for searching the genomic literature, but as Kate mentioned, it’s a lot more than that. It’s a genomic intelligence platform, and really functions as an associations engine that can be used to find and connect genomic concepts. What you see here on the slide are examples of the kinds of things you can search in the system and how you can combine different concepts and uncover associations between them by reviewing the literature. It’s really supporting genomic scientists in the literature review step of the interpretation workflow. That’s been at the core of what Genomenon has been doing for years. Today, we’re excited to discuss the work that we’ve been doing internally around variant curation and how we’re displaying that curated evidence in Mastermind within the interface. Then, we’ll also talk a bit about the services we offer around curation as well. If you saw our last Mastermind masterclass webinar, you’ve seen the new gene information page, which was released as part of Mastermind 3.0 a few months ago. On this page, you’ll find information like a breakdown of variants by type and classification, summary-level information about your gene and its function, as well as manually curated gene-disease relationships. There are over 9,000 of these GDRs available in Mastermind today, and that number continues to grow. The gene page is also a nice jumping-off point to dive into individual variants, and Jess will talk more about the individual components on this page, especially from the perspective of a variant scientist. How the information displayed here can be helpful for an analyst, especially when encountering things like novel genes and variants. Here, we’re looking at the evidence page for a particular variant in our search bar. You can see the articles mentioning this gene and variant are listed on this page. We see our ClinVar integration and the related variants, which we can explore using the blue and green tabs here under variant info. We know that we can apply filters to the search to prioritize specific types of articles. Importantly, we see this variant has been curated by Genomenon and has a provisional call available. That means the literature here has been pre-read and reviewed by our curators, and in this case, the evidence was such that we were able to classify this as pathogenic. All of the supporting evidence for this curation can be accessed by clicking “View Interpretation,” which is what we’re looking at now. Once you click that button, you’ll land here on the interpretation page. I’m not going to steal Jess’ thunder by giving away too much at this point, but it’s on this page that you’re going to see all the information that the curators reviewed and used to arrive at the provisional call that we see in the top left corner. Our expert team is using ACMG guidelines in their work, so the data here is displayed and organized in a way that makes it really easy and clear to see what category was applied, why it was applied, and what literature there might be to support this. Genomenon is on a mission to curate the genome. That means that if you encounter a variant in Mastermind that hasn’t been curated yet, we also offer services where we curate your variants on demand. We call this service Curate Pro. Our curators, like the lovely ones on our webinar today, are highly trained experts and they’re the people behind this work. It’s really our team’s expertise, paired with Genomenon’s AI curation engine, that allows us to turn around variant curations really quickly. The variants are of course curated to ACMG guidelines and are independently reviewed by our QA team before final delivery. If this is interesting to you, please send us an email to hello@genomenon.com, and we’d be happy to schedule a time to chat about curation on demand. With that, I will pass the reins over to KT, who will kick off our presentation content.

‍

KT: Thank you, Denice. Yes, so we’re going to be talking about leveraging the literature for effective variant curation. These are the points we’re going to cover today: some basics of why you need the literature for variant curation; methods for the review of the literature, and integration of AI and ML; once you identify the literature, how do you evaluate it and select an appropriate paper? We’ll go through some challenges and solutions for how variants are actually published in these papers. Then, of course, Jess will walk us through some of the real-life scenarios. In this visual, we’re looking at a representation of a variant calling pipeline, from data production to the end product of a clinical report. You’ll see that the bottleneck here is at the interpretation and report writing step. This is really where we’re going to focus today. This step takes a lot of human input to interpret what evidence is out there in the literature, and then combining that with all our computational tools, allele frequencies, etc. Published scientific literature contains our evidence for applying gene-, variant-, or allele-level evidence. Such data is really invaluable. However, we know the sheer volume of genetic data and corresponding literature can be overwhelming. The pace at which new research and discoveries are made means that keeping up with the literature can be daunting, yet it’s essential to our role as curators. We’re going to talk about methods for the literature review. What we’re looking at on this slide are some examples of search methods and platforms. This table is in the curation SOP for ClinGen bio-curators when they’re looking for the first steps you need to take to perform a literature search. Mastermind is down here, we’ll of course talk about that a little bit, but there are some other search tools that we’ll touch on today. The main lesson in this is always to know the strengths and limitations of the tools you’re using, like everything along this step in a curation process. These are some considerations for different tools depending on what you’re using. If you’re not using a genomic language processing system like Mastermind, it’s recommended to consider all the different ways your variant and gene might be referenced in a paper. You’ll see, this is again from the same training deck, let’s use quotation marks, parentheses, Boolean search terms to have all the different ways a variant can be seen. Spacing issues, the three-letter amino acid abbreviation versus one, is a nonsense variant going to be “Ter,” “X,” star, “Stop”, there’s so many different ways you’ll find these variants in papers. RSID numbers, and of course the historic IVS nomenclature. That’s something that is somewhat predictable, you can find that for some papers. We’re going to go through some examples to show that there are many formats for historic publications. It’s hard to know what to search for; it’s hard to predict how they would be published. In Mastermind, we have, built into the genomic language processing system, conversions of some of these variants that happen automatically. IVS nomenclature is normalized to the intronic HGVS standards, full amino acid names to the one-letter or three-letter abbreviations, also alternative gene names. Those can change over time, and that’s something that the system is taught to do. There is a Mastermind publication that came out comparing some of these search tools, and we reported on our findings. One important item was that PubMed only searches the title and abstract of a paper for the variant that you might be searching. When I was clinically practicing, this was my go-to search tool, so if your variant is not listed in those sections, it will not populate for you. We have found that this can miss up to 94% of variant citations. Another alternative search tool would be Google Scholar. Definitely more broad in application in terms of a search, but there are again some limitations, one of them being it might pick up your variant, but it won’t know the relationship, is it in the gene you actually searched for. So, it’s having a hard time connecting multiple pieces of information. If you get a lot of false positives returned, owing to that inability, that’s when you start having a decrease in the specificity of your searches. We’re looking at a different way to view what Denise was showing before, but again, the integration of our technology that increases the specificity and sensitivity of literature searches. The literature is indexed and using natural language processing techniques that have been tailored to genomic data, it has the ability to identify a genomic paper by disease, phenotype, therapy, gene name, variants in their different forms, and organize it in a way that really helps make your variant curation more effective and efficient. We know time matters in turning around a test report. There’s still a very important component of human review of these papers and variants, and we’ll cover some of those examples coming up here next. I do want to point out, the perspective of this talk is not only for variant curators but for those that might be on the clinical end of receiving test reports. You see a paper cited in a clinical report, your variant is contained within this PMID, and you’re trying to navigate there. One of the first things when you’re looking at these publications is, when was it published in time? We know that over the scope of a disease being described, the authors will have different information available on the disease mechanism and associations, so they can make different conclusions over a two-decade period because they had different information available, and you have to take that into account. I know this slide’s a little bit of a throwback to our first undergraduate courses on how to read a scientific paper, but through experience, we know that your variant under evaluation might be located in any of these sections of a paper, and it does take digging. Ideally, you’re not the one doing that digging; your search tool is doing that for you. So, I’ll highlight two of my favorite sections of these papers for variant information, one of them being the methods section. Of course, we know it’s going to tell me my experimental and computational methods used, but really importantly for variant curation, it might be the key to understanding, are these human variants being described? Are they germline, somatic, or both? How are the participants being described? Is this a healthy cohort or an affected proband cohort, and how was that determined? Do I agree that they’re affected? Are they using the same criteria that we are using to count a proband? And of course, when they’re giving us the information on details like what sequencing technology was used, data analysis pipelines, and variant calling process, this can help a curator know what level of ACMG criteria can be applied. The second one is the supplemental material. So, additional data such as tables and figures, methodology, extended analyses like splicing and functional studies, and this is where they hide a lot of those big pedigrees, which, of course, we need to know in some cases to know, am I going to apply a PM6 or a PS2? That will be hidden maybe within the supplemental. When our group had done that audit of reviewing the frequency of locations of variants within a publication, it was found that the variant was only located in the supplemental material in 22% of publications, which is huge. To minimize false negatives, you want to make sure you’re doing an adequate search that’s performing not only the full-text matches but also supplemental data as well. Alright, now we’ll get into some of the fun stuff of nomenclature. So, now once you have found your paper and you read the paper, you kind of understand the general ideas and conclusions of the authors, our next step is really confirming and normalizing our variant nomenclature. Unfortunately, HGVS standards are relatively recent in the span of genetic publications, so there still can be a lack of adherence, even for new publications. Human error, as we’re all prone to, evolution of these rules over time can make for a lot of nuances that you have to consider when you’re looking through the literature, and we’ll highlight some examples of those. When our group curated the CFTR gene, one example of many ways a variant can be published, we found that there were 172 ways the most common variant was described in publications. So, this is an aspect of variant curation that I honestly did not fully appreciate before moving into variant curation when I was in my clinical role. A lab would infer, my patient variant is located in a certain PMID, and when I went to review that publication, I couldn’t find my variant in there, and why was that? We’ll be walking through what some of those examples may be. First, I do want to share that we have created this intronic variant quick reference guide for anyone. Anyone can download this for free from our website. It’s very helpful, I think, when you’re reviewing publications that are presented in a non-traditional or an alternative nomenclature format in publications, and you’re trying to convert them to HGVS standards. Because they might be written out in this alternative way, having a visual to kind of pare that down, we have found to be very helpful. On our next slide, we’re going to be highlighting some alternative ways that these splice acceptor sites might be described, so keep that in mind. So, this is an NF2 paper showing one example of unstandardized format. In some cases, like the one above, these variants are resolvable and able to be curated once you get used to the nuances of the paper. If we go back to the recommendations of doing those Boolean search terms and trying to predict every single way your variant might appear, this is just one example of, “how could we predict this,” right? When variants are presented in tables like this as well, that’s a really hard format for some softwares to deal with. So if we take this family group, patient 20, and we’re trying to figure out what variant they are describing here, they’re telling us that we have an acceptor splice site that’s being affected. The next exon would be exon five, and then we have the AG with the lowercase letters going to the capitalized TA (agTA), and then that’s getting switched to atTA. What we have here is really at the -1 acceptor splice site, a G is being switched to a T. Then we’d want to compare, okay, are we in alignment that I think this is the -1 location, the dinucleotide before exon 5, are those the first two base pairs of exon 5? So we’ll check it for this one, yes, that seems correct, let’s check a few more within the paper and make sure it’s all fitting into that pattern. Once that’s done, we can then curate this on our canonical transcript, which we always aim to be the main select transcript, and put it as, in this case, it ends up being c.448-1G>T. That’s where the human input does come in to help normalize some of these. This second example is showing a paper for MECP2, associated with Rett syndrome, an example of a paper highlighting a couple of things. So one is that we have a mix of some original data, so if it says “this study,” it was the authors are reporting on this one, versus, if they have a referenced paper, we actually wouldn’t want to curate this patient and variant under the current PMID. We want to go track down that original one. The other thing to note is, of course, for nomenclature, you want to make sure they’re reporting on a variant in the same transcript as you are. In this case, we see that the title is letting us know that all these variants are in exon one, and they’re giving us a review of what some of these variants are. They start us off with a start codon variant, M1del. When we are curating, we notice, hey, we’re actually using the main plus clinical transcript in this case, and my exon 1 is actually in the 5′ UTR. So, this seems to be hinting that they are not using the same transcript as what the canonical transcript for our project is at this point. We can go back and check some of these reference papers, do some other searches within the paper to say, alright, look, they give us the sequence they’re using, they tell us the isoform. All of this is aligning to say we need to do a nomenclature switch to put it into the 5′ UTR, and Jess will show a little bit how we show this in our curated evidence. We will put in, the variant was published as X, and it is now Y on our transcript. We try to make that clear when these types of changes are made because that’s helpful for everyone. I will say, a lot of publications, unfortunately, don’t give you the transcript they’re on. You have to use all these clues. If they always told us a transcript, that would be beautiful, but they don’t, unfortunately, and you have to do some of this digging to find the right answer. We have some internal software tools that help us flip between transcripts, what the nomenclature may be, but there are public tools available. You see here, even ClinVar, for this example, is showing it nicely, saying, alright, on the main select transcript, this start codon variant is equivalent on the main plus clinical transcript to something in the UTR. So, if you’re using some of those databases as well, it can be helpful to look there. When we are, overall, as a curator, as we’re going through the literature and we’re trying to normalize some of this nomenclature, honestly, I really relate to being a sleuth. I know this is something we commonly hear in our workline, but you really have to look for clues, read between the lines. I think we all have gained lessons on, when we’re a publishing author, what’s all the information we’ll want to know about the transcript and the variant, just to make everyone’s lives easier. Also, the other point that can be difficult is really, do we know if this is an original cohort of patients? Again, not every author says, this patient’s been previously published in X paper. You have to start looking and finding when you’re doing these, the variant in a high volume, you’ll see, hey, I have four patients, they look very similar. Now I’m seeing a repeat in the author group, and it looks like they’re all from the same country and the same institutions, and we’ll start unweighting some of those that we think are suspicious for a repeat cohort. We know counting probands, especially segregation and everything, over counting is going to bloat your call, and we don’t want that to happen. From here, I am going to pass it on to Jess to then switch us into mode of, what does this evidence actually look like in Mastermind? I will have her take it away.

‍

JESSICA: Awesome, thank you so much, KT. Let me just go ahead and share my screen here. Okay, so as KT mentioned, I’m going to go ahead and do a walkthrough of our Mastermind Genomic Intelligence platform, and so, this is the homepage of the website, and the gene I’d like to demo today is actually CFTR. I picked that gene, as I’m sure most of us are already pretty familiar with CFTR. Here, you can begin a search just by typing in the gene name. I’ll go ahead and type in CFTR here, and then I’ll go ahead and click this search button. Once we search for our gene, it pulls up our gene page. This is new with Mastermind 3.0, and I’m going to walk through some of the features on this page. Over here on the left-hand side of the screen, we have this circle plot here that shows us a breakdown of all the variants in the literature. These are variants that have been classified by us at Genomenon. Alternatively, if you’d like, you can click on this little ClinVar button, and it’ll show you how many ClinVar entries there are for this gene. So, we can kind of do a comparison here by the different variants that are in Genomenon’s database, Mastermind, versus ClinVar here. We could switch back and forth if you’re interested to make a comparison. If I scroll down a little bit here on this page, we have some summary information about our gene. If we click this little box right here, it pulls up information about the gene. Here, I’d like to point out that it does show the canonical transcript. This is the transcript that we used for curating all of our variants. Over here, we also have some intrinsic guidelines. These are just recommendations that Genomenon has made, if you were wanting to know if you could apply some of these intrinsic ACMG criteria to a variant of interest. Over here on the right-hand side of the screen, we have our gene-disease relationships. What I’m mostly going to show you today are curated variants, but I did want to note that curators here at Genomenon like myself do curate gene-disease relationships. For CFTR here, we have seven different gene-disease relationships curated for this gene. Up at the top, we have this diagram here. This is a plot that shows us the distribution of all the variants along the protein. Here on the x-axis, these are the amino acid locations. If you want, you can zoom in and out, you can scroll left to right. Something I did want to point out is if you zoom all the way out, here down at the bottom, we have these little bars here. These bars are actually the different domains of the protein. If you click on one of these, like this one here, down here in a variant list, it actually filters our list for variants that are in that domain. So, this is the amino acid range of that protein domain. If you were interested in finding variants in a specific protein domain, you could go ahead and click through these different bars here, and then it would show you a list of the variants that are found in that region. Okay, moving to our variant list, I wanted to start off by showcasing this variant right here at p.F508del. In our variant list, we can see it’s a deletion, it’ll show you the type of variant that we’re looking at. We have the cDNA position of this variant, and if we click this little carat right here, we can see all the transcripts that this variant falls on, and we can see that our canonical transcript here is included in this list, so we know it’s on the canonical transcript. Over here, we have a few different buttons. This little G here indicates that there is curated content in Mastermind for this variant. This NIH symbol indicates that there is a ClinVar entry for this variant. If you click that, it would take you to the ClinVar entry. Then, this number here is the total number of articles that we were able to find that contain this exact variant. If you go ahead and click on this number, it will bring you to the variant evidence page. On this page, if the variant has been curated by Mastermind, we’ll see this little window. We can click on this handy-dandy “View Interpretation” button, and this will bring us to all the information that we utilized to curate the variant. In this window, we can see that there was some functional evidence curated for this variant. There’s also some clinical information, so we observed cases and the variant segregating with disease in multiple families. We also were able to apply some intrinsic ACMG criteria to this variant. If we move down here again, it shows us our variant. It’s a deletion. It will indicate whether or not this is a coding variant, and because we have this check mark, we know it’s a coding variant. Here lists our provisional ACMG call, based on the evidence that we were able to gather. Here, this variant appears in over 10,000 articles. I wanted to point out the ACMG categories here that we were able to apply. This, on the left-hand side of this box, is the ACMG category, and then if there’s a number on the right-hand side here, that’s the number of times we actually applied that curation. For example, there are four papers that we were able to curate PS3 evidence from. You may be wondering, “wow, these numbers are pretty small compared to this article evidence count here, over 10,000.” I just wanted to point out that our teams do review, comprehensively, all the evidence in favor of pathogenicity and in favor of the variant being benign. For really well-published variants like this, the evidence shown and viewed may not be all of the publications. I just wanted to point out that discrepancy here. If we take a look further down on the page here, we have some more information. Again, we have our canonical transcript. Here, we also pull population data from gnomAD. This is really important if you’re wanting to apply a PM2 if your variant is really rare. You can view this information here. Further down here, we have our intrinsic information. Here we applied a PM4, and this is because, obviously, this variant is an in-frame deletion, and it is not in a repeat region. We were able to apply that PM4 criteria. Moving along to information that came from the literature, we have clinical curations here, and then we also have some functional curations. Here, we can click the “See All” button to expand all the clinical curations that we did. We were able to apply PP1. We went ahead and bumped it up to “moderate,” because we found it segregating in multiple families, and you can view the curation here. This is just a little snippet from the PubMed ID. If you were interested in seeing the PMID where this curation came from, you could actually go ahead and click this hyperlinked PMID button here, and that would pull up the PMID for where this curation came from. Again, we have more clinical curations. Here, we have what’s called a PS4M, and that’s just kind of our own modification of PS4, downgrading it to moderate just because we observed it in multiple cases throughout the literature. All three of these curations here had multiple patients that had the variant and CFTR, and so that’s why we curated that. Similarly, over here, we have our functional information. We were able to apply a PS3. Again, we curated it in four papers. If we expand the “See All” button, we can see all the different PMIDs where these curations came from. Again, if you were interested, you could go ahead and click on this hyperlink right here, and it would take you directly to the PMID where we pulled this curation from. Something nice about clicking the variant from the gene page is that you can actually ex out our variant, as it opens it in a new tab when you click it from this list, so we still have our gene page pulled up. The next variant I would like to showcase is a t1220 frameshift, and I just wanted to pause right here, we see that I have only typed in t1220. The variant I’m going to showcase is t1220 frameshift, but if you wanted to see other variants at that same amino acid position, it will pull up all of the different variants that have been found at that same position. Just a reminder, this little G means that we have curated content in Mastermind for this variant. The little NIH symbol indicates that there is a ClinVar entry for this variant. Then, this is the total article count, which is also hyperlinked to the evidence page. If we click that, again, it’s very similar to the last variant we looked at, and we can go ahead and click this “View Interpretation” button because we know that there is curated content in Mastermind. If we click that button, we see clinical information. This one actually does have a PM2 applied, as it was rare when we looked at the frequency in gnomAD. Then again, here we also have some intrinsic criteria, because this is a frameshift variant. Similarly here, we have the information about our variant. It tells us it’s a frameshift. This is a coding variant, so again, we have our little checkmark here, and the provisional call for this variant is pathogenic. Similar to our last variant, we also have a lot of papers with this variant published. There are 418. Again, we have one PP1 curation and one case curation here, but just a reminder that we do comprehensively review all the either pathogenic or benign variant information. And so, very similar to our last variant, we have our canonical transcript over here. That’s the transcript where our variant is found. For our population information, it tells us that the allele frequencies are below this threshold in gnomAD, so that’s why we went ahead and applied a PM2 here. We also went ahead and applied a PVS1 to this variant. This variant is loss of function, and loss of function variants in CFTR are known to cause cystic fibrosis. This is why we went ahead and applied that PVS1. Moving along over here, for this variant, we have clinical information. I wanted to point out specifically in these curations that most of the time, the nomenclature was appearing on a legacy transcript. Part of our job as curators is to identify when variants may not appear correctly in publications. It’s on us to correct that nomenclature. That’s kind of a big part of our job here. KT mentioned this earlier, and so I wanted to show an example of it, but if there is a variant that requires some nomenclature correction because it’s on an older transcript, we will go ahead and leave a note here that we did correct the nomenclature and put it on the canonical transcript. In this publication here, the variant was referred to with some legacy nomenclature, and if we translate that to the canonical transcript, this is actually the correct nomenclature. We did that for another publication as well. We have this curation right here. You might be wondering what this type of evidence means. This is just kind of our own spin on PP4. And so, here, the PPC HOMO just indicates that there was a homozygous case in the literature. In this publication, the variant was referred to with this older kind of legacy nomenclature, and so the curator here was able to correct that nomenclature to be on the canonical transcript, and provide it for you here in a note. If there is any nomenclature correction, you will see a comment here. Okay, so next, I wanted to showcase an intronic variant. And so, what’s nice here, say if you didn’t want to go back to the gene page and you already have a very specific variant in mind, and you wanted to search it, you could scroll back up to the top here to this search bar. We can click out of this variant, and I already know the intronic variant I want to search, and that is c.3140-26A>G. If there’s a variant for it, it will show up in our search bar here, so we can go ahead and click on it and then click our little search button over here. Like I mentioned, this is an intronic variant, and we have kind of different ways that you would be able to search for an intronic variant if you were interested. Usually, when you search an intronic variant, a little note like this will pop up at the top. We have our specific variant here, and this number in parentheses indicates that there were 200 publications with our specific intronic variant of interest. If you were more interested in, say, maybe the intron as a whole, you could search for this type of nomenclature here, or you can click on it. This little “int” just indicates all the variants that are in the corresponding intron. This “G1047inta” here indicates that it was in the site closest to the splice acceptor site. If you were confused about what these acronyms mean, you could actually go over here to “Help” and click on our “FAQ” button. I’ve already pulled up the FAQ page, but I just wanted to show you how you could get to it. There’s other useful information in here as well. This is our FAQ page. I mentioned that the “int” means it is this intronic region here. This encompasses the entire intron. The other abbreviation we saw was the “inta.” These essentially are variants that are going to occur in the splice acceptor half of the intron. Here, it would appear kind of in this half region of the intron that’s closest to the splice acceptor site here. You can visit our FAQ page, and we have a ton of more useful information on abbreviations and all things Mastermind. Moving back to our variant again, this has been curated, so this is in Mastermind. We could go ahead and click this “View Interpretation” button. The “View Interpretation” button will pop up just like all the other variants, and we can view the interpretation. Here, we have information about our variant, kind of those abbreviation or keys in Mastermind that we could also search. The variant here is intronic, so we know it’s not a coding variant, so there’s no little check mark here. The provisional ACMG call is pathogenic, based on the information that we have found in the literature. There are 324 papers for this variant in this intron, but again, we comprehensively review all the evidence and select a few publications to curate from there. We were actually able to apply a PVS1 two times from the literature. We saw publications that contained multiple patients with the variant. Again, we have a PM2, and I wanted to point out this “FX.” This is actually kind of an internal criteria we use at Genomenon. If you see this “FX” here, this just means that there was a publication that was functional in nature, but did not have sufficient enough evidence to reach a PS3 classification. But we still want to record that there has been functional information out there. Very similar to our last couple of variants, we have the canonical transcript here. We have our gnomAD information. Because it was below this threshold here in gnomAD, we went ahead and applied a PM2. Something I wanted to point out that was kind of unique about this variant is that we actually had to find PVS1 evidence in the literature. In most cases, we will automatically apply PVS1 if a variant is truncating, so a frameshift or a nonsense or at a +1,2 splice site or -1,2 splice site, but here, because this variant is kind of deeper into the intron, we actually had to find this information in the literature and manually apply this PVS1 twice, because we found studies showing that this variant led to curation. I will note that we do also understand the nuances of PVS1. It’s not always applied to every truncating variant, in instances where it shouldn’t be applied. We do understand the nuances of that, so we will remove an automatic curation if necessary. Finally, here, we have all of our case curations. Again, it shows that there are six different case curations here. If we click this button, we can go ahead and expand it and see all of our curations. Very similar to the last variant, this variant was usually referred to on a legacy or older transcript, and so we went ahead and had to correct this nomenclature here to our correct variant. Commonly in the literature, it was referred to as this variant right here, but we had to identify it and then correct it and put it on the correct canonical transcript. Like I mentioned, all of these are just multiple cases. These were cohorts that contained multiple patients that had our variant, so we curated that information. Then we had a couple of publications where there were individual cases. Again, if the zygosity is provided, we will usually note the zygosity. This individual was compound heterozygous, and so you would know that just from this curation information up here. A couple of other things I just wanted to quickly point out on this page here is that we have this button over here on the right-hand side in this blue box. If we click this “See Options” button, this will pull up all of the therapies or clinical trial options. This kind of information would be really useful for clinicians. If you were interested in finding treatment options for your patients or clinical trials, you could click out to these links here. I also wanted to note that all of the pages are live. If you went ahead and came up here to share the link or if you even just copied and pasted the link from your search bar up here, if you share this link, it is live. If you save it and you come back a few months later, the information may have been updated since then. If we have found new publications in the meantime, we’ll add them, and then that will display here. You don’t have to do anything. The link is live. All right, so that concludes my Mastermind demo here. Right now, I would like to go ahead and invite all of our speakers back on and transition into Q&A. Thank you so much!

‍

DENICE: Awesome. Thanks so much, Jess. That was awesome. Welcome back, KT. Let’s dive right into the Q&A, because we got some great questions that came in. Let’s see. This first one, I think, KT, you can take this. “Do ClinVar and Mastermind utilize the same publications to classify variants?”

KT: Good question. In some cases, we would say the answer is yes. It really depends on the size of the number of publications for the variant in that gene. If this is a newly described gene, there’s only a hundred papers that have been published in total with clinical information or functional information, it’s likely Mastermind and ClinVar will be pulling from the same references. However, Mastermind is highly sensitive, so it’s possible different clinical laboratories are using different search tools to pull in that literature. It’s possible, if they’re not using Mastermind, could they have missed some papers? That’s possible. We do always have to keep in mind, you know, ClinVar entries are batched. Clinical labs and researchers over time usually release data sets and submit into ClinVar over time, but it’s not live when a variant is reported. It’s always possible we have new publications that are in Mastermind, and have been curated possibly, that are not in ClinVar for that variant yet, but maybe at one point in time. I would say.

‍

DENICE: Yeah, totally makes sense. I think a follow-up to this one is, “are there ever discrepant variant classifications between ClinVar and Mastermind, and if yes, how are those handled?”

KT: Yes, there are discrepant calls. For some of the reasons that we just covered, it is nice that we have our curated content, and then you can flip to that tab of looking at ClinVar right within the same page. They each have some things going for them and are unique in what they’re pulling from. We know that the amount of information and quality of information for ClinVar submissions can vary based on the submitter. We all know those submitters that we love reading their reviews — they give you every publication, what they used for each criteria, and you can feel pretty confident, while others might just not have anything, they’ll make a call without any details. It’s definitely possible, they can be discrepant. Everyone has their own style and maybe some of their own internal rules of, let’s say, their PM2 thresholds. So that’s going to, if what we have set, which we do, you know, put on the website, “hey, this is what we made the call for PM2,” if someone else has modified that, then that can also make it change over time. Hope that answers the question.

‍

DENICE: Yeah, definitely. Thanks. The next question: “Are all these features only available in the professional version?” I can take this bit. On the curated content, the vast majority of the curated data for variants and genes is available for Pro users only. That said, there is a limited amount of information and data available to basic users using the free version of Mastermind. In those instances, the situation is that a partner has paid to make that information available to all users to help increase diagnostic rates for that disease area. It is mostly for pro, but if you are using basic, you may stumble upon some of those as well. This next one, I will toss over to Jess. “How are repeat cohorts addressed in publications and in the curated data?”

JESSICA: Yeah, great question. We carefully assess and choose what publications that we use for variant calls. We have several different factors that we assess and look at to identify repeat cohorts. This includes, as we’re curating, we track down original sources of patients. We examine the author list, the location of the authors or the cohort, and the cohort size. This information, if we see it being repeated, that’s an indication that this cohort may have been published previously. We also assess all variants in a gene rather than going a single variant at a time. This makes it easier to identify those repeat cohorts, as a list of variants will also overlap. If there has been a repeat patient or cohort, you will see what’s called an ad report flag on the curated content page.

‍

DENICE: Perfect. Yeah, and just so folks know what we’re talking about, it’s those little icons where you see all the different other ACMG categories. You might see one of those badges that says “ad report.” So, thanks so much, Jess. That just goes to show how much sleuthing is involved when we’re doing this curation work. Awesome. This next one: “There are various transcripts in a gene where the amino acid numbers can be different,” and then they gave an example. “Will Mastermind give variant information for all the transcripts in a gene?” I can take this one. Yes. When we’ve curated that variant, we will have normalized the transcripts, and you’ll see us make notes on the interpretation page. I think Jess showed one where it was like, “They described it as this; we’ve corrected it as this.” For general searching in Mastermind, the answer is also yes. When you search by c dot nomenclature, that’s how you can see all of the different possible protein outcomes of that change. The answer is yes on both fronts, for curated content and for just regular old Mastermind searching. All right. “Are the c dot positions for CNVs in genes where CNVs are known also curated?” So, are we curating CNVs, I think, is the main question, and I’ll let KT take that one.

KT: Yes, sure. We are starting with SNVs for the clinical exome, but at one point in time, yes, we’ll get there. As of now, no, most of what you’ll see are 50 base pairs or less in the curated content.

‍

DENICE: Perfect. This is a bit more of a general one, but “how can I submit feedback about a curation?” I can take this one. Up in the top right corner of your screen when you’re in Mastermind, there’s a button that says “Contact Us.” That is how you get in touch with our team. That can be feedback about curated content; it could be a question about curated content or really any kind of Mastermind search. That’s the best way to get in touch with us. We definitely want to hear your feedback and comments and questions and help you out with those. Don’t be shy to use that. And that’s right within the application. If you’re in your email and not in Mastermind, and just think of a question for us, you can also send us a message to support@genomenon.com. All right, probably have time for a couple more here. Jess, for you: “Does Genomenon make any modifications to the ACMG guidelines when curating variants?”

JESSICA: Yep. I touched on this a little bit when I was showing Mastermind, but yes, we have reviewed and made modifications to certain ACMG categories, similar to how many other institutions have also made modifications to these guidelines. We actually have more information about these modifications in a document on the Mastermind platform if you were interested.

‍

DENICE: Thanks so much. All right. Next one we have here is, “how often does Mastermind data get updated?” I can take this one. When you’re doing searches in Mastermind, the articles that you see there, those are updated on a weekly basis. New articles are getting incorporated into our system every week. The ClinVar integration is also updated weekly. If information changes in ClinVar, new submitters, that is also updated weekly. On the curated content side, we will definitely revisit these classifications. They’re not set-it-and-forget-it. We will prioritize classifications that are likely to be substantially changed — thinking of VUSs — we prioritize those before really well-known variants. We’re looking to do that at least annually for the curated variants. All right. I think we’ll just do one more, and I’ll take this one. “Can we submit our variant data for interpretation, especially for those that are very rare or novel?” Absolutely! In terms of Curate Pro and our services around curation, it can be any type of variant, rare or not. /we’d be happy to chat more with you about that. Do send us a note to hello@genomenon.com, and we can chat a little bit more about our curation services. But it looks like we are just about out of time here. I want to thank everyone for attending. Thanks to Jess and KT for sharing your expertise with us all today. If we didn’t get to your question live, our team will get in touch via email to get those answered for you. We hope to see you at our next webinar or, better yet, come see us in person at ACMG in Toronto next month. Thanks again, everyone. Take care. Thank you.

‍

Download PDF

The World’s Most Comprehensive Source of Genomic Evidence

Mastermind accelerates variant interpretation with immediate insight into the full text of millions of scientific articles. Prioritize your search results by clinical relevance and find what you are looking for 5-10 times faster

Explore Mastermind CORE

Every Missing Genetic Variant is a Patient Your Label Doesn't Reach

Rare disease and precision oncology programs rely on evidence fragmented across millions of published articles and supplemental datasets.
Genomenon builds the custom real-world evidence your program needs.
AI-powered search. Expert scientific curation.
The result: a broader label, more eligible patients, and a regulatory filing your team can defend.

Request a custom evidence build for your program

speak with an expert

Real World Evidence

Genetic Disease Sponsorship

Software Solutions

Services

Data

Webinar