Clinical exome sequencing can result in a large number of genetic variants requiring interpretation. For a single patient, assessment of these variants can take between 6 and 8 hours (Machini et al., 2019). The extensive prioritization, evidence curation, annotation, and documentation of these variants can be incredibly time-consuming, and represent a major bottleneck in exome analysis workflow. This challenge is further compounded by the complexity and heterogeneity of nomenclatures used to describe diseases, genes, and genetic variants when searching for relevant information to inform clinical decision-making.
In this webinar, Dr. Dan Bellissimo, Director of the Clinical Genomics Laboratory at the University of Pittsburgh Medical Center (UPMC) presents a variety of clinical cases and workflows that illustrate how these variant interpretation challenges have been addressed by the Mastermind Genomic Search Engine to reduce turnaround time, increase diagnostic yield, and accelerate throughput.
You will learn about:
- How UPMC is using Mastermind every day, with examples from real-world clinical cases
- How to quickly and efficiently find relevant literature needed to interpret patient variants and CNVs
- How to continuously monitor VUSs and Likely Pathogenic variants to stay up to date on the latest research
Dr. Dan Bellissimo
Director of the Clinical Genomics Laboratory, UPMC
Dan received his B.S. in biochemistry at the University of Wisconsin-Madison and his Ph.D. in biochemistry from Duke University. In 1999, he completed a ABMGG fellowship in clinical molecular genetics at the University of Wisconsin. He was at the Blood Center of Wisconsin for 23 years, where he led product development and molecular diagnostic testing, and he is currently at the University of Pittsburgh as the Director of the Clinical Genomics Laboratory and Associate Professor of obstetrics, gynecology, and reproductive sciences. Dan has over 20 years of experience as a clinical lab director, and is responsible for development, review, and reporting of clinical genetic test results.
Dr. Mark Kiel
Co-founder and Chief Science Officer, Genomenon
GARRETT: Good morning, everyone, and welcome to today’s webinar — Variant Interpretation Challenges and Use Cases: Leveraging Mastermind in the Clinical Lab. My name is Garrett Sheets, and today we’re going to explore a variety of real-life clinical cases and workflows that will illustrate how the Mastermind genomic search engine serves as a streamlined variant interpretation tool in the clinical lab. Our speakers today have a lot of great information to share, so I’ll get right into housekeeping.
For those watching, feel free to put your questions into the Q&A, and if we have time, we’ll get to those at the end of the presentation. This webinar is being recorded, and we will email you the recording within the next day or so. Let’s get started!
I will briefly introduce our speakers. We have Dr. Mark Kiel, Genomenon’s co-founder and chief science officer. Hi, Mark!
We also have Dr. Dan Bellissimo, director of the clinical genomic laboratory at the University of Pittsburgh Medical Center. Mark and Dan, thank you both so much for joining us today! Dan, would you like to get us started?
DAN: Okay! So, my goal today is to show you some examples of how we use Mastermind to help us with variant interpretation in the clinical lab. First of all, I want to thank my genome analyst for helping me put together some of these examples for you. I’m going to start with a little bit of introduction to our lab at UPMC, our clinical genomics lab. Though we offer testing in a number of different areas, certainly the thrust of our work right now is the next generation sequencing assays. A few years ago, we started offering some hereditary cancer panels for a variety of different types of cancer. We have built out a clinical exome, which is about 4,700 genes. We’ve made this content so that it’s good at carving out disease-specific gene panels. We just recently started offering whole exome-sequencing, and we’re using a common web chemistry platform to perform all these tests. We get the sequencing data, it goes into our informatics pipeline where the data is aligned, and there’s variant calling. Then, after that — I think we all know this is where the work really begins, in variant annotation and ACMG classification.
I think what’s really different here, with the advent of next generation sequencing in the lab, is that in the past, most laboratories tended to be experts in certain sets of genes. For example, in my former life, I worked in a lab where we did platelet and bleeding disorders, and so I knew those genes very well. I knew where to find that information, and I knew what literature sources to track to keep up on that information. Now, the problem ahead of us is way bigger. When we’re interpreting whole exomes, we’re talking about lots and lots of genes, many of which we have no familiarity with at all. We really need to tap into data sources in order to help us interpret these variants. A lot of these databases, I list below — gnomAD when we’re looking for allele frequencies, and ClinVar to see if anyone’s reported it, HGMD, we use Predictive Tools, UniProt, a lot of times we’re looking for protein domain information… and the literature, still, is a very important source to find a lot of the information that we need. A lot of times, because we’re looking for very specific information, a lot of the literature search tools can be problematic.
So these are some of the challenges we have with variant interpretation: We have a large number of variants that come up in exome analysis, we have to have algorithms for prioritizing these variants for evaluation, we have to document the information we find on these variants to help us in future cases. It would take skilled analysts in order to do any of this analysis. In my laboratory, I have three PhD people working on this all the time. There’s been some publications showing that analyzing an exome can take quite a bit of time, somewhere between six and eight and a half hours. Certainly, the more we do this, we get better at it, but anything that can decrease the amount of time that we spend analyzing cases is helpful. Cases we find are positive right away are most straightforward to report, but cases that look negative or something can take hours and hours, just so that we can make sure that nothing’s being missed. Variant interpretation is a major bottleneck in the workflow for exome analysis, so tools that help us there really have a big impact. Literature searching is a crucial element in this. We’re looking for specific information relevant to the ACMG guidelines. In our lab, Mastermind plays a key role in helping us do advanced literature searching.
Here’s just some of the evidence that we’re typically looking for. There’s lots of ACMG criteria, but I find these are the things that really push a variant out of a Variant of Uncertain Significance space and to Likely Pathogenic, or move it into Likely Benign, asking the question, is the variant frequency consistent with the frequency of the disorder? It’s a really important question. It’s simple to answer for single gene disorders. It can be answered for multi-gene disorders, but for other really complex disorders, it can be a problem. A lot of times, there are also problems in penetrance and expressivity that create an issue with this question, is the prevalence of the variant in affected individuals significantly increased compared to controls? So, is there any case-controlled data? What are the phenotypes reported with variants in the gene? Do they match the patient I’m looking at? Are they related? Does the variant co-segregate with disease in the family? Is the variant in a critical and well-known functional domain or a mutation hotspot? Are there well-established in vitro or in vivo functional assays that support a damaging effect of this gene product? And does the variant follow expected inheritance patterns? A lot of the times, this is the key information we’re after when we go searching in the literature.
I’m going to use some examples just to show how we’ve used this in the laboratory. The first one is an example from our heritage cancer panel. We had a patient come in with a history of breast/ovarian cancer. They ordered the registry cancer panel. The assay was performed, and a variant was detected in PALB2. There you see this frameshift variant.
As we go into Mastermind, if you’re not familiar with this, you can go into the search screen and just enter this variant, PALB2. You start typing it, and you can see that it will appear below in the list, and you can select this to start your search. What you get back, then, is a list of publications that are related to the terms that you have just put in. In this top section, we get the publications, the publication dates, the relation to the journal impact factor, and the relevance to the term. This is just kind of a pictorial picture of what references may be most suitable to you.
Down below, what you get is the option to sort these references by different categories. Here, it’s sorted by relevance. You also have the ability to export these references. We get information on the amount of times the gene is mentioned in the paper. In this case, it’s mentioned 79 times. We also get information of how many times the variant in specific is mentioned in this paper. That helps us with the relevance of this paper.
Below, here, we see the PubMed ID link to this paper. We can also show the PDF. Further below, we get a little bit of the context of this variant inside this paper. You can see some of the terms, you can see that this variant is specifically named, you can see patient information — this is a Polish woman who’s been diagnosed with ovarian cancer — and other information here that’s useful. Below that, we even get some protein nomenclature and some RSID. A lot of information comes up that helps us sort through these variants.
One of the big benefits in going through this list is that we have various ways of sorting these terms. You can see this across the top of the bar here. One of the things we most commonly do is try to sort these references based on information we need for ACMG interpretation. If we go into that tab, you can see all these different choices on the left, and all the different criteria from ACMG in which we can refine this search. In this case, we’re going to be just giving an example, we’re going into the pedigrees and case control data. We’re going to be specifically looking for, on the right side, case control information. We add this to our search, and then, once again, we see this reference that comes to the top, in which we see that the variant is mentioned. We can also get all the information about the PubMed ID and some context down below.
Now, we can jump into this paper directly and take a look at it. From this paper, we can see right away that this mutation that we’re talking about appears to be a founder mutation in the Polish population. We see it’s part of a study here, in which there were 807 patients and 1690 controls. We have some idea about whether there’s some enrichment in this population. We also get information in the results about the frequency of this variant in this population. A lot of information helping us catalog this variant in this case.
The next case is another case from our hereditary cancer panel, and I do like this example, because it makes us more concerned about how we interpret splicing variants. Again, we have a person with a medical history of hereditary breast/ovarian cancer that the panelists perform. This time, we detect a variant in BRCA1. It’s a splicing mutation on a consensus splice site. Our first jump is to say, “oh, that’s for certain a Likely Pathogenic variant, because it’s in a consensus splice site, and this one’s expected to result in an out-of-frame transcript in the skipping of exon 10.” So we can go into Mastermind and enter this parent. Once again, if we start typing the name, it appears below. We can select it, we can go in and see these papers. One of the things you can see right away is, first of all, there’s a lot of articles identified, but there’s one article here where the gene and the variant are named a lot. You can kind of get an idea that this is going to be an important paper for exploring this variant. We can immediately just jump to that paper, and you can see from the title that something more complicated is going on, in this case, combined genetic and splicing analysis of BRCA1. Here’s our variant on a haplotype with another variant, which highlights the relevance of naturally occurring in-framed transcripts for developing gene variant classification algorithms.
I don’t expect you to read the whole abstract, so I’m just going to give you a little bit of summary here on what is in there. The overall message is that the odds for causality, considering that the case control study they did, is extremely low. It’s extremely unlikely that this is a pathogenic variant. You can see the number there is extremely low. This splicing variant, c.594-2A>C, is always in cis with this other variant, c.641A>G. The haplotype causes exon 10 skipping mainly due to the variant that’s in position 641, not the consensus splice site. Also in this study, they found, when they looked at splicing transcripts, that 20 to 30 percent of the transcripts were actually a delta 9, 10 transcript variant that’s in frame, and it’s also present in normal controls sufficient for tumor suppression. This variant should not be considered a high-risk pathogenic allele.
Just to give you a little bit of a picture of what that looks like, what they’re saying is, in the normal case, in this wild type, you can see that 66 percent of the transcripts were full length, but a considerable amount of it was this splice variant delta 9, 10. When we look at the mutation case, where these two variants are present on a haplotype, you can see that half of the full length transcript goes away, but the amount of this delta 9, 10 transcript never changes. It turns out this transcript is important in providing protection for tumor suppression.
To me, this points out something I’m always concerned about when I look at splice variants: first of all, do we understand what normal splice variants are? Sometimes, they’re not all full length, and there’s already mis-splicing that’s occurring. Then, the other question is, how much mis-splicing is pathogenic? I see lots of data, always in the literature, where someone shows that something mis-splices because of a variant, but the real question is, how much mis-splicing do you need before you lose the tumor suppression effect? That’s a difficult question to answer. This was just a really good example of how something that looked like a Likely Pathogenic variant turned out to be benign.
These are just some examples of how we looked at the citations in Mastermind for different variants as we were studying them. You can just go through the list of these different variants, and most of these are in the cancer panel that we’ve done. You can also see that the number of citations that are coming up in Mastermind is always higher than HGMD. I think part of this is just because Mastermind can immediately catch these as they come out in the literature. The last two examples in this table (PALB2 I’ll be talking about again a little bit later), these ones are really important because these PALB variants are in ClinVar as variants of Uncertain Significance, but in the new article that we were able to identify, Mastermind actually came up with a bunch of functional data for a lot of different PALB2 variants that allowed us to reclassify these as Likely Benign. They were really important in determining how we classified these variants.
Variant reclassification is also a big process that occurs in the labs nowadays. Not only do we have to classify variants that we see in current cases, but we have to keep up with the stuff that we’ve already reported. It’s another workflow issue. I’m really talking about updating classifications based on new information. If your customers are like my customers, they’re always asking us, “what’s your process for variant classification? How do I know your laboratory has an ongoing process to provide updated interpretations? Are you going to send me an updated report when the interpretation changes?” So it’s really important to have some kind of process to be able to do this without it being too onerous in the laboratory.
Right now, we offer variant reclassification. We may perform this at the physician’s request. A lot of times, a patient might be coming back in for their yearly appointment, and the physician is trying to find out, has something changed since they were last here? We can get new data if and when it’s published. I just gave you an example of PALB2, where new data came out and a whole set of variants got reclassified. That may lead to a reclassification event, and reissuing reports. We may reclassify a variant as part of a new case, so if we haven’t looked at a variant recently, but it comes in a new case, and the analysts go through it again and find new information, that leads to reclassification. The main tool we’re using right now is automated literature searches. That’s our main tool for finding new variant information to determine whether we need to reclassify variants. I’m just going to show you what that looks like.
In Mastermind, this is called setting up a Mastermind Alert. We’re able to set up alerts based on disease. We can list specific variants in these genes. We can determine who’s going to get emails about these alerts. We can determine how frequently we want these alerts to be sent to us. A lot of it is just how much email you want to get. Variant information doesn’t change that fast, but these are extremely useful in the laboratory to help us monitor literature for publications that come out that may require us to reclassify variants. So we can set these up with any notification frequency we want, as well as how we want to receive this information. It really does allow us to assure our customers that the variants of Uncertain Significance and other Likely Pathogenic variants are being continually monitored for new information.
Down below in the second half of this slide, I’m just showing you an example of what one of these alerts looks like. There’s a number of different variants being here tagged as no new variants, but you can see one here for the PALB2 that says “one new article.” Then the article is listed. This was “the functional characterization of 84 PALB2 variants of Uncertain Significance.” You can jump immediately to this article or view it in Mastermind. This was the paper that helped us reclassify those PALB2 variants that I just mentioned. This can be extremely useful. It’s a way to get alerted when something is new, versus someone constantly chasing around looking for new information.
I mentioned that the other way we get new information is when someone gives us a request for an update on a case. This is just an example of that, where we had a reported BRCA2 variant that was a Variant of Uncertain Significance. It was this prolene 655 leucine. A genetic counselor calls the lab to ask whether there’s additional information on this variant that could change the classification, because the patient was coming into clinic. Right now, there are two entries in ClinVar, including one from our lab. If we go to ClinVar and we look up this variant, we see that there are two labs down below, and there’s our laboratory. We both have reported these as Variants of Uncertain Significance. So, again, the analyst is able to go into Mastermind and set up this gene alert looking for this variant, and looking at the publications that come up, they see that there’s a new Genetics and Medicine paper that has this gene listed in the article. They can see that it’s actually found in the supplemental data for this article. That’s a really strong characteristic of Mastermind, that it’s able to find data in supplemental tables, which is not always possible with other platforms. They can go and pull up this paper, look in the supplemental table, and it turns out that there was no functional analysis performed, and the variant stayed as a VUS. But this was a way to quickly go into the ledger for very specific information and determine whether there was anything new on this variant.
Now, I’m going to give you some examples from our exome work. This is just a case of a 24-year-old, and you can see the phenotypes that we were given as part of the HPO terms, in this case, hypoplastic posterior communicating artery, mixed hearing impairment, joint hypermobility, hydrocephalus, and delayed speech and language development. These HPO terms are used to help sort our variants and to find the ones that are most likely involved. One of the variants that came up here was a loss of function variant in FOXC1, which looked like it was a Likely Pathogenic variant. FOXC1 is involved in a disorder called Axenfeld-Rieger Syndrome. It’s a dominant condition, and if you look through the phenotypes here, you can see that a lot of these actually match up quite well with the patient phenotypes that I just gave you. Part of the work the genome analyst has to do at this point is say, “okay, I have a loss of function variant, it’s actually in the c-terminal part of the protein.” A lot of times, there’s concern about whether c-terminal loss of function variants are pathogenic or not, so he goes and tries to find more information on this variant.
I have the variant here, listed at the top, you can see what it is, and he starts with just putting in fox c1. You get back about 3,100 publications on this, but then, once again, we can go into the ACMG criteria and we can add a way to sort the literature. I’ll say, “I want information on loss of function variants,” and then the final step is to say, “okay, and I really want information on the c-terminal domain.” So now we have a text term, “c-terminal domain,” or “c-terminal” in this case. Now that we’re filtering by all these criteria, here’s the output of that search. One of the publications that comes up is a Journal of Biological Chemistry article. That’s important, because I don’t think the JBC is a journal that a lot of geneticists read, and certainly not back in 2002. We get the PubMed links and we get this title. We can see the title is “Transcriptional Regulation is Needed by N-Terminal, C-Terminal Activation Domains.” This is a really relevant article to the information that we’re searching for. This is obviously a well-studied gene, because this date is going way back, and a lot of the functional analysis of domain structure happens on the biochemical side. That’s why that’s appearing in the Journal of Biological Chemistry.
So here’s the paper and the abstract. What you can learn from looking through this is that the c-terminal has some really important activation domains, and that this mutation would be predicted to disrupt them. It was information we needed to make sure our c-terminal loss of function variant would be pathogenic. We could actually point to the specific functional domains that would be taken out and the probable effects of that in this case. It was a really important finding.
Here’s just another example of a case. I began an exome case for a two-year-old with a congenital posterior urethral valve and abnormality of the urinary system. This is one of those cases that takes a lot of work, because it goes through the analysis pipeline and you really can’t find anything. What I wanted to just illustrate to you is a way you can use Mastermind for a gene candidate list — we wanted to determine a gene candidate list to make sure we weren’t missing anything. What are the most common genes that might be involved in this disorder? Is it possible that there’s an unusual variant there that’s just not being picked up? That was the goal of this investigation.
You can go into Mastermind, and again, it doesn’t just search by variants. You can actually add phenotypes into this search. This is showing the example, as we start to type this, that this term is available in the search criteria. We can select that, and then we jump right into the associations page, which is a new functionality that Mastermind has. This is another way of taking this term and all the associated articles involved, and then filtering based on different characteristics. You can see across these tabs here that we can filter this based on genes, variants, diseases, phenotypes, therapies, or CNVs. The way we proceeded with this was to pick the disease as a congenital abnormality. After that, we can click the “genes” tab, in which we see the genes that are most likely, in publications, to be involved in this disorder. At the top of this is this SALL1 gene, so we can select that and look at the references.
In this case, we identify a really good article on this disorder. You can see the title over here: “Novel Insights into the Pathogenesis of Monogenic Congenital Anomalies of the Kidney and Urinary Tract.” A really nice review that covers the genetics of this, and again, just enabling us to identify possible gene targets and to make sure there was nothing to be found in these. Nothing was, but again, this was a really quick way of digging into the literature for very specific information.
The last example I was going to give was related to searching for CNVs, because this is a functionality that got added in the last year in Mastermind, and I find this a really useful property. I do help sign out microarrays for our cytogenetics lab, and I think one of the things you find out right away is that it’s a lot harder searching the literature for CNVs. Unlike sequence variants, which a lot have a lot of standardized names and nomenclature to help you find information, what you find out with CNVs is that the coordinates reported can, first of all, vary on the genome version; whether you’re in version 37 or 38, the coordinates change. The coordinates can be different based on the platform that’s used, because each platform has probes in different positions, so the breakpoints will be mapped differently. If you actually do a search looking at break points, you may not find what you’re looking for. You could search by chromosome band, but this encompasses a really large area, so it’d be even harder to find what you want. The gene content can be variable, it can be included or overlapping in the different CNVs, and a lot of times, CNVs can be reported in a variety of disease states. As a matter of fact, a lot of the CNVs that we see in the population are known because they were found in a patient who had another CNV which was responsible for their disorder, and the CNV is assumed to be a population variant because there was another pathogenic variant already found in the patient. So it can be a lot harder to work out whether a CNV is associated with a specific disease. Finally, in CNVs, a lot of them are associated with phenotype variability and reduced penetrance, and that makes it a lot harder to find information on them and to know whether they’re pathogenic.
I’m just going to show you an example of a search that I came across in the laboratory. This was a case that came in, the reason for referral being developmental delay. The microarray identified a 206 kb loss on chromosome 16 at the p arm, and there were nine genes that were deleted. One of the things I had to do is find out whether similar deletions have been reported before, if there were phenotypes that were associated with this deletion or not, and whether it was present in the population. I needed to know, was anything known about its penetrance, its expressivity? Was it usually de novo, or were there carrier parents? Those are the kinds of things I’m looking for.
So you can go into Mastermind, and you can enter this keyword, and the nomenclature you have to use here, where I put “del” because it’s a deletion on chromosome 16. I put in the chromosome position, and because I’m using HG19 coordinates, I put this in the search term, and then Mastermind automatically converts those coordinates into version HG38 coordinates to allow you to go ahead and do this search. You get a screen here, where it has identified 75 overlapping CNVs, in this case, and here the results are down below. You can sort these results based on the starting point and end point of the breakpoint. You can sort them by their size. You could look at overlapping, whether they’re exact matches or they’re contained, they’re intersecting, or they’re surrounding. You also see the number of genes that are in these CNVs, and as I mentioned, the one CNV we had reported was a 206 kilobase deletion. You can see there are a number of deletions in that size range here, and they all have nine genes. If we click on this, we see these are the exact same nine genes, so you can look around for ones that include the genes specifically that you’re looking for.
So you can pick one of these and go into the reference. I just selected one of these, and there’s a publication here which was in the Journal of Molecular Diagnosis. From that, when you went into this paper, you could actually see, they also reported this deletion there, and they reported the patient phenotypes, and what they knew about the patient in this case. We were able to use that information, then, to write this interpretation, that this “has been reported in association with a variable incompletely penetrant phenotype that may include developmental delay, obesity, behavioral problems, schizophrenia, and craniofacial dysmorphisms… These deletions can be inherited and de novo events.” All the information, either that article had it, or it sent me on to other reviews about that same deletion, which enabled us to collect this information on this specific CNV.
So I tried to use some examples here to show you how Mastermind enables the genome analyst to quickly find and prioritize literature on the gene variants and CNVs. The search criteria are really flexible. They include ACMG criteria, as well as free text terms. You can use phenotype terms. You can set up gene/variant alerts to notify the lab if updated information is available on variants. This new Genetic Associations section expands the criteria that we use to search for literature, based on disease, phenotypes, genes, variants, and other therapies. Overall, we found this to just be an extremely useful tool for literature searching.
I wanted to thank my laboratory, this is the staff of my lab — I always say, it takes a village, especially to do next generation sequencing work. There’s lots of work done by everyone here on this page, with the lab directors, our manager, our administrative head, the genetic counselors, my genome analysts, the IT informatics support team I have, and also the clinical lab staff. I really want to specifically mention the aid I got from from Greg, Kevin and Patrick on helping me assemble these examples for you. They are the main users of Mastermind, and they use it daily in their work. It’s been really helpful to them. So that is my presentation, I thank you for your attention, now, I’m going to pass the presentation on to Mark! He can tell you a little bit more about Mastermind.
MARK: Hey, Dan, thank you so much! That was a fantastic set of cases exemplifying the value of Mastermind in clinical workflows. While Garrett gets my very short slide deck up, I’ll describe what my role in this webinar is: it’s to augment what Dan went through at a very detailed level, highlighting the value of Mastermind in clinical workflow to provide an overview of Mastermind and how it’s applicable in the examples that Dan highlighted, as well as in many others.
I’ll begin by emphasizing that, as Dan alluded to, one of Mastermind’s main value propositions is the depth and breadth of its content. One way to quantitate that is by demonstrating how many titles and abstracts comprise the data in Mastermind. It’s the totality of the published literature, 30 million titles and abstracts, but obviously, as Dan went through in his examples, we go much deeper and index the full text of those prioritized references that have genetic or genomic or human disease related content. We also keep that up-to-date on a weekly basis. We have more than eight million such articles that look for those components that Dan had described: the diseases, the genes, their variants, and a handful of other aspects that are useful for interpreting the meaningfulness of this content.
The indexing of the full-text articles includes figures and tables, which can be particularly challenging, given the complexities of the data display and the way genetic variants are described in those references. Also, as was highlighted in the previous slides, Mastermind makes a great effort to get at the very useful supplemental content. To date, though it updates weekly, we have nearly three million supplemental data sets, that is, three million references with their associated supplemental data sets, each of which may be one excel file, or it may be 15. When we index that content, we look for — no matter how they’re described — the diseases, phenotypes, and genes with their synonyms and aliases and acronyms, and all manner of genetic variants, whether they’re described at the cDNA level, at the protein level, the RSID level, or even copy number variants. You can read more about this in a detailed study of how it’s performed and the value and the comprehensiveness that it provides by following this link to our research paper.
In addition to the depth and breadth of our penetration into the medical literature, the way that we index is extremely exhaustive. The examples that were shown focused largely on cancer, but I’d like to pause here to underscore that we look across the whole spectrum of human disease, including both hereditary cancer as well as somatic cancer, and in constitutional disease, and all of their attendant complexities. Mastermind is truly maximally sensitive, not just in terms of the content that is indexed, but in the way that we index that content, and we don’t stop there. We put those data points together in what we refer to as “genomic associations.” Some of the fruits of that effort were highlighted previously, but let me add a little bit more detail.
If I could draw your attention to the left there in the heptagon, where it underscores which of those components we index for: the genes, the variants (including structural variants, like copy number variants), the diseases, as well as the phenotypes and their ontologies, so daughter and parent pairings, including therapies as well as categorized keywords to help variant scientists identify which papers are meaningful for identifying clinical cases and segregation with disease, as well as empirical/functional studies, whether they be in vivo or in vitro. When you put all that information together, across all of the citing references and the depth to which the authors cite each of those variants and the related components in each individual paper, what you emerge with is an estimation of the strength of that evidence, which is highlighted there in a couple of different examples on the right. This is a way that we prioritize which references are most meaningful. We draw your attention to that information in our prioritized list of references.
To sum up, on this slide, there’s a very healthy balance between the sensitivity that is demanded of clinical variant interpretation, but also, the specificity that is required to ease and streamline the workflows, to minimize the burden on folks in Dan’s lab and others who have to go through this information. Mastermind not only organizes this information and annotates it, but it’s done so in a prioritized way, where we show our work and demonstrate why that content was prioritized, allowing you to immediately verify and take action on that information.
So this is a landing page for the Mastermind search interface. It’s really intended to be easy to use. We allow you to perform any manner of search, whether it be a gene and a variant, which is a very typical use case in the settings that we had described, but you’re also at liberty to add any one of a number of those other categorical terms, or diseases, or phenotypes, or therapies, and do so in combination. You’re also able to input free text search, which is a very powerful capability, but I would recommend that you invoke those ancillary search terms only to enhance the specificity of your result as soon as you’ve got a lay of the land for the specific genetic variant or gene that you’re searching for. That’s very faithful to the ethos of the Mastermind indexing and user experience, where we prioritize sensitivity and enable the user to modify their search to enhance the specificity. Sensitivity first, and then specificity. The feature set in Mastermind allows users to go about that very quickly in their workflow.
I’ll pause here and underscore something that Dan brought up — this slide is meant to indicate that, Mastermind, in the user interface, makes things easy for you. Dan brought up the alert capability, which is a way for you to stay abreast of newly published information about a variant for which insufficient evidence was presented upon first search, simply because it hadn’t been published before, or in other situations, where the variant has no references, and you’d like to be apprised of either new references or any references that have been published. The Mastermind alert capability allows users to automate that process. I wanted to emphasize, as well, for labs that deal in high volume and have bioinformatic capabilities internal to their team: We also offer automation to those such users through API capabilities, as well as through various integrations with third-party software that any one of the users in the audience may already have in their workflow.
So on this next slide — Already, there was a fantastic depiction of the different feature sets in Mastermind, and I just wanted to showcase on this next slide how they all fit together here. I’ll summarize by saying, in the upper right portion of the screen, you see the literature landscape for your searched-for terms, in this case and in most cases, a genetic variant. You can see those prioritized references reflected by those represented by large circles, each one being a reference. The x-axis indicates the publication date and the y-axis indicates the quality of that journal, but again, the size of each of those icons is a reflection of the relevance of that content to your search terms. Dan highlighted that in those numbers, in the list of those references below off to the right.
Visualization of the literature landscape is depicted in list form, just as, on the left, the variant diagram for the searched-for gene depicts all of the variants that Mastermind has indexed across the full breadth of the medical literature, so that you can see your variant in context. This is particularly useful for cancer and constitutional variant interpretation, particularly if there is no literature for your variant. If there’s no literature for your search for a variant, there may very well be a different variant, either in that same residue or in a neighboring residue, or in the same domain, say, as your searched-for variant. That provides some measure of evidence for you to interpret the meaningfulness of that variant. The variant landscape is depicted visually by that Manhattan plot, and in list form below, which is searchable and sortable, and allows users to very quickly get to that information.
At the bottom is the content summary, where we have the title and abstract that’s highlighted with relevant terms. That’s on the left. On the right is the context for the content comprising your search. Right now, by default, it’s genes and variants, but if a user enhances the specificity of their search by adding phenotypes or drugs or diseases, you’re able to toggle that sentence display to see those sentence fragments in context of the cited search term, as well. It’s a very powerful shortcut to get at whether the content that you see displayed from the reference citations is relevant to your search.
At the top portion is where users can modify their search by adding diseases, phenotypes, therapies, and free text search terms, as well as, in that highlighted blue drop down in the middle there, any one of those canned or categorized key terms, such as clinical significance or functional significance, as well as a handful of additional search categories, such as prognostic import or diagnostic relevance.
With that, I think that’s the last slide. I’ll call Dan back to see if there’s any questions from the audience that he and I can both field. I’ll also invite any audience members who have not yet experienced Mastermind to follow that link on the current slide to gain exposure to the power that Mastermind can provide, and how it can streamline and enhance your clinical workflow. The contact information is provided there below. We love talking to users, so don’t be shy. If you ever want to talk to customer success or sales or support, we’re happy to be at your service! I’ll turn on my camera and invite Dan to do the same. Garrett, I’ll have you field any questions that we got from from the audience as Dan and I went through our content.
GARRETT: Sure, absolutely! First of all, that was a great presentation. Thank you so much to Mark and Dan for your time and your expertise. We did have a few questions come in from the audience, and for those listening, please know that if we do not get to your question, one of our team members will get back to you after the webinar. So, let’s see… One of our questions: What is the input file format for variant data in Mastermind?
MARK: Dan, we’ll just decide who who takes the ball. That seems like it’s pretty obvious for me, but I’d like to hear if you have any clean-up or any enhancement to my answer. The user interface allows you to search flexibly for any variant, but we also enable the boolean or multi-parametric capability to search on multiple variants at once, though I would recommend that, in your workflow, that function be relegated to compound heterozygous cases, whether they’re in the same gene or in a different gene. If you’ve got a case with multiple variants, that’s how I would recommend executing your search in the user interface, though I’ll say that your alert setup can comprise multiple variants. Obviously, the API lends itself to routine automation, up to and including an entire VCF. There’s batch upload to the API capability that allows users to upload an entire patient’s VCF file. Dan, I wonder if you could just touch on how your lab uses Mastermind based on file format?
DAN: I would say I don’t have too much to add. We do it pretty simply. That made me think, one of the examples you just brought up, about using multiple variants in a search only when looking for compound heterozygotes — I think the other place you might see that occur is with overlapping or adjacent variants. We get a lot of multi-nucleotide variants where, depending on your pipeline, it’s actually two variants close together in a row. I think there are some pipelines that split them out, even though you can see they’re together on ERE. You need ways to put them back together, or look at them separately, or whatever. That may be the other place you use something like that.
MARK: Yeah, great point. Those are tricky. I remember those were great teaching points when I was going through my training.
DAN: Like I said, some pipelines split them. Even sometimes, in gnomAD, they’re split, even though ERE tells you they’re together.
MARK: Yeah. Biology is complicated. It doesn’t make it easy for us.
GARRETT: Awesome. So, another question we had — I know we touched on this a little bit, but as we continue to develop and improve Mastermind for example new CNV visualizations? Mark, I’m not sure if you can speak to that a little bit more, or maybe Dan as well?
MARK: That’s a great opportunity for me to indicate that we’re coming out with updates and improvements to the Mastermind interface, new features, new visualizations, all the time. It’s in large part to respond to insight and feedback that we get from our customers. We’ve just fairly recently launched the CNV function, so I’m happy to see, in Dan’s examples, how they’re finding value there. We’re seeking to deepen that value that we provide by enhancing the visualization. We lovingly refer to standard visualizations of CNVs as “stacked pancakes,” where you see the breadth of the CNV and how they stack up and relate to each other. A very talented developer on our team has come up with what I think is a fantastic visualization, which allows you to quickly toggle through the CNV data set that comprises a search result in Mastermind to further enhance the value that we’re providing. Getting you to that lower right where the sentence fragments are shown is the really challenging part, indexing and finding and verifying those. The visualization in the upper left and the sentence fragment verification in the lower right, we feel are going to even redouble the value that we’re already providing in the CNV search.
DAN: Yeah. For CNVs, a lot of times, you’re used to looking at that view you see in the genome browser, where you see all the different CNVs lined up on top of each other against the gene map. That’s an incredibly useful view. I would say the only other issue you could have is when you go through the conversion areas, with the HG38 coordinated. There are some areas where they don’t match up well and the conversion may not work. There, you might just have to go in by chromosome band, or take a more gene-based approach to help you get to those CNVs you’re looking at. Once you’re in there, you’ll see the CNVs are different sizes and include different gene content. Then you can whittle it down, so there are other ways to get around it if you need to.
GARRETT: Great, thank you. Let’s see, Is there an option for batch search or batch query without VCF?
MARK: I’ll just say that it’s to the user’s tolerance, and that’s why I would suggest that you keep it restricted to a case. There’s at least some interrelatedness to the variants within a case that you would want to search for. Let’s just take an example, because Dan brought up some exome examples. There’s obviously filtering that happens before you, as a variant interpreter, have a set of variants that you need to look at. That, depending on whether or not it’s a panel, it would be smaller, or if it’s an exome. The degree to which you filter that can be a couple variants, it can be a dozen, a couple dozen, or hundreds. If you don’t want to use a file and you don’t want to use automation, you’re going to be relegated to the user interface. If you want, as a user, to pattern those searches, if you only have a dozen or so, you can. I would imagine that beyond a dozen, you’d be thinking about ways to automate, although it’s perhaps a simpler workflow to search on those variants one at a time and adjudicate the meaningfulness of that information, particularly to uncover what Dan alluded to in a couple of slides.
I love this because it’s so tempting to focus on the affirmative, and go, “aha, I found this one reference and it unlocked the case,” and a classification went from a VUS to Likely Pathogenic, or what have you. Dan, you did a great job of highlighting the power of a null result. Coupling that with the sensitivity of Mastermind gives you confidence that, if Mastermind didn’t find it, there’s no value to executing an additional search. So, Garrett, an answer to that question — what I would say is, when you have a case and you want to search for multiple variants, ticking off which of the variants in the exome result are null is valuable. I wouldn’t want to spoil that value by bucketing all the variants together and just sort of rolling the dice and seeing what comes out at the top. I think that users would benefit from taking a more methodical approach, and you could do so very quickly. It’s a 10 second search, a 10 second search, a 10 second search and you adjudicate the evidence, etc.
GARRETT: Very good.
DAN: I would say, when we get to that stage in our lab, the variant list has already been paired. Now, it’s the in-depth information on a smaller set of variants, less than 10, usually. That’s where we really employ this. It is on a case-by-case basis. The individual is doing it as they work this case.
GARRETT: Great. Let’s see — here’s a good one: is there a limit on how many variants you can search for at one time?
MARK: There is not in the professional edition, which is what was showcased in the examples and in my screenshots. The professional version of Mastermind is intended for clinical purposes, where you don’t want to be hamstrung by limits. We make Mastermind available to researchers and academicians in the basic edition, where there are limits on search, but they refresh every week because we really want to get use. In a clinical setting, we and our users feel strongly that you don’t want to restrict the applicability of Mastermind. You don’t want to relegate your searches to, only if this is true and this is true, then do you search in Mastermind. As was highlighted in these examples, you don’t know unless you search. One of the many feature benefits of the professional edition is this unlimited search capability. It’s across any gene, it’s unlimited searches during the week, and into the perpetuity of your license, but also, all of the features that we talked about, the category keywords, association page, etc. So that’s where there’s a distinct difference between the academic basic version and the professional or clinical version.
GARRETT: Cool. How do you track the articles that have already been reviewed for a particular variant?
MARK: Well, I’ll be upfront, we don’t make that available in Mastermind. Dan, I’ll send that over to you, and have you say whether that’s a challenge for you, especially if you’ve got a very focused post-filter variant set. How do you get around that, if there is that need in your use of Mastermind?
DAN: Well, I’d say that it’s not an issue that comes up too much. I would say, when we’re doing doing our variant annotation, if there were certain articles looked at that were important in classifying a variant, they would be notated inside the file for this variant. Especially if they’re going back and doing a reclassification or something, they’re well aware of the date that that last was done, and what the date of that last article was. They wouldn’t be so worried about whether they had reviewed it before. They’re really looking for something new. You know, is there something after that, because I already see that reference that I have. I’d say most of the time that hasn’t been a concern. Usually, the PubMed ID would get recorded for anything used, so we have that information there.
MARK: Yeah. Just a small plug again for the API: we have this great sense parameter so that, as Dan suggested, you don’t need to look at, or if your workflow merits that, you don’t need to look at something that’s 5, 10, 15 years old. The sense parameter allows you to focus your results on just the content that’s been published since a certain date. Also, just as I said, search for sensitivity first and then use the features for specificity. If one of your aspects of specificity would be the date of publication, you’re allowed to change the prioritization according to that date so that you can see just the most recent content. As I emphasized, the content is up to date every week, so the alerts will keep you apprised of that in the background. You don’t even need to think about it, but real time in the user interface when you’re doing your search, if you just want to cut to the chase and see any new information, there’s a very easy way to prioritize by publication date.
DAN: Yeah. Unfortunately, a lot of variants you look at, there really isn’t a lot of literature on. I showed some examples of things that had quite a bit of information on them, but you will hit variants where you’ll get nothing, and that’s important to know too, because the next time you go through it, if there’s something there, you know it’s something new. It really does make you stop having to go digging deep if nothing comes up at all. You know, there’s not anything there, that you’re stuck.
MARK: Yeah, I’ve changed my philosophy now. It’s human nature to be disappointed when you get a null result, but I’ve changed the way I think about it. I say, “yes, I got a null result,” and I just make it affirmative, right? Make it a positive, because it is, as you highlighted.
GARRETT: Very nice! I like that. We’ve got about two minutes left, so I wasn’t sure, either Mark or Dan, if you had any closing thoughts before we end our webinar?
MARK: I’ll say, Dan, that was really great! I hadn’t seen your slides before. We’ve talked before at conferences and through support and with your people, but I have to say, that was a really expert presentation. Thank you so much.
DAN: Thank you, thank you. Well, your team looked at my slides, and they told me I did a good job.
MARK: Oh. Good!
DAN: I’m really glad you enjoyed it! I enjoyed putting it together. Like I said, my analysts would help a lot — I just kind of told them, keep an eye out for good examples of when Mastermind was really helpful, and so those are the ones they sent me. They really just stuck out, where it was really, really useful to get something that can’t find elsewhere. You can do Google searches and everything; a lot of times, you find the main things, and you find a paper, but you can’t dig deep into specific content that way. When you’re looking for something specific, it’s just really helpful.
MARK: Yeah. Well, kudos all around.
GARRETT: Yes, that was a really, really great presentation! I guess we’ll close this out. Thank you, everyone, for tuning in. Again, this webinar is being recorded, and we’ll send you that very soon. Mark and Dan, thank you so much for taking the time to offer your expertise. If you have any questions, please feel free to send us an email at firstname.lastname@example.org, and if you haven’t signed up for Mastermind yet, we do have a bitly link on the slide that we showed earlier where you can create your free account and start with the trial of Mastermind professional edition. We’ll send you all of this information in the email after we wrap up. Also, at the conclusion of this webinar, you’ll receive a short post-event questionnaire. Please let us know your thoughts about today’s discussion. So, with that, thank you so much, everyone! I think we had a really great conversation.
MARK: See you all.