KATE: Hello! Welcome to today’s Mastermind User Group. Thank you so much for joining us. My name is Kate Oesterle, I’m a member of the marketing team here at Genomenon, and I’ll be moderating today’s training and the Q&A with our speakers. We have a lot of great information to share with you today, so I’m going to get right to housekeeping and the introductions.
Today, we’re going to be showcasing some of the Professional Edition features, based on questions from you, our valued users. If you don’t already have a Mastermind account, you can create one today with this bit.ly link that you’re seeing here on the screen, and that will get you started with a complimentary trial of the Professional edition of Mastermind. Mastermind can help you quickly identify papers for patient diagnosis and treatment decisions. You can scale your caseload with advanced features, such as protein-centric searching, genomic associations, ACMG and AMP filtering, alerts, and so much more. We’re going to be showing you some of those functionalities today.
As a reminder, this webinar will be recorded and will be shared with you later on today, so please keep an eye out for that in your inbox. If you are joining us live, feel free to put your questions in the Q&A box here in the go-to module. If we don’t get to your question during today’s presentation, a member of the team will be following up with you. So, let’s get started! I’d like to introduce our speakers for today’s user group.
We have Brittnee Jones, Genomenon’s vice president of product management and customer success. Hello. We also have Dan O’Hara, who is our Mastermind product manager. Hello! We also have joining us Denise belandris, our field application scientist. Welcome, everybody. Hello, hello. I want to get started with some questions that we had come in before the webinar that Denise will be covering with some search examples. We will hand it over to Denise.
DENISE: Perfect. All right, everyone should be seeing my screen now, which is just our Mastermind home page here. One of the most common questions that we hear is, how do I launch a search? What’s the formatting, what can I include? So we’re going to start with just that. Here on the home page of Mastermind, we have our search bar front and center, which is where you’ll begin typing in your query. Now, there’s a lot of flexibility around how to start. Mastermind indexes many types of genomic concepts, which is actually all the text that you see here in gray. We index things like genes, CNVs, therapies, etc. You can mix and match a gene and a variant, which is probably the most common way that users are approaching the system.
As you’re typing, you’re going to see a drop down list of suggestions, like we’re used to seeing with other search engines. Here, I’ve typed in a gene symbol. That’s how most of us are going to be entering our genes, but you can absolutely enter the full gene name. You can type that in, and Mastermind is going to recognize that as another way to describe this gene. That’s that part of our core technology, which we call genomic language processing (GLP). It’s this recognition of all the different ways that you can describe a different genomic concept.
Now, GLP is especially powerful when it comes to variant nomenclature, because there are so many ways an author might describe a variant. Behind the scenes, Mastermind is normalizing and disambiguating the literature so that you can type in your variant as, for example, a protein change. So I’ve typed it in here using single letter amino acids, and you could use your three letter amino acids as well. You can type in your variant as a cDNA change, like this. That is also recognized. Another way to add a variant is to give us the genomic position, which I’m just going to paste in here. So there’s my position. I specified the build using this contig. One other way that you can provide a variant is as an rsID.
All of these different ways to describe a variant are all valid and recognized in Mastermind. What you’re getting back are the articles that mention that variant in all conceivable iterations. You don’t have to worry about the way that you come into the system with your variant, because, again, Mastermind is doing that work behind the scenes to look up all of the synonymous ways to talk about that variant.
CNVs can be especially tough when you’re doing a literature search. We get a lot of questions about this. We’ve built in a lot of flexibility for the input so that you can paste in things like array nomenclature, which is what I’m going to drop right here. You can see it’s being recognized as a CNV. You can also search things like a karyotype, which, I can show another example like this, is also recognized as a CNV. You can type in cytobands. I’m going to type in del 2q24.3, also recognized as a CNV. Of course, you can come in with coordinates, so you can do that. Here’s an example of some coordinates I’m just dropping in. These are all recognized.
We have questions about the genomic build, which one we are using — we are centric to GRCh38, which is how you’ll see CNVs being displayed, but that doesn’t mean you need to come into our system with coordinates in that build. Let’s say you want to come into our system with coordinates and build hg19. All you have to do is tack that on to the end of your search, and you can see that the liftover is happening. Your input is in this small font, but then the main display are those coordinates, lifted over into build 38. No need to come into our system to do liftovers before you come into Mastermind. You can have that work done for you right here in the search bar, and it works for build hg18 as well.
Another question that we had come in was around introgenic copy number events, so deletions and duplications within a gene. There is a bit of a formula to follow when it comes to crafting the search. The first thing that you want to type in is the word “del,” or the word “amp,” so you’re just establishing whether you’re looking for a loss or a gain of DNA. I can type in just the word “del” to start, and then you can type in your gene name. Let’s say I’m interested in a deletion of this entire gene, which is how this search is crafted. You can also specify an exon by typing literally just “exon” and then a number, or even a range of exons, which you can just tack on with a dash and then complete the range. This is being recognized as a CNV, and I kind of pointed that out earlier as I was dropping in the other examples, like the array and the karyotype.
This is really important. If you’ve entered some nomenclature that the system doesn’t recognize, for example, if you forget the all-important first word, what it’s going to suggest to you is a text search. Really, you don’t want this. You want to be engaging with that GLP, you kind of want to speak the same language as the system, so you want to make sure that your input is being recognized the way that you intend. By adding in that word here, I’m back to being recognized, indeed, as a CNV. If you follow that formula, where it’s del or amp, and then your gene, and then your exon or range of exons, then you should be well on your way.
That said, we’re here to help if you’re getting stuck on how you’re entering your CNV or your variant. There’s a button right up here, at the top little menu in the top right here. This is a place for you to get in touch with us with questions, concerns, or feedback. This just opens up a little pop-up here, where you can type in your message and say, “help me with…” and then whatever that thing is. This message goes to our support team, and we can get back to you ASAP. We want you to reach out if you feel like you’ve hit a wall. Often, we’ll send back things like links or screenshots to help you see, step by step, how to properly craft that search, or otherwise, just get the info that you’re looking for.
There’s a question about phenotypes and diseases: What is the source of the information within our database? We’re using HPO for phenotypes and MeSH IDs for diseases. The way that you can enter those in the system, you can just start typing the word and you’ll see whether it’s being recognized as a phenotype, like in this example here, for seizure. You can also drop in the the HPO identifier. If I drop that in, that’s also being recognized for seizure here. Similarly, for diseases, you can type in your disease name, and that is being recognized, indeed, as a disease, or you could drop in the D0 number, which, for Fabry disease, is this right here.
One general thing about the search bar is that each term is kind of like a discrete piece of information. You can absolutely do things like search a CNV and a phenotype, you just want to make sure you’re hitting enter in between those terms, so that they’re getting added to the search bar. Instead of typing one long run-on, you want to hit enter as you’re going. For example, if I want to search this CNV right here, I’m just going to hit enter so it gets added to the search bar rather than continuing with my query and adding phenotypes or something like that. I’m just going to hit enter, and now it’s been added to the search bar. Now I can mix and match, so I can add in a phenotype at this point, and I can just hit enter to add it to the search.
I might have multiple phenotypes, so this is another really common question: Can you add multiple phenotypes? Absolutely. I can start typing that second phenotype here. I can see it in the drop down now. Instead of just hitting enter, what I’m going to do is press shift on my keyboard — which you can’t see me do, but I am doing — and now I’m going to click on it, or hit enter. Because I hit shift, I was able to add it to the search bar rather than replace the phenotype that was already in there. That’s the trick for adding multiples of anything, you just want to hit shift and then enter. I can even add another phenotype here, so I can go ahead and add this third one, hit enter, and now I’ve got everything in my search bar that I care about. You can toggle between the operator and/or by clicking on it, so it’s highlighted here in yellow. I’m going to click on it again to change it back to or; I click on it again to change it back to and. It’s really up to you, and how specific or restrictive you want that search to be.
We can go ahead and launch this search, and check out all of the information that we’re getting. Lots on this page. What I want to call out is the section for full-text matches, which we’re going to get to by clicking any one of these articles. Here, with the full-text matches, this is where you get a peek into the article. Regardless of whether this article is behind a paywall, we’re going to show you these text fragments. Each of your terms that are in your search bar are getting pulled out from this article. You can see the context behind how those terms are being talked about, and really, the more terms that you add to your search, the more sentence fragments you get out, because we’re pulling out every mention. The more things that you declare interest in, the more of these fragments you’ll be able to scan through.
While most folks are using Mastermind for things like gene and variant searching, or CNV searching, you can actually take those out of the equation. I can remove this CNV, and what’s left in my search bar is this list of phenotypes. A question we get is, can you launch a search without genomic information? You absolutely can. I’m going to hit search right now. What happens when you launch a search without any genomic information is that you automatically get taken to what we call the genomic associations interface. We also call it the “explore associations” page. Here on this page, remember that Mastermind has indexed every word in nine million plus full-text articles. Mastermind can tell you information like what genes are being mentioned in the articles that also mention these three phenotypes.
I clicked over to the genes tab to look at that list. You can also ask questions like, what CNVs are being mentioned in papers that mentioned those three phenotypes that I have in my search bar? That’s another tab here. So each of the tabs is named for the information that you’re getting out. I can browse through these CNVs, I can see the number of articles that mention this CNV plus my phenotypes. If you remember, this was the one that I had actually searched before I removed it from the search bar. What you can do — I’m going to hop on back to the genes tab here — is you can start adding things to your search with just a click. If I’m interested in this gene, now I can add it to the search, and it gets popped in right over here. I can submit the search now. Now that I have a gene in my search bar, I can actually look at the articles at this point. To do that, I would go up here to view evidence. Right now, I’m on the associations page. To go back to the view where you can see those full-text matches, you would go to view evidence.
Now, this information will get loaded, slowly but surely, and I’d be able to browse those articles. Looks like my internet’s getting a little choppy here, but you’re able to toggle back and forth between looking at the full-text matches, browsing the articles, and going back and forth to the associations page, because the button is roughly in the same place. So this takes me back to that explore associations view, and if I go back to this page, the view evidence button is roughly in the same spot. You can really easily switch between your associations mode and evidence mode. With that, I’m going to hand it over to Dan, who’s going to show off some filtering capabilities and somatic search examples. Take it away, Dan!
DAN: Thank you, Denise! We receive a lot of questions about Mastermind’s ability to find evidence for somatic variants. Fortunately, for those asking, Mastermind is extremely useful for finding clinically actionable evidence, for all types of somatic variants. I’m going to demonstrate two examples that capture only a small portion of Mastermind use cases in the somatic world.
For the first example, I’m going to show how to find therapeutic information for a novel fusion. Searching for fusion evidence is cumbersome and a bit arduous when using traditional search methods, but Mastermind’s sensitivity and tools for specificity make it really easy to find the information you’re looking for. I’m going to start with two genes in the search. The first gene is going to be ANKRD26. I’m going to add that gene to the current search. The second gene is RET, and again, I’m going to hit the shift button here to add two genes to the search. Before I launch the search, I just want to explain what we see right here: Mastermind is going to be searching for articles that mention both ANKRD26 and RET. If you want to look for papers with either of these genes, you can click the Boolean operator here to switch between and/or. For this case, particularly, for fusions, we want to keep that and.
I’m going to launch the search. As we can see, Mastermind is looking for articles with both ANKRD26 and RET. We have 232 articles. I mentioned this is a fusion example, and we see over here in the article section, a few papers that mention fusion, but we have 232 articles. I think I can speak for everybody when I say no one wants to read 232 articles, so I’m going to navigate over to this filter categories button here. I’m going to click into genetic mechanism. On the left hand side, we see fusion events right here. I’m going to click that, and these are categories. Category keywords can be added to any search for Pro users. What we do is add these search terms in, and for each of these search terms, for example, “fusion gene,” the number to the left represents the number of articles that mention that. I’m going to go ahead and add “fusion event,” “fusion gene,” and “fusion.” Then, I’m going to click submit with chosen filters. Now, the search is looking for ANKRD26/RET, plus our fusion category keywords, and we see 88 articles.
Our final step for this is to add a therapy. We want to look at therapeutic options for this novel fusion. We have 88 papers. That’s still a little too much, so I’m going to go ahead and add in a RET inhibitor, selpercatinib. I’m going to add that to the search. As you can see, we’ve filtered down, starting at 232 articles, then taking the next step to 88, and then the final step for 5 articles. Looking at the article list, we can see that the papers are talking about RET fusions specifically related to selpercatinib and other RET inhibitors. We go using that sensitivity first and then adding in that specificity approach. We went from 232 to 5 articles, specifically looking for a novel RET fusion related to treatment information.
I’m going to go ahead and navigate back to the original home search page. I’m going to do another example, but this time, I’m going to show how to find therapy resistance information for a missense variant. I’m going to add in EGFR as our gene, and our variant of interest is L792P. I’m gonna go ahead and launch that search. Fortunately, we don’t have 240 articles to start with, but we do have 26 right here. Using a similar strategy as the fusion example, I’m going to fold in some categories to increase the specificity of the search. I’m going to navigate back over to the filter categories tab, go over to clinical significance, go to the therapy tab here, and I’m going to add in “drug resistance” and “resistance.” I’m going to submit with chosen filters. Just to cover what we’re doing in Mastermind, Mastermind is searching for the gene and variant L792P, and the two filter categories which address resistance. We go from 26 to 16 articles, and we do see some acquired resistance.
We see EGFR inhibitors, that makes sense, and we also see a drug a few times in these articles: osimertinib, which is a third generation EGFR inhibitor. I’m going to go ahead and add that to the search. Now, Mastermind is looking for the gene and variant, the therapy, as well as the resistance keywords. We have 10 articles that talk about resistance related to L792P and osimertinib. I’m now going to pass it over to Britt.
BRITTNEE: Yeah, thanks! So we had actually gotten a question that came in from the audience, asking about annotating entire variant files, so doing some of what we’ve been discussing, but in a much higher throughput manner. Yes, we have an ability to do that. We actually have that. I wanted to go ahead and start here from the Mastermind home page, so that folks know how to navigate to this information. Right up here in the toolbar, you’ll see this little API link. What that’s going to do is take us across to the API site that we have. This contains information about how to automate. Here in the top, obviously, there’s just some documentations, but I’m going to direct us all the way down to “file annotations counts,” because that’s where you can find the information about uploading a VCF file. This is something that can be done by engaging with our sales team, and we’ll get you a key. That key will allow you to upload files to us. What you’re going to do is push us an entire VCF. It can already be annotated with information. The system is going to drop two additional tags in there. One contains the number of articles found for any singular variant. The instructions for this, I’ll tell you — again, we got to go back to the documentation. When you upload that file, you’re going to tell us what build that is. We will need that information from you. Then, we’re going to give you back the number of articles found for any singular variant, as well as a URL to go view those.
That allows you to click through. You can put that somewhere in your system, and that can enable a click through. That way, you don’t have to do all the typing that Denise and Dan have covered into the search bar. That’ll already be pre-loaded. I’ll just jump back over to Mastermind here. Anytime that you’re actually clicking on variants or doing any of the types of searches that we’ve been talking about today, what I’m doing is making a URL. I know Denise kind of covered this too, with the help section. This URL is specific to what I’ve done. You can see here, I actually clicked on the example from the home page. That’s going to have all of those different filters. It’s going to have any of that therapy information that Dan covered. All of that’s going to be here in this search bar, so what we’re doing is just pre-configuring that URL.
This covers the entire files. We have individualized endpoints for some of that higher-level information. Dan was asking about therapies, so we have a therapies endpoint. If I want therapies back for inserting my two fusion partners, we can automate anything that’s been done, but the question came in about the file annotations endpoint, or our VCF upload. I went ahead and started with that one, but again, if this is something that anybody wants to try, then please just engage with our sales organization. We’ll work to get you a key for this. You can even pilot that, so send some examples! We’ll walk you through exactly how to make this work for your workflow, because again, there’s a lot of information, and we just want to make sure we tailor it right for you.
With that, I think I’m gonna turn it back over, and we’re actually going to start the Q&A here. I’ll stop sharing my screen, as I know that we’ve had some questions coming in, I’ve been able to see as we’re going, and so we’ll keep going there.
KATE: Awesome, thank you, Britt! I’d like to welcome back our speakers so that we can start with some questions that we’ve received. Just as a quick reminder, if you haven’t created a Mastermind account yet, you can do so by using this bit.ly link that you’re seeing here on the screen. Also, feel free to reach out to us at any time with any questions, or if you need a demo for you and the team, we’re always available at email@example.com. Great information, everybody. Let’s get to some questions! Let’s see. “When I search for an intronic variant, why is the ClinVar integration displaying a different variant?” Who wants to take that one?
DENISE: I got this one. I’m going to share my screen here so I can actually show what’s going on. I’m going to drop in a search example here of an intronic variant to show what it is that’s happening right now. I’m gonna paste this in, launch my search. Here, up in the ClinVar section, we have the display here for the variant. This one, actually, is matching my change. That’s kind of answering the question already. You want to look at how you have entered the variant in the search bar, because if you enter it as a c dot, like I’ve done, there should be a match there, but if you notice — and I’m going to retype the variants so that you can see this one more time — when I type in that variant, it actually recognizes it two different ways: one, exactly how I’ve entered it in the system, and then this other way, where we’ve designated it as it as F244sd.
What we’re doing here is actually grouping all of the variants that are in the splice donor in this example. That’s what SD stands for. The reason we’re doing this is that our philosophy around here is sensitivity first. We don’t want you to miss anything. We want to show you all of the possible information that can help inform your decision. By grouping types of variants together, like variants in this splice donor, you have a really holistic view of all the variants that are possibly at those positions, which would be +1 and +2. If you’ve launched your search this way, where now I have SD in the search bar, you’re actually launching the search for a group of variants. It can be really easy for Mastermind to choose that route for you. Even if you have a cDNA change in mind, if you’ve launched the search this way, you’re essentially not declaring a specific change. You’re saying, I want to look at everything in that group. I might have in my head, +1 del, but what I see here in ClinVar is +1 G>A. You might be thinking, that’s not what I want.
What you want to do is click this drop down. Because you’ve searched for a group of variants, this drop down will show you every variant with a ClinVar record that’s in that splice donor. You’ll see things like +1 G>A, this is the one that I had previously searched. It is, in fact, there. Then, of course, other variants that are in ClinVar as well, within that splice donor. If you want to change that display, you just have to click the drop down, click the one that you want, and then this link gets updated so that it actually will take you to that ClinVar page for the +1 del variant. Just be cognizant of what’s in your search bar, and if it is the group or the bucket of that variant, just know that that probably explains why you’re seeing something different up here. There is the drop down, and if there’s a ClinVar record, you’ll see your specific variant there in the drop down.
This reminds me of a similar question that I’ll just cover right now, because I’m here and sharing. I’m going to go back and add my variant in as my c dot variant. I’m going to launch that, and the question that we get around intronic variants is, why am I seeing variants that don’t match the exact one that I search for when I’m looking at the full-text matches? I searched +1 del, why am I seeing other things? It has to do, again, with that bucketing system that we use, that sensitivity first approach. If you are truly only interested in the exact specific nucleotide that you searched for, search for it as a c dot like I’ve done here, and then we’ll actually designate papers that have an exact match with this icon here. This crosshairs or target symbol indicates that, in this article, there is a nucleotide-specific match for the variant that’s in the search bar. If I click on this, in the full-text matches, I can expect to find my variants somewhere. If I’m scrolling through, I can see, so this one’s not a match. That’s not what I’m looking for, but this is. Sure enough, my variant is there, but we’re also pulling in matches for other variants at that same SD splice donor site.
We aim to be maximally sensitive, so that’s why you’re seeing it there, but we will always prioritize the articles that have this target crosshair symbol. Those will always float to the top. If you’re really only interested in that specific change, and don’t want to see anything else, you wouldn’t need to look at these other articles that don’t have that crosshair symbol, because you won’t expect to find the exact variant in there. What you will find in these other papers that don’t have that crosshair symbol are other variants in this splice donor. There’s that variant again, that +1 G>A, so that’s what you’ll expect to find when there’s not a target symbol. If you really are hyper-focused on your specific nucleotide, then look out for this target symbol.
KATE: All right, thanks so much, Denise! I like how you combined those two things, we were able to knock both of those out. Let’s see here. A question about supplemental data, got a feeling Dan would like to answer this one: “Can you help me find my variant in the supplemental data?”
DAN: Oh, I absolutely can! I am very happy to chat about this. I’m going to share my screen for a moment. In May of this year, we actually released some enhancements to the way we display supplemental matches in Mastermind. Before I get into how to find your supplemental match, and where those enhancements are, I just want to give a big shout out to our development team. These new enhancements and these new features would not be possible if it wasn’t for the incredible work that they do on a daily basis.
With that being said, I’m going to start off with a gene and variant search NAA15, which is the gene of interest, and then K52X, which is the variant of interest. I’m gonna go ahead and launch this search. If we navigate over to the articles section here, on the left hand side, we see a few different symbols. These symbols represent different things. This white paper represents matches in the PubMed data, which is the abstract. This black paper represents matches in the full-text, and this paper clip is matches in the supplemental materials. If you get a paper with a paper clip, you know that we have matches in the supplemental materials. I’m gonna go ahead and click this paper here.
I’m going to scroll down, and it says right here NAA15 p.K52X has been found four times in the supplemental data for this article. Prior to our update in May, we only displayed this line right here when we found supplemental matches. Now what we do is we show the exact match in the supplemental file, the exact nomenclature that’s matched. Even though we searched this at the protein level, we still get the c dot nomenclature. We show how many times that nomenclature was matched. We show the supplemental file where that match was found, as well as the tab in the supplemental file.
Let’s use this first example. I’m going to copy this nomenclature, because that’s what was found. I’m going to click this little hyperlink here to launch the PMID. Once that loads, I’m going to click the full-text free and scroll down to supplemental materials. Just as a reminder, we show the exact supplemental file name. Just going to double check. Okay, this is in data set five, so I’m going to click supplemental materials, scroll down to data set five, and then in the search bar, I’m going to paste in the nomenclature, launch the search, and we see, on line 1796, we found our match. Now, without that navigation tool, you would have had to go through multiple supplemental files, as seen here… oh wow, about 20 of them, and then look for the C dot nomenclature. Luckily, we show where all of that information lives.
KATE: Awesome. Yeah, it’s a great new feature that we just launched, as you mentioned, and I want to echo that shout out to our development team. It was a lot of work, and we’re really excited to have this feature in Mastermind now. Another question that we’d like to address: “How should I decide if I should search by protein, cDNA, or position?”
BRITT: Yeah, I can jump on that one. Our system treats each of those like a filter, so if you’re coming in, it really depends on the information you want. If you come into the system with genomic coordinates, we’re going to give you any transcript and any gene at that location. Should you specifically want that in one gene, you can also just add that gene. You can come in with a genomic coordinate, add a gene, and that will then give you any transcript that overlaps. You can then specify down, so this is, again, how we do sensitivity first. Then, you can apply specificity filters. It’s the exact same principle here, if I choose one cDNA nomenclature, I have then said I am only interested in transcripts where this is true. At c.127, it’s a C>A.
I can go one step even further, because that’s still depending on where that is in the protein, in different transcripts, depending on where that is for splice sites, that could still result in several protein changes, so several transcripts where that could still be true. If you’re really interested in just the one, you can then use the protein. So that’s really the most specific. Each of these is acting, basically, as a filter on the other. If I say, if it’s genomic coordinates, any gene, any transcript. With cDNA, any transcript where that cDNA is true, protein, that one change. That can still be a couple of transcripts,but you can see how that that’s a funnel, getting more and more specific. So the answer to that question is, what do you want? If you want the most specific, you’re going to go all the way down the funnel. If you want something more general, don’t. As Denise showed, the one time to take this into consideration also is intronic variants. Obviously, you’re coming in with cDNA, but again, you can then choose, do I want anything in that splice donor, or do I just want my one specific change? You can do one specific exact change.
KATE: Okay, awesome. Britt, you can probably answer this one too, since I’ve got you. “I’ve seen links into Mastermind from other software. How are these created?” And this person says, “I use GeneCards as part of my normal workflow.”
BRITTNEE: Yeah, so you’ll see this across several different tertiary analysis partners that we have, through other partnerships, things like GeneCards. What they’re doing is, actually, we provide them a file that they use to annotate in their system, depending on the partner. What they’re doing is saying, either at the variant level, or with GeneCards, at the gene level, they’re providing a link directly to Mastermind. They’re saying, there are x number of articles for a specific gene within Mastermind. Those files enable a very quick link across. It makes it super easy, so that you can come to us. The one caveat I’ll put in: for some of those, where we are providing that flat file, it’s virtually impossible to annotate every single potential frameshift variant that could equate to a frameshift variant. You can do a one base insertion, a two, a four, and all of those are going to result in a frame shift. If the author just says, I have a frame shift at position 123, unless they specify cDNA change, we don’t know what that means. It’s impossible for us to get all of the variants into those files.
I will tell you, if you have a frameshift-type variant or copy number variant, something of this nature, come search us anyway. That’s my one thing. But those links are just provided to make it easier to get to Mastermind. You don’t have to fill in the nomenclature when you come across it. Often, the gene and variant are then included.
KATE: Okay, great. Next question. I love that we’re getting lots of questions coming, and thank you everybody for sending these in! As I mentioned before, if we don’t get to your question during today’s live talk, someone will be following up with you. I also dropped in our chat a web page that we have that lists out our integration partners, so you can see the other platforms that you can get Mastermind results through. Another question we have: “I have a nonsense variant in a gene and I’m wondering about other nonsense variants in the gene. How can I find that?”
DENISE: I can take this one, I’ll show that in the application here so that you can see how to get there. We should be seeing my Mastermind home screen here, so I’m going to search a nonsense variant that comes to mind. Here, I launch the search, I’m just going to minimize some pain so we can focus on this section here. This is the variants table/list. It is searchable, and it’s sortable. When I search a variant, it’s getting highlighted within the table here. Here’s my r43x. What I can do, actually, if I was interested in all the other nonsense variants, I can just type in the letter x, because that’s how we would designate a nonsense variant. Now, this whole list is filtered to show only the nonsense variants.
I can take it one step further and click on this header right here, for sort by cDNA position. If I was interested in all the other nonsenses, I’d probably want to start with the ones that are close to mine. By sorting it by cDNA position, I can go check out, okay, this one’s two residues away; this is the number of articles that we have for this variant. One additional article over here, we have at position 52, we have two additional articles. This is a great way of navigating. This table is a great way to to look for specific types of variants. I just showed how to look for the nonsense variants, but you can also search things like “int” for intronic variants, “sd” for splice donor, or “sa” for splice acceptor. You can also search “fs” for frame shift. We have a lot of these shorthands, like those buckets that I talked about before, where we’re grouping these types of variants. It’s not necessarily one change, but it could be multiple changes, multiple variants that fall into that category. You can look at the articles for all of those with one click.
BRITT: Denise, for follow up on that one, where can they find all those? Those are pretty specific to us, so if they don’t remember “sd” or whatever, how can they find that information?
DENISE: Yeah, great, so right here, “sra” — I haven’t talked about that yet, what does that mean? You can go over here to help/FAQs. It’s going to take me to our FAQ page, and from here, I’ll be able to search. You can search within this document, specifically “sra”, because I don’t know what that is. It’ll take you to the variant nomenclature section, which is exactly where we explain what all of these things mean. SRA stands for “splice region acceptor,” so that’s where the splice region is, so both on the donor side and the acceptor side. We will designate what that means. Then, here’s a helpful little graphic to to explain how we would define these different regions. Splice acceptor, obviously, is going to be these two bases, and the intron is anything in between, but then the splice region is going to be three into the exon, and then positions three through eight. That is described here. One to three into the exon, three to eight into the intron. Those are all defined here in our FAQs.
I find this graphic to be especially helpful for wrapping your head around where a variant is. Because of this layout, a variant might belong to multiple groups. A variant right here, for example, would belong to the intronic bucket. It’s also intronic on the acceptor side, and it’s also in the splice region on the acceptor side. So just be aware that a variant in one position might classify under multiple buckets at the same time.
KATE: Thanks for explaining that! Moving on to some questions from our audience. I think this would be a good one for Britt: “What database are you using to search for the articles? Is it PubMed, Google Scholar, or others?”
BRITT: PubMed. Our source of truth is PubMed. Once a PubMed identifier comes out, sometimes that is advanced publication articles. I say that because people are always shocked when they see a publication date in the system that is in the future. That is a thing. The minute that that information is indexed in PubMed, we’ll start pulling that through, so that does include a lot of articles from all sources. That usually yields somewhere around 10,000 new articles a week. Obviously, that’s in flux. Around Christmas, people publish less, so you’ll see some numbers go up and down, but that information comes out on a weekly cadence. Those new articles are indexed through PubMed, and then we push those out to the system once a week.
KATE: Okay, great. We have a question from Natalie. Hi, Natalie! We’re happy you’re here. “Can you search for fusions containing one specified gene, and any gene partner? For example, searching for fusions containing p.1k3ca as a c terminal partner with any gene as the N terminal partner?”
DAN: I can go ahead and take that. Using the same technique that I showed for the fusion example, inputting one gene into the search bar and then using the filter categories to select gene. If you’re looking for specific information, such as C terminal partner or N terminal partner, we don’t have those categories in Mastermind, but you can go ahead and use our text function search to look for that specific information. You can do pretty much any fusion search, for either one gene, or a gene and a gene partner, using those filter categories.
KATE: Okay, great. Next question: “Can you search by genomic coordinate?” I’m not sure if we covered that yet.
BRITT: Yeah, I think that there’s a couple ways to take in that question and answer it. Denise covered, can you search for genomic coordinate, for, say, deletion CNVs? Yes, absolutely. I spoke to the fact that, yes, you can search by genomic coordinate if you want to search for a singular change based on genomic information. For both of those, as Denise had highlighted, you can find more information in the FAQ. That will give you the exact formatting of any of that type of information. Now, the other one that you can search by — and I think the answer was covered in sort of a roundabout way, so we can just sort of sum it here — can I search by genomic coordinate for any change? The answer is yes. What you can do is insert a singular genomic coordinate change, and as Denise showed you, you can search that table. What we’re going to show you is every other variant at that same position. She covered how to use search in that table by variant type. You can also just put in a number. If I searched by c.127 C>T, I can go down into that search bar and say, hey, I’m looking for any other changes at c.127, and that will get you to basically search by location.
There’s also a way, I will say, to do this. We have scripts to do this with variant files in the API. User interfaces are more of a one shot, one at a time. I’m looking for one position, and then I want to see everything else in that position. With the API, you can give us a file and say, okay, I’m looking for genomic changes, and that does come in as coordinates, because, again, VCF files are genomic coordinates. You can give us that file, and there is a script that then says, now give me accounts and links for any change at this location. That one will be a bit more expansive, and can also be automated.
KATE: Great. Someone had just asked what other automation potential exists in the API, so I’m glad we covered that. Good timing. Is there another automation, potentially, we would like to speak to?
BRITT: Just to expand on that, there’s a lot of other potential there! Please engage with us on that one. We do all kinds of work through our API, gene-disease relationships, adding phenotypes to VCFs, etc. There’s a lot of things that we aren’t covering. General answer is, if it’s doable with the user interface, it’s doable in a high throughput manner in the API.
KATE: Another question is, “Do you have performance characteristics for the API variant annotation function?”
BRITT: That’s hard. Yes in that there’s multiple ways to query everything. We can talk through a specific workflow. Yes, we have performance characteristics around things like uptime. We have, how quickly can you query things of that nature, how quickly can we get you back information. That’s more performance side, but it really comes down to how do you ask a question. That same specificity, where I said you can come in with genomic coordinates, you’re going to get everything, you can come with cDNA — all of that’s true in the API. We’d need to walk through a specific workflow to really understand, and to give the correct answer back for that.
KATE: Okay, great. Again, contact us if you have any questions about our API, because there is a lot we can do for you. Michael wants to know, “Is there a way to search for small indels in a gene?
DENISE: I can take this one. Yes, you absolutely can. One example that just pops into my mind is the CFTR example. You could search that with the c dot nomenclature, so that just HGVS nomenclature, so c.1521_1523 del. That kind of structure is absolutely acceptable and recognized in Mastermind.
KATE: All right, thanks! “Why would a paper not show up in a search, or be missed for any alterations within that paper?”
BRITT: I can go ahead and try to cover that. It, again, comes down to how you searched, but if you have a specific example, please let us know. I would absolutely recommend that you contact us at the top of the Mastermind page, or just email us at firstname.lastname@example.org, and we can look into it. I will say that one of the most common reasons for this is that the nomenclature is slightly off. Oftentimes, there is a typo in a PDF. There is something that happened for us not to do that. We do have mechanisms in our system to go ahead and update that type of information, so we can work with you to make sure that we index.
I will expand this — another common question we get is, “do you take in historical nomenclature?” The answer is yes, but in case that you think we have missed something, people were leaving out “Exon 1” 10 years ago, and that means there’s a 40 base shift. If you think we don’t have that 40 base shift in our system because you’re not seeing it, please let us know. We have the ability to do that on a gene-specific manner, so we can say, okay, we know in one gene, there’s a 40 base shift, or an author left out the first nucleotide or didn’t count it. We can absolutely take into account that type of information. So, “it depends” is the answer, and we’d love to hear or see your examples, because we’ll make sure to correct that in our pipeline.
KATE: Okay. Here is a question: “Can we use wild characters to search for a related variant, like c.1234?a^g ?” We might not know the answer that one, but we’ll have to follow up with that one.
BRITT: Well, it’s the same idea as what I’d said before: you can search one variant and then use the table to look at any other variant.
KATE: This is a question we do get quite often: “Is the curation done by deep learning/NLP methods, or is there a human touch professional validating and signing off on the variants from the papers?”
BRITT: I can go ahead and cover that. Yes. Both. In the system, what we’ve been highlighting today is really the search capabilities around our genomics language processing. That genomics language processing understands genetics, and then searches all articles. Like I said, we understand gene-wide shifts. We understand that type of information. As our sales team likes to say, it’s basically like Google Scholar went and got a PhD in genetics. That type of automation is an automated return. It’s like an AI-type return.
Now, we also have curated data. We didn’t highlight it in this one, but we’ve done other webinars. If you have any questions about this, please feel free to ask. We have curated data. Those are collected by trained professionals, clinical genomic scientists, that are going through. Mastermind returns to them, saying, this article contains your variant, and talks about functional changes, or talks about fusions, or whatever it is that Dan and Denise had shown. Then, that person goes through, reads those sentence fragments, reads that part of the article, and says, yes, I agree. This is in fact a good functional article that had all the proper controls. They will then tag that article as a functional change article for the criteria for ACMG.
If you’re seeing curated data, where there is a provisional call in the system and it comes up in that top header toolbar, if you see that, those are put out by trained professionals. We have an amazing curation team who does this at a rate that I have never seen. Their speed is incredible, that they are turning this around. If you see that, those are people, those are trained professionals. If you have any questions about those, or want to give us feedback, there’s a way to do that, but if it’s just the system return you’re seeing, sentence fragments, that is the AI, the genomics language processing.
KATE: Okay, I have one more question about the API since I’ve got you here. “Is identifying where the matched term is, identifying which supplementary file it’s in, available for the API integration?”
BRITT: Fantastic question. Not quite yet! That is something on our roadmap. The way we tend to work is, we make things available in the UI, test it around a little bit, and then make it available in the API. The API can act a little bit like floodgates. We want to make sure that everything is functioning before we make that available in the API, but that is on our roadmap, and a fantastic idea. We totally agree.
KATE: Okay, great. I knew you’d have the right answer for that one. We’re just going to wrap up some things here today. I want to thank Britt and Dan and Denise for all of your help and answering all the questions. Thank you, everyone, for asking questions and joining us and being here. Just as a reminder, this will be recorded, and I’ll be sending this in an email to all of you later on today. We’d like you to save the date for our next in-person user group meeting, which will be at ASHG this coming November in Washington, DC. Also next month, our next webinar will be July 13th, with two very special guests from Intermountain Health. Stay tuned for more information about that webinar. Again, thank you so much for being here! If you have any questions, please feel free to reach out to us. After this webinar ends, you’ll receive a prompt to fill out a short survey. We really appreciate all the feedback that you share with us, so again, enjoy the rest of your day. Thanks for being here!