SEARCH COMPANION FOR CHROME | THURSDAY, MARCH 31, 2022
Mastermind Search Companion is a Google Chrome extension that seamlessly integrates Mastermind with 16 traditional online resources, including Google Scholar, ClinVar, HGMD, and COSMIC. By automatically displaying results in Mastermind in response to search queries on these platforms, Search Companion delivers all the data you need—so you don’t miss a thing.
In this Masterclass, Genomenon’s chief technology officer Steve Schwartz and director of customer success Dr. Brittnee Jones will demonstrate the functionality of Search Companion and the benefits it provides for anyone wishing to take genomic variant research and interpretation to the next level.
You will learn how to:
- View the most comprehensive results in Mastermind on any of 16 platforms
- Quickly explore articles in greater depth with one-click access to Mastermind
- Access Disease-Specific Curated Content to inform patient diagnosis and treatment
Founder & Chief Technology Officer
Brittnee Jones, PhD
Director of Customer Success, Genomenon
With over a decade building and leading customer success teams across the NGS space, Brittnee ensures rapid product adaption and maximal value for Mastermind users.
GARRETT: Hello everyone, and welcome to the Mastermind masterclass on using Search Companion for Chrome! My name is Garrett Sheets and I’ll be your host. Search companion is a Google Chrome extension that integrates Mastermind with 16 traditional online variant research tools, including Google Scholar, ClinVar, HGMD, and COSMIC. Today, Genomenon’s Chief Technology Officer and our Director of Customer Success will demonstrate the unique functionality of Search Companion and the benefits it provides for anyone wishing to accelerate their workflow. They’ll also show you how to find and install this extension through the Chrome web store. Keep in mind that today’s webinar will cover Professional edition features. If you don’t already have a Mastermind account, you can create one today at the bit.ly link you see on the screen and start with a free trial of Mastermind Pro.
We have a lot of great information to share, so I’ll get right to housekeeping and then into our introductions. If you’re joining us live today, feel free to drop your questions into the Q&A, and if we have time we’ll get to those at the end of the presentation. Also, today’s discussion is being recorded and will be emailed to you after the event. You’ll also see in the go-to-webinar interface an area for handouts containing two documents. One is an informational user guide and the other is a Mastermind data sheet. If you’re watching this after the event on our Webinars page, these resources are right below this recording. Now, without further ado, I’ll introduce our speakers. Today, we’re joined by Genomenon’s Chief Technology Officer, Steve Schwartz. Hi, Steve!
STEVE: Hi, how’s it going?
GARRETT: Good. We also have Dr. Brittnee Jones, who is Genomenon’s director of customer success. Hi, Britt!
BRITTNEE: Hi, Garrett, thanks for the intro!
GARRETT: Thank you both for being here to talk about Search Companion and to share your expertise. Steve is going to get us started with a general overview of Search Companion, and then Britt will join the conversation. Let’s get started!
STEVE: Alright. Mastermind Search Companion is a browser extension for the Google Chrome browser. If you’re unaware, a browser extension is essentially a plugin that you install once into your browser, and then it stays there doing whatever the extension does. In our case, the Search Companion extension works with Mastermind data to augment your searches for genes or variants in other websites. You install the Mastermind Search Companion extension into your Chrome browser, and then when you search on other websites, it will augment your search with useful gene and variant information in a small sliding drawer, a tab that shows up on the right hand side of your screen. You can click it to slide it out to show more information.
The purpose of Search Companion is to speed up your existing workflow without having to change it, meaning you can continue searching the sites that you already search for gene and variant information without having to intentionally go to yet another website to add more information to your search. It brings the information to you. You can also quickly compare Mastermind’s search results with the primary evidence of the medical literature with wherever it is that you’re searching on other sites or platforms.
Right now, the Search Companion extension works across 16 different sites which you can see here, a few of which Brittnee will demonstrate for us in a moment. Like I said, it shows a tab on the right hand side of your screen, which you can see in the screenshot here on the bottom left, and you’ll also see in the demos. You can click it and it brings out more information in a sliding drawer, which you can see on the next slide. We bring useful information about the gene and variant. What you’ll see in the sliding drawer is a protein diagram for the gene that you search for, showing the prevalence of specific variant citations across the protein domain from the medical literature.
In addition to doing that for all of the sites which Search Companion currently supports, when you specifically search in Google Scholar for literature evidence, the Search Companion also augments the individual article results in the Google Scholar results page with a Mastermind badge for that article in Mastermind. If you click the badge, it will actually open that article in Mastermind or that article information inMastermind with the gene and variant that you searched highlighted in the results for that article. This can be especially helpful. Again, it uses the power of the medical literature that Mastermind has found for these genes and variants in high sensitivity mode using our artificial intelligence engine and our genomics language processing to find those genes and variants in the literature, which can be especially helpful for rare variants, where you’re often looking for a needle in a haystack. That one article in Mastermind could end up being the difference between being able to make a diagnosis or not on the clinical side, or find useful evidence for other use cases as well. With that, I will turn it over to Britt to show some demonstrations of Search Companion with some real world use cases.
BRITTNEE: Alright. First, I wanted to go through how to get Search Companion. You should be seeing here, I just logged in and went to our website, Genomenon.com. I find the easiest way to grab this is the second little drop down here under Clinical. It says Mastermind Search Companion. When I click on that, you’ll see a little video, a walk through of the functionality of this. Most importantly for what I’m talking about now, you’ll see a big “download now” button right here. When I go ahead and go to that “download now” it takes me over to the Chrome web store. Obviously, I already have this, so it’s asking me to remove from Chrome, but you’re going to get a prompt to add to Chrome here. When you do that, you’ll actually get this addition here up in the toolbar. If I click on that, this is what Steve was talking about: we have indexed 16 other sites, you can see those here. Once that’s installed, that’s when you can start using that, as Steve mentioned, to augment your searches through your normal stream of workflow, if you’re on any of the other sites that we index.
I do have some examples that we’re going to go through today. First, I just wanted to start with going to Google Scholar. We find that’s a very common search tool that people use. What you’ll notice here when I went to Google Scholar is the pop-up of this little side tool bar. Obviously, I haven’t launched a search, so it’s just sitting gray right now. When I do finally launch a search, and you can see I’ve done this before, I just simply type my gene and my variant into that search, and then I get these articles returned. So, Google Scholar, for this singular gene and variant, returned six results. Mastermind itself actually returned 13. I can see both here with the little 13, as well as here in the toolbar. When I click on this, I actually get more information. This is the variant diagram that, if you’re familiar with Mastermind (and we’ll go through a walkthrough of that in just a second) you’ll actually see in Mastermind, as well as a very quick link to say, “hey, let me see that evidence.”
One thing to note here: What Steve mentioned earlier is that you’ll actually see the tiny little Mastermind logo next to all of the articles that are returned, telling you we are also indexing this article. With Google Scholar, you do get a limited view, a little bit of the sentence or a little bit of the article where we’re finding a match. With Mastermind, if I were to click this, it will take me directly to the article. I can go ahead and do that in a second. It’ll take me directly to review this article and the additional sentence fragments that I get through Mastermind. So again, there’s several ways to jump across. It’s this view evidence button. I can actually just click on that little thing, or if I want one specific article, I can go there from here.
I’m going to jump to Mastermind with this little view evidence button. What that’s going to do is launch that search, returning those 13 articles that it showed me in the sidebar in Search Companion. Brief overview here for those folks that may not be as familiar with Mastermind, or haven’t used it as much — what we’ve done is pre-launched a search using my gene and my variant that I had searched on Google Chrome. This was that same variant diagram you saw in Search Companion, and this is returning information about every single variant in the gene that I searched, which was MPL here. This really highlights hot spot areas of publications. This is where there are lots of articles that have been published about this 505 or 515 here. That’s not surprising for MPL, but this gives you information about all variants within this gene. Down here at the bottom, we also show some domain information. I forgot to say, it’s position on the x-axis here, and then citations for variant and on the log scale on the y. This little light bar that you can see, and I’ll highlight it right there, is actually the variant that we searched.
All of that information is then available in a table format down here. This is sortable; I just clicked on that little sort button there. This is where, let’s say there was very little information about a specific variant you searched. You’re actually able to look at other changes at the same position, and that can help inform your classification of this variant, even if there’s not necessarily a ton of information about the variant itself. This is also searchable. Just to show you a quick demo of that, so I can say, again, maybe there wasn’t a lot of information about my variant, so I want to say, are there other nonsense-type or termination codons near the variant that I searched?
On the top right now, this is a Mastermind-specific relevance score diagram. The size of the bubble here speaks to the predicted relevance, or how likely is this article relevant to your search. Every bubble there is one of those 13 articles that was returned. The bigger the size, the bigger the bubble, the higher likelihood that it’s relevant to your search based on your gene and variant, or any information you put into this search. In this case, it’s looking for my gene, my variant, how often are they mentioned, where in the article are they mentioned, if they’re in the title and the abstract, they’re likely more relevant to my search. The axes themselves on the diagram is just the year of publication on the x-axis and then the impact factor here on the y-axis. When I click on one of these articles, and you’ll notice our first return here is a very recent 2021 September article, the first likely relevant article here. This is because this is sorted by relevance for Google Scholar. The most likely relevant article here is actually an article from, I remember, it was — oh, there it is, 2006. So Mastermind was able to find a much more recent article in 2001. When we return an article, we actually give you these sentence fragments down here. You can see there’s a fair amount of the article there that’s being displayed to give me information about that. In this case, the authors are very clearly here stating that they believe this is a disease-causing variant, so this may be something interesting for me to look at more. If I wanted to I could jump out and look at the full article there via that PMID link.
So, Steve, I just wanted to bring this here. I mean, you helped design the relevancy score, so what else would you like to add about that? I did a little brief synopsis but I know there’s a lot more components to that score.
STEVE: Yeah, absolutely. We’ve done a masterclass, I believe, where we’ve talked a bit more in depth about relevancy. We also have a blog post on the Genomenon blog if anyone’s interested in looking more into it. From a high level, I’ll give a brief description: essentially, we score articles based on relevancy across two primary components, which I often refer to as intrinsic versus extrinsic relevancy. The first component, intrinsic relevancy, what I mean is intrinsic to the article. Intrinsic properties of the article meaning looking at things like, how recently was the article published? If it’s more recent, then it tends to be more relevant to your use case, of having less likelihood that you might know about the article already, a higher likelihood of providing you new information about whatever it is you’re searching. We look at things like how recently the article was published, the impact factor of the journal in which the article was published, things like how often the article is cited and other things like that, that again are just intrinsic to the article, irrespective of the search that you’ve done. Then, the second component, by extrinsic relevancy, I mean extrinsic to the article, or really how closely that article matches what you search. In that case, we’re looking at things like, are the gene and variant that you searched for in the title of the article, or the abstract, or the full text, or just the supplemental data? Are they mentioned closely together with diseases or other types of clinically relevant keywords? Things along those lines. It’s really, how relevant is the article to the search terms that you used? Then, we take those two components of relevancy and combine them into a singular relevancy score that we use to prioritize the articles with the most relevant articles at the top, the idea being that if we have the most relevant articles at the top, then that translates to less time you need to spend really going through each and every article. If you do a search and there’s only two articles that have ever cited the variant that you search for, then the relative relevancy between the two articles is less important. But if there’s 20 articles or more, then it becomes really beneficial to have those sorted by relevancy. That’s specific to genomics. So again, this is a difference that you would get between Mastermind and something that’s more general purpose, like Google Scholar, is Mastermind understands the genomic concepts and components of what the articles are talking about, and uses that in its relevancy calculation.
BRITTNEE: Great, thank you, Steve! Okay, so that’s just sort of a brief overview, but it also shows you the power. For Google Scholar, we only had six of the results. Mastermind was actually able to pick up not just 13 articles, but based on that relevancy, prioritize a new article where they were even talking about that being a disease-causing variant. Google Scholar is one of the sites we index. The next one, I’m actually jumping over here to the next tab, where it’s a dbSNP. Again, you can tell we do this. It’s got that gray thing right there. In this case, I’m going to be launching, because I have an rsID from one of my assays and some of my analysis.
Here, I’ve launched that via the rsID. I don’t see any types of information here. With literature, often there’s a LitVar link if I do have any there. However, Mastermind was able to pick up a single article about this variant. When I jump across and I again click into Mastermind, what I’m going to see here is that single article. It’s pretty new, it’s from 2020 here, and I’m picking up that rsID. I’m also picking up another nomenclature for this variant, which is what I’m going to have Steve go through, some of the nomenclatures that we talk about. What you’re seeing here is that we search by rsID, but I’m actually returning information at the gene plus the protein level that results from that rsID. This is what Steve was just mentioning, and I’ll have him expand on: not only do we recognize all these different nomenclatures, but we actually pull all of those into one sensitive comprehensive search. Steve, can you talk a little bit more about all of the different types of nomenclature and the different ways in which we recognize variants, and how we are genomics-smart, if you will?
STEVE: Yes, absolutely! I mentioned earlier, our genomics language processing engine that we use to find genes, variants, and many other genomic concepts in the literature, including copy number variants, diseases, phenotypes, therapies, or drugs, and then other genomics-related categorical keywords and categorizations of articles. With variants specifically, as you know, there are many different ways to describe a variant. One of the hardest things about finding variant evidence in the medical literature is when you do a search, especially on something general purpose like Google Scholar, it can actually be even more of a challenge for other sites that do manual curation of literature. Being able to find the articles in the first place that are relevant to a given variant is difficult, because you’re trying to guess, when you do a search, whether it’s through Google Scholar or whether it’s manual curators doing the search, you have to guess all of the different ways that an author could have described that variant in a paper, if they did describe the variant in the paper.
There’s the ones that we often think about off the top of our heads, like the C dot (c.) nomenclature for the cDNA transcript description of a variant, or P dot (p.) nomenclature for the protein change or amino acid change that occurred with the variant. There’s the genomic coordinate description, using either GRCh37 or GRCh38, which obviously can differ. There’s also the rsID, which is just an identifier that was assigned from dbSNP if the variant was ever added to dbSNP. There’s more historical nomenclatures that aren’t used so much anymore, but again, when you’re looking for a rare variant, you need to know that older papers could have described the variant using something like IVS nomenclature, naming the variant according to the intron. If it was a non-coding or intronic variant, they could have used IVS nomenclature.
Even more important than knowing all of these different nomenclatures is also understanding all of the ways that authors can mess them up. You can get authors that describe variants as cDNA variants, but using the protein format, meaning instead of something like c.946 g to a, they say g946a, and they don’t say C dot. Even worse is that g946a could also be a protein description, since those are both single letter abbreviations for nucleotides as well as for amino acids. So that’s one of the things that Mastermind does and that our genomic language processing does, it analyzes something like this: c127t without any sort of C dot or P dot prefix or any mention of the transcript being used or anything like that.
One of the things that Mastermind does is it analyzes all of these variant nomenclatures and all of the different ways that a variant can be messed up. It will actually look these up across all of the transcripts for a gene, it looks at what genes the article is talking about, which gene the variant is likely in reference to, and then it looks across the transcripts for that gene and sees if that variant is valid in any of them. It starts to use this process of elimination, then it will end up with some candidate variants that this paper is describing. Then, it figures out which variant is the likely variant that the author is actually trying to describe. This is very difficult to do when you’re manually searching, whether it’s the end user manually searching or curator manually searching, or the end user being a curator who is manually searching.
These are all very difficult to do, and even when you have a really simple, let’s say, missense or substitution variant, let’s say r43c or something like that — I’m just making that up, I don’t know if that’s even a likely substitution here — but even when you have something, let’s go with r43l, something really simple like that can actually end up being described in papers and articles in over a hundred different ways by the time you figure out all of the different ways that each of these nomenclatures can be messed up or mixed up or spun around. We’ve seen it all. We’ve even seen things like an rsID, where the first and last digit of the rsID were transposed in a seminal first published paper for that variant. Then there were ten more articles that cited that paper and propagated that typo of the rsID for that variant, so you ended up with ten different papers all referencing a variant by the wrong rsID. Those are the kinds of things that we then add to and build into our genomics language processing. To bring this back around, Search Companion is bringing the power of this genomic language processing into the searches that you’re already doing on other platforms or other websites to ensure that you don’t miss anything and to help speed up your your search process.
BRITTNEE: Thank you, Steve, that’s exactly right. The reason I knew that one article was there, as I was looking earlier this morning, was when I started designing the searches and I realized that was one of the 13 that was missed, or outside of this six or the seven that were missed. I think that’s exactly what you were talking about, this author only actually mentioned this variant according to that shorthand of the cDNA that looks like a protein type nomenclature, and as such, Google Scholar wasn’t able to pick that up. We’re also used to, and I just want to bring up, at this point, using Mastermind makes it much easier. You don’t have to do that long string that everyone has designed into Google Scholar to make sure you’re trying to get all these types of nomenclature. It’s as simple as just installing the Search Companion, and then when you search any of those different types of nomenclature in Google Scholar or in dbSNP, that’s really what what we’re trying to push today: you can then use Search Companion to look at every other nomenclature that could have been possible.
That goes through just a quick dbSNP, again, that’s then applicable to Search Companion over here. The last one that I wanted to go through, and I guess I don’t have that up on my screen, is searching ClinVar. I think we’re all used to coming in here. One of the complications when you’re searching ClinVar is that you likely need to have either the cDNA level nomenclature or you need to know the gene and variant that you’re actually looking for. So let’s try this. I’m going to go ahead and launch that search. I’ll bring up this one specifically. Obviously, if I’m searching ClinVar, I’m probably looking for something about a pathogenicity calculation. I’m probably looking for a clinical significance or interpretation. This one, this singular variant here, is marked uncertain in ClinVar by a single submitter, there’s a single star here. What we’ve also introduced and are introducing more coming through Mastermind is actually curated information. You’ll notice that this Search Companion looks a lot different than the last ones we’ve seen. What we have is actually a provisional call of likely pathogenic for this variant. Not only are we showing ten articles here, where, if you look down, there’s actually less within this ClinVar submission. What we’re also telling you is that we have a team of interpreters, people behind the scenes, that actually looked at these ten articles that were found in Mastermind.
Based on several pieces of evidence, so this is just a quick summary here, they gave this a provisional classification of likely pathogenic. If I want to go see that information, it’s as simple as clicking this view interpretation button. What that’s going to take me to is a slightly different version of Mastermind than maybe what we’ve historically seen. What we’ll see here at the top is that summary-level information. It’s likely pathogenic provisional class. Obviously, we do not have any patient information, we don’t have any of the information that the curator will need in order to truly give this a classification, but based on these four pieces of information, our team has decided that this was a provisional call of likely pathogenic.
I’ll go through this screen a little bit, because it’s much different than what you saw before. Again, we have a team of interpreters using this SOP, if you want to see it here, that have looked at those ten articles, and said these four that they’re summarizing here support invoking these criteria that led to this provisional call, that likely pathogenic call. I can also tell that this was done very very recently here on the far right. It says that this was last updated January 14th, looking at those articles. For each of the articles, here’s the PMID there. This is the information, this is a snippet from the article itself that our curator has determined, that we believe supports this criteria with multiple cases in this instance.
Down here on the left, you will also see some of the additional databases we brought in, why we invoked PP3 (Predicted Damaging tag) down here. These are the different computational algorithms we used, but the main part of this classification here is this evidence. We’re hoping this dramatically speeds up any type of classification that a clinical user is doing, given that we look through those ten articles and are saying, these are the four you should focus on, and this is the area in the article that you’ll probably want to go look at and read. I can jump out to that simple click right there and jump across.
We are currently working on, or we have several of these genes online, so I think we’re at 17 at the current moment, where we have classified every variant within that gene. We’re going in a gene and disease-specific manner. If there’s several genes within a disease, we’ll release those all together supporting that disease classification and to give you provisional classifications to speed this up in your lab. Is there anything else, Steve, that you can think to add there?
STEVE: No, just, you showed the intrinsic and calculated portions on the left. I did just want to emphasize that when we create these provisional calls, it’s not solely based upon the literature evidence, which obviously we have, but we are also including those calculated characteristics and categories of ACMG classification as well.
BRITTNEE: Yeah, that’s a great point. For this, I also wanted to say, what we’re also doing — I’m just going to search another variant where I know some information like this is going to come up — we’re also offering, should these exist, treatment options. In this case, this is a variant with an ENPP1. There’s actually a clinical trial that’s ongoing, and so we’re pulling up that information here with this treatment options slide. For ATP7b, there doesn’t happen to be anything, or Wilson’s disease, which is the first one I searched. I wanted to go ahead and search this variant too, to show you that we’re looking through all of these different registries and saying, okay, there’s a clinical trial ongoing. You may want to include that information for your patient. In this way, we’re also trying to facilitate ensuring that patients are getting the best treatment. Not only do we now know this as a pathogenic variant for this disease, but we’re being presented with treatment options so that we can give our patient the fastest turnaround to get that diagnosis, and hopefully get some therapeutic intervention.
STEVE: One other thing I guess I would add, as well, is that this is a fairly recent addition that we’ve made for Mastermind, including these disease-specific content curations for several disease/gene pairs. That’s one of the things that’s so powerful, I think, about incorporating this into Search Companion. It makes it really easy to ensure that you don’t miss this kind of data for the variant that you’re searching, so you don’t have to necessarily remember to go out to Mastermind and search it, and see not just what evidence exists for the variant, but if there are any curations available for that variant or interpretations available for that variant. You don’t have to remember to do that. You install the extension once and it will come up wherever you search for that variant.
BRITTNEE: Yeah. Then, the final button I just wanted to show here, to ease or to facilitate putting this onto a report very quickly, it’s simple. Obviously, I can download it, print it, copy/paste it out, but I can get this information directly onto my report with all of those PMIDs supporting that provisional classification, including the clinical trial down here, and to add that therapeutic intervention.
Okay, the last one I wanted to talk through, I think I’ll just jump back over to Google Scholar really quickly. Steve, I wanted you to talk through, because we’ve talked about nomenclature and some other information, the power of a zero. It’s something that we talk about a lot with Mastermind, or the power of even a one. I’ve searched the singular gene and variant within Google Scholar, and what I’ll see when I jump over to the Search Companion here is there is in fact one article within Mastermind. Now, this one article, it is older, but the reason that it wasn’t recognized in that Google Scholar search that I launched just a second ago is actually that the gene name here is ARTEMIS. This speaks to the fact that we’re recognizing all of these different types of nomenclature, but because we’re validating these variants as valid references off known transcripts for this gene, and the nomenclature for that gene, we’ve launched an incredibly sensitive search here.
We’ve looked at every gene nomenclature that’s been known, legacy, all of the different variant nomenclatures. In this fact, we’re returning, say, a one, or even sometimes we return a zero. There’s power in that zero, in time savings. That’s one of another reasons to really want this Search Companion. Even if you’re not finding something necessarily in some of these other sites that you’re searching, you can look at Mastermind and say, okay, that launched the most sensitive search. That used all of those nomenclatures Steve talked about earlier. I’m only getting a one or a zero return, so I have confidence in that return and can say, okay, I’m not going to spend any more time searching all these other sites trying for other nomenclatures, only to again come back to that singular return. Steve, any additions there?
STEVE: No, I think you described it great. That’s a great point. This is especially important, as you know, in the rare disease area, or on the clinical side of rare diseases, where that one paper could be the difference between being able to diagnose and treat a patient or not. We’ve actually got a couple of case studies up on the Genomenon website that describe instances when this actually was the case, when someone found a single paper through Mastermind that was not found anywhere else that led them to a diagnosis and a treatment plan.
BRITTNEE: Great point, thank you. Our site does have a lot of these different types of case studies, again, we would advise going there. There’s a quick link out to actually getting to install this extension, the browser extension. With that, I think that was all the examples I wanted to go through right now. I think we can start fielding some questions.
GARRETT: We do have some questions that have come in from the audience that we can start to go through and transition to our Q&A. Our first question that came in from the audience, Britt touched on this a little, but a good question we received: If we want to have the extension only work on specific sites to prevent it from reading web pages with PHI, how do we set it up so that only searches on Google and Google Scholar are read by the extension?
STEVE:That’s a great question. I would actually have two parts to that answer. The first is, one of the things that Britt showed at the beginning, when you first install the extension it will automatically pop this up, but you can also bring it back up at any time by clicking the extension button in the toolbar. There is a preferences panel that shows all of the sites that Search Companion currently integrates with, and you can disable or enable any specific sites that you would like. For example, one thing you’ll notice when you install it is that we also work with Google. If you don’t go to Google Scholar, or maybe you do, but you also will search for a variant just in Google itself, Search Companion can recognize variants that are searching Google and pop up this information as well, but we disable that by default because we don’t assume that that’s normally what you’re using Google for. We have it disabled by default, and you can opt into it if you would like. You can disable and enable whichever set of sites that you would like.
I would also clarify from the technical perspective as well that even if you are using it on sites that have PHI, the Search Companion extension does not work by reading the entire page or sending that data to our servers. When we build an integration with a site, we know specifically where the site displays the variant or gene name itself, and we grab that. That’s what Search Companion is actually looking at rather than looking at the whole page. On Google Scholar, it’s also looking at the article results but other than that, every other site, it’s just looking for the gene and the variant name in the page. It knows where to find those. That’s how we build the integration.
GARRETT: Nice. Britt, do you have anything to add to that?
BRITTNEE: No, just, when you bring that up, if you click in the browser extension as Steve mentioned, it just brings up your preferences. You could see, when I clicked on that, originally, there was there was an x next to Google to engage or disengage any of those. It’s as simple as clicking on the site that you wanted. You just simply click on HGMD as an example there, or SNPedia. I can just click on that little blue box with the check mark. It’ll then have an x on that, and we’ll no longer be reading that site.
GARRETT: Thank you. Another question, so, referring back to the relevancy score that Brittnee and Steve spoke to, one question is: Does intrinsic relevancy take into account experimental design and methods used to report results?
STEVE: That’s a great question. No, it does not, but maybe it should. I like that question. One of the things we do look at, we will look at things that the article is talking about. Another thing that’s useful to look at, for example, if you search a variant, we might see that an article mentions your variant once along with 300 other variants, or we might see that the article mentions your variant 20 times along with five other variants, once each. We can start to discern, and this is more of the extrinsic relevancy calculation, but we can start to discern how much a paper is about the thing you search versus just sort of incidentally mentioning the things that you searched. We can do the same for — one way that we start to get toward things like experimental design or methods is that we also have an ontology that we’ve created, of categorical keywords that we use to categorize the types of information that an article is describing or talking about. For example, we have a lot of categories and keywords pertaining to each of the different ACMG categories. You can filter articles, for example, that talk about in vitro studies or in vivo studies, or articles that don’t talk about those things, along with a lot of other, not just ACMG category oriented keywords, but other things like genetic mechanism or inheritance pattern or things of that nature that you can also filter down to. So we are getting there. I think we’re getting to the point where we’re looking specifically at descriptions of study design or experimental design or confidence intervals, or the types of analyses that are being done in the papers. Great question.
BRITTNEE: Yeah, I think that was one of the functions we didn’t quite go into today. I know we’ve done that in the past webinar that we’ve done or that I did, where we really dove into some of the utility behind those keywords. What Steve is talking about, that ontology of keywords we’ve built, assay types is one of them. So if I want an in-vitro assay, and I want it to mention all of these different assay types, or if I really care mostly about GWAS studies, and I really want it to be talking about GWAS studies in the return. I would check out the last masterclass we did to get a little bit more information in depth about some of those. Then, any of that information that you actually add to your search then is part of the calculation for the relevancy. We’re using that information that you add to ensure we’re bringing up the likely relevant information that’s talking about a patient, or whatever it is that you added.
STEVE: Another categorization that comes up a lot, for example, is germline versus somatic papers. Especially helpful if you’re on the rare disease side, for example, or really anywhere, but one of the use cases that’s really interesting to me is when you’re on the rare disease side and you’re searching a variant that’s studied a lot for cancer, but not for germline/inherited disease. It can often be difficult finding the one paper among hundreds that talks about it from a germline perspective versus a somatic perspective. Those are other things that we use the categorization of papers to help with, and that we’re looking to continue building out and improving upon that functionality.
GARRETT: Awesome, thank you both. These are great questions! The third one is: I am interested in novel gene-to-disease associations, not clinically, but for research. Can I do a deep search for gene and potential association with disease that may not yet be reported?
STEVE: Yes, absolutely! You want me to take it?
BRITTNEE: Either way, yeah, go for it.
STEVE: Sure. Yes, absolutely. So there’s a couple different ways to do this. In the user interface, when you do a search for a gene or a variant, we show the literature results. There’s a button at the top that says explore associations, and when you click that, it will show you the other types of genomic associations that are cited within those literature results, sorted by relevancy, what’s most often associated with the terms that you’re searching, the gene and the variant that you’re searching. One of the things you can do in the user interface is go to explorer associations, go to the diseases tab, and see a list of diseases that are being described in the literature, from the most commonly described at the top maybe all the way down to a disease that’s only ever been discussed in correlation with that gene and variant in a single paper. Then, you can click over and see what that paper is.
One of my favorite solutions to that is actually using our advanced API. With our advanced API, you can programmatically look through. You can start with a given gene or variant as your input, and you can output all of the articles that cite that gene and variant or all the diseases, and then you can actually go through each article and see, what are all the genomic concepts that were described in the article? Not just the diseases that each article describes, but the phenotypes, the therapies, the mechanism of action, or other types of categorical keywords that each article cited. You can use that information to filter or re-prioritize the articles based upon what’s interesting to you, what sort of questions you’re trying to answer, what problems you’re trying to solve. You can actually use the API with custom scoring. I’ve built a few scripts like that myself in doing research, and I will often build in what I call the “interestingness score,” which we’ve also discussed in another masterclass before, where I will have custom criteria that grades how interesting a given article would be to me. It does things like looking at the ratio of my variant to other variants, or my gene to other genes, or I might have a set of genes in a given pathway and I want to see the ratio of variants within those genes in a paper versus variants within other genes in the paper.
That’s really just scratching the surface. There’s a lot of different things you can quantitatively score once you’ve got the information from the unstructured text of the literature structured, once you’ve structured the information, there’s a lot of interesting things that you can do with that data. Then, what I would do is often use that to score the interestingness of the articles, and then for each disease association for a given gene or variant, I would look at the aggregate interestingness scores of the articles and how many articles tied that gene or variant to that disease. Then I could prioritize my disease list by that. So, lots of different ways to solve that problem. We’ve got a few of those ways, both in the user interface and the API.
BRITTNEE: Yeah, and to that end, the association’s interface is part of our professional licensing, but as Garrett has put down here on the screen, there’s a link to sign up for Mastermind where we give you an introduction period so you can test all of that functionality. Or, from there, you can actually request a full demo time or pilot time, where we can walk you through the software and you can see some of that. If you have those types, if that’s the scenario you’re looking for and you want more of that information, please reach out, because we can go through that with you.
STEVE: And to bring it back around to Search Companion, I’d point out that if you’ve got the Search Companion installed and you search a gene or variant on another platform or site, you’re two clicks away, essentially, from that disease list. You just click into Mastermind and then click that explore associations button. You’ve got the disease list right there.
GARRETT: Awesome, awesome. Our next question is: What about a CNV search? How does it work with Search Companion?
STEVE: Great question! Currently it doesn’t. That is on our roadmap, to integrate copy number variants into Search Companion. We don’t currently. I think one of the reasons it’s on our roadmap, and probably the reason the question was asked, is that it’s going to be really powerful when we do that integration. When you think about the problems that plague variants with the various nomenclatures that they have that we discussed, it’s ten or 100 fold for copy number variants, because they’re so large. They’re structural variations that are often difficult to describe. A lot of times, the ways that they’re described is either using karyotype notation, which can be difficult to parse, all the way to just plain language descriptions of the variants, which are sometimes easy for humans to parse and difficult for machines to parse. Every site or platform has a different way of putting or searching by those CNVs.
The genomic language processing engine and Mastermind understands them all. It even understands plain language descriptions of CNVs that authors often do. We studied a deletion of exons three to four in gene Y. Mastermind understands that and will actually normalize that information into a standard genomic coordinate, a start position and an imposition and an effect, if it was a deletion or a duplication or whatever. We normalize all that data so that no matter how you search for the CNV, whether it’s using just plain language or karyotype notation or genomic coordinates, you will see the results in the literature for all of those.
Another thing we can do with that normalized information is find very similar overlaps. For example, if you search for this CNV that starts here at this nucleotide and ends here, maybe that specifically has never been cited in the literature, but another one has that started 50 nucleotides later and ended 20 nucleotides later. It’s by and large the same CNV, but for whatever reason had different break points in the study that was done, or just in the description that was used for the variant. Maybe it was more convenient to describe it slightly differently, given whatever they used to find the CNV. That I think is really powerful, and will especially be powerful when we bring it to Search Companion, where you’re searching other sites for CNVs, and we can show you all of the information for that CNV, regardless of the way that that site decided that you should search for that CNV.
BRITTNEE: Yeah, that’s really the powerful part I find with the CNV searching, is that we’re looking at overlaps, we’re looking at the nearest CNVs we’re finding to what you searched. We normalize those, but let’s just say that, as Steve mentioned, that puts you in at base 50 and a deletion does 7,000. Whatever that is. Because of the different callers that people use, sometimes you’ll get repeats on the end, and somebody will put that one on this end, you’re going to get CNVs that are one base off, you’re going to get CNVs that are just a few hundreds of bases off, or thousands if they’re huge. We bring all of that information into Mastermind and display all of those CNVs, so even if you get a zero return on your CNV because somebody else did array technology or whatever and their probes were in different places, we’re going to give you back CNVs that map to the highest percent overlap of the CNV you searched. I think that’s a great point. Once we can get that into Search Companion, if you’re searching Google Scholar for your CNV, but there’s nothing that matches the exact ends of that CNV, we’re going to show you, but hey, there was one that actually had a 90% overlap because of those bases at the end, or their caller worked slightly differently.
STEVE: That problem also plagues even smaller CNVs, like single exon deletions within a gene. Britt mentioned searching that CNV in Google Scholar, and that’s an example where text searching can really fail you for CNVs. Let’s say you’re looking at a patient or you’re looking at data where you have a deletion of exon three. You might search Google Scholar for deletion of exon three in gene whatever. Maybe there’s no article that cites that, but maybe there is an article that cites a deletion of exons two through four. The article never even says exon three, it says two through four, a bunch which we know includes exon three, and they might have found a causal link to the type of disease that you’re looking for. That’s the thing that’s really easy to miss if you’re not using a platform that fundamentally understands the structures of CNVs, and in the context of evidence, really.
GARRETT: Lots of good info. Thank you both! I think we have about six minutes left, so I think we have time for one more question: Is it possible to do a query across two variants for different genes to see if there are articles which address gene variant clusters?
BRITTNEE: Steve, you want it?
STEVE: Yes is the answer. So another capability that Mastermind has is the ability to do boolean searches, meaning combinatorial searches across genes or variants. So, yes, you can search for any evidence of multivariant or even multigenic causes of disease by using that boolean search functionality. I don’t know, do you have any more to add to that?
BRITTMEE: I was just saying, and then it expands. You’re looking for the those two variants in the context of a disease. We’re talking about the associations that Steve was talking about earlier. We’re going to be able to say, I want these two variants and I want articles related to my disease, or I want these two variants and I want specific phenotypes that my patient has, because I suspect something about those variants. The other way in which we can actually do this, and to take that boolean one step further is, because we actually structure every single word in these articles, you can search specific words that you also wanted. We could layer in, let’s say, assay types, because I care if they’re in a GWAS study. You can come in and say, I need them to be talking about the genomics of COVID. That’s not a recognized disease phenotype or any term currently, obviously not. You can come in and say, but I really am interested in that, the context of COVID, and add that as a free text keyword. You can say, I’m only looking for articles that talk about COVID. We really have the capability to use not just the boolean ‘and’, you can do ‘or’, so you can give us five phenotypes and say, I want articles that mention any of them and the variant. We really have the expansions around that.
STEVE: Yeah, and with Mastermind understanding several different genomic concepts in our associations graph, you can do boolean searches, as Britt alluded to, you can do searches across association types. So not just variant clusters, a cluster of two or three variants that work in concert, but also combinations. You might say anything talking about gene A with this specific variant in gene B, or you might say this variant with this CNV, or this gene fusion with this variant, or things like that. You can look across association types with that boolean functionality as well.
GARRETT: Okay, I think we’re at a wrap. Awesome conversation, great information, Steve and Britt! Thank you again for sharing your expertise and your insights for Search Companion, and for everyone watching and tuning in, thank you very much. As a reminder, we will send you a recording of this presentation later on today. Also as a reminder, if you don’t yet have a Mastermind account, do sign up at the link on the screen to start with a free trial of the professional edition. If you’re using basic edition, you’ll want to talk to us about upgrading to get the most out of your subscription and also access to the most features. We’re always here to support you. If you have any questions, do reach out to us at firstname.lastname@example.org to speak with a Mastermind specialist. With that, thank you so much, everyone! Awesome conversation, and have a great rest of your day.