Mastermind Masterclass:
Genomic Associations

Wednesday, July 22, 2020

When it comes to the herculean task of making the entirety of genetic knowledge accessible for clinical care and drug discovery, indexing and making the evidence easily searchable isn’t enough. Mastermind transcends simply indexing the evidence by identifying Genomic Associations contained within the literature to make this work easier, faster, and more complete.

Mastermind leverages Genomic Language Processing (GLP) to annotate and organize genetic data into Genomic Associations that draw informative connections between Genes, Variants, Diseases, Phenotypes, Therapies, and Categories.

In this Masterclass, co-founders Mark Kiel and Steve Schwartz describe how to harness the power of this comprehensive evidence-based data for use in clinical workflows to answer specific questions about genes and variants, as well as in drug discovery to answer more open-ended questions and uncover associations that were not previously known to the searcher.

Watch Mark and Steve as they lead a discussion and demonstration of the power of Genomic Associations. 

 Topics discussed: 

  • What a Genomic Association is and how it can be used to inform clinical care and drug discovery
  • How to query the Mastermind Genomic Search Engine to answer specific questions with genomic associations (how genetic variants are associated with disease, what specific therapies can be used to treat patients, etc.)
  • How to query the Mastermind API with open-ended questions to discover novel genomic associations (identify all therapies associated with specific gene defects, all phenotypes associated with specific diseases, which variants are most likely associated with a specific clinical presentation, etc.)


Using the API, are you saying that variants can be prioritized based on disease or phenotype association?

Steve: Yeah absolutely! Yes, they can. A lot of the power of the associations in the API comes in that article info endpoint which Mark had shown during his demonstration where you can query a given PMID and see every gene, every variant, every disease, phenotype, therapy, that was mentioned in the article. What you can do is query for example, a given gene if you know what your disease is ahead of time, you can filter by that. If you don’t, what you can do is you can query by gene, get a list of the PMIDs, and then you can pass those PMIDs into the article info endpoint and in the code in my case, in the python code, you can keep track of all of those associations that were mentioned in every article, and then prioritize them however you want. That’s what i really did when i was scoring the articles as i was doing things like counting the number of variants in an article, counting the number of diseases in an article, and then i could combine those counts into a score and then sort based on that.

Mark: I’ll add to what Steve described is that we have a good working knowledge of how generic scores can be put to use for generic use cases, but those scoring criteria are fully modularizable. So Steve, when he puts in those scripts, uses his knowledge and his awareness of what typical use cases would benefit from, but every one of those aspects in the article info endpoint can be parameterized and reconfigured to develop a novel score that can prioritize to a specific effect. That’s a an understated power of the the API that Steve talked about when he put the scripting around those endpoints.

How would I use the API and Mastermind together? Or is it one or the other?

Steve: That’s a great question! Typically the most effective workflows that i’ve seen that integrate Mastermind tend to use both. The API is really good at automating the process of what I consider to be asking Mastermind questions and getting answers. As you saw in the example scripts that I showed, you can automate the process of querying, filtering, and prioritizing a list of articles, or evidence, or information that a curator would then investigate for whatever the use case is. If it’s evaluating a case, or if it is staying abreast of the latest knowledge or information about a particular gene, or any other number of use cases, often what happens is you have this automated workflow that creates a queue of tasks, or things for a person to look at, and with each one of the results in the Mastermind API, we provide a url into the user interface for exactly that reason, because then you can add that url into the queued task and it gives a jumping off point for the person who’s going to be investigating that particular task in the queue.

Could you explain the publication history panel?

Mark: Just for reference while you might be pulling it up, it’s the bubble plot in the upper right quadrant. That is highly dynamic and changes as you add new filter terms, so that’s important to remember. The way to explain it is on the x-axis is the publication date, on the y-axis is the impact score for that journal, and the icons that fill that plot – the circles – are individual references. I think the most potentially confusing thing is what does that size mean? The size is meant to draw your attention to the most relevant content so the bigger the icon, the more powerful – the more cogent those Genomic Associations based on your defined search are in that reference. Steve you can follow up if you want to expound on what I said.

Generally, it’s the upper right corner there and again dynamic, and really dramatically reprioritizes the content when you add additional key terms to focus your attention on the specific articles that are of relevance to you. At just writ large, the way that the various aspects that we use to qualify the relevance of the content has to do with where in the paper your search terms appear – title, abstract, full-text, supplemental data, etc. How often those search terms appear, for example in the case of a variant, is the variant mentioned 12 times? Or 50 times? Or just one? And to build those two together, is it mentioned once in the supplemental data? Or is it mentioned a couple times in the abstract? There’s a differential specificity and relevance when you take that into account. Then when you start to add more terms, how close together the terms are mentioned in the body of that article, that’s another parameter. Steve i don’t know if you want to build on any of those things that i said but those are three of the main ways that the content is prioritized and reflected in the size of each of the icons in that bubble plot.

Steve: Sure! If you think about it, the publication timeline is the x-axis, the impact factor of the journal is the y-axis, and then the size of the dot is really the relevancy of the article to your query. What’s interesting about that is it really depends on what kind of information you’re looking for. This plot can be a very useful tool in doing your investigation because the most interesting papers are often in the upper right corner of the plot, so we have a lot of users who love using the plot for their primary navigation of articles that they’re investigating where they’ll look at the the right side of the plot, or the upper right side of the plot. Particularly, the largest circles that are the closest to the upper right hand side of the plot are going to be the most relevant in recent articles likely to be new findings that would be interesting to you. Another use case might be if it’s a variant that you’re not very familiar with, you might be looking for the larger circles further to the left hand side of the plot as sort of the seminal papers that discuss that variant. It’s kind of interesting how you get this sort of regional navigation structure depending on your use case which determines where on that plot you might be looking.