Mastermind Masterclass: Therapies

The Mastermind Genomic Search Engine now includes the ability to search the genomic literature by therapy.

Searching by therapy provides significant value for oncologists making decisions on third-line therapies, where approved drugs or clinical trials are no longer effective for a cancer patient. Finding a comprehensive list of all therapies in the medical research tied to a patient’s genetic make-up is invaluable for treating these late-stage cancer patients.

Therapeutic search is also highly useful for clinicians making diagnostic and treatment decisions for patients with rare diseases. There are currently less than 800 FDA approved therapies for over 7,000 rare diseases, leaving many patients without approved treatment options. Mastermind eliminates the time clinicians spend scouring the medical literature to find therapies associated with the patient’s genetic profile and puts the answers at their fingertips.

Join co-founders Dr. Mark Kiel and Steve Schwartz as they lead a concise guided tour of the new Therapies search feature.

You will learn:

  • How to annotate your genetic variant search results with known therapies
  • The best way to prioritize search results based on all possible treatment strategies
  • Techniques to determine which off-label therapies have been associated with any indication


How can you prioritize references if you aren’t yet sure what therapy you are looking for?

Mark: I’ll take the UI Steve, but you can clean up the answer with the API, which I think offers a lot more flexibility, and I like the coining of the phrase, ‘interestingness’. I think you should trademark that! I’ll emphasize that what Steve described in his discussion of the script he made is extremely flexible and modularizable, and so Steve settled on some parameters for his interestingness score that satisfied the question that he was seeking to answer but that can be permuted in any way that you please. On the UI side, what Lauren showcased in the article list is by default organized by relevance, and if you remember the icons in the publication plot in the upper right, the size of the icon what Lauren referred to as a, ‘big article’ – meaning it has the most relevant and interesting content, and that’s reflected in the size of the icon – that relevance takes into account how much of the content focuses on the therapies, and particularly how much of that content focuses on the therapies with respect to treating that genetic causation. And so relevance is a parameter that’s built into the user interface and can be set as the default, and it can be set to showcase the most relevant articles to your search upfront.

An additional way that you can prioritize the articles would be by publication date. Depending on your concern, you may be aware of therapies that are useful for treating genetic disease that’s been known for 30 years, but you’re interested in the latest either a review article, or the latest things like gene therapies that Lauren highlighted in her examples, that can be brought to bear in the prioritized article list by modifying the sort parameter from relevance to publication date.

The last thing I’ll say is forward-looking to a user interface feature that were were were building out now in Professional, is to showcase some of those Association data that Steve alluded to in his diagram, and that Lauren highlighted in in her demo of the API, some of those associations will be put to a page in the user interface for Professional so that you’ll be able to see some of that itemized content for specific therapies that are really associated with different genes and diseases. Steve I’m sure there’s more territory that you can cover there with respect to the API but that covers a broad swath of the UI capability.

Steve: Yeah absolutely, so from a high level if you think of any given investigation or any given search the way that the data is presented that we have, it’s it’s basically a web of interconnected associations between the six different concepts that I had described earlier – the gene, variant, disease, category, keywords, phenotypes, and therapies – and so with each search it’s really about figuring out what your constraints or your input values are, and then deciding what you’re looking for as the output values. The user interface currently really helps you get to the
articles as the output value so that you can then look at each article and decide what you’re trying to get from it, and we try to pull that information through in the interface to make it easy to get to.

What’s really interesting with the API is we have different end points for each different set of outputs. We have an end point where whatever your input values are, we will list associated genes as the genes endpoint, or for those same sets of inputs give me a list of variants from the variants endpoint, or again, from the same set of inputs give me a list of therapies, or phenotypes, or articles, and so that’s one of the ways that the API makes it much easier to constrain your initial search. However, you can even if it’s only with one input parameter like a gene, you can use that gene to list out the associated therapies, and then what you can do is you can loop through each therapy and use each therapy as an additional endpoint. For example, gene and therapy 1, and then as your output maybe hit the articles endpoint, or perhaps at the variants endpoint to see if there’s a specific region of the gene or types of variants that you’re interested in effective therapies for. It’s really about figuring out how you can constrain your input in order to get a list of output – a list of options in your output – and then you can use each of those options as an additional input to further constrain your search. I think the first part of it is figuring out how you can sort of pivot your query to filter and refine and then once you’ve gotten the refinement that you were after with the set of filters that gets you to the information that you’re looking for, only then does it become a question of, ‘how do I now prioritize the information that I’m looking at?’ and so that’s one of the ways that both the user interface and the API prioritize.

We do a few things on the prioritization that are irrespective of your search. Things like looking at the recency of an open article, or the impact factor, or how much genetic content is in the article. Then there’s a second dynamic piece of the prioritization that we do on the fly based on your particular query where we try to ‘weight’ articles that talk more about the specific things in your query to the top, and how those things are related to each other. That’s where when Lauren showed in her demo using the categorical keywords to filter for therapies, it’s not only filtering it’s also reprioritizing based on articles with more therapeutic content, and so I think those two things together helped to sort of first refine and then prioritize the results to help you find what you’re looking for even when you don’t necessarily know what it is yet.

Are nutritional therapies including in the drug ontology?

Steve: Yes, they should be. We use the FDA UNII, the Unique Ingredient Identifier database from the FDA. It includes both pharmaceutical therapies and nutritional therapies I believe. I know that we’ve searched for a few of those and found all the ones that we search for, so I think it would be a matter of trying it and seeing if it If you do find something that’s not in there, we do have the ability to modify and add to it, so let us know if you find anything. If you type it in to Mastermind and it doesn’t come up as a therapy option, please reach out to us through the contact form and let us know.

Candace: I know there’s over 180,000 therapies in there so lots of lots of wonderful stuff to play with!

Is it possible and much easier to search for newly matched articles which show functional studies associated to a particular variant?

Steve: It is easy, I don’t know if what we’re talking about is easier than, it is easy through the API.

Candace: I’m assuming they mean now with therapies is it easier?

Mark: We’re emphasizing that searching in combination for the given keyword, or ontologic in our therapies index, in addition to in combination adding any of the functional categorical key terms. Steve highlighted the ability to perform these searches in a multi parameter search, or boolean search, that capability is possible also in the UI. I think the question when she says, ‘is easier now that we have therapies’, you are definitely able to combine a therapeutic term from the UNII ontology with a functional category, and then have all of those functional, empirical, in vivo, and in vitro studies that address the efficacy of that therapy in the context of those functional studies. I’d encourage you to combine the new therapeutic ontologic capability with some of the pre-existing categorical search capability that we have, and those references – if you stick with the relevance sort order – should rise up to the top and you should be in the top 10 articles, the most relevant content to that functional empirical evidence around a specific therapeutic.

Can you filter by biological pathway?

Steve: Not directly, not yet. One thing that you can do, one of the ways that I’ve done that in a couple of the scripts that I’ve written, is by starting with either a set of genes that I already know are in the pathway and passing those in as a group of genes into the API. Or, if I don’t know exactly which genes I want first, what I’ll sometimes do is you can either start with one known gene or even start with the disease for example, and query the genes endpoint of the API. If I query the genes endpoint, it’s going to give me a list of genes associated with whatever my input is and so my input could be a disease, or it could be a known gene, and very often the top genes that come back in those associations are the other genes in the pathway because those are the ones that tend to be most highly associated in the medical evidence. I usually would put that as sort of a pre function in my scripts to start with a disease, or start with a gene. The first thing I’ll do is hit the genes endpoint, get the associated genes, and then I can do the rest of my analysis with the set of genes in that pathway. That way is one way to do it.

Mark: One thing to add on to that that Steve showcased with respect to the interestingness parameter that he devised, which is still in that script, can be repurposed or modified or used as is, is the idea that not only do we find these associations but we afford the ability to quantitate the strength of those associations based on that evidence. It can be as simple as enumerating the number of papers citing that association, or you could go into much more specific depth by qualifying the nature of that evidence using the articles info endpoint. It’s really hard to overstate how flexible the API allows you to answer specific questions, and then do so iteratively. So answering a question like, ‘is it possible to search on pathways?, just as Steve described and in any other permutation that you’d like to address, Steve talked about knowing the pathway, a priori, but also discovering the pathway whether it’s direct and canonical to a pathway of interest or indirect. Mastermind affords the ability to recover that evidence, as well as discover new evidence particularly using the API as it allows you to do things in very throughput.

Because it’s just us, the Mastermind community, I’m gonna throw in a bonus question of my own. We’ve been talking about some exciting stuff around what Mastermind could possibly contribute to our current situation with COVID 19. There’s gonna be some stuff coming out about it and we’re really excited, but do you want to just give us a little taste of what Mastermind can contribute?

Mark: Sure! Our focus is not directly on infectious disease, so when people think of viral genetics or bacteriology and parasitology as it has bearing on human disease, it’s facile to think, ‘well, let’s look at the genetics of those different organisms’, so we don’t cover that content. Mastermind does play a role in helping understand the host response to in this case, a viral infection. One of the most obvious things is the viral receptors on human cells which are now widely known to be ACE2 and TMPRSS2, widely expressed throughout the body, which is probably an explanation for the not just respiratory manifestations of COVID infection. One thing that piqued my interest as soon as information was coming out about COVID, as I’m sure it did any geneticist or physician for that matter, was the idea that there are some patients who are asymptomatic, and not properly patients, but rather carriers. There’s clearly risk factors associated with morbidity and mortality associated with infection which isn’t surprising given compromises to underlying health, but the thing that was most intriguing was the idea that there can be asymptomatic carriers. My first thought was, what genetically sets those carriers apart? Based on what I described as host viral receptors ACE2 and TMPRSS2, it’s logical to think that there may be polymorphisms in those two human genes – proteins – viral receptors – that confer a resistance to infection, or diminishment of the consequences of that infection, or otherwise outside of the viral host direct interaction, any aspects of human immune biology is susceptible to being highly polymorphic across different individuals. In the case of COVID infection it seems to be the case that the exuberant immune response is actually what leads to patient demise, and it’s quite reasonable to presume that there’s some genetic risk or polymorphic differential response in association with COVID infection. As Steve and Lauren highlighted, and I touched on in some of my answers here, Mastermind has a super abundance of this association data, not just for ACE2 and TMPRSS2, but for any other genes associated with the host viral response and so things are preliminary at this point but we’re working in this vein with a number of the sequencing efforts that are involved in trying to divine what may be responsible for this differential response. Very long winded answer but yeah, we’ve got a lot of stuff we’re doing.

Steve: I would add additionally, especially with the latest information on therapies that were in therapeutic associations that were pulling from the literature, it’s now much easier to stay up-to-date with the absolute latest evidence for any given therapy and so that’s I think another thing that will sort of come to the top in addition to looking at the the genomic predispositions of patients or carriers and how they react to the virus, but starting to look at therapies that could be effective as well.

Mark: A ‘known’ or ‘not yet known’ as Steve highlighted with his script – pulling out this stuff that’s known and asking questions about those things but also discovering things that may not be on the radar of pharma researchers.