Genomenon Founder & Chief Science Officer, Dr. Mark Kiel, was a guest speaker at the 2019 Cutting Edge Drug Discovery & Development Symposium in Ann Arbor, Michigan on ‘Curating the Genome to Drive Drug Discovery’. Below are his slides and a transcript of his talk.

The goal of this presentation is to show how Genomenon is curating the human genome and explain in more detail about how our clients are using that information to drive drug discovery and inform clinical trial enrollment.

“Today’s talk will begin with explaining what the Mastermind Genomic Search Engine is, the data it comprises, and how it’s used to inform decision support for clinical labs. The lion’s share of the rest of the presentation will be describing what comprehensive genomic landscapes are, how they’re used to inform clinical trial design and enrollment strategies, as well as how they’re used to provide the empirical evidence necessary to expedite companion diagnostics regulatory submissions.”


“The problem that Genomenon seeks to solve is known as the Bioinformatic Bottleneck, and is particularly a problem as it pertains to interpreting genomic information. We can now sequence genome data very efficiently and very cost-effectively, and it has essentially become commoditized. What remains a challenge is interpreting the vast amount of information, both from the clinical perspective and from the drug discovery/research perspective. This challenge has been referred to as the ‘Million-Dollar Interpretation’.”

“This is a problem because extracting, organizing, and interpreting that genomic information from actual sequence data for patient or patient cohort, or otherwise available empirical information in databases and the medical literature requires human intervention since we’re talking about computational methods to solve big data problems.”

“An example of failure that I’d be remiss in not mentioning that has cast a pall on bringing AI to bear in solving genomic problems is IBM Watson and their challenges in penetrating and meeting success with genomic markers. The project they undertook with MD Anderson didn’t bear fruit. In contradistinction, Genomenon is solving that problem not from a platform perspective with a black box approach to AI, but rather by taking a more granular level detailed approach to leveraging subject matter expertise, and focusing at the level of evidence and having that information aggregated as opposed to having it ‘rain down’ from above.”

“I’ll speak more specifically to Mastermind’s genomic database: It is an index of the medical literature/empirical information from decades of scientific research totaling 30 million titles and abstracts, over 7 million full text articles fully indexed with genetic and disease related content, the full spectrum of human diseases with more than 10,000 individual disease syndromes and their phenotypes. Mastermind includes the whole spectrum of the human genome; many tens of thousands of genes, all of their variants and genetic mutations to the tune of more than 5 million that have are extracted from figures, tables, full text, and supplemental information from the empirical literature.”

“This is Mastermind’s search interface on the Mastermind home page. The way it’s used in clinical circles is through our cloud-based genomic search engine software, which can interrogate the empirical evidence at the disease, gene, variant, or and/or phenotype levels. Mastermind users search this information by what we refer to as ‘tri-level’, where they’re looking for one variant at a time from the evidence to ask and answer whether that variant in their patient has been seen before. If so, where? How many times and in what context? And what can I do for diagnostic and treatment decisions?”

“When you perform a search in Mastermind, the content displayed on the left exemplifies the variant landscape for any given gene, and on the right side is exemplifying the evidence landscape that results from that search. This example is fibrillin, which is mutated in a myriad of different ways because it’s a loss-of-function mechanism. What you can see here is all of the reference text from the medical literature, where your variants, gene, drug, disease or any combination of those input parameters and that information can be used to inform your clinical decision-making.”

“Now I’d like to move from the clinical ‘tree view’ and take it up a level to the ‘forest level’, particularly as it pertains to drug discovery. Mastermind assembles Comprehensive Genomic Landscapes, or datasets, and leverages that information and our computational curation process to auto-assemble and then manually curate that information.

Mastermind’s Genomic Landscapes inform precision medicine research and development, identify genomic biomarkers for clinical trial design, and provide empirical evidence for CDx regulatory submissions.”

“So what’s needed to assemble that information to make it useful to inform Precision Medicine studies? We subdivide the challenge into three different camps. This triad of information is what Mastermind is able to assemble to produce Comprehensive Genomic Landscapes, and together they provide a comprehensive understanding of genomic biomarkers:

Diagnostic: Clinical-grade industry-standard interpretation frameworks that Genomenon leverages to auto-organize and then manually review all of the information from the medical literature to say definitively that the information is useful for diagnostic purposes.

Function: Information from empirical studies in silico, in vivo, or in vitro allows us to understand what the functional consequences of some of these genetic variants are.

Treatment: Information both at the clinical level and when we’re talking about pharma, or drugs that are undoing clinical trial investigation as well investigational compounds – the precursors to these drugs that are being used in the clinic.”

“To clearly convey what a genomic landscape looks like, I’ll show two examples. One example is for constitutional disease: We were tasked with understanding the full landscape of genetic variation in the RET gene, which is a complicated disease mechanism. RET has a gain-of-function (GOF) and loss-of-function (LOF) mechanism in two different clinical scenarios.

Our challenge was to organize all of this evidence in the empirical literature into a readily accessible, digestible, evidence-based, comprehensive picture of all of the variants. This example is one of the deliverables where each row represents one of the variants in that gene.”

“There’s a preliminary call based on the assembled curated evidence that indicates the clinical saliency of each of those data points. The evidence is categorized and can be downloaded and intercalated into downstream pipelines as well as searched in-app so that you can answer questions on the fly. This slide shows the groundswell of evidence that we have expertly curated and assembled to back up our claims about each of those points of evidence for each of those variants.”

“The previous example was focused in the constitutional space – hereditary cancer or monogenic disease. Mastermind also has the capability of adjudicating disease-gene associations at the variant level for oncology.

This is an example of ATM, which operates in a loss-of-function mechanism. The particular challenge that we solved was in this very large gene, for which there’s many hundreds to thousands of variants, is to identify those variants that have cogent functional evidence indicating loss-of-function.”

“The end result was that more than five thousand variants were assembled out of many hundreds of thousands of articles that describe the ATM gene. Then we characterized them by using our combination of computational techniques and manual curation.”

“We’ve coined the term ‘Genomic Language Processing’ (GLP) to describe our ability to extract meaningful information from the corpus of information – the medical literature. We refer to it as Genomic Language Processing because we like to say that there’s nothing ‘natural’ about scientific language, so natural language processing or NLP techniques from ‘off-the-shelf’ or those that are retrofitted don’t suffice and often meet failure.

Our GLP approach encapsulates our bottom-up granular evidence-based expertise approach. Then we take all of that information, those named entities that we recognize in the complexity of the variant nomenclature, and our ability to disambiguate that information, and then we pass it downstream to our semantic machine learning capability to understand that information and how it informs disease, both from a clinical perspective and from a functional perspective.”

“Genomenon’s approach when we’re tasked with a new challenge is Genomic Landscapes, and what makes us unique is our 2-phase approach. The first step is to auto-assemble the aggregated and thoroughly annotated information using computational techniques. This is followed up with expert manual curation. This, coupled with computational intelligence, is how our unique offering has succeeded where others have failed.”

“What are the benefits of Comprehensive Genomic Landscapes? I like to say that we ‘bookend’ the drug discovery and development process. Early on in R&D, we help our clients understand the genetic underpinnings of their disease to inform hypothesis testing or target identification, as well as leverage any of their existing genomic information to layer on top with an orthogonal dataset from the empirical literature to help debulk their data and refine their candidate discovery approach.

The other end of the spectrum where it is useful is in the context of clinical trials. This allows a better understanding of who’s likely to benefit from your compound once you have a cogent understanding of the molecular course of disease, and similarly support FDA submissions and other regulatory documentation. This genetic information that’s thoroughly annotated and well-organized can expedite that FDA approval process.”

“A Mastermind case study that encapsulates all three of the value propositions that I talked about multiple different stages in the drug discovery paradigm is Rhythm Pharmaceuticals. They had a compound in phase 3 clinical trials for monogenic obesity, a disease where children who are born with genetic defects have abnormal growth and weight gain. Prior to working with Genomenon, they understood the molecular landscape of their disease to the tune of having three genes in their purview and about a dozen variants that they knew they could diagnose patients with to inform of treatment trials. When they approached us they said it took about two years to produce this data, and they knew this as a disease that’s hiding in plain sight. They knew that they needed to scale up their operations and improve their ability to enroll patients and encourage doctors to test patients.

In addition to understanding a better background knowledge of the genes and variants that contribute to the development of monogenic forms of obesity, they wanted a larger dataset of genetic variants that would cause these forms of monogenic obesity to use those variants in a diagnostic way to inform their enrollment of patients in their clinical trials. Similarly, at the end of they were looking to have this data evidence assembled and ready-made for their FDA regulatory submission.

Faced with a ‘big data’ challenge, we refined the offering and put a much finer point on which genetic variants in those 120 genes could actually stand the test of clinical certainty and be used to enroll patients – new kids who would benefit from this drug based on the evidence from the literature and databases that Genomenon had assembled.”

“The Vice President of Translational Research and Development at Rhythm, Alastair Garfield, said that our offering was comprehensive unlike any other, and that it saved them many, many man years of manual effort in a way they couldn’t have reproduced even if they had endeavored to. I think most importantly that it fundamentally changed their patient enrollment strategy and allowed them to provide benefit to more and more patients.”

“As I end today’s presentation, I want to repeat that Genomenon is curating the entire genome. When I gave the example about the 120 genes that inform the development of monogenic obesity, we have also curated many hundreds of genes on a bespoke basis for a number of different clients. We have ambitious goals of curating every gene in the entire genome, beginning by fractionating the genes into different clinical subcategories: solid tumor genes (or solid tumor disease cancers), hereditary cancers, hereditary disease, and the full clinical exome, which comprises more than 2,000 different genes. Genomenon has within our capability with our automated and manual approach the ability to curate every one of those genes exhaustively in the course of a year – that’s our ambitious goal.”

Want to know more? Contact us to discuss your specific needs.

Leave a Reply

Your email address will not be published. Required fields are marked *