Webinar: How to Leverage the Genomic Landscape to Accelerate Precision Medicine

leverage the genomic landscape

In our latest webinar, our Founder & CSO Dr. Mark Kiel discussed the vital role that genomics plays in the pharmaceutical development process. 

Key takeaways:

  • Why a deep understanding of genomic evidence is critical at each stage of the drug development process
  • How genomic evidence can be used in each stage using real-world examples
  • How Mastermind Curated Genomic Datasets are being used to increase the success rate of drug candidates, broaden patient populations, and identify new disease indications

About the Webinar

Realizing the promise of Precision Medicine requires both a deep understanding of the landscape of genomic evidence and an understanding of the molecular drivers that influence all aspects of the drug development pipeline.

Advances in genomic sequencing technology provide incredible opportunities for the use of genetics to drive drug discovery, but the lack of organized empirical data and the challenge of manually interpreting and annotating genomic evidence has limited its enormous potential.

Using real-world examples, Dr. Kiel will demonstrate how Genomenon has overcome these obstacles and is being used by pharmaceutical companies to provide invaluable insight into the genomic drivers of disease.

Drawing from its exhaustive database of empirical evidence and a unique combination of AI and expert curation, Genomenon’s Mastermind® produces Comprehensive Genomic Landscapes to validate candidate targets, inform downstream research and discovery, and segregate patients for enrollment in clinical trials.

Dr. Kiel will also discuss how Genomenon is pursuing its vision of curating the entire human genome to advance the drug discovery process and maximize the efficiency of clinical diagnosis.


Do you have any plans to curate the entire genome?

We definitely do have plans to proceed with curation of the genome by stages so at present were taking requests from Pharma who have high yield needs. This is our approach to reaching full curation of the entire genome we’re going to take it indication by indication where we’re going to first begin with high yield gene targets that cause solid tumors, as well as hematopoietic tumors, then focus attention on hereditary cancer and and work our way up toward hereditary disease, all the way up to and including the entire clinical exome. And as I as I alluded to, in the meantime when we have pharma engagement we’re able to pick and choose according to their needs which genes in that roadmap they’d like to see prioritized initially, and so this is just a reflection of our product development and company maturation strategy where we do expect to be able to curate the entire genome and in some years time.

On average, how fast can you deliver one of these genomic landscape datasets?

That’s a great question. There is no such thing as average. Unfortunately it really just depends on not even the number of genes, but the nature of the genes, and so some of the genes that I’ve actually, many of the genes that I highlighted here, were very thoroughly studied for many years and that introduced its own challenge of aggregating and organizing the data. Other challenges include genes that are less well studied, where the information is just hard to find, and the genomic landscape process that we undertake facilitates improvements in yield and efficiency for both of those aspects. And so I would say a typical genomic landscape can be produced within four to eight weeks and by that I would say there’s maybe two to four fairly well studied genes in that pathway with many hundreds up to thousands of variants in each. There are other works that we can expedite if there are critical sort of time-sensitive needs and we’ve curated entire gene landscapes in a matter of days actually and those are also for very well characterized genes with a thousand to two thousand variants apiece, all the way up to and including a longer project arc where we’re talking about many dozens of genes in the landscape that all coalesce around a given disease indication that obviously takes much longer but is still measured in months and definitely not over the course of a year or more.

Can these landscape datasets be integrated within existing datasets in my company and how easy is this?

That’s a great question. I’ll answer that question and I’ll sort of flip that question on its head so all of the information that I showcased in the form of the deliverable, it’s called Mastermind Reporter, is the user interface that allows you to view it. So all of this evidence is downloadable as a CSV, that can then be intercalated and used in your downstream activities. If you have more high throughput needs or the data that comprises your genomic landscape is larger we can talk about other ways that we can take that information and place it in your workflow, but that information is definitely available to you for any of your clinical or R&D purposes once you’ve subscribed to the genomic landscapes. The flipside of that question would be taking your own data and intercalating it into the genomic landscape itself, and that’s so as I say that’s a sort of opposite side of that question, that too has been done with great success where then this Mastermind reporter view of the genomic landscape becomes the central repository of all of the collected evidence that either you have collected from your own empirical studies or your own previous investigations, as well as the very full-bodied evidence from the Mastermind genomic landscape proper. So those data from your own internal work can also be added into the genomic landscape.

Are data pulls for specific variants able to also pull information on alternate variant nomenclature/legacy variant nomenclature that may not be known at the time of ask?

Depending on what genes you’re looking at that can be a real thorny challenge and I’ll say that in our GLP we have a great awareness of legacy changes, legacy nomenclature issues, that can really challenge the manual curation efforts that may or may not have been undertaken before we are engaged to produce a genomic landscape. Things like the nomenclature including or not including say a signal peptide or there being a cryptic exon that was either included or not included, or otherwise whether the first methionine is counted or the variants are enumerated based on a gap or an amino acid residue beginning of the nomenclature. So those various changes in nomenclature as well as more complicated colloquial nomenclature changes, those are all addressed in the process both from a computational perspective as well as through that expert curation. When we when we perform these genomic landscape curations, our curation team develops a great understanding of the legacy issues attendant to each of these different genes and our curation.

Can you speak a little bit to your approach to cohort analysis?

Sure! I don’t have any slides to showcase for that, and I’ll talk about it first by describing what the needs typically are and then talking about what the hope for outcome is, and then talking about how we have a unique strategy to meet those needs. What we’re talking about with cohort analysis is increasingly, there’s the recognition that genomic information is valuable, but there’s not as much awareness of how to leverage that information in many of the ways that I describe but particularly when you have a cohort of patients who are united by some disease or phenotype or drug response, sequencing information the hope is will unlock the cause of the either unification of that cohort around a specific disease or otherwise the segregation of that cohort according to drug response. The goal of cohort analysis is to pull out a variant or a bouquet of variants that unite or segregate that patients within that population. And Mastermind’s approach is to understand that a variant, a single unique variant, is very infrequently the sole cause of the unification of segregation or parsing of that cohort. It’s actually usually on a spectrum and that spectrum can be that there’s variants in a gene, not a single variant but a number of variants in a gene, that cluster around a domain say and those are the variants that you’re trying to pull out to unlock the molecular cause for the cohort parsing at the biological level. Other times it’s a gene family. Still other times it may be a pathway of proteins that are interacting together to unite these patients who are behaving similarly from within the cohort. That’s the goal of cohort analysis and our approach is to be sensitive to that heterogeneous presentation of the patients within the cohort and their molecular causation and to leverage all of the evidence that I described in the form of the genomic landscape in informing, answering those questions about the cohort, and how it breaks down so all of the information in Mastermind that’s assembled and curated according to these genomic landscapes can be used to decorate the variants from within the cohort all wrapped around an understanding of the prevalence within the cohort of those variants and how those variants interact with each other and hat evidence in an aggregate can be used to unlock the biological meaningfulness of the segregation in within the cohort. So again, the starting is you have sequencing data on a cohort and you want to understand the molecular cause of that collection of patients and the value is that you unlock that molecular cause whether it’s a single variant or a myriad of different variants that all are related to each other, Genomenon can unlock that answer by leveraging its unique approach to cohort analysis where we layer on top of that the empirical evidence from these genomic landscapes. So that was a very long-winded way to say we can do that.

Which groups do you typically interact with? Are they translational, genomics, data science, bioinformatics or are they users or consumers of the datasets?

That’s a great question and it depends on the structure of the organization of the client that we have an engagement with. We often will interact with the bioinformaticians when it comes to data sharing and discussions of technical aspects of the data. Sometimes those bioinformatic groups are the ones who actually derive the value, particularly when we’re talking about using the genomic landscape in the earlier phases of the drug pipeline, but we also do and quite frequently engage with the clinical teams and we have ways of effectively interacting with both and ensuring that the value that they each are seeking, or that they’re seeking together, when they’re working in concert with each other, is actually realized through the delivery of the genomic landscape whether it’s the user interface, which is the more typical way that our clinical users will interact with the data, or more typical of our bioinformaticians they like to just see the the download. They don’t want to mess around with a UI because they’ve got more high throughput needs and so it just depends we work with both groups and sometimes both of the groups together in the form of one client.

Is PubMed your only source of information?

No. In producing Genomic Landscapes, Mastermind aggregates data from multiple disparate sources including the titles and abstracts in PubMed. In addition to PubMed, genetic insight from 7.5M full-text articles and 700K supplemental datasets as well as population frequency data and predictive models of pathogenicity. Other databases are also included such as external databases like ClinVar as well as internal databases provided by clients for their exclusive use.


Hello and welcome to a webinar. I am Kate Oesterle, Marketing Specialist for Genomenon and I’ll be your host today today. Mark Kiel will be presenting ‘How to Leverage the Genomic Landscape to Accelerate Precision Medicine’, and he has a lot of great information to share today so I’ll get right to housekeeping and our introductions. As you’re watching please feel free to submit questions through the chat window on the right and Mark will answer the questions after the presentation.

Without further ado, I’ll introduce our speaker: Mark Kiel completed his MD PhD and molecular genetic pathology fellowship at the University of Michigan, where his research focused on stem cell biology genomic profiling of hematopoietic malignancies and clinical bioinformatics. He is the founder and CSOof Genomenon where he supervises the scientific direction of the mastermind suite of software tools mark take it away.

Thanks Kate and welcome attendees! As Kate mentioned, I’m going to be discussing today how pharma can leverage genomics for drug discovery, and i like to keep things pretty simple and easy to follow so the outline of the talk will proceed in three phases. First, we’ll talk about why using genomics provides added benefit to the drug discovery process by focusing on those core benefits and applications of genomics. Then we’ll talk about how one goes about leveraging genomics to improve the drug discovery process overall and obviously we will be focusing on leveraging Mastermind for that purpose. I’ll spend almost all of the time talking about what we call genomic landscapes but we don’t have time in this webinar to talk about cohort analysis so if there’s anybody interested in cohort analysis we’d be available during the Q&A; or afterwards to discuss that topic. Then we’ll end and I think most importantly with some relevant use case examples from work that we’ve done internally or that we’ve done with some of our pharma partners that will really reify what the first two bullet points were talking about.

Let’s start out by talking about why genomics can provide benefit to the drug discovery process. This is the main takeaway from the talk and it’s supported by industry insight and thought leaders in this space, and that’s to say that genetics really truly materially improves success rates in drug development. This is from a slightly older review in Nature Genetics but the assertion holds true increasingly to this day and it’s the main take-home message from the talk is that genetically supported targets can double the success rate of drug development. So that’s a really critical figure that I want to have you keep in mind as we go through the talk.

Here are some of the targets that can be attacked using genomics. Those are diseases that are either complex or heterogeneous or otherwise ill-defined. Where genomics can be helpful in finding both activating or gain-of-function mutations or otherwise loss of function mutations, both of which depending on biological circumstance can be targetable by precision therapies, but it also helps across multiple related disease types by converging the pathogenic mechanisms across those different disease types into unified treatment strategies, and for obvious reasons cancer and oncology is one of the most impactful places where you can understand a mechanism in one oncology scenario and be able to leverage that understanding in another different oncology scenario, And then lastly, conditions with genetic heterogeneity. Genomics can be used to resolve that heterogeneity to better understand what different subtypes of disease at a molecular level may be driving patient presentation or response to therapy, or the converse of that would be genomics can uncover latent homogeneity in diseases that are otherwise not understood to have of a unified mechanism of action.

This will be the substance of the benefits that we’re going to be talking about. I’ll be walking through most of these with the use case examples but I wanted to lay them out here pretty cleanly to say that genomics can empower pharma at all phases of the drug discovery pipeline from optimizing preclinical therapeutic targets, to reducing research and development costs and then maximizing the success of clinical trials either at their inception or on a rolling basis at their conclusion, as well as expediting regulatory approval processes and then at the post market or commercialization phases can help decrease time to market by promoting some of those activities.

This is the challenge and opportunity slide; I’m sure the audience is well aware of the thousand-dollar genome that’s by now almost old news. The the sequencing costs have been driven down to the point where getting the data has been a commodity and is increasingly less and less expensive but there’s been a relative stasis in, or a plateau in, the decreasing cost to understand that data. And that’s been referred to as the bioinformatics bottleneck. So I’m drawing a distinction here between bioinformatics used to understand the biologic meaning of a nucleotide variant at the protein level, and actually understanding at the biological and clinical level what that means – what that means for disease development, what that means for different patient presentations, and increasingly what that means when physicians are trying to target those diseases by specific precision therapies predicated on the molecular lesions underpinning that disease development. And so in contrast to the thousand-dollar genome, which is relatively straightforward in producing the data, we have this million-dollar interpretation, where by that I mean the challenge is really understanding what the data means and what we can do about it.

To put a finer point on what that means, we can break that understanding down into three main buckets:

The first one is diagnosing patients and for that we require clinical grade industry-standard variant interpretation guidelines like the ACMG (American College of Medical Geneticists or Genomicists) or the AMP (Association of Molecular Pathologists) or related guidelines for rare disease and cancer, respectively. So that’s diagnosing patients with all of this genetic or genomic evidence and information. Another is to define the function of those variants – which is helpful particularly in the early phases of drug development in better allowing pharma researchers to characterize functional or in thereby actionable genetic variants to understand how changes at the nucleotide level can confer defects at the protein level that can then be targeted with the third bullet point which is to say treatment. What’s needed in this slide for precision medicine studies is the evidence, the documentary evidence, that supports a physician’s decision to provide this therapy versus that therapy either in the context of clinical trials or in the context of routine clinical practice.

Why is that a problem? This is a summary slide that exemplifies the fact that extracting and interpreting genetic information still very much requires a lot of human intervention. These people who are assessing this information, interpreting this information are highly skilled and they’re few in number because of the newness of this bioinformatic and particularly bio interpretation discipline. So the real challenge to scale is that unlocking the latent potential of genomic information requires a great deal of manual effort by expert human curators.

The solution that genomic on has brought to bear in the market is the Mastermind genomic database, so there’s a couple of ways that Genomenon realizes the value in Mastermind. Many of you are familiar with the search engine, the user interface that’s used in the clinic, to search for one variant at a time, but we have more high-throughput ways of distributing our data and and realizing value from genomic information either through api’s or what the focus of this webinar will be on in the form of genomic landscapes. So what comprises the Mastermind genomic database? On the left side are the three critical components of information required to understand the meaningfulness both biological or clinical of a variant. That is to say the population frequency information, how rare or common a variant may be, as well as in silico predictive models of damage to protein, structure or function. Those two points of evidence – population and prediction – are infrequently sufficient on their own to pass final judgment promoting a variant from being of unknown significance or uncertain significance to pathogenic or benign. The third, and what I like to say is the apex of that triad of evidence, comes from empirical literature. Empirical studies in the form of publications that have actually done the testing variant by variant to understand where those variants are seen in patient populations or pedigrees as well as what the in silico, in vitro, or in vivo studies were that define the function of those variants at the protein level. And that’s really a core value of mastermind: aggregating those first two together with the harder to secure information from the publications, and what mastermind does with all that information is index, organize and annotate that information around all genes and all variants in the entire human genome and draw associations between those genetic elements and every disease or phenotype in the whole spectrum of human health disorders, as well as bringing to bear some of those drug associations that are relevant in this circumstance or that circumstance across the whole pharmacopoeia of all of medicine. So both for cancer and constitutional diseases genes, variants, those phenotypes and diseases as well as the drugs that treat them are all aggregated, annotated and organized within the Mastermind genomic database. Just a couple of numbers below to indicate that our database is the most comprehensive available. We have at present 7.5 million fully indexed full-text genomic articles that extract all of this meaningful information through our indexing process and the fruits that that bears, at least that at a high level, is in the form of five and 5.7 million genetic variants unique genetic variants that influence any one of the genes in the human genome those are all in the Mastermind database as we speak.

How do we unlock that potential that’s latent within the mastermind genomic database that clinical users to the user interface look at one variant at a time? How do we in bulk answer questions that are driving some of the Pharma R&D activities? As I mentioned before, I’ll focus most of my attention on the concept of genomic landscapes and leave the cohort analysis aspects of Genomenon’s activity for a later date.

But just like I had mentioned, there are three ways to conceive of Genomenon’s value proposition: the user interface for looking at one variant at a time, the genomic landscapes for looking at the entire set of variants for any given gene/gene family/gene pathway or any in all genes that influence any indication, any disease, any one of those comprises a single genomic landscape. And then at the end, the cohort analysis would be understanding at a population level what your population sequencing information means in the way of recognizing novel mechanisms of disease.
This is the user interface. It’s a very clean search engine and allows you to search by gene & variant to see where that variant has been published if at all, and if it has been published in association with what disease or diseases, as well as with what drug affinities and then to show you that evidence, what the papers are including a context for where those variants are mentioned so you can make the decision very quickly – is this a relevant paper to my clinical question or not?

Just to emphasize that this is a very comprehensive search engine. Here’s one gene that operates by loss of function to cause disease. This is a reflection of all of the myriad different variants in that gene on the left that Mastermind has indexed from many millions of full-text and articles and supplemental datasets. And then on the right we show you that information at a granular level to convince you why that information may be relevant to your clinical concerns.

When we’re operating in bulk mode, when we’re trying to pull down and annotate and understand that data in bulk, at the upper left here talks about that content comprising 30 million titles and abstracts from PubMed indexed on a nightly basis – the seven and a half million full-text genomic articles, figures, tables, discussion results, section etc, as well as accompanying supplemental datasets from seven hundred thousand of those highly prioritized articles – that’s our data content and what we do with that information involves our proprietary genomic language processing, or GLP, which looks for every disease, phenotype, gene, variant and drug, and then we take that information and pass it down stream into our machine learning or computational intelligence algorithms that organize, annotate and extract the meaningful information for clinical purposes as well as for functional pharma R&D purposes.

Then there’s a little bit of a preview into what a genomic landscape looks like but I’ll emphasize here that we’re not just computational intelligence but we have what I think is a unique two-phased approach to understanding the genomic information at our disposal. The first involves the machine learning and computational intelligence that I discussed in automatically organizing that information and then it’s followed up in very high throughput with our internal workflows by manual expert curation according to those industry standard guidelines, and so that is unique in the industry to have such a robust computational intelligence as well as expert curation invoked in assembling these landscapes.

This is what we’re talking about when we say a genomic landscape. This is an example for the RET gene, which many of you know has both somatic disease pathogenesis as well as heritable mechanisms of pathogenesis, so both loss of function and gain of function, and the challenge here was in uncovering all of the pathogenic activating germline variants that drive the cancers that RET is associated with. And so this is the way that we show our clients what that information looks like. Every row is a variant in the RET gene with provisional ACMG calls as well as the full reference citation set for that variant so you can understand how well supported in the empirical literature that variant is and then critically, the categorized and enumerated evidence that underpins those provisional calls.

When you click on any one of those variants you can see the extent to which we’ve exhaustively annotated that information from those many hundreds or sometimes thousands of articles that talk about that gene all categorized according to the ACMG industry-standard criteria.

This is not just true of constitutional disease but also of somatic disease, particularly in oncology where the focus is less so on say segregation patterns or population distributions but more on the functional aspects of the variants in the gene. This is an example for one such variant where the information was then organized in different tiers according to the AMP criteria for loss of function, complete, partial, minimal or otherwise gain-of-function in those rare circumstances that this gene has gain-of-function mechanisms, and then again the evidence is then organized but then thoroughly expounded on with reference citations for each assertion.

To top line that, that’s what you see, what you get in the in the form of a deliverable but what that’s useful for is to drive research and discovery, proper drug discovery by identifying the genomic candidates that may be targetable by linking that information with the actionable genomic real-world evidence at your disposal, or otherwise by drug repurposing – by understanding when you have a pathway that’s targetable by a compound, what other diseases may also be targetable because they’re that same pathways abrogated in the new disease as well as on the other side of the drug discovery pipeline in informing the conduct of and the efficiency and likely success of clinical trials.

So now that we’ve talked about how to proceed let’s conclude with a number of representative case studies and get into the ‘what’.

What exactly are we talking about? How can I illuminate for the audience the impact of these genomic landscapes and what their data content provides in the drug discovery pipeline?

This is a schematic of that drug discovery pipeline again from preclinical work and discovery all the way to the clinical phase as well as the application in the clinic and any post market activities that are undertaken by pharma. Below the chevrons here which itemize each of those phases, we’re talking about the benefit – the value proposition of leveraging these genomic landscapes, and I’m going to focus our attention on these four in the middle. New target assessment divining the mechanism of action for a particular gene as it causes development of a specific disease, the candidacy of variants and therefore patients in the conduct of clinical trials, and then regulatory submissions support in the form of packaged evidence to justify your companion diagnostic decisions.

There’s four of these, let’s talk about each one of these in some detail beginning with new target assessment and the example gene here is LRRK2. The benefit of seeking out new targets and providing an assessment of those targets is that it helps farm understand the core molecular basis of disease in a very organized but also detailed fashion all patterned by – informed by the evidence from the empirical literature. It allows pharma to identify new pathways and otherwise very complex diseases. It also provides a molecular starting point for a targeted therapy where one doesn’t yet exist in certain circumstances, and then it affords discovery of new biomarkers and disease populations whether they be disease-causing or response modifying, or useful for a response monitoring.

This particular work involved uncovering actionable variants in the LRRK2 gene in the context of neurodegenerative disease. And just to orient you to the plot here on the top figure, what we have is the case that existed before Mastermind was invoked Genomenon was consulted for this work and that comprised 22 variants that were known or presumed to be informing development of Parkinson’s disease and the group that we were working with was well aware that that was only scratching the surface, and our task was to uncover the full complement of all genetic variants in the LRRK2 gene that cause Parkinson’s. We went from 22 at the top in gray to in the bottom, 138 pathogenic variants again according to industry standards and they’re arrayed there along the linear axis of the protein so you can start to see what functionality of the LRRK2 gene those variants may be influencing. I don’t want to just stop there and talk about that six fold increase in known pathogenic driver variants that are useful for a better understanding the disease mechanism of this new target, but I also want to underscore that there’s fully 391 variants of uncertain significance where they weren’t sufficient in from the evidence to be deemed benign, but they have this sort of latent potential for turning into likely pathogenic or even pathogenic variants should a new study demonstrate their effect, their negative effect on a protein and so this genomic evidence is a living dataset and it keeps getting updated on a weekly basis at an ever-increasing pace and at present we’ve identified six-fold more variants which are useful for say understanding disease or recruiting patients but that that is not the end – there is still a great deal more potential that may be unlocked in the future as as Mastermind keeps abreast of the newly published literature. The takeaway from this slide is that Genomenon can definitely and substantially increase the number of candidate variants.

But more fully, especially when we’re talking about candidate target assessment, let’s understand what these variants are doing. This is another plot where we’re now breaking down the evidence, the granular evidence that was reviewed and curated at the paper level to understand what aspects of this gene’s function are being impaired by each of those variants. So again, the pre-existing state when we were contacted with this work was that there was no or very little insight into what the functional consequence of each of these variants was and in our two-phased approach of organizing the data with machine learning and related algorithms followed up by expert curation, we uncovered 96 variants that had demonstrated functional evidence from the literature and all of those variants in the final deliverable were annotated succinctly with that insight. What you’re seeing here is a clear clustering of functional variants within activation domains of the LRRK2 gene that provide novel insight into protein effects at the hands of these different variants and thereby the proteins target ability in the context of this disease. The takeaway from this slide is that Genomenon can reveal novel functional insight into a new target candidate protein.

The next case study will be in divining a mechanism of action for a gene or for its component variants.

The benefit of insight into mechanism of action is that it allows you to focus on high yield candidates when you better understand what the mechanism of disease causation is. It thereby allows you after focusing on high yield candidates to decrease the failure rate by preventing you from going off on a tangent and chasing leads that were never destined to be successful, harkening back to that take-home message that I talked about where solid genetic evidence of disease causation at the hands of any one of these variants dramatically increases the likely success of that targeted therapy. It also saves on opportunity costs where you can divert some of that otherwise wasted energy into other aspects of that drug pipeline or otherwise other pipelines for different indications. The real sort of meta takeaway here is that it allows you to better out-compete your competitors by being more efficient in your product development.

This was a work that we did in looking at the ATM gene which has known loss of function mechanisms as it’s a well-characterized tumour suppressor. The real challenge here was in understanding what missense variants were doing. Nonsense variants that lead to protein truncation or frameshift variants that do the same, that’s a pretty straightforward way to understand how loss of function mechanisms operate at the hands of those variants. Missense variants are obviously much more challenging to define their consequence on the protein a priori, and so what we were tasked with is uncovering the evidence for all of the missense variants that have been discussed in the context of ATM and organize all of that empirical evidence for each of those variants and so the starting point was that there was little functional insight into the missense variants that actually cause disease, and what we determined in this gene was that there were 280 some variants where there was documented functional evidence of protein consequence and we can thereby infer a disease causation. The takeaway here is that Genomenon can uncover evidence to help determine mechanism of action even when that process of discovery requires very granular attention to detail and very manual curation activities.

The third example that I want to talk about is in promoting the clinical-trial candidacy of given genes or their attendant variants and the example protein that I’m going to focus on is this LEPR or leptin receptor protein.

Why would Pharma care about improving the candidacy of variants for their clinical trials? Increasingly, genomic markers are being used as inclusion or exclusion criteria. It’s also increasingly common to use molecular companion diagnostics for enrolling patients into your clinical trials and this ensures a more homogeneous patient cohort and once you’ve got a handle on the mechanism of action for the genes that are mutated to cause the disease, the indication for collecting your patient cohort, having a more homogeneous cohort allows you to increase your drug response rate because you’ll have less patients who don’t have a mechanism of action that’s amenable to treatment by your compound. In this very particular example of theLEPR gene the challenge was to uncover the full comprehensive variant landscape in this gene as it leads to monogenic obesity, and so again what you can see here in this plot are all of the variants that were uncovered in the process of producing this genomic landscape for this gene with the pathogenic and likely pathogenic variants shown in red as they’re distributed across the protein and those variants that may someday become promoted to likely pathogenic but are currently of uncertain significance. This is just the LEPR gene but the assignment that we undertook in this case was actually much larger where the the group that we were working with started out with three genes of focus and twelve variants that they had to assess to be appropriate for inclusion criteria into their clinical trials, and working with Genomenon in stages – what our efforts allowed them to do was increase their enrollment criteria from three genes and 12 variants to 48 genes and more than 600 variants that have again documentary evidence according to clinical standards to support the idea that if this variant is seen in this patient, there’s justification for enrollment into the clinical trial and treatment with the pharma companies compound. This resulted in many dozens fold increase in the clinical trial enrollment potential on a per variant basis and can obviously very dramatically increase patient enrollment and has significantly changed the way this pharmaceutical company is conducting their clinical trials.

The last case that I’ll spend a significant amount of time on is in the context of buttressing or supporting regulatory submissions to the FDA and other regulatory bodies, and the specific example that I’ll highlight here is for the RET gene that I discussed a bit earlier when introducing what the deliverable looked like. Why is it important to have support for your regulatory submissions and applications? Having genomic landscape data can establish objectivity in the form of that empirical genetic evidence for any candidate that you’re seeking to include in your submission. It can also provide the full complement of that data, so not just to say that you have evidence but that evidence is already documented and presented to you in a very tight and ready to use package. A sort of understood but important-to-point-out benefit is that if you proactively strengthen an initial regulatory submission with best-in-class evidence supporting the candidacy of the variants in your companion diagnostic, it can cut precious months or even years out of the regulatory approval process by having a strong initial submission.

So again, to orient you to the plot, the top in gray is what was known when we were engaged and in the bottom is what resulted after assembling the genomic landscape for the RET gene. Those variants are arrayed along the linear axis of the protein and these are just now the pathogenic or likely pathogenic variants which are all activating because that is the target of this compound – I realize there’s a typo here, ignore the Parkinson’s variants – this is in the context of hereditary cancer. The beginning of the project when we were contacted had 38 activating variants for which the evidence was not thoroughly organized or detailed, and after our engagement we returned with a full list of 130 activating variants that were all thoroughly documented and had evidence to justify their candidacy and inclusion on this companion diagnostic. That’s a three and a half fold increase in the number of variants that can promote increased enrollment of patients into clinical trials and when we’re talking about the full maturation of this drug pipeline in the form of use in routine clinical practice, that expands the market size for this compound.

I’ll just remind you what we talked about at the beginning, what information comprises these genomic landscapes in the form of the actual deliverable that is returned. We have summary information much like I showed you and obviously much more detail and many more specifics, but summary information that allows you to walk away with the key take take-home messages from the investigation that we perform to produce these genomic landscapes, but you’ll have all of the evidence necessary to roll up your sleeves and dig into that information as well. And I say here that the evidence is tiered and so the various aspects of the benefit that you receive from working with us and seeking to produce a genomic landscape is that there is the full list of variants that as I showcased is typically three to many dozens fold more in previously known at the onset of the project, so that entire list of all variants in a gene is provided to you in the genomic landscape. The provisional calls for pathogenicity in the context of ACMG criteria or the tiered guidelines if in the context of AMP and oncology as well as any functional insights that come out of the curation work that were otherwise specifically requested and a custom curation or otherwise that we internally uncovered as we were going through each of these curations exhaustively. You have the entire variant landscape enumerated, you have each one of those with a provisional call based on industry standard guidelines, you have categorizations of evidence whether it’s segregation studies case studies or cohort analysis or larger scale population studies, functional studies – whether they be in silico in vitro or in vivo and and those functional studies are actually broken down according to what aspects of the protein those variants are impairing the function of. That evidence is all categorized and can be searched through and sorted through and filtered on and then for each of those you have detailed information from the empirical literature that supports those evidence categories, and you can see some of that here that comes in the form of direct citations from each of those references which themselves are enumerated in the form of a bibliography to justify the strength of the evidence from the empirical literature for each one of those variants. So, a list of variants, provisional calls, categorization of evidence, more detailed information in the form of direct quotes from each of these references and then a full bibliographic reflection of each reference that talks about the variant, that’s the tier of value that is provided from these genomic landscapes and that can be used in all of those four different ways and ways that I didn’t describe due to want of time by our pharma clients depending on their needs and the stage of progress through the pipeline.

I ended a little early – I’m trying to stay focused on the core value proposition for the genomic landscapes – there are other use cases that I didn’t allude to. I’ll end by saying that we have great ambitions at Genomenon and that we’re working on unlocking a lot of the genomic potential for multiple genes, gene sets and drug pathways. Anybody who’s interested in engaging with us and talking about your specific needs we’d be happy to have a conversation.