In this webinar, Dr. Mark Kiel used real-world examples to demonstrate how a Comprehensive Genomic Landscape for a disease, pathway or gene set has empowered Pharmaceutical researchers and translational teams to understand genetic and rare diseases at the molecular level.
- How a Comprehensive Genomic Landscape delivered a 6-fold increase in identifying pathogenic drivers in just one Parkinson’s gene.
- How Genomic Landscapes have been used to segregate clinical trials using a comprehensive list of genes and pathogenic variants as genomic biomarkers.
- How to accelerate the cumbersome process of identifying genomic biomarkers for Companion Diagnostic (CDx) development, backed up with clinical evidence from the scientific literature.
About the Webinar
Drug targets with human genetic evidence of disease association are twice as likely to lead to approval (King et al. 2019). But navigating the millions of genetic data points to comprehensively identify genomic drivers of a target indication or drug pathway is daunting.
Understanding the molecular drivers of disease accelerates drug development at each stage of the process. It…
- Informs downstream research and discovery,
- Guides biomarker selection for clinical trial segregation criteria, and
- Provides documented evidence for CDx validation.
There is a proven process to assemble essential genomic insight into Neurodegenerative and other rare and inherited diseases to drive better target selection and biomarker identification.
For missense mutations, how are these being reported out in the clinical report – eg TARDPP?
There’s a couple of ways I can imagine that question being answered. A technical way to answer that would be in the variant nomenclature – given that this is a highly technical process up front, the variant nomenclature can be modified to fit any downstream needs that the client that we’re working with has so that the data that we produce doesn’t need to be transformed in any way but can be immediately actionable and intercalated into the downstream workflow.
So, that’s one way. The standard way that we produce the variant nomenclature is according to industry standard HGBS nomenclature but we’re flexible and able to modify the nomenclature as befits any given solution. The other way to answer that question would be, how do you particularly deal with missense variants, which by their nature don’t necessarily lend themselves to a clear understanding of what they’re doing to protein coding consequence. And so I’ll fall back on my slide where I discussed the Comprehensive Genomic Landscapes production, is that we use the ACMG and AMP criteria and assemble the clinical and functional evidence necessary to properly interpret the pathogenicity of variants, whether they’re nonsense or frameshift or deletion mutations or point mutations that lead to protein coding consequence such as missense variants or otherwise noncoding variants such as splice, or even intronic variants. We follow the industry standard approach to interpreting all of those different types of variants.
Can you suggest genetic biomarkers (also including mRNA) based on drug used and target indication?
This is a great question and allows me to clarify two things. I’ve been focusing here on the genetic variants for protein coding consequence but in the armamentarium of Genomenon to produce these data sets we can take a step back and asses information at the gene level and that necessitates looking at the different mechanisms of disease causation at the hand of the different genes and that includes differential expression. So if you’re talking about differential expression of different genes and how those differential expressions can lead to disease, that is something that can be produced in a Comprehensive Genomic Landscape at a gene level, yes.
The other thing to say to build on that is where I answered the first question by saying that we follow industry guidelines for interpretation of variants, ACMG or AMP, as well as genes. In the case of constitutional disease that would be ClinGen guidelines as a standard framework.
Nevertheless, we layer on top of that curation and annotation any custom insight that the client is particularly interested in. So I didn’t go into too much detail but hopefully in those examples that I showcased there were particular points of interest for the clients that they were specifically looking for based on their previous research and their preexisting knowledge of the gene and the disease. Things such as the clinical frameworks used to assess severity of disease – that is a curative activity and annotation that we can provide in the production of genomic landscapes.
Another example would be assembling a patient-specific laboratory value database if the disease is such that there are clinical laboratory biomarkers that allow you to track the level of severity of the disease. We’ve had engagements where we’ve produced such a database by exhaustively curating all such case studies and case series that report out on those studies.
A final example to really drive home the fact that we can do custom curation is in exhaustively reviewing all of the literature for a given disease to pull out prevalent studies as the prevalence in different variants and different genes are found in different subpopulations of disease based on ethnicity or based on disease subtypes. In conversation with a client, it really depends on what the focus is, and what the nature of the disease or the gene, or the gene pathway is, as to what custom annotations we layer on top of those standard frameworks.
What about new papers? How can we be sure this data is up to date?
That is a great question, particularly in the context of neurodegenerative disease, research is expanding exponentially. I mentioned the commoditization of genomic sequencing and how much more straightforward it is to perform these sequencing studies on a large scale, research is not slowing it’s only getting faster, and all of that information becomes less and less tractable to extract and understand manually and we appreciate that. With all of these genomic landscapes that I’ve described, the work is evergreen and continually updated both for previous curated variants for which there’s new papers about those variants – that is updated – in addition to any new variants that weren’t previously described that are newly published in any one of those papers that come out from quarter to quarter. The data in these genomic landscapes is kept up to date and the curation component is preserved across those updates so that you’re assured a sensitive, comprehensive and highly valuable up to date data set.
In the example of ATP7B, how many of the 145 variants with functional evidence were in the 248 ClinVar reported variants?
The short answer is I don’t know, I don’t remember that exactly. I do know though that we did look at that quite specifically, I just don’t know the result. I will say most of the ClinVar variants do not have reference citations and even if they do, they do have a characterization of those functional studies. So I don’t just want the audience members to perseverate on the number of variants that are accumulated, which is uniformly more and in most cases is much more than what is available in ClinVar, Genomenon’s Comprehensive Genomic Landscapes can double or multiply by ten the number of variants that are seen in ClinVar. Beyond that sheer number there’s exhaustive annotation of all of those variants as well and clarity, which is not provided by any of the references or any of the information in ClinVar at all.
What do you think of the cases with not much known genetics related to diseases? Do you think your platform or genetic biomarkers would still help target ID and patient stratification for such cases?
That is a great question. I’ll say that Genomenon’s data is at heart an association database. A lot of the focus of my talk, and certainly of our software in the hands of clinical users, is on variant information for known disease entities and clarity on published evidence for a given variant to make clinical grade calls or support some of the activities in Pharma that I described but a necessary preamble to assembling any of that data in Mastermind is an understanding of all of these associations at multiple levels. To answer the question directly, is if it exists in the literature directly or indirectly, no matter how rare, if there’s one reference that talks about this association we have it, we understand it, and we have tools that can automate the assembly of that information, followed up efficient and custom targeted review of that information.
As a specific example, one group that we’re working with doesn’t know how their drug works. They know that it works and they know that it’s extremely safe, but they don’t know how it works and they don’t know because it’s not directly published in a single reference. The work that we’re engaged in with them is in unweaving that understanding from disparate publications that have information that touch on what the mechanism may be but where the specific action of their drug will only become clear with a comprehensive view of all of those pieces of evidence brought together and assessed through the productions of a Comprehensive Genomic Landscape. So that’s a fairly long way to say, ‘yes, we have that information’.
What about looking at gene x gene interactions?
The gene level information that we have comes along with it, as I mentioned Mastermind being an association database, gene/gene associations. Gene/gene networks at the protein level, the protein interactome, the pathway participants if they’re not directly interacting with each other but participate in a common pathway, how proteins may affect the expression of other genes, all of that information is latent in our association database and can be uncovered simply by asking the question in conversation with a client and as I said, patterning to look for things to uncover when we organize and manually curate all of that information.
What does the data look like? How is it delivered?
That’s a great question and I have a slide that I loathe to put in the main presentation because it’s pretty busy but I want to clarify that this data is fully downloadable and integratabtle into your systems downstream and the format is modifiable so that you can fit any one of those downstream use cases. This is an example of one of the proteins that I talked about for Parkinson’s, the GBA gene, each row represents one of those dots, or one of those variants, each column is a specific datum or piece of information where I want to emphasize we have a very tiered approach to presenting this evidence. One of those columns that you see in the middle there is the provisional call, so it’s the top-line information from our curation so you can start to sort at a high level what this information means. That is the provisional ACMG call where you can see here they’re pathogenic variants. Then the next columns over are the numbers of articles that we investigated, technically and manually, to come to that conclusion and in many cases in many hundreds of papers that we looked at. A further level of summarization of the evidence is which of those variants have functional studies, in this case for ACMG it’s PS3 category of evidence, which of those variants have clinical grade evidence, and one example it would be PS4 or strong evidence of pathogenicity because that variants segregates with disease in a number of different patients as well as the population frequency data and the in silico predictive models.
A further tier of evidence is if you want to drill down into the specific information from any one of those assertions, categorized and then top-lined, you can see the reference from which those assertions came within their category of evidence. In the case that I described functional studies on the far right and the middle-bottom the clinical studies, with their PMID indicating references and a sentence que – a quote from the author – that makes the case for each of those different assertions followed by on the left any of the custom annotations that were requested, be they were specific assays that were used in the functional studies, other biomarkers that are associated with the protein consequence, different diseases where that variant is mentioned in that context or different ethnicities, or really anything that’s of interest to the group that we’re talking to based on their needs and based on the disease in addition to any of those baseline data points, such as population frequency at the upper right there, or the different ways a variant is described in the literature or the SIFT and PolyPhen type of in silico predictive models.
Could you please clarify if there is the possibility of Genomenon to resolve pathogenicity in case of well known BRCA1/2 mutations?
Yes – Genomenon has data for BRCA1 and BRCA2 mutations and can resolve pathogenicity based on published evidence and information in relevant databases.
Hello everyone. Welcome to the webinar. My name is Kate Oesterle and I’m a part of the marketing team here at Genomenon. On behalf of our whole team we thank you for joining us. We have a lot of great information to share with you today so I will get right to housekeeping and introductions. You can submit questions via the chat on the lower right hand side of the window. Please send us your questions and I will field those to Dr. Kiel as we go along and we’ll also take time to answer more at the end. We will be recording today’s webinar and we’ll share links with the recording as well as the Q&A transcription with all of you via email. Today’s webinar, ‘Using genomics to accelerate drug development for neurodegenerative diseases’, will be hosted by Dr. Mark Kiel and I’ll introduce him now. Mark completed his MD PhD and molecular genetic pathology fellowship at the University of Michigan where his research focused on stem cell biology genomic profiling of hematopoietic malignancies and clinical bioinformatics. He is the founder and Chief Science Officer of Genomenon where he supervises the scientific direction of the Mastermind suite of software tools. Mark, take it away!
Thank you, Kate and hello and welcome to all the attendees. We’ve got a pretty diverse collection of people so my intention with this webinar is to keep things pretty high-level, but as Kate mentioned, I welcome questions throughout the presentation as well as after through email or various ways that you can contact us through the website. Again, I’m very happy to have everybody here even under the present circumstances. It’s actually very encouraging to see how this enforced distance is bringing a lot of people together, particularly my colleagues in the medical field and our colleagues in pharma uniting together for a common cause. During these challenging times when many of the laboratory work is shut down and manufacturing has come to a halt – there’s ways to continue maintaining the momentum that the data science teams at these institutions have been making by looking at more of the in silico data, which is what is going to be the substance of the webinar today.
I’m gonna turn off the video here so that you can focus on the slides and I’ll begin with an outline of the presentation for today. The first component of the webinar is to discuss why genomics is important in drug discovery and drug delivery. I’ll talk about the core benefits and applications of genomics generically and then more specifically in the second part of the webinar, I’ll talk about how you should go about leveraging genetics and genomics through the use of Mastermind’s Comprehensive Genomic Landscapes. I’ll describe what genomic landscapes are as well as the many places where they’re applicable again at a very high level, but then the last part, and the most substantial part of the conversation, will be revolving around some specific use case examples – representative case studies- that we’ve developed in concert with a number of our pharma clients all revolving around neurodegenerative disease. Before I begin with the core of the webinar let me emphasize that what I describe here is only representative but is not limited to just the neurodegenerative diseases specifically that I talked about or neurodegenerative disease at all. We’ve done a number of projects in oncology and rare disease. The focus of today’s webinar is just a coalescence of the information that we’ve accumulated around a number of neurodegenerative projects that we’ve put together with our clients.
Let’s start by talking about the core benefits and applications of genomics in drug discovery. This is a major takeaway from the talk here. We’ll get into some more detail – this is not just true obviously of the work that Genomenon is doing, but it’s known throughout Pharma and has been for some years that having genetic information supporting the candidacy of your targeted molecules, biomarkers and the like, can double the likely success of a drug bringing it to bear in the clinic. This is one of the more seminal papers that talk on this from Nature Genetics some years ago, but there have been a number of additional papers, both review articles and primary studies, that have corroborated this notion and why we’re gathered here today.
The promise of genomics in drug discovery – I’ve broken down into three subcategories: One, is that it’s useful for reconciling information about the pathogenicity of complex or heterogeneous diseases both interrogating the mechanism of action, whether they be activating or loss of function or in resolving clinical heterogeneity. It’s also true that it can bring multiple seemingly disparate disease types together which may have a unified mechanism of pathogenicity which therefore could be considered to have a unified treatment approach. And then lastly, many conditions that are genetic have either known or unknown underlying genetic heterogeneity – the different pathways that are involved to arrive at seemingly identical clinical circumstances would if they have different paths genetically to get to those clinical circumstances would obviously have a different drug response profile and genomics can resolve that heterogeneity. The opposite side of that coin is that pathway homogeneity can be uncovered by genomics linking patient subtypes within a defined clinical entity that was otherwise unknown or unknowable, absent genetic information that can can predictably determine which patients are going to have which course either in the natural history of the disease or in response to a targeted therapy.
That’s broad strokes and where the promise of genomics lies in drug discovery, more specifically as we have seen in our work with a number of our Pharma partners, where genomics can empower Pharma is in optimizing preclinical therapeutic targets at the beginning phase of triage and understanding which targets are vulnerable to drug ability. Further, it can reduce our R&D costs to streamline those candidate predictions as well as devising experiments that are maximally efficient to lead to a successfully targeted biomarker with a molecule or a library of molecules. Likewise on the other side of the drug discovery process, understanding the genomics of a disease can maximize the success of clinical trials by better parsing patient populations with known genetic drivers of disease, which are known to be targeted by the therapy that’s the central focus of the clinical trial. Then similarly on that end of the drug discovery pipeline, expediting FDA approval by having this great burden of genetic evidence with all of its evidence citations as I’ll describe when it characterized what a comprehensive genomic landscape is, can materially speed the time to market by oftentimes eliminating the need for a resubmission of an initial FDA application because of the strength of the initial application and all of the data and evidence and ballast underpinning that first submission – we’ve seen that play out with a number of our pharma partners.
What’s the problem? What’s the problem that we’re discussing the solution for? It’s assembling these comprehensive genomic landscapes, this full understanding of the genomic underpinnings of a disease and how treatment may be brought to bear to treat patients with those different mutations is extremely challenging to aggregate and annotate and understand using only a manual approach. Particularly valuable is the information that’s otherwise locked away in the scientific research. A manual approach to extracting this information is incomplete and intractable due to the time it takes to collect all that information. At genomic on we’ve solved that problem for our clinical users with the Mastermind Genomic Search Engine which has all of the collected evidence from the medical literature pre aggregated and readily searchable. For our pharma clients we’ve leveraged the intrinsic association data within Mastermind linking diseases and phenotypes and drugs with genes and variants, no matter how any one of those entities is described in any of the published literature or supporting databases necessary to fully understand the genomic landscape of a disease. In keeping with the way that the clinical world interprets the meaningfulness of genetic variants, we also incorporate predictive models of protein consequence, as well as population frequency and layer that information on top of all the evidence from the empirical literature that’s contained within the Mastermind database and bring that to bear to produce what we’ve called a Comprehensive Genomic Landscape.
What is a Comprehensive Genomic Landscape? It’s a complete data set of all genetic variants for any given gene or gene pathway or drug target or any disease indication or phenotype or biological phenomenon extracted from the Mastermind database which is as I mentioned extremely exhaustive and up-to-date. Having that information be expertly curated by Genomenon’s team of variant scientists and scientific researchers using all of that evidence from the Mastermind database to drive the scientific conclusions and assertions that are made. More specifically, the variants are interpreted according to clinically accepted standards ACMG in the case of constitutional disease and AMP in the case of oncology, and all of the information, all of the assertions that we make in promulgating these Comprehensive Genomic Landscapes are fully referenced. That includes clinical studies and findings, empirical functional studies that get at mechanism of action – which will be a major focus of the examples that I show – as well as obviously treatment information and clinical trial information and critically all of that is supported by scientific evidence in the form of references to the primary text from which those findings were were extracted. That information, as vast as it is, is all summarized very efficiently so that a researcher in Pharma on one end of the drug discovery pipeline, or group running a clinical trial, a CRO, or an internal team on the other end, can have lock away immediate immediately actionable insight into what this comprehensive genomic landscape means holistically for the gene or the the gene pathway that was the substance of the Comprehensive Genomic Landscape or specifically as you’re investigating any one of those individual variants or their associated data. It’s updated quarterly with any new findings that are newly published because the research world doesn’t slow and in fact in genomics, the pace of acceleration has quickened since the commoditization of these sequencing platforms and their use in research studies both clinical and basic biological studies.
I’ve alluded to the drug discovery pipeline – this is likely to very familiar to my attendees – Mastermind’s Comprehensive Genomic Landscapes that are produced by Genomenon’s team are helpful and critical in many aspects across multiple stages of the the pharma lifecycle from early discovery and divining what molecular drivers there may be causing this disease, all the way to patient segregation and developing a market for a new compound, up to and including commercial and post commercial activities such as I talked about with development of a companion diagnostic and appropriate regulatory support, or as we’re increasingly finding in challenging rare disease cases helping pharma companies to identify the patient populations that would benefit from their drug of which by definition there are a few patients. Mastermind with its reach into the clinical realm and molecular diagnosticians and treating physicians in that respect, can bridge that gap and help streamline the process and maximize the yield of patient identification in rare disease.
What makes Genomenon unique? Our combination of very sophisticated and genomically aware computational intelligence that exhaustively and accurately indexes the vast corpus of information that we’ve accumulated over the past seven years of operation. Layered on top of that, our expert curation. As I alluded to before, every one of our assertions that we make in a Comprehensive Genomic landscape is vetted manually by one of our variant scientists and ultimately approved by an MD or an MD PhD level review. It’s not the situation that it’s all produced from an AI or machine learning approach where you have reason to be suspicious of the quality of that information, but moreover it’s not only produced manually which has the challenge of being insensitive and very liable to missing information I’ll showcase in some of the use cases as well as the the challenges associated with revisiting that data if you would need to go back and do that investigation again from the bottom up manually. Genomenon’s unique approach is the integrated combination of computational intelligence and final expert manual curation.
The latter half of the talk, as promised, I’m going to focus on neurodegenerative disease with I think four or five case examples that are born out of work that we have done with Pharma or biotech companies interested in developing therapeutic compounds for neurodegenerative disease who were aware that they would greatly benefit from understanding the Comprehensive Genomic Landscape of that disease for particular genes.
The first example I’m going to go through is Parkinson’s disease and there are two genes that I wanted to talk about although there’s about a dozen or more genes that are known to be associated with Parkinson’s. Some of the more prevalent genes that are widely known to lead to Parkinson’s disease are GBA and LRRK2.
Parkinson’s disease as we all know affects nearly a million people in the United States and leads to motor symptoms such as tremor and bradykinesia, as well as non-motor symptoms such as depression and sleep disturbance. It results from loss of dopaminergic neurons in the substantia nigra and an accumulation of SNCA aggregates either through direct mutation of that gene or upstream mutations as we’ll talk about. Importantly, and why I think there’s been a focus on Parkinson’s from a number of our Pharma colleagues is that there’s no real efficacious therapeutic options for Parkinson’s disease, at least not that are widely disseminated and currently the patients are managed with symptomatic relief only. 90-95% of patients are sporadic although caused by genetic mutation and in many familial cases exists, such as in genes like SNCA, in addition there’s many risk alleles that are associated with Parkinson’s, but the genes that I want to talk about have a high genotype to phenotype correlation.
The one I want to begin with is GBA which is a gene better known as beta-glucocerebrosidase which is active in lysosomes and it cleaves a component of cell membranes and it is known to cause both Parkinson’s disease as well as a metabolic disorder called Gaucher’s disease, which comprises three distinct subtypes depending on the presence or absence of neurological complications and a particular area of focus for the the group that we were working with on this project was reconciling those different subtypes of Gaucher’s disease, and thinking about how that the genetic mechanism of development of disease through mutations in GBA, is related to those patients who develop Parkinson’s because of mutations in GBA. They were particularly focused in two aspects of this gene. One is in obviously understanding the full breadth of genetic mutation that contribute to disease, but secondly understanding both the functional mechanism of disease causation as well as the disease speciation at the hand of different types of variants that may lead to the genetic heterogeneity that leads to different clinical findings in different patients.
In order to give you a takeaway from the work that we did, the group that we were working with began with an awareness of about 30-some mutations that they had or variants that they had accumulated through a year or more of their manual review of the scientific literature. They were aware that they were likely to be missing information, that’s why initially they came to Genomenon, but in addition to producing our Comprehensive Genomic Landscape we also revisited the three dozen variants that they began thinking were associated with disease and brought clarity to those variants and the group’s understanding of what those variants did by having a deeper and more broad reach into all of the evidence supporting or detracting from the candidacy of those variants. In total at the end of this we produced a dataset that comprised about 300 pathogenic variants in the GBA gene out of a total of I think some six or seven hundred variants that were described at all in the medical literature, so three hundred with clinical grade evidence to support their pathogenicity, including about a third of those which had functional studies either in vitro or in vivo functional studies divining the protein consequences of those individual variants, which is again what half of the goal of the the work that we undertook was designed around. In total that led to about eight and a half fold increase in the universe of known driver variants in the context of Parkinson’s and with ancillary information about Gaucher’s disease in addition to providing much deeper clarity into the functional drivers and their associated mechanisms causing disease.
This is a graphical representation of that yield. Just to orient everybody this is known as a swarm plot where each dot represents a variant from an article, or many articles, and always brought back to a patient or several patients who were initially found to have those variants. There’s a great deal of evidence belying this simplified plot but you can nevertheless see at a high level some patterns emerge where there’s more information, more prevalence of variant presence along the linear axis of that protein subjacent to these functional domains in the GBA gene. These plots are known as swarm plots and they’re waviness is just so that you can see where the pattern emerged, but if you drop parallels to the x-axis from any one of those dots, that will reflect the position on the protein of that particular variant. The way that they snake up is just a reflection of where they’re starting to aggregate. You can see many aggregations toward the right hand side of this Lysosomal Acid Galactosylceramidase domain of the GBA gene. There’s a great deal of information and detail that I’m not discussing here, but suffice to say that we in the production of this genomic landscape evinced a great deal of these patterns with all of their associated detailed evidence supporting the presence of those patterns.
That’s the functional aspect that I just wanted to lightly touch on. One of the more interesting things about the work that we did for this gene was the second aspect that the group was interested in which was to say: how does GBA mutation sub species each different patient populations within Parkinson’s patients and within Gaucher’s? In particular within patients who have Gaucher’s with different subtypes. The type one is the one that has a neurological affinity and a prevalence or predilection for relatedness to Parkinson’s disease, and what I’m laying out here in different colors are different tiers of the plot are along the right side spine of the graph are the different diseases with their associated swarm plots. Looking at the two bottom plots – the gray and the dark blue – you can see which variants are related to Gaucher’s and Parkinson’s generally, but then looking at the orange, light blue, and and medium blue plots there for shows type 1, type 2, and type 3, you can start to see how turns emerge in the presence of these different variants across the GBA gene that specifically targets Type 1 Gaucher’s but don’t target Types 2 and Types 3. Without giving away the ghost and getting into any detail, there’s detailed information that our genomic landscape has unlocked that’s provided some insight into why that might be and how it has bearing on the protein coding consequence of the GBA target that this group is designing a molecule to treat.
Shifting gears to the second use case still within Parkinson’s, this is the LRRK2 gene which is a large multi-domain and multifunctional protein that has both GTPase and kinase activity which was obviously lent itself to some heterogeneity in its target ability which was what this other group was interested in helping devine in the context of part Parkinson’s disease.
We undertook to produce a Comprehensive Genomic Landscape not simply to take their universe of known variants from 13 to a hundred and fifty – again pathogenic, clinical grade, evidence backed pathogenic variants – leading to about a tenfold enrichment of their resolution of the universe of LRRK2 variants that cause Parkinson’s, but fully a hundred of those pathogenic variants had associated in vitro and or in vivo functional evidence that allowed us to better clarify what those variants were doing in the context of LRRK2 protein function.
The gray there is the previous universe and the dark blue there at the bottom is the swarm plot for the new high-resolution comprehensive universe of LRRK2 variants in the context of Parkinson’s disease. Each of each dot is a variant when you drop a perpendicular, that’s where and the protein those variants are our present. There they’re presented here along the protein domain structure so that you can see how the variants are clustering and how they’re likely to be leading to protein coding consequence. I’ll leave this light up here while I clarify that if you just had the benefit of the 13 variants that was the baseline level of understanding of what was going on in this protein, no pattern would emerge. Whereas when you benefit from a comprehensive and exhaustive view, you can see some pretty salient patterns that emerge, many of them that are perhaps not surprisingly focused on GTPase and kinase domains but also some ancillary patterns emerge, such as variant peaks in the armadillo domain, a cluster of variants in the anchoring domain as well as in the WD40 domain that we’re not present at all based on the previous research that was undertaken. A clear demonstration of the benefit of having a comprehensive understanding of all of the genetic drivers backed by evidence that are leading to disease in this one protein.
Without going into too much detail I’m happy to have a private conversation with any of the attendees who are particularly interested, but in addition to broadening the universe of understanding of causative variants, we clarified the nature of those protein coding consequences where there were empirical studies to support any of those conclusions. So in addition to being multi functional based on those multiple domains, LRRK2 is interesting in that it both can connote when mutated variants lead to disease can be causative of disease through either a gain of function or a loss of function mechanism. In our process of putting together this Comprehensive Genomic Landscape our curation team was specifically looking for and annotating those variants from studies that clarify that the protein when mutated led to a gain of function LRRK2 or a loss of function and which function was affected. On the top there I think is the GTPase and on the bottom is the kinase. Again, you’re starting to see some interesting patterns emerge some of which may be surprising, or may not, depending on how much you knew beforehand and how much detail you had about any of those previously known variants.
The latter two examples that I want to go through are on two separate diseases. The next disease that I’m going to touch on is Amyotrophic Lateral Sclerosis better known as ALS, and ALS is caused by a number of different genes. TARDBP is one of them and the one that I’d like to focus on.
ALS is properly defined given its prevalence in the United States of less than 200,000 as a rare disease but obviously leads to neurodegenerative phenotypes in patients and including a progressive loss of motor control and eventually death due to respiratory failure. It also has as a relatively common sequelae of the various ways that the genetics can lead to this phenotype – the aggregation of these TARDBP proteins. There are some treatments that have been brought to bear for ALS but they only slow progression and so there’s a great deal of interest in more curative and more impactful treatments. As I mentioned there’s a great deal of different types of genetic paths that lead to this ALS phenotype – FUS, SOD1, and TARDBP are among the most common and TARDBP is the the focus of this study although obviously we have information about the other gene targets as well.
TARDBP is a nucleic acid binding protein that’s involved in regulation of transcription and various other RNA processing and stability activities. In ALS those aggregates of TARDBP are common and functional studies involving TARDBP will often find miss location or abnormal fragmentation by protease and those functional assays, as well as a whole panoply of other heterogeneous mechanisms of disease causation at the protein level which is why the the work that we did here was targeted on disambiguating those different heterogeneous protein consequences and reconciling what those consequences look like along the axis of the functional domains of TARDBP protein itself. Just a highlight that the known universe of TARDBP pathogenic or likely pathogenic variants in in Clinvar, which is a common go-to for Pharma and clinical users alike, pales in comparison to the full scope and breadth of understanding that a genomic landscape from Genomenon can provide, namely a five-fold increase in known variants many of which in this case I think most of which have associated functional evidence hiding away in the decades of genetic and basic science research about this protein that only our computational intelligence followed by expert curation approach to developing these comprehensive landscapes can bring to bear.
The swarm plot for TARDBP along the axis of the functional domains of that protein, you can see some clear patterns emerge where there’s a focus of variants in the domain that’s responsible for interaction with UBQLN2 protein there’s there’s the suggestion here to me when I when I look at this landscape, as I have looked at many dozens of these different landscapes, that there may be a three-dimensional conformation within that domain where the more prevalent and well characterized through experimental study variants seem to be converging in three-dimensional space. The other thing to note here is that many of those variants, I think most of them in fact none of the variants in that UBQLN2 interactive domain nonsense or frameshift, they’re all missense variants, and the fact that they’re clustering in such a distinct way, not just in that blue domain but in addition to the nuclear export signal as you can see above the the gray bar there, and in some of the other domains the RRM domains 1 & 2 which also seem to suggest there’s a three dimensional conformation. The fact that they’re clustering in such a distinct way and the fact that they’re almost exclusively missense mutations suggests to me strongly that there’s a gain-of-function mechanism and in fact when you look into the details as we obviously did to produce this, the vast majority of the studies for these variants corroborate a gain-of-function mechanism of action and only when you’ve aggregated all that information into this comprehensive landscape does the full complexion of the consequence of mutations in this protein become clear.
The last example that i want to talk about is a slightly different disease that also leads to neurodegenerative symptoms, that’s Wilson’s disease caused by mutations in the ATP7B protein.
It is a rare disease, much rarer than the other diseases that I’ve talked about affecting two to three thousand individuals in the United States. It is the result of abnormal accumulations of copper in liver and brain tissue leading to involuntary movements and psychiatric symptoms as well as obviously long-term damage to those sensitive regions. There’s therapies associated with treatment of Wilson’s disease but there’s also very active studies looking to develop more effective therapies and the engagement here was to produce a Comprehensive Genomic Landscape that would better uncover the full breadth of mutation in Wilson’s disease at the hands of this gene as well as better clarifying what those different mutations are doing at the protein level.
Additionally as a comparer here we have Clinvar which has 250 some known pathogenic variants associated with ATP7B and even in a well studied gene like ATP7B Mastermind’s ability to produce a Comprehensive Genomic Landscape led to a nearly three-fold increase in the number of pathogenic variants, and among those pathogenic variants brought clarity to the functional studies and their consequence for a hundred and fifty of those variants.
These are arrayed in the ATP7B protein against their functional domains. Just crudely on the left are the metallic binding domains one through six and on the right crudely are the ATPase domains, the effector domains of the ATP7B protein. This landscape to me suggests a loss-of-function mechanism given how distributed the mutations are and indeed when you look into the details of all the assembled evidence the empirical studies do demonstrate that this is a loss-of-function mechanism, but nevertheless there are still some interesting patterns that emerge too that are salient – one at the far left is that snake of mutations that seem to concentrate in one of those metallic binding domains there – all of which are missense suggesting that there may be a critical portion of that one metallic binding domain. The details are all pattern in the Comprehensive Genomic Landscape and then the other is in between the dark blue and the orange domains there are the the seeming concentration of variants in ATP7B, the inner region that doesn’t have a functional domain, or at least not a well characterized functional domain. The details of which emerge when you examine those through hypothesis generation and testing of the Comprehensive Genomic Landscape.
I don’t have time to speak to it but I appreciate that many on the call who are interested in neurodegenerative disease are particularly interested in Alzheimer’s disease. I don’t want to leave you with the impression that the three diseases that I talked about are the only diseases within neurodegenerative disease for which we have data. And indeed we have information about the APP,, PSEN1 and 2 as well as the disease risk alleles in a APOE covering the Alzheimer’s disease genomic landscape. I just won’t showcase any of that information here but suffice it to say that the value provided from this Alzheimer’s focused Comprehensive Genomic Landscape is similar to the value that’s provided in the context of those other diseases.
Some the benefits of these genomic landscapes can be broken down at least for neurodegenerative disease into two rough camps providing the insight in the mechanism of action as I described it which can be utilized in improving target identification. I top line it here to say that novel insight into five to ten more functional drivers can result from engagement with Genomenon to produce a genomic landscape for your target of interest both gain-of-function and loss of function mechanisms can be elucidated and this all together can lead to more focused R&D efforts so that you don’t waste time and resources on candidate compounds and targets for which there’s less supporting genetic evidence. The other side of that the value prop is an increasing diagnostic yield, better defining a more comprehensive market, better defining how to initially segregate patients into patients who are likely based on evidence to respond to your therapy before beginning your clinical trial or after in a post hoc analysis, better clarifying the results of your clinical trial pattern on the genetic and genomic evidence that a Comprehensive Genomic Landscape can provide. I say there in that sub bullet point that they’re in some of these work that we’ve done with Pharma partners we’ve increased the number of genetic biomarkers that they understood when they first engaged with us in at least one circumstance by 50 fold. Depending on how well characterized your gene is or how much effort you’ve expended, in every circumstance the production of a Comprehensive Genomic Landscape can dramatically increase your result but sometimes to an almost unbelievable extent, especially if the genes are less well characterized or the disease mechanisms are just being worked out. Lastly and what I alluded to at the beginning, producing a Comprehensive Genomic Landscape is useful in the promulgation of a companion diagnostic where not only can we increase the diagnostic yield, in the one example that I didn’t showcase in a different disease by threefold in the number of genetic biomarkers, but also in expediting and maximizing the likelihood of initial approval when documents are submitted to the FDA.
I’ll circle back to that the top line message that I wanted to convey: understanding the genetics that support the candidacy of your targets and the the disease pathogenesis of your targeted disease can double the success rate of a clinically viable drug that can be brought to market and Genomenon is uniquely suited to solve that challenge in concert with our Pharma clients.