Drug Discovery in the Age of Genomics

Currently, only 10% of drug therapies actually reach patients because the genetic mechanisms of disease and their consequences for drug action are not well understood. This incomplete understanding can result in R&D failures in the lab, diminished likelihood of success in clinical trials, and/or inappropriate diagnosis and treatment of patients in the clinic. With the advent of Next Generation Sequencing (NGS), pharma and biopharma companies can begin to understand the biological processes underlying disease and change this devastating statistic.

Genomics Empowers Pharma and BioPharma to:

  • Choose better starting points for therapies
  • Reduce R&D costs by focusing on validated targets
  • Increase the chance of FDA approval with genetic evidence support
  • Increase success of clinical trials through optimized patient selection
  • Decrease time to market

In this webinar, genomics experts Drs. Mark Kiel and Alex Joyner describe the deep and invaluable connection between therapeutics development and genomics, and how to obtain and use NGS data to improve outcomes. 

Is your organization making full use of NGS technology & genomic data to power your drug discovery?

Find out as you join us to learn:

  • How genomics can be used to facilitate drug discovery, e.g. prioritizing variants from among patient cohorts
  • How genomics can be used to improve outcomes for clinical trials, e.g. maximizing patient recruitment and ensuring optimal patient response
  • How genomics helps to decrease time to market for therapies, e.g. expediting FDA approval with documented genetic evidence

Watch the Recording Below


Q: Can you comment on the applicability of genomic data in diseases beyond Oncology especially Rare Disease or Neuroscience?  How would you typically get started w/ a Pharma or BioPharmaceutical company?

A: Genomenon has worked with a number of pharma clients in both Oncology and Rare Disease. As most rare and orphan diseases have a genomic lesion as the cause,  genomic analysis can definitely lead to improvements in biomarker candidate selection or clinical trial design and operation. Genomenon typically begins with a proof-of-concept trial of data collection and analysis to showcase our capabilities, followed by a much more broad application of our data and analysis. In particular, I think there is a great deal we will learn in the coming decade about how genomics influence mental health and response to psychiatric medications with judicious application of sequencing data in clinical trial design.

Q: Research is moving quickly, so do you update regularly? If so, how often to account for the latest published literature?

A: We update our database weekly. Mastermind also provides alerts so that users can receive new articles on a variant or gene as they come out. 

Q: What is the current level of adoption of genetics/genomics in pharma/biopharma? 

A: Genomics is a well-established tool in research. Its use in a clinical setting continues to grow as methods become more reliable, and the application of knowledge of specific variants continues to grow. It is safe to say that if a trial is being designed now, it will have at least some genetic or genomic component, with this proportion of trials projected to see a dramatic increase in the foreseeable future.

Q: Can better measures of treatment response make it easier to identify genetic predictors of differential response? 

A: Current measures of treatment response center around overall population-level outcomes such as Event-Free Survival or Overall Survival. More precise measures of molecular effect of the drug response might allow for enhanced insight into the candidacy of the drug to minimize the risk that an early trial (which may not be as statistically powered) may not show an overall survival benefit when in fact the intended molecular response is occurring in the cohort. These fine-tuned measurements can be facilitated by genomic analyses on cohort according to the quarternary analysis described in the webinar. However, caution should be exercised in over-interpreting laboratory values when survival benefit is the ultimate goal.

Q: Have you looked at variants that promote longevity in Mastermind?

A:  We have not specifically looked at longevity for any client but do indeed have data in Mastermind that may address this phenomenon, and we would certainly be interested to learn more about your specific requirements.

Q: What should we do when there are many drug targets in the cohort samples? 

A: In the webinar, I discussed the concept of molecular recurrence across a cohort dataset. The concept of recurrence is not restricted to a single variant target or even a single gene target, but instead can be extended to include multiple genes comprising a pathway or family of interrelated genes. Similarly, there needn’t be a requirement that these different genes be in the same pathway. In fact, in any given cohort, there may be complementarity among the responders – some who have one pathway activated and a fraction of the remaining patients who have another (related or unrelated) pathway activated. Broadening your approach to cohort analysis by modifying your concept of recurrence in this way will capture all such drug targets even when they operate in different pathways if you search for and identify complementarity across the cohort.



Candace: Good afternoon and welcome to the webinar. I am Candace Chapman, VP of Marketing for Genomenon and I’ll be your host today. Today, Dr. Mark Kiel and Dr. Alex Joyner will be discussing Drug Discovery in the Age of Genomics. They have a lot of great information to share, so I want to get right to the housekeeping and introductions so they can get started. As you’re watching, please feel free to submit questions through the chat window on the right of your screen. Mark and Alex will answer questions after the presentation, so without further ado I’ll introduce our speakers:

Mark completed his MD PhD in molecular genetic pathology fellowship at the University of Michigan where his research focused on stem cell biology, genomic profiling of hematopoietic malignancies, and clinical bioinformatics. He’s the founder here and the CSO of Genomenon, where he supervises the scientific direction of the Mastermind suite of software tools.

Alex completed his PhD at the University of California San Diego in biomedical sciences with a focus on bioinformatics. He worked on genomic profiling of autism patients and phenotypic association with brain imaging data. In industry he’s worked on optimization of variant calling algorithms and interpretation of NGS results in a clinical setting. He’s new here on the team and we’re really happy to have this senior field application scientist. Hello to Alex and Mark.

Alex: Thank you so much. Hi Candace, it’s great to be here with you and Mark today. I’m gonna switch over here and have Mark take it away.

Mark: Thank you Candace for the introduction. Hello, everyone thank you all for coming together for this webinar. This is the first in a series that we’ve given that we’re going to discuss drug discovery as particularly pertains to the use of genomics and genomic data in that process. Alex and I will be doing a bit of a tag-team in this process. What we’re going to go through will be the interrogatives here in drug discovery using genomics, so beginning with the why, why use genomics? We’ll go through the core benefits and the applications of genomics in this drug discovery process. We’ll talk about how we should go about that with drawing on our own experience here at Genomenon, and going through at a high level some practical considerations for the use of genomic data, and then ending with some specific examples representative case studies of actual drug discovery from concept to sequencing to findings.

Let’s begin by talking about the core benefits and the applications of genomics and I’ll start with a summary slide. This slide is really the main takeaway from this talk – it’s from a really great review article from Nature Genetics by Matt Nelson at GSK. In it he says at the conclusion that genetically supportive targets likely can double the success rate of clinically viable drugs in this drug discovery process, so I’ll commend you to that review article to get any of the details. As you’re listening to the webinar and we’re going through these slides this should be the main takeaway: The idea that leveraging genomic data can double success rates of drug development.

Let’s start out by talking about what genomics allows pharma and biotech companies to do. We’ll go from preclinical identification and optimization of therapeutic targets to reducing research and development costs to refining those candidates, maximizing the success of clinical trials, expediting FDA approval process and then overall decreasing time-to-market and leveraging genomic data.

We’ll start with talking about optimizing preclinical targets. The first and probably most obvious aspect of bringing genomics to bear in identifying preclinical targets is gaining a fundamental understanding of the biomolecular basis of disease. The next would be in the case of complex disease where the cause of disease is not necessarily obvious or where there’s heterogeneity in the understanding of some patients with the disease and a lack of understanding of others. Genomics can be used to understand novel or otherwise unknown pathways in that complex disease and put together this can provide a molecular starting point for devising targeted therapeutics. Overall the goal is to identify biomarkers for these disease indications around which you’re developing your drug discovery process and biomarkers can mean a lot of different things in different contexts.

It’s a very broad term in this case there’s three uses that I’ll be touching on. One is the most obvious where it’s associated with disease by virtue of being pathogenic for or causative of that disease, but another set of aspects of a biomarker that are less well appreciated is whether the biomarker makes a note of meet differential response to a therapeutic. Similarly, if the biomarker particularly at the genetic level can be used to monitor response to that drug. Typically these biomarkers are understood through interrogating case control studies to identify those genetic variants mutations all the way up to large structural changes that are enriched in a particular disease. Throughout I’ll be citing a number of really key references. This is a very recent reference and is from Duggar which I think really nails the historicity of precision medicine in drug discovery.

I’ll start at the top in the 50s to 70s where those decades were dominated by the idea of phenotypic screening approaches to understanding druggable targets using animal studies or studying human tissue or otherwise cell culture models where there was less penetrating and precise insight into what was actually causing the disease, but rather it was more of a holistic understanding of what was going on as a consequence of the first cause of the disease. In most of these scenarios it was the genetic lesion.

In the 70s to 90s Duggar states that the approach was to latch on to a putative protein target and have that target in your mind as you’re devising chemical compounds to target the action of that permuted protein target. Then in the 90s up until the publication of the first Human Genome Project and its widespread use there was a focus on what at that time was a novel assay est studies or expressed sequence tags where the focus moved away from just the protein function and more toward the palette of differentially expressed genes in that diseased state. As we’re going through this in time there’s an increasing precision with which we understand each of these diseases during the human genome project the sort of afterglow of that initial publication. GWAS studies were very easy to perform while sequencing costs were still prohibitive and those genome-wide association studies or GWAS studies allowed farmer researchers to get close to what was associating with disease, but not yet fully understanding what the bio mechanisms of disease was which is what we’re experiencing now in the next-generation sequencing or NGS era which arguably started to become in widespread use in pharma research in 2013 up until the present.

There are a couple of examples that I don’t have on the slide that I’ll provide to you from the beginning all the way up until some present examples. One including BCR ABL fusions which were among the first molecular lesions to be recognized to be associated and causative with a disease. In this case that’s chronic myelogenous leukemia and obviously Imatinib/Gleevec was the drug that was used to precisely target that fusion event. Another example is HER2 new amplifications in breast cancer and Trastuzumab which is an effective therapy in patients who are positive for that copy number change or amplification of a gene and a third example would BRAF mutations, point mutations in melanoma for which a targeted anti BRAF inhibitor Vemurafenib (Zelboraf) can be brought to bear.

Alex: Mark, these are great examples of cancer can you provide us with a couple of examples from hereditary disease?

Mark: Oh yeah sure, great question Alex. I don’t want the message to be conveyed although I will focus on cancer, that these techniques and these philosophies are only applicable in oncology. There are a number of examples of hereditary disease or non malignant disease. Among the first that actually Duggar brings up is PCSK9 mutations identified. I think first in a small cohort of French patients with Hypercholesterolemia and the later development of monoclonal antibodies to target the effects of those mutations. Evolocumab (if I’m pronouncing that correctly) is one of the targeted compounds that was brought to bear to treat that hereditary disease and then there’s myriad other examples. Particularly, in the case of say replacement therapies for inborn errors of metabolism where we’re not talking about malignancy, but where these aspects of leveraging genomic data still apply, so great question Alex, thanks. The next thing that I want to talk about is leveraging genomics in reducing R&D costs. This may not be self-evident, but having the underpinnings of genetic evidence to support the candidacy of a biomarker that you have a targeted therapeutic that’s addressing the gain-of-function or loss of function mechanism of that lesion allows the Pharma R&D process be much more focused on high-yield candidates. Obviously, this has a sequela of decreasing the failure rate of the drug and so it should be apparent that decreased failure rates mean more efficient use of R&D budgets.

There’s another sort of ramification of decreased R&D costs when you focus on high-yield candidates based on genetic and genomic data and that is to save on what are known as opportunity costs. The sooner you can rule out a low yield candidate, the sooner your R&D efforts downstream of the initial sort of discovery process can focus on those more high-yield candidates. This at least in the context of drug discovery has been referred to as failing fast and failing cheap and this could be sort of exemplified in a quote from this Journal of Translational Medicine review article where “the cost developed new therapeutics has increased significantly in the past three or four decades, but without the widespread application of genetics and genomics the success rate has remained largely unchanged.” Clearly that is an area that can be addressed for maximizing the efficiency of drug discovery pipelines particularly where many therapeutic failures obviously occur after large capital investments.

Alex: Mark, it’s my understanding is that drug discovery is largely been empirical and so basically companies would follow down the target if it worked, but not really knowing the genetic mechanism behind it or the actual chemical mechanism behind it. Could you also provide some more quantitative numbers about the failure rate? Mark: Yeah sure, so this may be more viscerally apparent to those in attendance here we know about it from some of our client engagements and from reference materials, but a shockingly high number of drug candidates fail FDA approval which again exemplifies the benefit of getting to failure faster, so you can rule out the candidacy of one of those nine drug compounds and increase your success rate especially when the price tag of the average cost of FDA approval across the whole spectrum of drug discovery is in the order of billions of dollars, two and a half billion I think is the typical number that’s cited. In particular Alex as you were alluding to, this is worse for complex or otherwise poorly understood conditions. Alzheimer’s being a notable newsworthy recent example wherein in the recent decade more than ninety nine and a half percent of Alzheimer’s candidate compounds at various stages of development have resulted in failures very costly failures in certain circumstances and so having published data particularly supported from genetic and genomic evidence is one of the ways to mitigate these risks of costly failure.

We’re gonna move now into talking about the next phase once you’ve got a candidate from the R&D process into maximizing the success of clinical trials. There’s a couple of ways that genetics and genomics can be used in this process. Some of those ways we’ve helped our clients with using the data that Genomemon has at its disposal, so the first is using genomic markers as inclusion or exclusion criteria. In the early days of clinical trials, most of the inclusion exclusion criteria was largely based on demographic or histologic information or diagnostic information and increasingly with a view toward creating a more homogeneous initial patient cohort to maximize the control that’s exacted over the conduct of the clinical trials molecular inclusion and exclusion criteria are being brought to bear. This is exemplified most fully in the development of a companion diagnostic with molecular composition, so a molecular companion diagnostic really maximizes the likelihood that you can have control over the conduct and the results of a clinical trial at least to the extent that you can a priori. Having genetic and genomic data also increases the likelihood of drug response rate by ruling out patients who do not have the molecular stigmata of the disease that you are treating with your drug. In other words, you can rule those patients out initially and dramatically shrink or limit the size of your cohort in order to create that homogeneous patient cohort. The opposite is also true where you can broaden the patient population that comprises your cohort by maximizing the different types of molecular lesions, examples of which I’ll allude to in the latter part of the webinar, to broaden the population pool from which you can draw patients for your cohort as well as when the drug is finally approved maximizing the market potential for those patients. Lastly here I’ll say that this adds statistical power to the study when you’re able to broaden that patient population size.

Alex: Mark, you mentioned that inclusion/exclusion criteria can be used to both expand and contract cohort size. Can you provide a little more detail on how that works? It might be slightly counterintuitive.

Mark: Yeah it does seem paradoxical and so I’ll say that it’s situation dependent or context dependent. In the case of concatenating a cohort by ruling out patients who don’t have a particular or a set of particular molecular lesions. That process maximizes the likelihood of success of the clinical trial by increasing the drug response rate So the benefit of culling patients who don’t have molecular lesions that are targetable by the drug increases the likelihood of success of the cohort. The converse of that would be having additional patients enrolled in the trial after you’ve learned of novel molecular lesions that also can be targeted by your drug compound. Obviously, both of those scenarios, both sides of that coin, require genomic sequencing to get to that precise example. The latter example of broadening the trials is one that we’ve talked about in the context of a client of ours who was doing just that of iteratively or serially broadening the base the molecular criteria by which they enrolled patients into their trial almost on a rolling basis.

I spoke about companion diagnostics. Here’s a slide from our friends at Q2. This diagram depicts from preclinical to launch the drug development process and pigeonholes where companion diagnostic development typically takes place where you start later in Phase I with a prototype assay and then before beginning phase three you promulgate what’s a final assay and fully vetted and highly reproducible platform to produce this data by which you will enroll patients into your trial. Also, by which you will examine the results of that trial including say determining why patients may have been responders or non responders based on the molecular lesions that you find that were either part of the companion diagnostic or ancillary to.

So another component here toward the end of the pipeline is expediting FDA approval. It’s probably known to attendees that great the burdens of supporting data are required to justify the candidacy of a biomarker for FDA approval process and I’ll stress here I think it’s plain to everyone here who’s done any genetic or genomic research that there’s a great deal of objective evidence and objective support that comes with genetic and genomic findings.

Further, having a full landscape of genetic lesions associated with your entire patient cohort can promote deeper understanding of pharmacogenomics in the way that individual patients metabolized to or otherwise respond to your drug responding faster metabolites or slower metabolites or otherwise other aspects of pharmacogenomics. Another aspect that may not be fully appreciated beyond what I think was obvious in those three bullet points is that strengthening an initial FDA submission can save time in the long run by reducing the likelihood that the FDA will come back and ask for a protracted revision process. This is from a blog coming out of Harvard this is a very simple straightforward schematic. I think it really emphasizes this point of talking about the last mile of FDA approval sometimes taking six months sometimes taking 24 months and while there’s multiple variables at play in that process, any of the variables that you have under your control that can be risk mitigated such as I said strengthening your initial submission process with a great deal of objective genomic data will curtail that six to 24 month FDA approval process period.

So putting all these things together, what we’re really talking about is decreasing the time to market for an effective drug therapy, obviously not taking any shortcuts, but maximizing the efficiency with which you go through the paces of R&D and clinical trials through all their phases in the FDA approval process. So put together this leads to a more efficient product development process and also using genomics can lead to innovative ways to conduct clinical trials. In the precision medicine era the conduct of clinical trials typically falls into two buckets. One is say a basket trial where you have one drug, one biomarker and many diseases that fit that biomarker to which you applied the drug in the conduct of the clinical trial and the opposite sort of version of that is an umbrella trial where you have one disease, but many biomarker drug combinations as appropriate. A permutation of that would be say an n-of-1 trial where you have a more close marriage of the genetic genomic biomolecular findings for a given patient and have the drug that’s being used to treat that patient fine tuned to that molecular identity. This is perhaps obvious, but it’s important to describe outright and when you’re talking about decreasing time to market, there’s this aspect of competition in later stages of drug development particularly when disease indications that are likely targets for drug development tend to come to consciousness in the pharma community at the same time as a result of some seminal studies or seminal research findings. As a pharmaceutical company curtailing the time it takes to develop a drug fully or otherwise early days to get on the most potentially lucrative candidates allows you to out-compete your competitors and can lead to a success in what otherwise could be a zero-sum game and so decreasing your time to market by leveraging genetic and genomic data will allow you to be a more effective competitor in this market space.

The second and next phase of the webinar that I want to talk about as I said are high-level practical considerations for use of genomic data. Obviously, I don’t want to go into too much detail because it won’t be relevant to each attendee, but I do want to talk about sort of the nature of these genomic studies and bringing them to bear for drug discovery. Beginning obviously in the place of selecting the most appropriate omic data, I suggested earlier that most of the conversation was about next-generation sequencing and identifying variants, but there are multiple different types of genetic lesions that still fall under the rubric of genetics and genomics and still have a place in precision medicine and drug discovery, but for the next generation DNA sequencing SNV’s or single nucleotide variants and otherwise multi nucleotide variants that involved Indels. Those are discoverable through next-generation DNA sequencing, but you still have to select the most appropriate version of net NGS and that can take multiple forms from whole genome sequencing to whole exome sequencing where you’re focusing on the coding regions of genes to otherwise large gene panels that are comprehensive for clinical effect or otherwise comprehensive across known oncogenic drivers – all the way down to and including a focused gene panel which depending on your needs may include even one gene.

You have that palette of options to choose from and it really depends on what is already known about your disease indication and its bio pathogenic mechanism and what you’re trying to do with the omics data. Are you trying to validate a presupposition? Are you trying to provide gated entry based on molecular findings to your clinical trial, or are you trying to cast a wide net and learn and uncover new pathways that where otherwise not known? Depending on your place in that spectrum you might focus on a small gene panel of known entities all the way to more of a screening study with whole genome sequencing.

The other four types that I’ve summarized here include structural alterations like copy number variants and fusion genes. In the case of copy number variants or amplifications or deletions you can choose to do an array or to otherwise leverage the whole genome and whole exome sequencing data with some bioinformatics techniques to pull out copy number for change. If you’re working with chromosomal microarray, I would say that understanding what resolution the assay affords is pretty important because there are multiple examples of fairly small c and v lesions that would otherwise be missed in a low-resolution study and similarly with whole genome or whole exome sequencing, while the algorithms to detect copy number changes have been proved from those data, you’ll want to be sure that your pipeline has been validated to recognize the CNB’s that you hope to uncover with that assay and then for fusions you can use obviously whole genome sequencing which will detect aberrant chromosomal juxtapositions in the form of translocations or inversions. Otherwise what I’m more familiar with is this RNA-Seq to identify these fusion events and increasingly there are large fusion gene capture panels that can target Gene A partners in a Gene A to Gene B fusion event and will allow you at least for those known Gene A entities to uncover the gene B fusions that may be present in your patients. Lastly transcriptome studies through gene expression that I alluded to earlier or otherwise epigenetic changed by identifying methyl marks.

Alex, at this point you have more experience in this vein dealing with the data and data analysis than I do, at least at great large. I wonder if you can add to what I’ve just described here?

Alex: Sure Mark and it’s always good to keep in mind that these technologies are still relatively new and they are essentially still screening experiments and all considerations for laboratory experiments should be taken into account. What that really means is that results should not be trusted blindly. Potential for both false positives and false negatives certainly still exists and that can come from artifacts from the actual sequencing or it can also come from the subsequent informatic analysis and so when analyzing next-gen sequencing data, a validation of a result from an orthogonal technology such as Sanger sequencing is always recommended especially in a clinical setting where it’s possible that a life-or-death decision could come from a genetic result.

Mark: Yeah, it’s actually now that you bring that up I recall in my postdoc days when I was doing some of these studies getting married to one of the candidates that we identified, because it had all the hallmarks of being a valid viable high-yield biomarker, but when we went through the appropriate paces of validating its candidacy in the accuracy of the result, even though it was in one of these curated datasets, it turned out to be an artifact and so getting to that notion much more quickly will save you months, sometimes costly months of R&D. That’s a great point to recognize that these are just experiments and that the data should be qualified.

Alex: And the good news is that these technologies are really, rapidly increasing in their quality so that is a definite positive in the past five or even ten years we’ve really improved in terms of the false negative, false positive grades from sequencing.

Mark: This slide is meant to again at a high level provide an overview of genetic workflow to bring all of this together into context so beginning by just determining your study parameters in light of the indication that you’re targeting and what your drug what is known about your candidate drug compound. We talked a fair bit about based on those parameters understanding how to design your patient cohort and by what inclusion or exclusion criteria you compose that cohort performing the sequencing and/or array experiments and then analyzing the data.

Again there is a summary here, but it’s important to sort of pigeonhole in the process where each of these aspects come to play, particularly with respect to number four, for NGS data there are four phases of that analysis that I want to walk through here. The first is called primary analysis which is the just proper next-generation sequencing data production from a sample. Secondary analysis then is read mapping and variant calling. Tertiary analysis is individual variant interpretation, so looking in isolation and individual data points in determining their significance and then a term that we at Genomenon have coined is quaternary analysis, which is putting all of that together when you’re near performing biomarker identification as a result of cohort sequencing studies, so primary secondary tertiary and quaternary. I’ll walk through those here at a high level in these subsequent slides. Primary and secondary analysis, as I say DNA to data, this may be familiar to you in the audience, but it begins with a DNA sample and off the sequencer you produce what’s known as a FASTQ file which just comprises the raw DNA sequencing reads with the quality measurements some quality metrics for the accuracy of each of those nucleotides calls – that data, through bioinformatics processes, gets assembled into what are known as BAM files (B is for binary), SAM is the sequence alignment and mapped file the “CHR” there is to indicate that you take those individual reads of FASTQ data and assemble them together into loci where you’re able to call the reads to that locus and detect any variants and present those variants in summary form in the BAM file and then at the patient level all of those individually mapped and called reads and variants get assembled into what’s known as a VCF file which is then at the gene and mutation level and can tell you for a given patient or in the case of multi patient VCS, an entire cohort, what variants you have found in that individual or those individuals, so FASTQ, BAM and VCF that comprises the secondary analytic trio for CMV’s the file format is bed files which is just chromosomes start and end calling a zygosity a deletion or an amplification and then fusion pipelines from whole genome sequencing or from RNA-Seq have their own sort of bioinformatics idiosyncrasies, but ultimately the file culminates, and again, I’ll iterate it’s putative fusion events, so just because you see it in your fusion calling output does not mean it doesn’t require validation for validity. There’s a number of artifacts from fusion identification that you’ll want to be wary of.

Tertiary analysis is as I said looking at individual variant level data and interpreting the clinical or biological functional significance of that data. We at Genomenon coined the phrase Evidence Triad which goes along with the ACMG amp guidelines, the clinical gold standard for interpreting the meaningfulness of variants. That triad includes the predictive models of pathogenicity, examples being SIFT and Polyphen-2, the population frequency data, an example being say nomad or exact those to put together in this evidence triad ACMG process are seldom sufficient to call a variant pathogenic. The real tie-breaker there and what Genomenon has focused its efforts on is unlocking the information and evidence from the published literature and so there’s a couple of additional resources that can help you determine in tertiary analysis the meaningfulness of an individual variant. I have listed a couple up here ClinVar is one common example. I’ll say for ClinVar that it’s very broad in that it covers a multitude of different disease types and genes, but it is fairly shallow there’s not a surfeit of genetic evidence for any given gene in ClinVar. It’s certainly not comprehensive although it does touch on multiple different diseases and genes, the data is not replete and for any given data point the evidence that’s promulgated in ClinVar is certainly not comprehensive and so there’s some blind spots in ClinVar data again being very broad, but shallow.

On the flip side there’s a tool database known as OncoKB which might be known to attendees. I’ll say that it is narrow in that it’s obviously focusing on cancer, but even within cancer focusing on a handful of genes and within those genes a handful of variants, but it does go into great depth in understanding each of those variants and in each of those genes and how known drugs are brought to bear for say treating each of those lesions and so ClinVar is broad, but shallow. OncoKB is deep, but narrow. Then what we at Genomenon have produced is the Mastermind Genomic Search Engine which we think comprises the whole ocean of information about genetics and genomics as it indexes the published literature looking for disease gene variant associations and so each of these three tools and multiple others can be brought to bear to facilitate tertiary analysis. Then to put a finer point on what I described earlier as quaternary analysis another way to understand that is looking at the cohort at large putting all of this data together at the population level taking the individual tertiary variants and understanding their interpretation or lack of interpretation if there’s no evidence for them, putting all that together and understanding in your cohort what’s going on.

I’ll begin by talking about how to aggregate the data. I talked about the external sources of data including Mastermind from Genomenon and others. There’s also internal sources of data such as the electronic medical health record which you likely have as part of your data set for each of your patients within a cohort and then obviously the sequencing data so those three sets of data need to be homogenized, especially if they’re coming from different sources and then reconciled so that you can put them together to conduct the annotation, which is the next phase. The annotated data can then be filtered based on either quality control thresholds, as Alex alluded to, or otherwise somewhat more challenging is biological parameters based on what your hope for findings might be. An example of that would be if you’re looking for gain-of-function mechanisms. Well there are dominant negative and there are changes that ostensibly look like loss of function that may lead to gain-of-function. If you’re focused on gain-of-function, you might put a priority on say substitution variants and deprecate loss of function nonsense variants and there’s other sorts of aspects of the biological understanding that you have of the disease and the target that you can bring to bear to annotate the data. To round that conversation out, you’ll want to prioritize the components of the data that you think are likely to be meritorious so that when you complete the assessment or the third phase of quaternary analysis you’ll be able to most efficiently ask and answer the questions that will get you to the biomarker that you’re seeking for through this cohort discovery process.

A cohort is particularly in a clinical trial a group of patients receiving the same treatment. It should be emphasized here, I’ll harken back to that first slide, that phenotypic homogeneity is obviously necessary but genotypic homogeneity is very beneficial, and when I say genotypic homogeneity I don’t mean the self same molecular lesion, but I do mean that an awareness that the same biological consequence of each individual genotype is homogeneous. So having that sort of idea in mind seeking genotypic or biological homogeneity is obviously beneficial. This as I alluded to can be assessed by inclusion or exclusion criteria looking for the presence or absence respectively of a genetic mutation and then that’s true of individual clinical trial conduct through cohort analysis, but then it’s also increasingly true of very large multinational multi-corporation cohort studies at a country level, a very large population level. Alex I know that you’re much closer to these studies than I have ever been. I wonder if you can add some color to what I’ve just described there?

Alex: Absolutely Mark. These large-scale population studies really started with the Decode Project in the early 2000s with an attempt to genotype the entire Icelandic population, which is very homogeneous and so it provides for a good cohort since then Decode has been purchased and I’m sure they’re now doing next-gen sequencing on the Icelandic population.

Mark: For the sake of time, I’m going to jump right to the last section. If there are any questions, Alex does have a great deal of understanding of how all those trials are conducted and the value of those. If we have time for questions we can get to those at the end or otherwise we can respond to them afterwards. In the interest of time, I’ll jump to the last component of the talk and that is to go through some representative example case studies, many of which have come out of the work of Genomenon and Genomenon’s founders. This one happens to have stimulated my interest personally in precision medicine. This is a gain-of-function lesion the BRAF v600e lesion in Hairy Cell Leukemia that was understood by sequencing a large cohort I think 50 or 75 patients who had hairy cell leukemia using whole genome and whole exome sequencing. This is from a New England Journal of Medicine Study from 2011 that found that BRAF V600E lesions are highly penetrant upwards of 95% of Hairy Cell Leukemia patients have these single BRAF mutations there should be known to you that that’s a gain-of-function lesion, it’s a known oncogenic driver, it’s highly penetrant across all patients who have this disease and there’s a series of drugs that treat its effects. This is what I would say an exemplar of the best outcome of one of these cohort studies: Gain of function, known oncogenic driver, already existing drug available, and where there’s a gain-of-function and it’s one variant and so that’s a highly recurrent variant.

I want to take a step back now and talk about a more likely scenario of not finding the same variant being recurrent across your cohort, but rather permuting your understanding of what recurrence means to say not the same variant, but similar variants. In this case of a study that Genomenon team members performed that I spearheaded, in the case of splenic marginal zone lymphoma there were highly recurrent NOTCH2 mutations some that were individually recurrent, but in aggregate that were highly recurrent together across the cohort and not only did they occur in the same gene, but they occurred in the same functional domain as you see there. They were loss of function mutations or ostensibly loss of function mutations that removed a negative regulator in the past domain from NOTCH2 and led to a gain of function, so this was a novel finding in the context of ionic marginal zone lymphoma. The data came together in Genomenon’s quaternary analysis by revising, our understanding of what recurrence meant after having performed the next-generation sequencing DNA studies and we benefited greatly obviously in refining the candidacy of these variants by recognizing the importance of NOTCH and NOTCH signaling in the development of metaphoric cells natively as well as its role say NOTCH1 in producing T cell leukemias. BRAF V600E is a single variant in the context of Hairy Cell in the context of NOTCH2 you have highly recurrent types of mutations in the same gene that are focused on a particular domain.

Taking another step back, again a study from Genomenon in our early days, is identifying recurrent lesions across a family of genes and pathway participants around that family, so in this case for T-Prolymphocytic Leukemia various members of the JAK-STAT signaling pathway including very upstream protein products like interleukin-2 receptor all the way down to the end effector STAT5B, in aggregate there were highly recurrent mutations that impacted this pathway, again in a gain-of-function mechanism and again for which there was a drug on the shelf that was known to be able to target the effects of these lesions culminating in STAT5B a gain of function. So again Hairy Cell Leukemia, BRAF V600E variant splenic marginal zone lymphoma NOTCH2 one gene T-Prolymphocytic Leukemia one gene pathway, the JAK-STAT pathway all gain-of-function all with extant therapies that could be brought to bear to treat these patients making for a very effective efficient and highly likely to be successful clinical trial.

I’m taking another step back and moving away from the variant gene or family pathway level and talking about biological phenomena, so again revising your understanding of what recurrence means. This is now a loss-of-function mechanism that we uncovered at Genomenon in the context of another leukemia called Sezary Syndrome where we found recurrent loss of function lesions causing abrogation of normal epigenetic modification in these now leukemic cells. This was a multi -omic study perhaps you can appreciate that from the figure that I’m showing here. I’ll highlight ARID1A which was the most highly recurrent lesion across all epigenetic modifiers. The top colored images there showcase recurrent copy number change, so deletions that culminated are pointed to very focal deletions in some patients on the ARID1A gene and then in other patients, a series of mutations that presumably also lead to loss of function of ARID1A and in downstream studies were confirmed to lead to loss of function of ARID1A. Putting together those two as a data complement loss of function mutations in ARID1A and copy number deletions in ARID1A. Taking a step outside that gene itself and looking at say TET1 and TET2 or the MLL gene members all of which in different patients and at different levels were recurrently mutated or otherwise deleted in those patients. In aggregate, the complexion of the lesions and the bio mechanism of disease and Sezary Syndrome pointed to the biological epigenetic modification pathway being disrupted as a result of these different genetic lesions.

This is my last content slide. Again I’ll refer the viewers back to the duggard article from Nature Reviews Drug Discovery. In it, Sarah talks about the promise of genomics and drug discovery and she says that particularly for complex and heterogeneous disease genomics is able to provide a key window into the pathogenic mechanism of those diseases by identifying even in infrequent cases, but very strong cogent evidence of activating mutations as a window into uncovering the larger landscape of lesions that lead to the same effect as those strongly activating mutations. Likewise in conditions with genetic heterogeneity, as I alluded to, there’s convergence on a pathway or a gene family or otherwise a biological phenomena that can only be uncovered using these genomic techniques. Lastly, she highlights the idea that genomics is allowing for the convergence of treatment strategies across multiple genomic or genetically related disease types that wouldn’t otherwise be uncovered absent genomics and in particular in the case of T-Prolymphocytic Leukemia drug that we tested the response to these cells with was Pimozide which affected STAT5B signaling which had been on the shelf and in use in the context of psychiatric disease for many decades and uncovering the genetic lesion that drives T-Prolymphocytic Leukemia unlocked the door and allowed us to know that this drug would be appropriate in attempting to treat the effects of those lesions and so that’s a real-world example of what Sarah was alluding to in her review article.

I’ll end here with a description of what I alluded to earlier, which is to say Mastermind is what Genomenon is building as a result of the work that we had done early that I described here that we also encountered the need for in our own clinical work. It’s a genetic genomic database the most comprehensive of its kind predicated on empirical evidence from the medical literature comprising many millions of full-text fully indexed genomic articles, many hundreds of thousands of supplemental datasets looking very comprehensively for mention of disease gene variant associations, which we’re making available to clinicians through our use of web-based software as well as to pharmaceutical companies through our data licensing of very thoroughly curated genomic landscapes of all lesions within a given gene or a given gene pathway, so with that I’ll adjourn and I’ll pass it over to Candace to see if she has any closing remarks.

Candace: Hey thanks so much Mark and thank you Alex. I do want you to know that we still have ten minutes and we’re gonna use that for some Q&A; so if you have time to stick with us till the top of the hour please do. I just wanted to let you know just briefly about Mastermind. If you’re interested in checking it out, we have a free version, so if you have not already requested a Mastermind account you can do so at bit.ly/mm-pharma. This will give you a license for the Mastermind basic Edition. It will start you out with 14 days of the Mastermind Professional Edition to try. It’s something I recommend even as a non-scientist it’s it’s pretty amazing. Let’s get to the questions.

One question is can better measures of treatment response make it easier to identify genetic predictors of differential response?

Mark: I can take that. The way I’ve conceptualized this is in the slides I talked about setting up your initial patient cohort based on molecular criteria, say a companion diagnostic that you come to the table knowing. When you’re looking at the response data, you may have heterogeneous response across your patient population even though they have what’s otherwise a homogeneous molecular inclusion criteria. If there’s modifying genetic lesions having sequence the entire exome or the entire genome will allow you now to segregate the responders from the non-responders and repeat the quaternary analysis to say what unites the responders molecularly and makes them distinct with that genomic data in mind from those who do for those who don’t respond. It’s another sort of an analytic filter that you pass your patient population and response data through and I’ve actually seen that happen in some of the work that we’ve done with our clients where there was a cogent at least putative reason at the genetic level for why the patients who responded may have responded. I would say quaternary analysis is a technique that can be used upfront in establishing cohorts, but also down stream in parsing the response data from those cohorts. So that was a great question.

Candace: Thanks Mark. Related to what you just said someone said can you elaborate on the variant curation process and how it fits into quaternary analysis?

Mark: Sure, I’ll speak a little bit to the industry standard approach to tertiary analysis and then I’ll talk about how Genomenon performs the tertiary analysis that sort of bleeds into the quaternary analysis that we perform. ACMG criteria relies on the predictive data from SIFT and polyphen the population data from nomad and in the publication data from PubMed or Mastermind and so there’s a set of guidelines, they’re ACMG/AMP guidelines for hereditary disease as well as for cancer, and those guidelines have broken down each of those lines of evidence and within the publication data they further refine the type of data, type of evidence that can come out of those studies. The most salient distinctions are functionally significant empirical studies that have identified this variant as being impacted in some way, gain-of-function or loss of function, as well as segregation studies which find that the variant is most frequently associated with patients and not in healthy normal controls, so that information almost always has to come out of the published literature and so in the evidence triad we try to put that at the to so it’s the keystone that unlocks these pathogenicity interpretations. That’s tertiary analysis and each individual variant that you see from patients in your cohort or the cohort of patients at large that information of known pathogenic variants or likely pathogenic variants can be used to annotate the data in the cohort. As I suggested, prioritize in the quaternary analysis what you’d like to focus on what data of merit you’d like to prioritize.

An example would be in Hairy Cell Leukemia, if we didn’t know that BRAF V600E was an oncogenic driver, it would have taken longer to uncover that variant as the likely causative agent of Hairy Cell Leukemia. Even though it was highly prevalent, it could have been presumed to be an artifact or polymorphism or as I suggested if it were at a lower frequency or otherwise the lesions were more heterogeneous not knowing that the variants are typically associated with disease just not the disease that you’re looking at or the gene is typically associated with this cancer, but not the particular cancer that you’re looking at having that tertiary analytic data to annotate your cohort information to drive prioritization in the quaternary analysis is what’s sort of really effective use of both tertiary and quaternary.

I’ll say just about Genomenon’s process what we do is when we’ve identified a pathway of interest we review all of the variants known from these databases and from the published literature and produce very efficiently a comprehensive collection of data that culminates in industry-standard clinically significant interpretation and annotation that can then drive downstream quaternary studies. Annotating your patient population data or decorating that data from the sequencing study with the quaternary data that we produce using our machine learning technique coupled with our manual review or otherwise allowing the data to speak to you and tell you when you do your quaternary analysis what’s the most important component here that we should pay attention to moving downstream in R&D; or otherwise in our cohort selection process.

Candace: Thank you so much Mark and thank you Alex! We had a ton of questions just roll in right at the end there so we don’t have time to answer all of them, but I promise that I will have Mark and Alex look at the questions and answer them in text form. I’ll be sending a follow-up email in the next few days with a link to the recording and slides, and we’ll put up the questions and answers on the webpage. I’m just so thankful for all of you and your attendance, and there were a few comments in there “great presentation” and I happen to agree! I hope that you got a lot out of it and I hope that you continue to stay connected to us. Follow us on all social media @Genomenon and stay in touch on Genomenon.com and we’ll look for you at our next webinar.

Thank you!

Try Mastermind Basic Edition Now
and get 14 Days of Mastermind Pro