Rhythm Pharmaceuticals discuss their work with Genomenon to create the landscape of genomic mutations associated with rare genetic disorders of obesity. They describe how the genomic landscape was optimized to facilitate a deep understanding of the variant landscape of melanocortin-4 receptor (MC4R)-pathway genes. The end result of their work may help identify MC4R-pathway deficient individuals who might benefit from future precision therapies.
Genomenon indexed over 6 million full-text genomic articles to identify 120 genes and over 10,000 variants associated with obesity in the medical literature. Each individual genomic variant was interpreted using the evidence assembled through a machine-learning based technical process. This novel Artificial Intelligence (A.I.) approach vetted and annotated each variant using American College of Medical Genetics and Genomics (ACMG) guidelines.
Watch Alastair Garfield, PhD, Vice President, Translational Research & Development (TRAD) at Rhythm Pharmaceuticals and Dr. Mark Kiel, Founder and Chief Science Officer at Genomenon, as they share how a database of genes and variants associated with obesity was developed in less than 60 days, including scientific evidence complete with literature citations and ACMG interpretations for each mutation. The machine-learning driven process replaced years of manual research of the scientific literature to find and interpret these obesity-related mutations.
You will learn:
- The importance of having the entire published landscape of genetic evidence at your fingertips in the precision drug development process
- How Genomenon rapidly assembled a comprehensive biomarker database of the genetic evidence tied to obesity
- How A.I. and Machine Learning can be used to interpret the genomic variants by ACMG guidelines in a fraction of the time of manual curation processes
Q: Can you share the slides?
A: Yes! Use this link to view the slides from the presentation: Rhythm Webinar Slides.
Q: Are you able to produce a list of diseases (or phenotypes) associated with a list of variants just as you are able to produce a list of variants for any given disease indication? Which disease and phenotype ontology do you use?
A: Yes! For diseases, Mastermind uses the Medical Subject Heading (MeSH) ontology. For phenotypes, we have relied on the Human Phenotype Ontology (HPO). The latter is not yet in the user interface, but has been used for custom projects and data analysis.
Q: What about the gene expression studies?
A: Gene expression studies are included in the Mastermind indexing process, and can be organized and annotated in our service work. Genome-Wide Association Studies (GWAS) are also included in the Mastermind index.
Q: Can you please expand on the curation process. Do you prioritize specific resources? How do you solve contradicting data?
A: After establishing the diseases or phenotypes of interest, as well as understanding client-specific areas of focus (e.g. drugs of interest or functional assays of merit), a prioritized list of genetic variants is produced using the Mastermind Genomic Search Engine.
Each of the variants is then manually reviewed by referring to each individual citing reference and examining descriptions of functional or clinical studies pertinent to the variant interpretation schema required of the project. For instance, for constitutional disease, the ACMG framework is useful to determine the pathogenicity of a variant, whereas for oncology, the AMP/ASCO guidelines are more relevant.
Resources are prioritized in the same way that human reviewers would prioritize references – by journal and study quality and by the cogency of the data described. Conflicting data when encountered is annotated and discussed with clients to resolve any ambiguities in the data deliverable.
Q: I can understand the improvement in diagnosis with gene identification. But how does it impact treatment?
A: We show the evidence in the medical literature for genes and variants, which includes studies showing which genes and variants interact with which targeted treatments and reveal related biological pathways, which enable more informed treatment decisions.
Q: I can imagine that epigenetic modifications could contribute to dysfunction in the MC4R pathway and other pathways that contribute to obesity. Is/can Genomenon’s capabilities be applied to epigenetic- as well as genetic information?
A: Yes. Epigenetic modifications are identifiable using Mastermind technology.
Q: Do you capture the specific symptoms (as well as the disease name) that the variant is associated with, and do you use a controlled vocabulary of symptoms?
A: From Dr. Kiel. We are able to capture the phenotypes as described above according to the Human Phenotype Ontology (HPO). We are able to capture any information that may be helpful to variant interpretation according to client-specified needs provided the ontology is available for use.
Q: Do you foresee connecting the Mastermind database to other databases that aggregate patient sequencing data from clinical or research-focused testing?
A: Yes. We have disclosed partnerships with VarSome and GenomeNext – both of whom aggregate population and prediction data relevant to the pathogenicity determination of genetic variants. More partnerships are in the works! In our project work, we are able to integrate dozens of additional data sources per client needs.
Q: What’s your strategy to resolve issues when there are conflicting information in literature?
A: First and foremost, we present the evidence, which is to say we try to surface all information. For custom projects and data analysis, we annotate conflicting information and present it to the customer. Over time, we’d like to enable these kinds of annotations directly in Mastermind as well.
Q: Can you please describe your strategy for assuring semantic veracity within and across inbound content streams over time?
A: We’ve spent years refining our process, making sure it’s repeatable and accurate, erring on the side of sensitivity, and then using our own data and feedback from customers to optimize specificity as well. We have a blog post that expands upon this approach as well: Read the Blog.
Good morning or good afternoon everyone. I’m Julia Karow, Managing Editor at Genomeweb and I’ll be your moderator today. The title of today’s webinar is: Mapping the disease variant landscape to accelerate precision drug development. The sponsor of our webinar is Genomenon. Our speakers today are Dr. Alistair Garfield, Vice President of translational research and development at Rhythm Pharmaceuticals and Dr. Mark Kiel, founder and Chief Science Officer at Genomenon.
You may type in a question at any time during the webinar. You can do this to the control panel which usually appears on the right side of your screen, just click on the Q&A; box in the control panel to ask a question and please make sure that the drop down menu says all panelists and if you do not see the Q box in the control panel just move your cursor over to the left to the main panel until you see the control panel icons on the bottom and then go to the options menu and select Q&A; we will ask the speakers your questions once their presentations have concluded. So let me turn it over to Dr. Kiel, please go ahead with your presentation.
Thank You Julia and welcome all attendees. I’m very pleased to be hosting this webinar with by now a very good friend of mine, Alastair Garfield from Rhythm Pharmaceuticals. I’m going to begin and spend about five minutes just talking about what we’re doing at Genomenon and how it’s applicable to the data and science that Rhythm was collecting and performing and how mastermind which is Genomenon’s variant database was helpful in bringing some of this information to light for Rhythm. I’ll begin by saying that the data is freely available. For those of you who are not aware of Mastermind it’s a cloud-based software that’s freely accessible and you can sign up at the link that’s provided on the screen below and that will be provided in subsequent emails to all attendees. Let’s start with what Genomenon is. To put it briefly Genomenon is organizing the world’s genomic information and we put a strong predicate on evidence the evidence from the source of truth which in our case is very strong focus on the empirical medical literature so the mastermind database that forms the core of the genomic technology comprises understanding of all of the 30 million titles and abstracts in the medical literature including 6 million plus full text articles, their text content, the information the figures and in the tables as well as within the past three months the supplemental data for many many thousands of individual studies and what we’re doing with that information is organizing that genomic knowledge over the whole spectrum of human disease, including 10,000 plus individual diseases or syndromes or phenotypes, all of the genes in the human genome no matter how they may be described by an author and collectively when we index these source materials we are able to collect all of the mentions of any one of millions of individual genetic variants no matter how the author may be describing those and annotate all that information with a rich collection of at this point up to a thousand clinically and functionally relevant keywords. What we’re talking about today is how we make that data available to either the users of our software or the customers that we engage with. For some of our data services in this case we’re talking about an evidence-based genomic landscape, I want to start here at the end so that everybody can have this in their mind as Al talks about what Rhythm is working on and Al and I, toward the end of the webinar, will have a bit of a conversation about an evidence-based genomic landscape produced using the mastermind database through genomics data services includes for any given indication or set of indications diseases phenotypes phenomena.
Mastermind is used to produce a list of annotated and evidence supported variants for any gene or any one of dozens or hundreds of genes associated with that indication and why that matters is that understanding the genomic landscape typically requires a capturing and reconciling information from three different sources. It’s helpful that these three different sources alliterate so that it’s easy to remember we’ve depicted them there in that triad the first is population frequency data which is particularly important if you’re talking about constitutional diseases that you’re born with but also is influenced by hereditary variants that may be associated with cancer risk. The second arm there is predictive models of pathogenicity so things like in silico model sift and polyphen that predict using mathematical models the functional consequence of a individual variant on a protein and the third and what we think is the most important, you could say the keystone for this triad, is information from the empirical published literature and throughout Al’s talk and in our conversation will illustrate why this information from the published literature is so important. In the context specifically of how we provide data that’s useful for Pharma and biotech these are three different ways that we provide that value the first of which is what we’re going to be highlighting here which is that genomic landscape for drug discovery. This information is also useful for clinical trial target identification organizing patients identifying patients who have variants that are known to be consequential or maybe targetable with a given compound of being investigated and then thirdly is looking for new indications or drug repurposing as it’s commonly referred to. Mastermind has in its database a comprehensive disease gene and disease gene variant Association list and if you have a pathway that’s targetable by a given compound and you’re interested in understanding the patient populations that are affected by those variants the services that Genomenon provides using Mastermind is able to make predictions about the best path forward for drug repurposing activities.
Very quickly I think I just have three more slides before I turn the ball back over to Julia to introduce Al. There’s a couple of use cases that I want to focus on. One is that we can present this information to you at a very high level at the gene level. I’ve got a couple of representative examples here one is collecting all of the genes that are known to be associated with DNA damage repair pathways again coming from the medical literature and a very comprehensive approach all of which is justifiable from the medical literature and prioritized according to custom client requirements other examples include their variants associated with a variety of metabolic malignancies that happens to be my area of research. And thirdly just to expand the scope of what Genomenon is able to offer would be all cancer related fusions there’s a gap in the data from publicly available data sources or even commercially available data sources for really understanding at a comprehensive level all of the fusion events their breakpoints and how they behave in disease and through treatment in the context of cancer, Mastermind has been used to meet that end for clients.
The second case and what we’re going to focus on here today is a deeper level of information at the individual variant level and that’s where we start to talk about its utility and diagnostic circumstances. It’s also helpful in the context of understanding what your drug is doing and having a real solid grounding in an awareness of your drug pathway and your disease pathway from a genetic perspective, so a trivial example there is a comprehensive landscape of all of the genetic variants in dozens of different genes that have been linked to acute myeloid leukemia or otherwise much more broadly understanding the functional variant landscape for any one of the hundreds of genes that are known to be associated with causing cancer oncogenes or tumor suppressors of the like. This is how we’re able to do it just it’s a lot of information on the slide what I’ll do is summarize it fairly quickly and then pass the ball over.
We have this data on hand, we can understand what the focus of the study is and tailor a query of our database to that effect, annotate the data filtering and prioritizing by the information that is provided to us in conversation about the nature of the work and the results that the client is looking for. We have what I think are impressive tools that help us auto curate and auto organize the information that’s facilitated in many ways by some of our machine learning and computational intelligence activities and critically all of that information is manually reviewed. We have a mind toward maximizing the sensitivity of our database assembly we like to say we leave no stone unturned but we don’t want to show you a rock pile we want to make sure that you’re getting out of that process all of the gold nuggets that you are looking for when you began the project and so this is a landscaping trajectory for identifying either disease gene associations or disease variant associations that in many cases can be enacted in a matter of weeks and such as the case when I had the good fortune of meeting with Al Garfield at Rhythm and his colleague Ida, that’s exactly what we did for them in the context of understanding rare genetic disorders of obesity so with that, Julia I’ll pass the ball over to you and we can we can jump right into the content of the webinar.
Thank you Mark for this nice introduction, just want to remind everybody if you have a question please type it into the Q&A; box and again if you do not see the Q&A; box in the control panel just move your cursor over to the left to the main panel until you see the control icons on the bottom and then go to the options menu and select Q&A; all right our next speaker is Dr. Alistair Garfield from Rhythm, please go ahead.
Good afternoon, thank you very much. Hi mark how are you doing and thank you for the opportunity to talk a little bit about what Rhythm does and to kind of flesh out some examples of how Genomenon has been helpful for us and just to give a bit of context. We’re a small pharmaceutical biotech company located on the East Coast in Boston, Massachusetts we were founded in 2008 and we’ve kind of been growing from the strength to strength since then we’re about 50 or so employees as of now and our raison d’etre is really to try and deepen an understanding of the genetics that underlie a very specific and a much underserved sub population of the obese community and we’re kind of taking a little bit of a leap out of the world of oncology in choosing to no longer view obesity as a singular disease state much like in cancer now no there is no one cancer they’re typed by tissue and by underlying somatic mutation Rhythm takes the view that obesity is a constellation of different disorders that could be stratified on the basis of the underlying germline genetics and in this context but that begins with us having to take a we don’t know nearly as much about the genetics of obesity as we do about cancer and as a result there’s an awful lot of work to do to try and understand the underlying genetics of these disorders in the hope that essentially we may be able to improve the clinical care of those people in a way that considers more than just their BMI because at the end of the day BMI is essentially just a number and none of us like to be boiled down to numbers and it’s a number that belies a huge amount of very nuanced and very important biological information not only for potentially improving clinical care but also in the way that we view people with different forms of obesity, because obesity is very much a stigmatized and somewhat marginalized disease state and there is a conception that all obesity arises due to the the failings or the lifestyle choices of the individual and while undoubtedly environment as an extrinsic factor does play a role in the manifestation of obesity, what lies below the water line is a very large impact coming from one’s inherited genetics and we know that the body weight is the the second-most heritable traits in humans that an individual’s body weight can be predicted or defined to about fifty to ninety percent of an individual’s body weight can be predicted or defined on the basis of their genetics so it’s having a huge impact in defining one’s body weight now genetics does play a role in all manner of different types of obesity now at Rhythm our specialty is to focus on rare genetic disorders of obesity then the name is on the side of the can it’s rare it involves genetics and it results in obesity and we need to believe this to be a distinct subset from what is termed polygenic obesity or more colloquially as general or common obesity and this is the kind of disorders that you hear talked about in the press in terms of the obesity epidemic genetics plays a role in both situations but to a varying extent so in polygenic obesity you know that the large part of this pie chart what we find is is that common mutations or common variants each producer a small effect on body weight that additive ative in an additive way come together to basically define a predisposition that within the context of the permissive environment and obesogenic environment can lead to the manifestation of obesity and about a hundred or so genes have been implicated in polygenic obesity now that’s distinct from the kind of disorders that are rhythms expertise and where we focus most of our attention which is on these rare genetic disorders of obesity and these can be broken apart in basis of the kind of impact that the genetics has so in this situation rather than having small rather than having common variants applying a small amount of effect on an individual’s body weight we have very high impact rare mutations in about 1 of 70 or so genes so it’s a kind of monogenic condition that end up having a very significant impact on that individuals BMI and that’s really where rhythms focus lies and we recognize that even within the rare genetic disorders of obesity there are subsets there and the more we can learn about the genetics the more we’ll be able to stratify them to try and improve clinical care. Now one of the things that sets rare genetic disorders obesity apart from common law or polygenic obesity, not only the genetics underlying it but also the way in which these conditions present and rare genetic disorder obesity have some kind of hallmark characteristics that can stand out to physicians when they’re seeing this and in the first instance it’s the the onset the timing of the onset of the obesity children as young as two can present it as being very obviously obese and they basically continue to gain weight unabated through the majority of their lives just becoming larger and larger so that the onset and therefore the severity is one component of rare genetic disorders of obesity and then the other is the notion of hyper fascia or an insatiable unrelenting hunger a drive to consume food and and we hear anecdotal stories about children in a parent’s who have to lock refrigerators and hide food away because of the drive to consume food in these individuals is is so overwhelming and obviously you can see how these two things are going to be connected. Now when we think about how body weight is regulated this is where rhythm really brings its expertise we have a deep mechanistic understanding of the of the physiological pathways in the human bodies that ultimately try and dictate or to control body weight and it’s there that we find the errors arising that ultimately lead to pathogenic states like obesity and it’s a very simple energetic equation that defines body weight its energy in and it is energy in in the form of calories and the food and the drink that we consume and is energy out in the terms of the way that we utilize that energy just to go about our everyday so on one hand you know there’s your basal metabolic rate there’s thermogenesis how you are keeping your body temperature at the appropriate 37 degrees C and then there’s also volitional movement I’m pacing up and down now that is going to be using energy that I have consumed now our systems our bodies tolerate acute fluctuate fluctuations in that balance very well so we’re coming up to the holiday period we’re all going to over-consume we’re all going to be spending too much time on the sofa some of us may find our trousers don’t fit particularly well come the first of January but the system will find a way of addressing that there will be a slight elevation in your and your body temperature your metabolic rate may increase you may choose as a new years resolution you’re now going to go to the gym. So as a result you’ll be able to bring that weight back again and that acute fluctuation is well tolerated it’s the chronic on balancing of the equation in either direction that ultimately leads to a path of pathological state so an expenditure exceeds intake then you’re going to have insufficient energy to run the engine, you start to find fuel by burning off proteins in your muscle and that’s essentially a starvation response rhythms focus is the other side of that coin which is when intake exceeds expenditure and in this situation unfortunately due to the first law of thermodynamics that energy can’t just be destroyed so it ends up getting turned into a longer-term storage molecule in the form of fats and that’s the expansion of one’s adipose tissue that ultimately is going to lead to obesity so intake and expenditure is the very delicate balance that our bodies have to strike and it’s the brain that ultimately regulates both sides of that equation and in particular an area of the brain called the hypothalamus which is a very ancient structure almost all vertebrate species have a hypothalamus of one form or another and what the hypothalamus is job here is to do is to integrate to assimilate to coalesce a huge amount of information coming from the periphery that is there to try and tell the body what they should do so within the context of consuming a meal we know that the body is that you have hormones released from your gut from your stomach from your fat your stomach is going to stretch that’s going to send off as activate stretch receptors on your stomach to indicate that it’s full all of that information ultimately finds its way into the service central nervous system whose job it is to basically then engender the appropriate response so when we have a recognition of fullness that’s when we’re then going to put the knife and fork down and step away from the plate. Now one area of the brain within the hypothalamus one system that is critically important and where it rhythms really deep understanding lies is a system called the melanocortin-4 receptor (MC4R), a set the pathway and it’s defined by two distinct populations of cells in on its most simplistic level on one hand we have this sensory population of cells that we call pro opioid melanocortin neurons these blue neurons here poncey and it’s their job to essentially act as the assimilator of all of that or of all of that peripheral information and then once they’ve gained enough information they’re going to release a neuropeptide which activates a second order effect the cell which in this case is the melanocortin-4 receptor expressing neuron that the neuron in green on your screens and the activation of that cell is then going to set off a cascade of neurological events that brings about and awareness of fullness and therefore brings you to the conclusion that you should be putting your knife and fork down and stopping to consume food now nothing in biology really proves just how important something is as to what happens when you take it away and what we know is that from both preclinical and clinical work that the loss of function mutations in any of the genes that comprise the melanocortin-4 receptor pathway the pom C gene itself the melanocortin-4 receptor gene itself the leptin receptor gene loss of function mutations in these genes very rare loss of function mutations and mutations in these genes give rise to a very severe early onset obesity that is characterized by hyperphagia and unrelenting drive to consume food so Rhythms focus is on trying to understand the genes within the pathway and ultimately how the mutations within those genes which are a 10 aside in obese individuals may be ultimately impacting their body weights and then in deepening that understanding we may be able to bring some sort of improvements of the way that those people are perceived and ultimately how they are they’re cared for by the community and by the industry and that starts very much in a Rhythm didn’t pioneer this in this entirety, the academic world has laid a very strong foundation of identifying some of these patients and beginning to understand the genetics that underlies that disorder so Rhythm started by trying to accumulate and amass all of the published literature that related to genetic variants in the context of obesity identified in mc4 receptor expensive or receptor pathway genes and that’s really where our relationship with Genomenon began about a year ago or so, actually happy anniversary Mark, this is a one-year anniversary so I can turn it back over to Julia or Mark, did you have some questions? Thank you Al, that was fantastic yeah I have a couple of questions, just we can we can describe in our conversation how this relationship happy anniversary I wasn’t affected if you forgot my word I’ll be sure to tell my wife it’s not just her so you talked about how rhythm took it upon itself mostly through your own and Ida’s hands – all of this information from the medical literature and you go into a little bit more detail about the pains that you took and what that process of aggregating the data looked like in the early days was sort of pre-Genomenon days. Absolutely, I think pain is probably the word that you’ve scraped to there and when we all know what it’s like as scientists to have to trawl through PubMed to try and identify papers that may be relevant, download them, start to read them when you’re looking for very specific biological information you want to make sure that you’re looking at every aspect of the paper whether it’s the figure Legend, whether it’s a supplemental information you want to make sure you get into all of that but obviously that takes a painstaking and very long time mean to actually achieve so we began by you know doing the the search as one would expect and starting to read the papers to collect the information make list of the variants we are identifying. We were very successful at doing that we ourselves bring a lot of understanding about the pathway that probably helps us more so than maybe somebody who isn’t familiar with it but at the same time that’s a huge amount of information to glean and especially you want you’ll know this when we’re talking about papers that focus on genetics your your particular variant of interest or a variant in your particular gene can be buried amongst a thousand other variants across ten other genes in the same paper so it’s very important for us to create a compendium that is replete and as accurate as possible for all of the variants in pomme in mc4 receptor pathway related genes that have been identified in the literature within the context of of obesity so while we had laid a pretty good foundation we then needed to come to you to kind of say okay what is it that we have left behind and then on top of that it was a point that you made earlier which I just want to echo is that the the semantic nuances of an inconsistencies of reporting on variants also creates a lot of heartache because you can report a variant by its its genomic location by a cDNA position by its protein consequence and unless you are familiar with every gene every variant and every context in which it can be described it’s very easy to leave something behind and that was one of the best things I think that you really helped us with was to be able to pull out something that was still the variant we were interested in it was just a different name yeah so in my experience having worked in the clinic mostly on the somatic cancer side but also on the constitutional side what you just said is the variant scientists begin to cultivate a real expertise not just on the literature, sorry not just on the pathway, and the genes of interest but also on that literature and some of the things that we unearthed in the process of developing this database were archival sort of foundational studies where the genetic nomenclature was more colloquial eyes may be less precise or certainly didn’t conform to the most up-to-date nomenclature and and in my own experience I recognize that as a pretty big struggle the sort of disambiguation and reconciliation of older data with something that you can really put to use in ascribing this variant and this empirical evidence to that information coming out of the patient sequencing data another thing that that you and I have talked about quite a bit again I’m coming from the cancer world and so about two years ago it was it was new to me but the challenge of not just aggregating this data but understanding it’s important so can you talk to what challenges were occasioned by navigating the ACMG guideline or the framework for interpreting the meaningfulness of these variants yeah and absolutely so I just provided the listeners with some context or chronology here a relationship with Genomenon started by recognizing that we needed a far more replete and a far quicker way of being able to review the published literature and make sure that we had captured all of the variants that were that were listed there and that was regardless of whether or not Rhythm believed that they were necessarily going to be disease-causing or whether they’re just reported in a paper so we started by creating that compendium and we did this across a number of genes while Rhythms expertise in this focus is very much on the mc4 receptor pathway related disorders we have a kind of more much more holistic view in trying to understand rare genetic disorders of obesity in its in its entirety so actually when we approached the nominal we were asking for them to achieve this across about 120 known obesity related genes so it wasn’t a small undertaking you can just imagine how long that that would take if I had to sit down and read every paper that was associated with that and at the end of the day what they ended up delivering was about 10,000 variants across those 120 genes in relatively short order certainly shorter orders than it would take me but to Mark’s point amassing or identifying genetic information is one thing we can sequence a person’s we generate terabytes of genomic information within days but our ability to interpret it our ability to understand what it means for the phenotype of that individual the clinical presentation that’s a lot more challenging and I think it’s something that in general the the community is struggling with and there have been efforts made to try and build kind of conceptual frameworks in which all the scientific and all the clinical information relative or relevant to a given variant can be kind of chelated and in a summit some sort of kind of quasi quantitative way be a mask to them bring out a probabilistic determination that variant is causing or related to whatever the pro the clinical presentation is and that’s what Mark is alluding to when he talks about the ACMG guidelines American College of Medical Genetics they have a framework that tries to help people build an understanding between the relationship of the genotype in the scene the type and there are a lot of things that play into ACMG and drawing that line in the assaut in the correlation between the variants and the phenotype some of that stuff can be derived from public databases in terms of frequency or population frequency and seeing that variant some of that information can be derived from computational algorithms. Mark mentioned sift and polyphen earlier that kind of gives you predictions of deleterious nests and we’re able to do that on Rhythm side as well we can utilize build pipelines that pull in that sort of information for being able to interpret our variants but the real challenge lies in finding the information that in the published literature that relates to a specific variant and again this is where one has to spend painstaking amount of time going through a figure to observe a Western bought bottle to also go into a supplemental table of 1,500 variants to find the one that you’re particularly interested in so having kind of worked through our relationship with Genomenon to the point where we had this compendium of variants now as a question of could they bring for us a surface biological information within the literature that was relevant to the role of that variant hidden a particular disease so what they’ve done what they did here for us was to use I presume and I apologize if you get any technical words wrong but natural language processing and the like to be able to pull out segment fragments that are relevant to being able to interpret the context in which that variant is being discussed so the statement that our variant is pathogenic or a statement that the variant had a loss of function in an in vitro assay that’s the sort of really important functional and and biological information that we need within rhythm to trying to build an association between the genotype and phenotype and and you can imagine how hard that must be when you’re talking about tens of thousands of variants across tens of thousands of papers to find all of that information so that was it that I have to say thank you for because you saved me a huge number of man-hours on that one. Yeah that’s right and it was as I say born out of my own experience indeed using a framework and navigating it it’s not it’s beautiful that it’s codified and that we now have a common framework that the community can agree on you hit the nail on the head there. Some of this data is easy to import and interpolate into your data to be used for variant interpretation but the real needful information for many of these calls to upgrade them from varying of uncertain significance to pathogenic it only comes from the literature and I’ll build on what you just said there it was really a reciprocal relationship where we understood what you were looking for as you were invoking this manual process of collecting this data to to make determinations according to the ACMG classification guidelines and I feel like that feedback was very helpful from our perspective as well in understanding what your approach was and how that data needed to be annotated and presented to you and that’s in response to a question here we didn’t have that framework in place before we were engaged with Rhythm and so it’s a happy outcome for us as well because we’ve got this new capability now as of nine months ago of being able to interpret these variants and understanding the deep sort of contextual information necessary to come up with these pathogenicity designations so going if you can rewind in your mind Al when you first learned about mastermind and you can be honest, if what your initial take on the data was, I recall you having expressed surprise that we had the data and I wonder if you can speak to how you’ve overcome your initial skepticism about us having a map solve this variant level information, yeah so obviously it’s critically important for us to make sure that the information that we are then using to build these associations internally between the genetics and the clinical presentation that you know we feel confident or what we’re being given is isn’t is accurate so you know we ourselves did because we had already amassed a lot of this information through our manual process you know we were able to kind of look at what you provided with us and to cross-reference it with our own observations and it was remarkable just you know how well it kind of came out how well aligned it came out I think the mastermind databases is hugely useful for us I mean we routinely use it as a way very quickly diving into an individual variant to even just link out to seeing what the publication was in which that variant was mentioned or whether it’s ever been mentioned or in the public literature before because that’s that’s also an important consideration and a Rhythm and is undertaking a very in-depth and an intensive genotype screening of individuals with suspected rare genetic disorders of obesity we’re highlighting and identifying new variants on any given moment and it’s it’s a real use for us to be able to very quickly ascertain as to whether something is novel or whether something has been seen before and I think be able to put variants to the left and the right of that I and as we’re actually working in the clinical community is usually helpful you had talked about the sort of central components of the mc4r path and then there are sort of indirect players or more peripheral players can you speak to the value of having amassed all of this information it’s many you know thousands of variants if you can speak to how understanding more holistically your pathway and the variant landscape in each of those genes helps inform some of the data and some of your thought processes as you’re developing within Rhythm yeah so we have a sense we’ve established a very deep focus on the monocots and forest have to pass way and the canonical components of that pathway have been defined by academia in a way before Rhythm necessarily came along but what we’re doing is to build on that foundational work to try and identify other genes that may be implicated in mc4 receptor pathway deficiencies and it’s possible probable that a large number of these other highly impactful genes which have been observed to give rise to a rare genetic disorder when mutations are present could actually be feeding into the same pathway and our sequencing initiatives as I mentioned earlier kind of go beyond just the genes that comprise the mc4 receptor pathway are known to comprise mc4 receptor pathway so we’re amassing genetic information or lots of different genes it’s going to throw up a lot of of new variants in those genes or even variants we’ve seen before and as we work with that data epidemiologically scientifically to understand it we’re not expanding the pathway we’re helping to kind of stratify and break apart the obese the obese population especially those individuals are the most severe and that need clinical States and the more we understand about the genes and more we understand about the variants the better we’re going to be at being able to bring those people under our wing and I say our collectively is an industry and to be able to improve the way in which they’re treated so one of the most exciting things that I’ve found either sort of merging my experience in the clinic and in my research is how exciting it is to see that NGS is in the service of both of those fields where and almost immediately we can complete the circuit where the expense and the speed with which we can produce this data and with no small contribution from Genomenon on the speed with which we can understand that data. Now it’s a virtuous circle where the NGS data from the research labs can feed forward into treating patients diagnosing and treating patients in the clinic and some of that clinical outcome data can be useful to feedback to determine new lines of inquiry so with that perspective I wonder if you can speak to the benefit of staying abreast of the newly published literature you talk about maybe a deeper understanding and evolving of these of knowledge of the genetic pathways affecting these different syndromes but also new studies that are routinely published month over month on variants that you otherwise thought you had understood completely yeah I mean we have to stay completely on top of this it’s a constantly shifting landscape and new data has been generated every day and in alignment with the ACMG guidelines every time there is new data it has the ability to affect the way that a variant is going to be interpreted and ultimately how might they that may then lead to an understanding of the djinn of the phenotype of somebody that carries that variant so we have to stay constantly abreast of any additional data that’s coming through. I mean it happened only just yesterday, actually a new paper popped up with novel information about a variant that we had observed. It had never been published on before staying on top of that paper knowing that it’s appeared being able to on a routine basis search for that sort of information that changes the way we now view that variant so it’s critical to stay on top of the continued of the the evolving published literature and obviously the more awareness that Rhythm brings to reggienet. The disorders of obesity which is part of why we’re working with you, part of why we’re doing this webinar as the more awareness we bring we’re hope we’re going to engender more interest in the community for people who want to go out and find all sequence individuals to find these variants and the information that we can you know that we can publish to get out there is going to help everybody understand what they are looking at. Yeah I feel like it’s a full genomic sequencing as applied to precision or personalized medicine, I feel like that’s the perfect marriage where we can start to individually tailor patient treatment and therapy and understanding if you can, enlarge the conversation beyond your disease focus and speak to how next-generation sequencing might be applicable to multiple different diseases in the practice of medicine perhaps using your specific example as a window into that larger question. Yeah I know where the door whenever we’re dawn of the genomic eras mentioned earlier we were doing very good job of being able to generate data. Interpreting is proving a little bit trickier but we’re playing and we’re playing catch-up and it’s not that as a community or as an industry we won’t get there I think the more that as to your point the more the people are sequence the more that becomes accessible the more that the world accepts that sequencing izzie is a good thing we’ve seen the advent of things like 23andme and those sorts of of enterprises they’re bringing the genomic area very much into the for thought or to the front of the general populations thinking but there’s still an inherent fear of what it might throw up but at the end of the day genetics is going to play a huge role in a whole host of diseases or conditions or presentations that we haven’t even begin to think about and it’s the amassing of the genetic information is building the associations between specific genotypes and phenotypes or genes and phenotypes regardless of whether you understand that the variants that’s really going to be what starts to shed light on the value of genetics for both understanding and ultimately being able to treat individuals with genetically based disorders. Yeah I couldn’t agree more, I saw the signs I was actually working in a lab doing Mouse work and I saw over much older some of the work that was being output in genomic sequencing labs and just the pace the the quality of the data that was being purchased and how immediately impactful that information was. I saw the signs I got hooked and I’m really cheered that I got into genomics when I did. Yes you have anything to add Al before for we turn it over to questions from the from our attendees?
I’ll turn it over to Julia to take questions. Great, Julia if you want to start to field some questions from here.
Yeah, thank you both Alistair and Mark. I just want to remind everybody that if you have a question please type it into the Q&A; box and also just before we start with the Q&A, I would like our attendees to take a brief moment after the webinar has ended to take our exit survey and provide us with your feedback.
But for now let’s go into the questions. There was one for Mark, who led into whether you also consider gene expression studies and not only variant information?
Sure, that’s a great question. I’ll try to answer briefly. As I suggested we understand not just at the variant level this information, but also at the gene level and similar to the work that needed to be undertaken to help Rhythm understand the ACMG classification schema we looked at the variant level for some of the specific information that would guide us to which aspect of ACMG was relevant functional studies case studies segregation patterns in a similar fashion and at the gene level we can understand which references are talking about this gene being upregulated this gene being downregulated or a whole host of genes within a single study. So it’s all a matter of prioritizing the the information, focusing the questions that you’re trying to ask, amassing that data with the relevant annotations and then sorting, prioritizing and reviewing that data.
Okay thank you. Another one for Mark, and that is what phenotype and ontology to use and also a little bit more information about the annotation and what it can go the other way from disease to in variant phenotype and not just from phenotype to disease gene variant can you explain a little more about that?
Sure, so the short answer is we’re able to use any ontology that’s available to us, we had already baked in the cake the medical subject heading term disease ontology which didn’t need to stray far from the work that we were doing with Rhythm because the diseases that they were interested in, the syndromes comprising monogenic forms of obesity were already mesh entities and we were able to aggregate the data right out of the box for other groups. We’ve used the human phenotype ontology or hpo collection of clinical phenotypes and then to build on that and to call back to the work that we did with Rhythm we didn’t have the population frequency data nor the in silico predictive models of pathogenicity until they were required of us through the h-2a CMG curation work that we were doing with Rhythm and so we were able to pull some of those resources together for that specific project and now we’ve got a very robust pipeline to produce that data and then obviously manually review it and just to follow up on that last part of your question this database can be played in Reverse as well. So what we’re talking about here is going from a collection of diseases or a collection of genes or both and determining what the variant information is we can also play it backward and go up from the very level of the gene level and look at the whole panoply of disease circumstances that that single variation or single gene or collection of variants and genes are associated in the medical literature so it really depends on the type of questions that the client is looking to answer.
Okay thank you here is a question for Alistair. Someone would like to know, they understand what improvement you can get in diagnosis with identifying a gene related that’s involved in a disease but how might that impact treatment can you comment on that?
Yeah, that’s that’s a good question and I think there’s a couple of ways in which our understanding can can improve general clinical care in the first instance we know there are people for whom you know the standard lifestyle interventions that are recommended for obesity just they just don’t work and that is particularly true of individuals at rare genetic disorders of obesity I mean I heard a quote from one of our care wells, one of our key opinion leaders who said that asking an individual with hypothet just to resist food is like telling somebody with depression to cheer up so I think that that understanding and shedding light on these disorders bringing them to the fore and trying to demonstrate how these are not all these things are not the same and therefore shouldn’t be considered or treated the same again to come back to the way that we we deal with cancer these days and I think that’s one way in which we can really try to impact the lives of these people and the second level obviously is is the pharmaceutical side of things and the more we understand about the specific aberrations that underlie a person’s clinical presentation the more likely it is to be able to generate targeted therapies or somehow compensate or supplement or overcome that particular deficiency so it’s but both those sides of the coins are the kind of pastoral side and also the pharmaceutical side.
All right thank you another one for Mark, can you expand a little on your curation process and there have been a couple of question on how you resolve contradicting information in the literature?
As I suggested, we didn’t have a curation process prior to this engagement and now we’ve been able to utilize it in many different circumstances we start with assembling all of the data all of the variant level data we organize the data into article variant Association so a variant as Al has suggested may be mentioned in dozens or hundreds of papers we treat each of those paper variant data as an individual curation unit and we look at each of those we aggregate the type of studies in which that variant appears the contexts in which that appears we look at every one of those individual contexts across every one of those papers and we make a collective determination about what the authors are attempting to demonstrate through their studies either Chronicle or functional there’s two aspects to talk about ambiguity in the literature there’s often frank contradictions frequently in the context of Association studies where one study will show an association and another won’t. We aggregate and annotate all that information we look to whether the study is highly powered and then in the end if there is a discrepancy that is made manifest in the end result and that typically connotes a variant of uncertain significance where the data cannot agree another aspect of ambiguity or uncertainty that needs to be reconciled is in what I think properly referred to as legacy nomenclature for variants where the authors of the study typically in 80s or 90s have used the different numbering schema for their variants that can make it very challenging for our automated or automated techniques to recognize these variants and you’ve uncovered in part through the work that we’ve done with Rhythm some of those issues and have been have been able to automate a technical solution to reconciling some of those legacy nomenclatures so that’s not yet in the user interface but that is definitely on something that we’ve been actively working on now for several weeks and are optimistic that we’ll be able to launch soon into the database.
Okay thanks and I think there’s a real a slightly regin and that asked about the your content lifecycle management strategy I guess that’s a technical term for keeping up to date with current discoveries you want to comment on that in addition?
Sure, I was not familiar with that term I might use it the next webinar that I host. It’s top-line, here we update this data on a weekly basis and we include many tens of thousands of new references each week and as Al has pointed out, it’s amazing to see how there’s an exponentiation in this literature. It’s beautiful to see as a scientist, it’s cheering to see it’s being put to use as a clinician and Masterminds goal is to stay on top of that to be sure and so on a weekly basis we add all that new information into our data.
Okay, thank you. A question for Alistair; was Rhythm able to incorporate its curated findings into Mastermind?
So taking out what we had collated from the literature and feed it back to Mark to integrate. No we didn’t do that, but we have integrated our data with the data that Genomenon has provided us and as I said in that that is part of the validation step that we do in trying to make sure that we are working from the most complete and the most accurate data set so our own work has formed our own foundational workers form part of our pipeline but I don’t believe we have fed our information back to Genomenon for them to integrate. I don’t even know if that’s feasible it’s a great idea as a company we’ve had that hope for for quite a while the data that we’ve collected and annotated for Rhythm is Rhythms annotations but what we would like to do is seeing as we believe that Mastermind is the most effective variant curation tool and we have it available for use for free too. Many of the labs that are doing next-generation sequencing, many of whom are doing it in very high throughput, we’d like to coalesce a community of variant scientists around the use of Mastermind so that the data in real time that they’re collecting the interpretations the evidence that they’re collecting all of the the judgment calls that they’re making we’d like to capture that information from the expert variant scientist users and with all the provenance the date the time the reference citations that they were using for that conclusion all facilitated by the Mastermind platform that’s the hope of mine that has been for some time indeed even since genomics original founding to bring that together it’s a lofty goal but we are indeed making concrete strides in that direction that I’ll be excited to talk about in some month so Mark does that to bring together databases such as Clinvar which essentially you know provides that the sort of the sort of assimilation of genetic data that you’re talking about is data. A hope to kind of partner or not business-wise but just to kind of bring those two different sets of data together exactly. So does that triad that I had shown Al that we’ve helped bring together in the internal tools that we use to curate the Rhythm data we would like to externalize that data into Mastermind and make it possible for all of that information to be assessed in one place the Mastermind software but further to have the human level curation, the approval of the acceptance of the recognition that this reference this statement from the author of that reference really makes case for that apex or the the Keystone the published literature the evidence from the literature that is necessary to make the conclusion that this variant is actually pathogenic. We’d like to have our variant scientists be able to capture that knowledge in the Mastermind software itself and then make that available to other users of the Mastermind software in a similar way to what’s available now in Clinvar where Clinvar is sort of a repository of that information. I see in my mind’s eye in some months time Mastermind being a living version of that a very active live participating component of aggregating that data.
Alright yeah maybe a little related to that it’s a question of whether you foresee connecting the mastermind database to other databases that a great patient sequencing data from from clinical research?
Yeah so we’re in very active communication with a number of potential partners both tertiary analytics software providers some of whom we’ve announced some of whom were working behind the scenes to launch very soon but also to aggregate some data we’ve got a couple of partnerships that have already been publicly disclosed and a couple of others that we’re working toward and so I think what we’re talking about here is empirically collected data from population sequencing studies adding that data into Mastermind particularly where some of that population data has phenotypes or clinical outcomes associated with it that would be a great facet to add to Mastermind to create this what, as I say envision in the long term a universal hub of all of this genomic information because I feel like and how you can back me up on this I feel like this problem won’t go away, it’s only going to get bigger. More sequencing data, more knowledge that we have to extract and aggregate from the literature. I mean ACMG is a good framework but I think it’s only going to get more Byzantine. There was recently some updates to the ACMG guidelines and a lot of them were disease specific which is great because you get the best sort of outcome for that particular disease but it exponentiates the complexity of the already fairly complicated ACMG schema I said I’m all about it but I think that we’re at a breaking point here where it’s just not possible to stay on top of this with strictly manual labor it’s got to come through automated aggregation techniques and as I suggest much or most of that data that information that knowledge has got to come from the empirical literature and that’s why Mastermind and genomic that at that Nexus.
All right, here’s a question that you may have answered earlier, when we had the question between expression that I’m not sure yet, I’m not sure and that’s about epigenetic modifications do you also consider those that you don’t have data for?
Great question. We don’t have that data in the database. It’s a more challenging question to address because the nomenclature for some of these epigenetic modifications isn’t as well codified standardized as for the variants although the variants have their own attendant challenges things like eqtls for determining long range or non-coding effects as well as you know changes in methyl marks that affects gene expression those are references and contents or contexts that are identifiable but there would be a more manual curation approach to the data that would have been aggregated automatically it’s certainly feasible, but we don’t have that in the software at present nor in our armamentarium for some of our projects but whoever asked that if there’s interest I’m very open to a conversation because I know that epigenetics is playing an increasingly recognized role in certain disease contexts.
Okay another one for Mark, do you capture specific symptoms as well as disease names that a variant is associated with and if you do, do you use a controlled vocabulary of symptoms?
For that we don’t currently. I didn’t build this I’ll sing the praises of our Chief Technical Officer who is really the mastermind behind the infrastructure of Mastermind. We have such a modular infrastructure that it’s very straightforward for us to insert into that as an input any controlled vocabulary or ontology such as this being described in the question that’s a very straightforward implementation detail as he likes to say we don’t have one of those at present but it’s certainly something that we cleared them easily add to to augment data or to produce a custom data set for inclined station.
All right thank you this is actually all the time we have for questions today so let me thank our speakers Alastair Garfield and Mark Kiel and our sponsor Genomenon. If we didn’t have time to get to your question today, we will have somebody follow up with you after the webinar again as a reminder please look out for the pop-up survey after you log out to provide us with your feedback and finally if you missed any part of this webinar or you would like to listen to parts again we will send you a link to an archived version by email with that thanks very much again for joining us for this genome webinar.