Webinar: Characterizing Targeted Cancer Therapies via a Comprehensive Gene Fusion Database

Genomenon has developed the most comprehensive database of gene fusion pairs found within the scientific literature. This complete literature catalogue for the 507 genes from the Illumina TruSight RNA Fusion Panel contains all known fusion pairs, with reference citations for every scholarly paper citing the chimeric genes. In this webinar, Genomenon’s CSO and co-founder Dr. Mark Kiel discusses how a Gene Fusion Database can be used to facilitate drug discovery and development.

Webinar Summary

Gene fusion events are a leading cause of cancer, play a significant role in diagnosis, and can drive treatment decisions in an increasing number of cases. Possessing a comprehensive view of the research related to fusion genes can significantly accelerate the drug discovery and development process. This information allows researchers to find fusion pairs for any disease of interest, identify targeted treatment strategies, and refine fusion gene breakpoint analysis.

Unfortunately, the diversity of gene pairs involved in fusion events can be overwhelming, and until now, there has been no comprehensive database of research citing gene fusions.

In response to this need, the team at Genomenon has developed the most comprehensive database of gene fusion pairs found within the scientific literature. This complete literature catalogue for the 507 genes from the Illumina TruSight RNA Fusion Panel contains all known fusion pairs, with reference citations for every scholarly paper citing the chimeric genes.

In this webinar, Genomenon’s CSO and co-founder Dr. Mark Kiel discusses how a Gene Fusion Database can be used to facilitate drug discovery and development, including:

  1. Understanding the pathophysiology of any chosen pathway or disease
  2. Identifying fusion partners for the development of targeted drugs
  3. Selecting patient cohorts for clinical trials

View the Slides


Question: Does Mastermind have all of the fusions observed in the TCGA dataset (or only fusions for driver genes)?

Answer: The Mastermind Fusion Database comprises fusion events that are described in the published literature. Because the TCGA dataset contains many read-through fusion events, artifactual read mis-mappings as well as a number of passenger mutations, there are certainly some fusion events in TCGA that are not contained in the Mastermind \ Fusion Database. However, all of the fusion events that comprise the Mastermind Fusion Database are characterized by the authors of each respective reference and are therefore not subject to the same caveats as the fusions found in TCGA. Furthermore, one use case of the Mastermind

Fusion Database is to annotate the TCGA dataset to determine which of the TCGA fusion events have actually been characterized in the literature previously to focus attention on the most meaningful fusions.

Question: Are the fusions in your database annotated with the bioinformatics software used to call the fusion (in the case of RNAseq experiments and other genomic data analyses)?

Answer: This annotation is not part of the first version of the Mastermind Fusion Database but your question will be taken into consideration for future versions.

Question: How are therapeutic agents cross referenced in the database? Repurposed pharmaceuticals?

Answer: This data is not in the first release, but will be available in future releases.  Please contact us if you’re interested in therapeutic cross references.

Question: How often is the database updated?

Answer: The Fusion Database is updated on a monthly basis.

Question: Can you query complex associations, such as variant-drug-outcome? It is important we can capture beneficial vs adverse effects of a drug on a variant.

Answer: Future versions of the Fusion Database will be annotated by therapeutic compounds to identify which fusions are associated with which therapeutic.

Question: To what level are the breakpoints characterized for the fusion partners?

Answer: Fusion breakpoints are characterized at the exon/intron level.

Question: Do you normalize variant representations across transcripts? exon numbering depends on transcript; so how do you ensure that you capture the proper transcript when parsing the papers? Is that something you have control on?

Answer: If the transcript is mentioned in the article text, we use the transcript mentioned in the article; otherwise we use the transcript in which the variant is valid as described.

Question: In slide 9 there was reference to scientific publication dated 2013 – so assuming the actual observations date back 7 – 10 years. How likely it is that new discovery since changed the conclusions?

Answer: There are more than 400 novel fusion events described per year each of the last 3 years. Furthermore, there are new publications each year that further characterize previously described fusion events such as outcomes and response to therapy – either beneficial or adverse. The Fusion Database captures all of these references as they are published.


Candace: Good afternoon and welcome to the webinar I’m Candace Chapman, Director of Marketing for Genomenon and I’ll be your host today. Dr. Mark Kiel today we’ll discuss Characterizing Targeted Cancer Therapies via a Comprehensive Gene Fusion Database possessing a comprehensive view of the research related to fusion genes can significantly accelerate the Drug Discovery and development process this information allows researchers to find fusion pairs for any disease of interest identify targeted treatment strategies and refine fusion gene breakpoint analysis. Unfortunately the diversity of gene pairs involved in fusion events can be overwhelming and until now there’s been no comprehensive database of research or citing gene fusions.

In response to this need we here at Genomenon have developed to the most comprehensive database of gene fusion pairs found within the scientific literature. This is a very new development and we’re very excited to share it with you today. As you are watching please feel free to submit questions to Dr. Kiel through the chat window on the right. Mark will answer questions after the presentation. So without further ado I will introduce our speaker Dr. Kiel. He is a Founder and Chief Science Officer here at Genomenon where he oversees the company’s scientific direction and product development. Prior to starting Genomenon, he completed his residency in clinical pathology at the University of Michigan. While at Michigan he completed a fellowship in Molecular Diagnostics and devised the informatics framework for clinical next-generation sequencing in the molecular diagnostics laboratory. During his doctoral studies he made groundbreaking contributions to the study of hematopoietic stem cells for which he was awarded the Weintraub International Graduate Student Award and the ProQuest distinguished dissertation award while a postdoctoral researcher he made significant contributions to the field of hematopathology for which he was awarded the Benjamin Castleman Award and now i’m going to turn it over to Mark, take it away.

Mark: Thank You Candace let me know if I’m not coming in clear and thank you especially to all the attendees. As Candace said I will echo how excited we are to be discussing with you for the first time, at least publicly a new development that Genomenon has undertaken using the mastermind database. Which is to say to expand our focus away from mutations and genetic data and disease associations to include structural variants namely gene fusion events.

So this is an outline for my talk, as an attendee of webinars I like it when the speaker gets right to the point and so I’ll spend as little time as I think is necessary to go over a background just to lay the groundwork for why this is important. Then I’ll talk directly and immediately about what it is we’re discussing so that you have a sense for that in your mind and I’ll actually show you some example data. I’ll walk through a couple of use cases in an outline form and then I’ll summarize the data and turn it back over to Candace to field any questions that we have. If we don’t have enough time for questions we’ll be sure to answer them after the webinar.

Before I start I’ll emphasize to anybody who’s new to Genomenon that Mastermind is a software that’s available to you through the link below you can sign up here for 14 days of the professional Edition which is for clinical users large reference laboratories or academic clinical labs alike but they’re after the database is available to you in its free edition.

So this is one I think for background slides that I have this is from a recent Nature Reviews publication and it’s a little bit busy but it’s effectively a timeline of the discovery of  meaningfulness for structural variants in the form of gene fusions in cancer. So in 1960 in the pink they’re the first structural variant that was discovered associated with CML and then it took a decade or more for it to be more precisely characterized and thereafter you can see augmented by advanced technologies including next-generation sequencing the this great proliferation in our understanding of and characterization of fusion events that are oncogenic drivers in cancer and so the way this is laid out is colored by those different techniques and the review here was published not a year and a half ago and I think they left off the last five years because it was just too crowded and there was just too much information across all of these different strata of different disease types.

Both in hematopoietic malignancies novel fusion events were discovered as well as increasingly in the case of salò tumor cancers. So apologize for the wordy slide this is a review from a couple years ago now from a former colleague of mine but I thought it captured very effectively many of the points that I want to convey in this introductory section of the webinar so first is to say what I think is the headline is that gene fusions are an integral component of understanding of somatic aberrations in cancer so we as clinicians or in pharma or R&D; would be remiss if we didn’t turn our attention to the recognition that fusions can drive the development of multiple different types of cancer and then skipping down to the fourth element there NGS and high-throughput sequencing enables systematic discovery of these gene fusions with great fidelity with high sensitivity and with great precision.

One of the challenges that occasions is that often multiple gene fusions are identified even within a single individual sample which can present challenges in distinguishing between oncogenic driver lesions and other unimportant passenger aberrations things like incidental fusion events that result in chromosomal unstable cancers. Otherwise bioinformatics Mis mapping challenges between similar regions in the genome that tend to miss map and lead to artifactual fusion events or otherwise things like read through fusion events that are adjacent genes that seem to look like fusions that have chromosomal material in the bioinformatics analysis that doesn’t belong together but in fact as a reflection of a bio traumatic artifact. So those that makes it challenging to fully recognize which of the fusion events that are identified in any one of these individual patients or an entire cohort of samples are the actual meaningful driver lesions so too are chromosomal rearrangements frequently observed in cancer and in benign tissues again complicating the analysis just as I described for right above but the benefit of this is that increasingly these fusions can be used to sub classify these cancers to provide diagnostic prognostic and increasingly therapeutic information so that clinicians can best afford their patients a high likelihood of cure.

With some of these precision therapeutics that target these different types of lesions so this is from a different group from a more recent review article from Gao at all in 2018 and I apologize for how busy this is but I’ll describe what is being conveyed here. In the panel off to the left each of those rows represents an individual cancer type don’t ask me what the acronyms are, but you may recognize some of them then the colored bars is a reflection of how those patients within that disease cohort stratified in terms of the frequency of those patients who have fusion events as a driver mutation. Then over on the right is a box-and-whisker plot breaking those different categorizations of patients with different types of molecular lesions and the errors that I’m drawing your attention to are those that include fusions and so you can see how fusions contribute very significantly across a whole patient population to the meaningfulness of their molecular genotype.

I apologize for how busy this slide is I won’t go through all of the details but this is meant to indicate how very heterogeneous these different disease populations can be both in terms of the number of fusion events that occur in these different diseases as showcased by the size of each of these bubble plots where each of the columns now reflects those different disease categories that’s in Column A and then over in Column B is a reflection of the great distribution of different types of genes that are involved and at what level based on the size of those individual icons in that sort of heat map. So again, the details aren’t important the takeaway message is that fusion events have myriad driver genes and come in multiple different flavors in the form of the different partners that they take on to cause disease and then I think this is my last introductory slide it’s just a drive a point home that increasingly we’re seeing multiple different disease types with some significant fraction of patients with those diseases having meaningful drive or fusion events and then over on the right the good news is that increasingly there are targeted therapies that can focus their attention on alleviating the molecular lesion that’s the result of those fusion events so that’s the same paper that I described earlier from Nature Reviews we’re on the y-axis this is on the right now on the y-axis we have the driver genes and on the x-axis are the different precision therapeutics that have shown efficacy in targeting these molecular lesions that result from these Chimeric Genes Fusion Events and so this grid work is going to be increasingly broad and deep with a more wide array of targeted therapies that clinicians can bring they’re armamentarium in treating their patients who happen to have one of these fusion events so that’s the background.

Let’s now talk about why Genomenon and the field is moving in this direction of better characterizing gene fusion events. So this is not just true of structural variants but it is also true of structural variants and I want to make sure that that doesn’t get lost in the conversation here but when when performing molecular characterizations of patients these are the questions that are for which answers are being sought by the clinician or the team that’s putting together the data.

So some of the questions I’ll list here, which patient variants are activating and what is the supporting evidence another question that can be asked more broadly is what is the landscape of functional variants for a given gene or a gene pathway and that can be useful in clinical circumstances in building out a pre-configured genetic lesion database to streamline clinical workflows or otherwise it become in pharma R&D; to better understand at a very broad level the molecular pathogenesis of a disease at the hands of lesions in a gene or a gene family and to what what are the evidence predicted genes that should be included on a diagnostic panel to ensure that you’re not missing some of these patients whose driver is from one of the genes or the gene variants that I had described in points one and two.

Above which variants then should I include on that disease specific diagnostic assay if it happens to be a more focused assay that is at the very level and not at the gene level is in point three and then increasingly which variants are targetable using those precision therapies and so all of these questions can be addressed from empirical evidence. The lion’s share of which is pattern in the peer reviewed empirically determined medical literature. So this gets again to where we are presently or where we were as of last week, these are a couple of the main resources that both clinicians and researchers could avail to understand the fusion gene landscape driving their different cancers so these may be familiar to you if you’ve used any of these three or similar databases then I think your interest will be peaked with what I’m about to show you in the way of the gene fusion database that Genomenon has recently produced but Cosmic, the catalog of somatic mutations in cancer, has a listing of all of the fusion events that they’ve characterized that amounts to about 300.

At present Koka B is also a commonly used tool mostly in research and it has a characterization of variants and fusion events in a number of driver genes in aqua genesis I’ll talk to how deep how thoroughly and how comprehensive these databases have characterized the the fusion landscape in a moment and the last would be the Jax laboratory clinical knowledge base which was launched I think a year or so ago. So these three databases collect this evidence by and large from the medical literature and present that evidence through their web interface in the form of a database and we had heard from our users and we understood. I personally from my own work that these databases are a shallow reflection of the full complement of fusion events that have been published in the literature and I’ll speak to that more specifically here in a moment.

The answer to that challenge the need for better characterized holistic landscapes of gene fusion events in cancer and the dearth of information the sort of the paucity and and lack of comprehensiveness of the databases that have characterized gene fusion events to date is what mastermind is is coming to the to the rescue for in the form of this gene fusion database so as Candace mentioned and as I indicated to this is a new development that arose out of a need that our clinical users and some of Pharma partners have mentioned to us and so with our great data backbone in mastermind predicated on the medical literature that I’ll talk to here toward the end of the discussion and in terms of how we produce this data we asked the question of the Illumina trusight RNA Fusion panel genes which amounts to about 500 genes total what are the gene fusion partners for those genes and what clinical circumstances including diagnostic or disease import as well as therapeutic significance do those gene fusion events find themselves in in the medical literature so that was the sole input that we asked questions of mastermind for relevance to the gene fusion database on the right at a very high level is a summary of the results so for each of those 500 or so genes we found between 1 and 95 gene partner fusions so these are in frame characterized fusion events that have been found in patients so this is not a database dump or a sequence at assay of a large cohort of patients where we took all comers these are thoroughly characterized fusion events that have some merit if you were to see them either in your clinical workflow or if you were to be afforded access to this data to inform some of your research and development activities as a researcher or a pharmaceutical company in total across all 507 genes we we found approximately 2,500 gene fusion events so that is not to say every time we found an email for out fusion we counted it as a new fusion event no this is unique fusion events found across all of those 507 genes so that is a great richness of evidence and information for any one of those genes on the list and as I indicated earlier the list was chosen to be somewhat of a companion diagnostic for clinical users who or researchers who are using the true site fusion panel but we can repeat the process when with any and every gene upon request but for those 500 genes we found on average 5 fusion partners amounting to 2500 total unique characterized gene fusion events.

Just to give you a sense for the scale with which this project was undertaken mastermind as you may know has a great corpus of evidence from the empirical literature and the identification of these genes fusion events came from a subset of the six million plus full-text and supplemental data sets that mastermind has from the literature amounting to about 30,000 articles so again those other databases that are currently in use by clinical labs and research groups those are predicated by and large from the medical literature you can imagine the challenge that would be occasioned let alone in having to sift through the six million plus references that may contain fusions but even once you are aware that these 30,000 references are those that cite these fusion events you’d still have the massive undertaking of pulling that data out and having it be annotated and characterized and that’s effectively what the the mastermind gene fusion database has done for those 2,500 gene fusion events so this is a pretty key takeaway from the webinar and why I put it here in the in the middle of the discussion this is a more detailed reflection of some of the top genes that were most recurrently associated with fusion event many of these won’t surprise you perhaps though the scale of the promiscuity of these gene fusion partners will in this case on the far right we found to have the highest number of gene partners out of the list of 507 at 95.

Then it goes on from there there’s a sort of shallow slow where many of these genes I think more than a hundred of the genes on the 507 list had more than 10 fusion partners and I want to put that into perspective characterizing that in the context of what some of the attendees may already be familiar with and that is to say the cosmic database has about 20 gene partners for elk that had been characterized again coming from the literature so in that situation there’s a 5x increase in the extent to which Masterminds gene fusion database has characterized the literature for these functional fusion events I don’t have the the data reflected here but similar numbers for the other databases there are seven fusion events for Arco KB for which for Alka KB was the largest set of data for fusions and then for C KB there were only four fusion events that were characterized for elk so there’s between 4x and 20x more fusion events in the Mastermind gene fusion database then had previously been available so as we say that’s why we’re so excited to be discussing this new release of data a little bit of technical detail to talk to how we do this I suggested earlier that we began somewhat arbitrarily with the 507 genes that comprise the Illumina trusight gene fusion panel we first determined all of the gene fusions in the medical literature so that’s a one-liner that kind of trivializes the challenge that we faced but we had a leg up because Mastermind has been extracting gene information for the better part of four or five years from the literature and so we use that as a skeleton upon which to frame the identification of these gene fusion events we assess the context of the gene gene fusion Co mentions both at the article level the quality of the article the specificity with which the fusion event was described in that reference the diseases that were described in the context of patients who those fusion events as well as all the way down to and focused on the sentence level where these gene fusion events were mentioned all collectively the fusion events were all prioritized by the strength of that evidence using some proprietary computational techniques they were annotated for the diseases that I described thoroughly annotated for therapies any one of those therapies and a much broader array of therapies that I described in the intro slides that all comprises the annotations for the mastermind gene fusion database as well as the breakpoints at the exon or intron level.

So where authors have described what exons from gene A and G B are juxtaposed mastermind in its process both technical and manual has extracted all of that information and reviewed it for accuracy and so I’ll end with that. We review this data to identify and more thoroughly characterize those fusion events that are functionally or clinically significant that I’m happy to announce that we’re soon to be releasing some of this fusion data to our professional users if you’re not familiar with the Mastermind interface this is one of a screenshot from an internet visit for the mastermind database and under your account in the my report section there will become available in in some months time the gene fusion data to our professional users so if you encounter a gene fusion event in any of your patients and you’d like to assess whether it’s been published in the literature that would be made available to you through this my reports feature.

I apologize that this is so small there’s clearly a lot of data that we’re providing this is a reflection of the information that comprises the gene fusion database in its entirety and so the slides will be made publicly available after the webinar I think you’ll receive an email from Candace so you’ll be able to look at this in a little bit more detail but for the purposes of the webinar let me describe these columns of information in the Mastermind reporter that showcases this menu reviewed gene fusion datasets so this is a search for one gene it’s the N track one gene on the left are all of the gene fusion partners listed in in descending order of how extensively they’ve been characterized in the literature that number the number of articles that cite each of those fusion events is listed in the second column and all of the references each of the PubMed IDs that correspond to the mention of these different fusion events is listed in the public pmid list and then the prioritized references that are most likely to be useful for clinical or research purposes is presented right in the reporter in that bibliographic section and then in the ancillary columns there, the fourth and fifth columns describe the summary of the breakpoint evidence that was extracted from the data as well as The Associated diseases and then not shown as a reflection of those targeted therapies that are described in the context of each of those fusion events so this is an example for n track 1 which is one of the 507 genes this is another example search on NPM one which has somewhat fewer results but nevertheless you can see there perhaps 1,300 reference mentions to the MPM one elq fusion event again a bibliographic reflection of those that are most highly prioritized and an estimation of the frequency with which different diseases are Co mentioned or Co associated with those fusion events as they’re driving oncogenesis.

I draw your attention here not to the very well-known NPM one elk which was found in these other databases but rather to some of the also-ran fusion events that are nevertheless clinically very significant in in informative for driving different um therapeutic or diagnostic decisions and so if you can’t see that this is NPM one tick two which has been published in the medical literature nine times which you can click on that link through the Mastermind reporter and see those references again prioritized based on the the cogent C of the evidence described in those references and this is the first paper that characterized that M p.m. one tick to Association from 2014 it’s a paper that I’m very familiar with and the reason I chose this example is not because of my ego but rather because this fusion event which as I suggested is very well characterized in the literature and known to be associated with cutaneous lymphoproliferative disorders is not present in any of those three databases that I described cosmic on a B or C KB and that’s just as I said. I was looking for an example and I thought I would find one close to home and it just so happens that it’s not in these other databases so that’s a reflection of how shallow those other database resources are and how more positively how rich and comprehensive the Mastermind database is for gene fusion events so we’re rounding the corner here.

I want to spend a little bit of time talking through some representative use cases that some of our clinical users and some of our Pharma partners have described to us as being potentially an application of the gene fusion databases we’ve developed it to date. The first for the clinical users is very straightforward it’s just to say to facilitate clinical reporting for patient information from RNA seek or genome sequencing data where fusion events have been identified which reference hasn’t been seen before what diseases am I looking at what are the typical breakpoints and what can I recommend as a therapy all of that information can come right out of the Mastermind gene fusion database very efficiently another use case that we’re exploring in more detail with some of our partners is annotation of the cancer genome Atlas dataset or any other clinical database of data for genomic profiling of patient sets to basically have an orthogonal method of annotating that data to better prioritize which of those molecular lesions are likely to be causative having been as they would have been in the gene fusion database characterized in the literature previously us on a related note patient selection for clinical trials as I mentioned at the onset we would be remiss if we did not include the identification of fusion events in profiling for our cancer patients and this can be useful for providing evidence and justification for patient enrollment on to clinical trials and can positively alter the likelihood of success because you will have better genetic data to indicate that this fusion is likely to be causative versus an otherwise uncharacterized fusion and then finally to uncover novel fusion mechanisms that might inform Pharma R&D; so I suggested that the mastermind gene fusion database comprises those 507 known gene drivers in fusion events we can take a step outside of that gene set and answer the question for any gene an entire gene pathway or otherwise very comprehensively across the whole human genome and thereby uncover fusion mechanisms that are otherwise lying dormant or undiscovered in the medical literature certainly not discovered in that aggregated organized and prioritized dataset that only Masterminds gene fusion database can provide.

I’ve talked a lot about Mastermind again I was trying to leave the details for last so that I get the main message out there but Mastermind for those of you who are unfamiliar is a comprehensive database predicated on the genomic medical literature have a deep understanding of the full breadth of the medical literature 30 million titles and abstracts are indexed on a daily basis of those we’ve prioritized the full-text content for those titles and abstracts that contain any genetic or disease related information so this is the most comprehensive dataset driving precision medicine and genomics that is available on the market and we characterize that information for any one of tens of part of a thousand of different diseases the entire spectrum of human disease any gene in the human genome.

I’m listing here the protein coding genes because we’re talking about gene fusion events and then we have 4.1 million point mutations and in Dell’s characterized in the Mastermind database to which we are now adding those 2,500 gene fusion events that I’ve just described and we’ll continue to do so as we expand the genes that comprise the mesh when gene fusion database so I believe this is if not my final slide, it’s the penultimate slide there have been a series of review articles in the most recent issue of cell this is one that that was particularly interesting to us from J gender and colleagues.

The PubMed ID is down there for those of you who are interested but this is from table four from that review and the Grand Challenges in genome science and genomic medicine and to that just so happened to be right next to each other which was convenient for boxing were really resonant with what genomics outlook for the future of genomic medicine is described the generation of catalogs of clinically meaningful functionally annotated CSVs in all clinically actionable genes and Mastermind would like to expand that to include all possible snips and in Dells as well as copy number variants. What we’re describing here are fusion events. So that the full complement of genomic or genetic lesions in all diseases across all genes not just those that have previously been deemed clinically actionable that is really what a genomic on striving mission is so that we can have better informed more efficient routine use of exome and genome sequencing to guide cancer therapies or otherwise diagnosis prognostic predictions in non oncology diseases as well as increasingly some therapeutics for those constitutional diseases and so with that I will thank you for your attention again.

I will turn it back over to Candace to see if we have any questions if we have time left over but I’ll leave this slide up for any of you who are interested in accessing mastermind now for the variant content that it contains in the professional edition and encourage anybody whose interest was piqued from what I described for gene fusions or other structural variants to reach out to us and hello@genomenon.com or kiel@genomenon.com to reach us generally or me specifically and I thank you.

Candace: Thank you so much for listening and thank you Mark great job. I’m very excited about this new data now.

Try Mastermind Basic Edition Now