The Human Genome Project, begun in 1990 and completed in 2003, was a major breakthrough for clinical genetics – the sequence of the human genome was no longer a mystery.

With the entire sequence, researchers were able to reliably detect genetic variants in any of the >20k genes in the human genome, and further, to understand their relationship with disease. In 2014, the industry reached another milestone – the $1000 genome. However, the drastically reduced cost of sequencing was challenged by the cost of understanding the results, referred to by some as “the $1 million dollar interpretation”.

Interpretation of sequencing results involves determining the clinical meaningfulness of genetic variants discovered in a patient’s genome, which requires consultation of a variety of data sources – most importantly, the medical and scientific literature. Aggregation and assessment of this evidence is made difficult by the need to use search engines that were not intended for the purpose of comprehensively identifying genetic information. As a result,

Finding the references necessary to accurately diagnose a given patient can be like finding a needle in a haystack, which ultimately extends the time required to reach a diagnosis.

Typically, the time from DNA sample preparation to diagnosis is days or weeks, which is time that some patients simply don’t have. Several groups have made major steps towards reducing this time using a new, faster sequencing technique combined with a streamlined pipeline for analysis and interpretation of the data. In a recent study, this resulted in a patient being diagnosed in a little over 7 hours, setting a new record for genetic diagnosis.

In this study, 12 patients in a critical care setting underwent whole-genome sequencing (WGS) in order to ascertain a molecular diagnosis and initiate appropriate treatment, which was achieved for 5 of the 12 patients. A significant component of the time required for this process was prioritizing and reviewing identified variants, including consultation of the medical and scientific literature. On average, this took ~ 2hrs or 13% of the total time, meaning that even for a record-setting sequencing and analytics approach, there remains substantial opportunity for improvement, both in the time and the number of successful diagnoses.

As the use of WGS in clinical practice continues to increase, improving the speed and sensitivity of variant interpretation will be crucial for ensuring the maximum number of patients receive accurate and timely diagnosis and treatment.

The most impactful way of achieving this aim would be to interpret the entire human genome; however, this would be a massive undertaking on the scale of the Human Genome Project, requiring decades of work using current manual processes. Automation that can supplement manual efforts will be necessary to make this a reality within our lifetimes.

Genomenon’s mission is to interpret every known, published variant in the human genome and distribute those interpretations and corresponding evidence to the wider clinical genetics community within Mastermind, which is currently used by nearly 20,000 clinical laboratories, genetic counselors, and clinicians. This will be accomplished through a novel combination of AI-based indexing and expert manual review that ensures maximal sensitivity while vastly increasing the speed and accuracy of interpretation.

The sensitivity of AI-based indexing, which is the foundational technology of Mastermind, has already been demonstrated to result in rapid diagnoses for patients, including two children diagnosed with rare diseases at Rady Children’s and the Rare Genomics Institute, each as a result of a single paper only Mastermind identified. Thus,

Curating every published variant in the human genome can dramatically reduce the time required to receive a diagnosis and maximize the number of accurate diagnoses, especially for rare diseases.


Our Methodology


AI-Based Techniques to Increase Speed & Sensitivity

Curating the human genome will require automation, both to reduce the time to completion, and to increase the sensitivity of the results. Genomenon uses a technology known as Genomic Language Processing (GLP) to automatically extract and standardize clinical and genetic information from the literature. Currently, the Genomenon database includes over 15 million variants, 8.5 million full-text articles, and 3 million supplemental datasets.

As GLP is specifically designed to recognize the myriad ways in which genetic information is referenced, this vastly increases the overall sensitivity, in addition to enhancing the speed, compared to traditional search methods and existing variant databases. For 108 variants encountered in clinical practice, the sensitivity of articles returned by Mastermind was 100%, whereas Google Scholar was 76.9%, PubMed was 22.2%, and ClinVar was 63.9%.

The increased sensitivity provided by GLP maximizes the number of variants and articles that are identified and removes the need to manually search for the appropriate subset of literature for a given variant, saving a significant amount of time overall.

Manual, Expert Review to Ensure Specificity

While GLP dramatically improves the sensitivity of variant interpretation, manual review is still required to ensure the specificity and accuracy of every variant classification, as well as to maximize the clinical utility of the resulting data. As such, all evidence is manually reviewed, annotated, and interpreted according to the clinical-standard ACMG/AMP guidelines by trained variant scientists. This interpretation process is also assisted by an automated curation framework which increases the overall speed of variant interpretation without sacrificing the accuracy provided by manual review. Ultimately,

The sensitivity of GLP combined with the specificity of a manual, expert review process results in a larger set of clinically actionable variants.

As an example, across 27 representative genes, Mastermind identified 4.8X more pathogenic variants, on average, compared to ClinVar. This expansion of clinically actionable variants, especially when considered across the entire human genome, can significantly increase diagnostic rates, especially for rare diseases where access to information about variants is lacking.

Prioritization of Genes

To provide the most immediate value to the clinical genetics industry, certain genes and subsets of genes will be prioritized based on their associated disease(s) and whether there are any therapies, approved or in trials.

Prioritizing the interpretation of genes based on clinical utility can provide immediate value for patients seeking diagnosis and treatment, even before interpretation of the entire human genome is completed.


Integration of Variant Interpretations into Mastermind

As interpretation of a given gene is completed, all variants, classifications, and their associated evidence are being integrated into Mastermind. When a user searches for a variant that has such evidence, a ribbon is displayed showcasing the classification, the associated disease(s), a summary of the evidence supporting the classification, and a list of resources and available treatments for the disease.

In a standard variant interpretation process, Professional users click on “View Evidence” to quickly review quotes from selected articles that supported the variant’s classification according to ACMG/AMP as well as data from population frequency databases and computational prediction algorithms, to verify the provided classification. They can then click “Export Report” to download that information in a format suitable for clinical reporting.

A number of genes have already been integrated into Mastermind, and work on additional genes is ongoing.


Impact for Clinical Genetics

Completely and accurately curating every known variant in the human genome and providing comprehensive, annotated evidence for their classification will have wide-reaching implications for diagnosis of genetic disease. Not only will diagnosis be faster as a result of significantly reducing the time required for interpretation of sequencing results, but diagnostic rates will increase as no variant or article will be left uncovered or unexamined. Ultimately,

Curating the genome will revolutionize the field of clinical genetics, allowing for genetic testing to become a rapid and sensitive first line of defense for the diagnosis of disease, especially those that are rare.

Due to the decreased effort required for interpretation, genetic testing will also become more cost-effective and scalable, especially for smaller clinical laboratories. As a result,

More clinical laboratories will be empowered to perform large-scale sequencing, which can further increase diagnostic rates.

In addition, the ability for all users to view and independently review the evidence for every variant classification can increase consistency between clinical genetics professionals by ensuring that all are starting with the most comprehensive foundation of evidence. Increasing consistency of classifications is key for ensuring that patients are appropriately diagnosed.

Finally, the notification of available therapies and clinical trials alongside variant classifications can increase overall awareness among clinicians, and maximize the number of patients that are successfully treated and enrolled in trials.


Impacts for Drug Development

Developing therapies for genetic disease, especially those that are targeted, requires a deep understanding of the landscape of causative variants, both to choose the correct targets and to ensure that the appropriate patients are enrolled in trials or treated with an approved drug. As such,

Interpretation of the entire human genome would provide the necessary evidence to accelerate drug development, from early discovery through to commercialization.

This will be especially important as highly targeted therapies become even more of a focus for drug development. It will be essential that the damaging effects and prevalence of individual variants are well understood to ensure the success of the therapy in trials. Through the detailed annotation of both clinical and functional studies in the literature that are provided in Mastermind, pharmaceutical companies can confidently choose the right variant or variants for their program.


Summary

In recent years, massive advancements in technology have brought rapid, low-cost genetic sequencing to the clinic, revolutionizing the way diagnosis is approached. However, interpretation of sequencing results remains a challenge due to the requirement for manual processes of data aggregation, including consultation of the medical and scientific literature. With these manual processes, interpretation proceeds slowly and at the pace of a single variant or gene, meaning the interpretation of the entire human genome would be outside our reach without decades of effort.

Using a novel combination of AI-based indexing of the literature and manual review, we aim to complete interpretation of the entire human genome in just a few years.

Having every known variant interpreted according to clinical standards, with comprehensive, annotated evidence, will revolutionize both clinical genetics and the development of therapies for genetic disease. Not only will diagnosis be faster, but it will be more accurate, and development of therapies will be predicated on a deep foundation of evidence to ensure the highest probability of success.

At Genomenon, our aim is to ensure that no patient goes undiagnosed as a result of a lack of access to quality evidence; interpretation of the human genome is the solution.


ABOUT THE AUTHOR
Lauren Chunn is a medical science liaison with expertise in applied genomics in both clinical and pharmaceutical settings. She engages with clients to understand their program needs and collaborates on precise genomic solutions.