CNV Search Guide

Which nomenclature is used to search for a CNV?

A user can search by cytogenetic band, ISCN karyotype notation, genomic coordinators, or by genes and exons. Examples of each type of search is below.

  1. By cytogenetic band – e.g. “Duplication of chromosome band 11q23”
  2. ISCN karyotype notation – e.g. “46,XX,del(5)(q13)” or “dup(1)(p36p11)”
  3. Genomic coordinates – e.g. “deletion (chr11:1918222-1977026 (hg18))”
  4. Genes and exons – e.g. “duplication of MLL exons 2-6”

Can I search for both losses and gains in one search?

Yes. We have a Boolean search feature that will allow a user to expand a search to include both loss and gain of chromosomal material. For example, a user can enter both an amplification and a deletion of chromosome 7 material by performing a Boolean search. This is typed in the search bar as “amp:chr7:53900000-58100000” then “del:chr7:53900000-58100000”, and changing the comparison operator from “and” to “or” to ensure results from either CNV type are returned, rather than results that cite both.

What is visualized in the CNV Diagram?

The genes within the genomic coordinates will be displayed. The Y-axis shows the CNV citations per gene. The X-axis shows the number of articles citing each CNV that overlaps each gene. For example, if there are 4 CNVs that overlap a given gene cited in the literature, and each CNV has 4, 15, 3, and 100 citations, respectively, that gene will have a Y-axis value of 122 CNV citations.

How are corresponding CNVs in the literature displayed?

The data is displayed according to genomic start point, end point, length, and reciprocal overlap between the searched for CNV and the one in the reference(s), matching cytoband, overlapping genes, and article counts for each CNV. Initial ranking is by total article counts if searching by a CNV, by filtered article counts if additional filters (e.g. phenotypes, genes, etc.) are being searched with the CNV, or by co-occurring article counts if more than one CNV is being searched. The CNVs in the CNV list pane can be resorted based on preferences of the user and filtered by specific genes.

How is CNV overlap defined?

We use specific terms to describe overlap with the searched-for CNV.

These include:

  • Exact: The listed CNV exactly matches the genomic start and end coordinates of the CNV search.
  • Contained: The listed CNV is contained within the CNV being searched. The CNV is smaller in size than the CNV searched.
  • Intersecting: The start or end position of the listed CNV is included within the searched CNV, but the other is not. This can be understood further by examining start or end point, length, or gene/cytoband match.
  • Surrounding: The listed CNV starts before or at the start coordinate of the searched CNV and ends at or after the end coordinate. The CNV is larger in size than the CNV searched.

For each listed CNV, hovering over the “Overlap” type will illustrate the percentage of reciprocal overlap (<1-100%) with the number of overlapping base pairs. The percentage is calculated as the number of overlapping base pairs divided by the combined [overlapping] length of the two CNVs (the smaller of the two start positions subtracted from the larger of the two end positions).

For example, the exact match CNV in the list (if any) will have 100% overlap. All other overlap types will be less than 100%. A bullseye next to a publication in the Articles list pane is indicative of an exact match.

Can you filter CNV search results by phenotype?

Yes. A benefit of using Mastermind is the ability to filter results based on the phenotype, disease or keyword of interest.

Can you filter out cancer-related CNV?

Yes. All of our advanced filters can be beneficial. In this case, applying the germline and germ-line filters under “ACMG interpretation”, “pedigrees and case studies” is helpful.

How does the gene diagram work when there are 50-100,000 genes?

Although the diagram will appear crowded, one can easily zoom in and out by scrolling the mouse. You can also click and drag the mouse pointer left and right to move along the diagram.

What genome build is this based on?

Our CNV search is based on genome build Grch38 for mapping purposes. However, you may search on different build types by adding parentheses around the build. For example, chr11:1918222-1977026 (hg18).

How do you define a CNV?

Any structural variant described by authors using any accepted CNV nomenclature or description, including by cytogenetic band, ISCN karyotype notation, genomic coordinates, or genes and exons (for examples of each, see “Which nomenclature is used to search for a CNV?”)

Can you search other types of structural variations?

At this time, we are not optimized for trinucleotide repeats or some other types of structural variation.

How are various transcripts accounted for in CNV search?

Primarily for normalizing citations using gene-and-exon nomenclatures, we use the longest transcript for a gene mapped to GRCh38 for normalizing the citation to genomic coordinates.

How did you solve the problem which is caused by changes in the reference genome (different genome coordinates)?

We use GRCh38 by default for normalizing CNV references across different nomenclatures, but lift over coordinates from GRCH37 (hg19) when specified by either an author or user within their nomenclature.

Do you index articles based on legacy exon numbering, as well as systematic versus custom exon numbering? This is an issue I have struggled with quite a bit in performing literature searches for CNVs.

Right now, we are aware of issues with legacy exon numbering, as we have handled similar issues with variants for example using IVS nomenclature with legacy intron numbering. For now, CNVs are normalized by matching the exon numbers to the longest transcript and mapping that to the GRCh38 reference build.

You can easily see legacy citations using different exon numbers by searching for deletions or amplifications of the entire gene, then sorting the CNV list based on start position to see references using alternate exon numbers. You could then click the legacy exon numbers and sort the article list by publication date to focus on older articles more likely using legacy numbering.

If you are aware of specific genes and examples of this issue, please reach out to support@genomenon.com and let us know so that we can consider incorporating corrections for the genes into our Genomic Language Processing pipeline.

If I search for a deletion in a certain region, would it also show any duplication associated with that region?

Currently, the default search functionality shows overlapping CNVs with the same effect (either deletion or duplication/amplification). If you want to see either/or, you can do an Advanced (Boolean) search for both the deletion and amplification and change the comparison operator between the two to “or”, which will show results citing either one or the other. The CNV list will also then show all overlapping CNVs of either deletion or amplification types.

Is there a way to filter literature results by somatic vs germline?

Mastermind has keywords for “somatic” and “germline” that help filter less relevant content and prioritize more relevant content. Additionally, if there is a specific disease of interest, any specific disease or phenotype can be used as a filter as well.

Is it possible to perform a batch query for a list of CNVs?

This will be possible with the API, coming soon.

Additional to the CNV feature I wonder if it is possible to search according to the Tier classification for somatic variants?

While Mastermind Genomic Search Engine does not itself have pre-categorized Tiers for CNV entries, there are a variety of search features that allow you to focus on identifying, for instance, functional studies that may have been performed to assess the consequences of a searched-for CNV.

It looks as though the engine is optimized for CNV, but can it also work for fusions, inversions and translocations?

Mastermind allows you to search for evidence pertaining to known fusion pairs (NPM1 and ALK, for instance) using multi-parameter gene searching with “and” operator in addition to the “fusion” category of keywords.

We have an API Cookbook example script which allows searching for gene-fusion evidence when neither gene is known (by starting with a disease), when one gene partner is known, and when both gene partners are known. This script automates the process of identifying gene-fusion associations predicated by the evidence in the medical literature by utilizing the Mastermind Advanced API.

Later versions of Mastermind will allow for more flexible karyotype searching to uncover translocations and inversions leading to gene fusion events in a similar way as was illustrated for CNVs.

Many users will be used to viewing a histogram for these SV, however the user interface seems to mirror searches for SNVs, is it possible to restructure the viewing panel according to a log scale histogram, with bars linking to relevant papers, as analysts may feel more comfortable with this format?

Yes. A conventional type of data visualization is planned for a future Mastermind release that will be more similar to the type of plot you describe.

Can you search for HLA alleles including the HLA nomenclature for a high resolution HLA NGS data, including specific phenotypes?

We do not currently index the HLA nomenclatures for use in the context of tissue typing.

Does the Mastermind CNV tool capture those CNVs from the large scale cancer genomics projects?

If the results of these CNV studies are described in the full-text, the answer is yes, they are captured in Mastermind. We are in the process of adding the supplemental data to our CNV indexing process as well. Please contact us at support@genomenon.com to bring to our attention any that are noted to NOT be captured so that we can address.

Does MM take HPO term as an input for phenotype search?

Yes. Phenotypes including HPO terms can be used as search parameters for filtering and prioritization.