Sample of All FAQs (Helpie FAQ)

  • Mastermind Overview
  • What is the source of evidence for Mastermind?

    PubMed – Full Text and Supplemental data. We index the titles, abstracts, and other PubMed meta-data for all articles, along with the full text and supplemental data of articles relevant to genomics and Mendelian disease.

    We have also integrated ClinVar as an additional source of evidence. Variants that have been submitted to ClinVar are available in Mastermind. This includes ClinVar variants where Mastermind has not identified evidence in the literature (zero articles returned).

  • How often is the Mastermind database updated?

    Weekly. Mastermind performs weekly updates to its database by identifying the new content that has been published in the preceding week through PubMed and prioritizing this content for indexing.

  • Can results change from day to day on the same search in Mastermind?

    Yes. Because Mastermind data is updated on a weekly cadence and genomic data indexing is ongoing, new data can be added to the search results as new articles are indexed.

  • Are genes and variants cited in the tables and figures of full text searches included in the Mastermind database?

    Yes. Mastermind indexes the entirety of the full text in its search, including tables and figure captions. If data is contained directly in images, Mastermind does not index it. These instances tend to be rare and occur more often with much older articles.

  • What browsers are currently supported by Mastermind?

    Google Chrome is the preferred browser. We also offer a Chrome plugin called Mastermind Search Companion, which can be installed from the Chrome Web Store:

    If you do not have Google Chrome installed, you can download it by following the instructions here:

    Mastermind is also accessible via Firefox, Safari, Internet Explorer, and Microsoft Edge – though some features may be limited.

  • What happens if an article is behind a paywall?

    Sentence fragments are displayed regardless of paywall status. Mastermind provides a link to the publisher’s site to view or purchase the article.

  • What is Genomic Language Processing?

    Genomic language processing, or GLP, is the core technology behind Mastermind that normalizes and disambiguates the clinical and genomic information from the entirety of medical literature. Powered by GLP, Mastermind identifies every reference in every article in every way that an author could describe it, analyzes the genomic associations between each concept, and presents the data in an easily understandable interface. More information on GLP can be found in our blog posts about Genomic Associations and Curated Content.

  • Variant Nomenclature
  • Which types of variants can be searched in Mastermind?

    Mastermind can be used to search for coding variants including: missense variants; insertion, deletion, and indel variants; nonsense variants; frameshift variants; and copy number variations (CNVs). Mastermind will also search for non-coding variants affecting 5’- and 3’-untranslated regions (UTRs), splice donor/acceptor sites, splice regions, introns, as well as intergenic variants up- and down-stream of neighboring genes.

  • What variant formats and nomenclatures are supported in Mastermind?

    Mastermind recognizes variant information provided as cDNA, protein, genomic coordinates, rsID, legacy, and IVS nomenclature. Mastermind uses genomic language processing (GLP) to search the literature for all nomenclatures – standardized or not – and provides a highly sensitive return of references regardless of how individual authors might describe a variant in an article. For data display, variants in the Variants Table are shown as protein changes (ex – p.G1355D) to make it easier to find and interact with relevant articles.

    Mastermind also supports copy number variation (CNV) searching by karyotype, array nomenclature, genomic coordinates, and more. The system assumes coordinates are provided in GRCh38, however we accept coordinates in older builds by simply specifying that build (ex – hg18, hg19) at the end of the search.

    Deletion events are displayed in the search bar and CNV table as “del” and can be searched as:

    • deletion of __________
    • del __________
    • loss of __________

    Amplification events are displayed in the search bar and CNV table as “amp” and can be described as:

    • amplification of __________
    • amp __________
    • duplication of __________
    • dup __________
    • gain of __________

    Here are some examples of CNV below:

    • ISCN karyotype – “46,XX,del(5)(q13)” or “dup(1)(p36p11)”
    • Array – “arr[hg19] 7q36.3(158,583,829-159,119,707)x3”
    • Genomic coordinates – “deletion chr11:1918222-1977026 hg18”
    • Cytogenetic band – “del 11q23” or “loss of 11q23”
    • Intragenic CNVs – “dup KMT2A exons 2-6” or “amp KMT2A exons 2-6” or “gain of KMT2A exons 2-6”
  • How do I search for SNVs/indels by genomic positions?

    Mastermind supports GRCh37 and GRCh38 searches for SNVs/indels. Variants can be entered directly in the search box as shown below in GRCh37 and GRCh38, respectively:

    To search for genomic coordinates in Mastermind, enter them in the search bar with the appropriate sequence identifier or modify the URL directly as in the link below:


    where can be taken from the list below and substituted into the URL.

    For example, a search on chromosome 1 in build GRCh37/hg19 would look like this:

    Searching by genomic coordinates will list references for variants mapped to alternate transcripts. We recommend adding the gene of interest as another search term to reduce off-target matches.

  • How are insertions and deletions, nonsense, frameshift, and non-coding variants displayed in Mastermind?

    Mastermind uses shorthand identifiers to represent different types of variants, which are displayed and searchable within the Variants table. The shorthands used by Mastermind are as follows:

    • Insertions: “ins” — e.g. V600ins
    • Deletions: “del” — e.g. V600del
    • Indels: “delins” — e.g. V600delins
    • Nonsense: “X” — e.g. V600X
    • Frameshift: “fs” — e.g. V600fs
    • Untranslated regions: “UTR” — e.g. 5’UTR or 3’UTR. Some genes contain introns within the untranslated regions; therefore a variant might belong to both the “UTR” and “int” groups simultaneously – e.g. 5’UTRint.
    • Splice donor: “sd” — e.g. V168sd; these are variants affecting the 2-base region at the 5′ side of the intron. In the protein space, these are mapped to the nearest amino acid in the nearest coding neighbor at the 5′ side of the intron.
    • Splice acceptor: “sa” — e.g. N581sa; these are variants affecting the 2-base region at the 3′ side of the intron. In the protein space, these are mapped to the nearest amino acid in the nearest coding neighbor at the 3′ side of the intron.
    • Intronic: “int” — e.g. E46int; these are variants affecting any of the bases within the intron between the splice acceptor and splice donor sites. In the protein space, these are mapped to the nearest amino acid in the nearest coding neighbor.
    • Intronic donor and acceptor sides: “intd” and “inta” — e.g. N581intd or N581inta; these are variants that occur in either the donor half or the acceptor half of the Intronic “int” variant region between the splice donor and splice acceptor sites. These are more specific sub-divisions of the “int” category, and so variants in either the “intd” or “inta” categories will appear in the “int” category as well.
    • Splice regions: “srd” and “sra” — e.g. N581srd or N581sra; these are variants surrounding the splice sites, from 1 to 3 bases into the exon and from 3 to 8 bases into the intron. The intronic part of the splice regions overlap the intronic “int” classification as well, so splice region variants within the intron will also appear in the “int” and either the “intd” or “inta” categories as well.
    • Upstream genetic variant: “ugv” — these are variants affecting the region of 5,000 bases upstream of the 5′ side of the gene.
    • Downstream genetic variant: “dgv” — these are variants affecting the region of 5,000 bases downstream of the 3′ side of the gene.
    • Extensions: “ext” — e.g. A55ext; these are variants which span multiple exons within the gene.

    The graphic below shows the different regions for these groups of variants:

  • How and why does Mastermind group non-coding variants?

    Mastermind group types of non-coding variants together to provide a highly sensitive return of evidence. Articles with an exact nucleotide-level match are prioritized on the list and designated with a crosshairs/target symbol, shown below:

    In a search for MYH7:F244sd, for example, there are multiple nucleotide-level changes that fall into this “sd” (splice donor) bucket, including: c.732+1G>A, c.732+1del, and c.732+2T>G. When a search is performed for MYH7:c.732+1del, articles are returned for the entire F244sd bucket, but exact matches are prioritized and designated with the crosshairs symbol. You can read more about non-coding variants in our blog post here.

  • Can I search for mitochondrial variants or RNA variants in Mastermind?

    Mastermind indexes and returns articles for mitochondrial and RNA gene searches. We recommend then searching the variant as free-text. This can be done using the Boolean search with the operator OR between multiple variants or different nomenclatures for a single variant.

  • Mastermind Details
  • What are the Filter Categories in Mastermind and how can they be used?

    Filter Categories are available with Mastermind Professional Edition. The purpose of the Filter Categories is to add specificity to searches by filtering in articles that mention keywords of interest.
    The main keyword categories include: ACMG Interpretation, Clinical Significance, Genetic Mechanism and Significant Terms in Abstract.
    Users have the ability to Enable All keywords within one or multiple categories, or select keywords individually. The number next to each term indicates the number of articles with a match for that term. Each main category also contains several sub-categories. For example, under ACMG Interpretation, there is a sub-category for Functional keywords, which in turn are divided between in vivo and in vitro categories.
    The Significant Terms in Abstract category identifies keywords that are specifically associated with the search. Mastermind produces this list of custom key terms by aggregating the content of each of the articles returned, performing a word frequency calculation, normalizing this list against the rest of scientific literature, and then ordering the terms by their frequency of occurrence in the content of interest. This set of keywords are dynamic (changes based on your search) whereas the keywords in the other categories (ACMG interpretation, Clinical Significance, Genetic Mechanism) are static.

  • Can I save my search?

    Yes. Saving the URLs is the best way to save a search in Mastermind. Each time you visit the URL, you’re getting the most up to date information available for that search. You may initially find 10 articles returned for a search, but upon revisiting that same URL 2 months later may show 12 articles.

  • Can I configure URLs so I don’t need to type?

    Yes. URLs in Mastermind incorporate the search terms and Filter Categories, making it possible to pre-configure links by plugging in the values for gene, mutation, cnv, disease, hpo (phenotype), unii (therapy), cats (Filter Categories), and keyword (free-text terms).

  • Can I batch annotate variants?

    Yes. The Mastermind API can integrate with your VCF pipeline. To learn more visit:

  • Does Mastermind include population frequency data or computational model predictions of pathogenicity?

    Currently, population frequency data and computational model predictions are only available for our Disease-Specific Curated Content.

  • Is Mastermind available through API access?

    Yes. The Mastermind API enables access to the data programmatically. You will need a separate license key to use the Mastermind API. To learn more visit: In addition to the published API, custom APIs are available upon request. Contact our Sales team to start a conversation

  • How do I report a missing article or variant to Mastermind?

    If you come across an article that you would like to see indexed, or notice information from an article that has been misinterpreted by Mastermind, feel free to reach out to us through the “Contact Us” in the toolbar within the application or email us at

  • Does Mastermind differentiate between positive and negative associations for diseases and genes or diseases and variants?

    No. Mastermind searches for all mentions of a disease, gene, or variant, but does not draw conclusions about the nature of the association between the disease and variant or gene. We leave that to the experienced clinical genomic scientists who utilize the application to obtain a comprehensive source of evidence. The Prognosis keywords found under the Clinical Significance category are often useful for prioritizing papers related to the impact of the gene and/or variant in the context of disease.

  • Does Mastermind provide variant interpretations or reports?

    Provisional classifications are available for our Disease-Specific Curated Content. However in general, the Mastermind genomic search engine does not draw its own conclusions about the clinical significance of individual variants. Rather, it provides the user with all the evidence necessary to make these conclusions on their own. Mastermind is therefore more properly considered a decision support tool.

  • Can I filter out certain types of articles or keywords?

    We use a filter-in approach to ensure maximal sensitivity and no articles get missed. Where a filter-out approach is prone to discarding potentially relevant articles, utilizing the Filter Categories in conjunction with phenotypes/diseases/therapies/text leverages the power of GLP and Mastermind’s Relevancy algorithm. The filter-in approach enables prioritization of the literature and presents you with information most relevant to your search at the top of the Articles list.

  • Is there a way to filter literature results by somatic vs germline?

    Mastermind has categorical keywords for “somatic” and “germline” found under ACMG interpretation>Inheritance pattern that help prioritize more relevant content. In addition, including diseases or phenotypes in the search is a particularly useful tool for filtering articles.

  • Can Mastermind be used to identify mutational hotspots?

    Yes. Mastermind is ideal for identifying evidence of hotspots in genes. The Variant Diagram plot is a visual representation of variants published for the gene in question, as well as the relative number of associated articles. Using this feature, you can determine at a glance which regions are highly variable. This plot updates whenever your search terms change, or you can select a variant by clicking on its bar within the Diagram.

  • CNV Searching
  • Can you search for HLA alleles including the HLA nomenclature for a high resolution HLA NGS data, including specific phenotypes?

    Searching for HLA alleles is possible using free-text search. Searching the gene of interest along with the allele as free-text is the recommended workflow. We do not currently index the HLA nomenclatures for use in the context of tissue typing.

  • Does the Mastermind capture those CNVs from the large scale cancer genomics projects?

    If the results of these CNV studies are described in the full-text, they are captured in Mastermind. We are in the process of adding the supplemental data to our CNV indexing process as well. Please contact us at to bring to our attention any CNVs that are not captured so that we can address them.

  • Does Mastermind also work for fusions, inversions, and translocations?

    Mastermind allows you to search for evidence pertaining to known fusion pairs (ex – NPM1 and ALK) using multi-parameter gene searching with “and” operator in addition to categorical keywords found under Genetic Mechanism>Fusion Events.

  • In addition to the CNV feature, is it possible to search according to the Tier classification for somatic variants?

    While Mastermind Genomic Search Engine does not itself have pre-categorized Tiers for CNV entries, there are a variety of search features that enable classification. For example – enabling filters under the ACMG Interpretation>Functional category will prioritize papers that discuss consequences of a searched-for CNV.

  • If I search for a deletion in a certain region, would it also show any duplication associated with that region?

    The default search functionality shows overlapping CNVs with the same effect (if you search a deletion, only overlapping deletions are displayed). If you want to see articles for either, you can perform a Boolean search for both the deletion and amplification, then set the operator between the two to “or”, which will show references citing either the deletion or amplification. The CNV list will also then show all overlapping CNVs of either deletion or amplification types.

  • Do you index articles based on legacy exon numbering, as well as systematic versus custom exon numbering?

    We are aware of issues with legacy exon numbering, as we have handled similar issues with variants (such as IVS nomenclature with legacy intron numbering). For now, CNVs are normalized by matching the exon numbers to the longest transcript and mapping that to the GRCh38 reference build.
    You can easily see legacy citations using different exon numbers by searching for deletions or amplifications of the entire gene, then sorting the CNV list based on start position to see references using alternate exon numbers. You could then click the legacy exon numbers and sort the article list by publication date to focus on older articles more likely using legacy numbering.
    If you are aware of specific genes and examples of this issue, please reach out to and let us know so that we can consider incorporating corrections for the genes into our Genomic Language Processing pipeline.

  • How did you solve the problem which is caused by changes in the reference genome (different genome coordinates)?

    We use GRCh38 by default for normalizing CNV references across different nomenclatures, but lift over coordinates from GRCh37 (hg19) when specified by either an author or user within their nomenclature.

  • How are various transcripts accounted for in CNV search?

    Primarily for normalizing citations using gene-and-exon nomenclatures, we use the longest transcript for a gene mapped to GRCh38 for normalizing the citation to genomic coordinates.

  • How do you define a CNV?

    Within Mastermind, a CNV is any structural variant described by authors using any accepted CNV nomenclature or description, including by cytogenetic band, ISCN karyotype notation, genomic coordinates, or genes and exons.

  • Can you search for other types of structural variations?

    At this time, we are not optimized for trinucleotide repeats or some other types of structural variation. Contact us at if you have questions about CNV nomenclature/searching.

  • What transcripts does Mastermind match to?

    We use RefSeq for transcript information. We have indexed millions of articles and have seen that RefSeq is used almost exclusively to reference transcripts. When Ensembl or another reference is used, RefSeq is typically also mentioned, allowing the information to be detected and incorporated into the database.

  • What genome build is this based on?

    Our CNV search is based on genome build GRCh38 for mapping purposes. However, you may search on different build types without prior lift over by specifying the build at the end of the search. For example: “deletion chr11:1918222-1977026 hg18”. Mastermind will perform the lift over to GRCh38 automatically and return articles for matching/overlapping CNVs.

  • How does the CNV diagram work when there are 50-1000+ genes?

    Although the diagram will appear crowded, one can easily zoom in and out by scrolling the mouse. You can also click and drag the mouse pointer left and right to move along the diagram. Click “Reset” in the upper left corner to zoom out to the widest view of the diagram.

  • Can you filter out cancer-related CNV?

    No. We take a filter-in approach to ensure maximal sensitivity of the return. Specificity can be added to your CNV search using diseases, phenotypes, genes, and categorical keywords. In this case, applying the germline and germ-line filters under ACMG interpretation>Pedigrees and case studies is helpful.

  • Can you filter CNV search results by phenotype?

    Yes. A benefit of using Mastermind is the ability to filter results based on the phenotype, disease or keyword of interest. When phenotypes are added to the search, sentence fragments mentioning those phenotypes are displayed in the Full Text Matches.

  • How is CNV overlap defined?

    We use specific terms to describe overlap with the searched-for CNV.
    These include:

    • Exact: The listed CNV result exactly matches the genomic start and end coordinates of the searched-for CNV.
    • Contained: The listed CNV is contained within the CNV being searched. The CNV is smaller than the searched-for CNV.
    • Intersecting: The start or end position of the reported CNV is included within the searched-for CNV, but the other end is not. This can be understood further by examining start or end point, length, or gene/cytoband match.
    • Surrounding: The listed CNV starts before or at the start coordinate of the searched-for CNV and ends at or after the end coordinate. The CNV is larger than the searched-for CNV.

    For each listed CNV, hovering over the “Overlap” type will display the percentage of reciprocal overlap (<1-100%) and the number of overlapping base pairs. The percentage is calculated as the number of overlapping base pairs divided by the combined [overlapping] length of the two CNVs (the smaller of the two start positions subtracted from the larger of the two end positions).

    For example, all exact matches CNV in the list (if any) will have 100% overlap. All other overlap types will be less than 100%. Intersecting and surrounding CNV may vary significantly in percentage overlap. A crosshairs/target symbol next to a publication in the Articles list is indicative of an exact match.

  • How are published CNVs displayed?

    The data is displayed in the CNVs table under the CNV diagram. In the table, there are headers for: chromosome, start position, end position, length, overlap, matching cytoband, overlapping genes, and article matches. Default sorting of this table (without filters applied) is by total article counts. When phenotypes, genes, or filters are applied, the table is default sorted by article matches with filters. Data can be re-sorted based on the user’s preferences by clicking the headers. It is often helpful to sort by Overlap to quickly identify exact matches and/or CNVs with high percentage of overlap. Users can also type gene names into the “Filter by gene” search box to filter the table, or add a gene to the main search bar to require a gene match in the articles that are returned.

  • What is visualized in the CNV Diagram?

    All unique CNVs overlapping with the searched CNV are displayed as individual blue tracks, with genomic position along the X axis. Hover over each blue track to see start and stop coordinates of the CNV, its size, and the number of articles that will be returned for that CNV.
    Below the CNV tracks, Articles are displayed in a Manhattan plot in blue, where the Y axis is citations per CNV. Hover over blue blocks to see total article counts for that region, its size, and the start and stop positions.
    Below Articles, the genes mapping to the genomic coordinates along the X axis are displayed in gray. Hover over the gray blocks to see the gene name, start and stop positions, and article count.

  • Can I search for both losses and gains in one search?

    Yes. The Boolean search feature allows a user to include both loss and/or gain of chromosomal material. For example, start by entering the amplification in the search bar (ex: “amp:chr7:53900000-58100000”). With this in the search bar, enter the deletion (ex: “del:chr7:53900000-58100000”) and identify the desired CNV from the drop down list of suggestions. To add this to the search, press shift on the keyboard, then click on the additional CNV from the drop down list. The operator can be changed from “and” to “or” by clicking the operator. This allows for more/less restrictive searching for both/either CNV events in a single search.

  • ClinVar Integration
  • Does Mastermind match all cDNA changes for a specific protein change?

    Yes. In the ClinVar tab, use the drop down next to Select ClinVar Record to see all cDNA changes that map to the protein change in your search.

  • Can I link out to ClinVar from Mastermind?

    Yes. Click the View in ClinVar button within the ClinVar tab to access the ClinVar record corresponding to the cDNA change of the searched variant.

  • How do I find ClinVar variants in Mastermind?

    ClinVar variants can be found in Mastermind by searching the cDNA, protein change, genomic coordinates, or rsID nomenclature. They are displayed in the Variant Info section, and designated with a blue tab when available.

  • How often will ClinVar information be updated in Mastermind?


  • Does Mastermind index the current live version of the ClinVar database or a downloaded copy?

    The ClinVar information is from a download copy of the database.

  • What ClinVar variants are included in the new Mastermind integration?

    All variants in the most recent ClinVar download are available in Mastermind. This includes ClinVar variants where Mastermind has not identified evidence in the literature (zero articles returned).

  • Disease-Specific Curated Content
  • How should I cite Mastermind in my paper?

    When referring to the use of Mastermind within a sentence, please use the following text: “Mastermind Genomic Search Engine (”

  • Where does your population data come from?

    Population data for curated variants is based on data from gnomAD v2.1.1.

  • Does Mastermind have curated content for every variant for a curated gene?

    Curated content is available for published variants in the canonical transcript of a curated gene. For variants that are unpublished or for which the published information provides no useful information for classification, curated content may not be available.

  • Which variants have curated content?

    Currently, Disease-Specific Curated Content is available for over 230 unique disorders in Mastermind Professional Edition. A small subset of the curated content is available in Mastermind Basic Edition.

  • How do I access curated content?

    For variants with curated content, this content can be accessed by searching for a gene and variant. A provisional classification will be displayed for the variant on the Evidence page, and the curated content is available on the Interpretation page be viewed by clicking “View Interpretation.”

  • What part of the content is curated by humans?

    Before curated content becomes available in Mastermind, clinical genomic scientists review each article for evidence, ensure that the variant nomenclature is accurate and based on the canonical transcript, and apply ACMG interpretation criteria to the summarized evidence. The curated data then undergoes an extensive secondary QA review process before it is added to the Mastermind database. Our SOP for curated content can be found within the application (Genomenon Sequence Variant Interpretation Standards) or here.

  • What is Disease-Specific Curated Content?

    Disease-Specific Curated Content is a comprehensive and expertly curated set of variants for a gene. The variant data includes a summary of the published evidence, ACMG-based criteria applied to the evidence, and an ACMG-based provisional classification based on the evidence. The curated content also includes gene/transcript information, population data, in silico prediction models, and data intrinsic to the gene.