Over the past several years, Genomenon has worked alongside pharma teams across rare disease and precision oncology to turn the published literature into decision-ready, literature-derived real-world evidence.
In this interview, Genomenon Founder and Chief Scientific Officer Mark Kiel, MD, PhD, takes time to answer key questions about literature-derived RWE: where it adds the most value, the common pitfalls he sees in rare and ultra-rare indications, and how leading teams are using the literature to design smarter trials, make stronger regulatory arguments, and move forward with more confidence.
Q: When pharma teams work with real-world evidence, where do you see the most room to get more value out of it?
Mark: One of the biggest opportunities is recognizing how much real-world evidence is already sitting in the published literature. Many teams naturally start with EHRs, claims, or registries, and the literature comes in later, if at all. But for rare and ultra-rare indications in particular, decades of global clinical experience are already captured in papers, case series, and natural history studies - and that’s often where the most detailed, patient-level insights live.
Beyond that, there are a few themes I see repeatedly. When data from multiple providers or sources are brought together without a clear harmonization plan, it can be hard to separate signal from noise. It also helps to define a very specific decision goal up front - what you’re trying to refine, de-risk, or validate - so the analyses are tightly aligned with that outcome.
Q: How many pharma projects has Genomenon delivered using literature-derived RWE?
Mark: We’re at roughly 75 pharma projects now using literature-derived RWE, across many dozens of pharma teams.
What matters more than the number is the range. We’ve done ultra-rare disease work where only a few hundred patients exist in the literature, and we’ve also supported larger precision oncology programs with tens of thousands of patients across studies.
So it’s not one narrow use case - it’s rare disease, precision oncology, small programs, large programs - all using the literature as a real, working source of evidence for development decisions.
Q: How do you define “literature-derived real-world evidence,” and how does it differ from EHRs, claims, and registries?
Mark: When I say “literature-derived real-world evidence,” I mean clinical, genomic, and outcome data systematically extracted from the peer-reviewed record - case reports, case series, observational cohorts, even guidelines and consensus statements.
What makes it different from EHRs, claims, and registries is where it comes from and who it captures. EHRs and claims are bounded by a health system, by coding practices, and by a lot of incomplete or inconsistent capture. Their denominator is huge, but it’s often bloated with mostly healthy or minimally characterized patients who touch the system once and disappear.
The literature, on the other hand, has a very rich numerator for rare disease and precision oncology. These are the patients clinicians felt were important enough to write up - often with detailed genotype, phenotype, treatment, and outcome information - aggregated across countries and care settings over decades. So you get both breadth (global experience) and depth (rich detail per patient).
When you curate that with rigor and full provenance, it doesn’t just “decorate” traditional RWD. It can sit upstream - shaping eligibility, endpoints, and feasibility before you ever touch an EHR extract - side-by-side as a complementary source, or even be used after the fact to explain what you’re seeing. But it’s often most impactful when you start with it upstream.
Q: Why is literature especially valuable for rare indications?
Mark: In rare disease, the literature is where the patients actually are. By definition, these patients are few, so every case is precious - and when they do show up in the peer-reviewed record, they’re usually described in great detail.
Those reports are often written by true specialists and key opinion leaders who know exactly what to look for: the nuanced phenotypes, the genotype patterns, the treatment course, the outcomes. You just don’t get that level of exhaustive description from a typical EHR pulled from a general health system.
So for rare indications, the numerator in the literature is enriched for the very patients you care about, and they’re documented with far more care and depth. That’s why literature-derived RWE is so powerful in this space - it’s where the rare disease signal is strongest.
Q: What does a credible workflow look like - from ingestion to auditable evidence?
Mark: At a high level, there are a few things you absolutely need in place.
First, you need access to the right body of literature. If you’re not comprehensively pulling in the peer-reviewed record for your indication, everything downstream is already compromised.
Second, you need a way to extract the clinical, genomic, and outcome data at scale. That’s where fit-for-purpose AI comes in - structured extraction, normalization, de-duplication. You can’t do this manually once you’re into thousands of papers.
Third, you have to be confident you’re doing it well. That means validation, QA, and expert curation. If you’re going to publish the results, submit them in a regulatory package, or make a billion-dollar decision on a program, you don’t want a black box. You want curated evidence you can actually stand behind.
Finally, the output has to be transparent and auditable: data with full provenance, that’s inspectable, reusable, and genuinely fit for regulatory dialogue.
Q: How fast can you generate a decision-grade patient or variant landscape?
Mark: With a mature pipeline like we’ve built at Genomenon, you’re talking weeks, not years.
We’ve already done the hard work of wiring up fit-for-purpose AI with expert curators, so when a pharma team comes to us with a clear question, we can very quickly turn the literature into a decision-grade patient dataset.
That means instead of spending months trying to chase down scattered evidence, you get a structured, auditable view of patients, variants, phenotypes, treatments, and outcomes quickly enough to actually influence trial design, indication strategy, and other high-stakes decisions.
Q: Is it possible to expand eligibility without sacrificing precision in who responds?
Mark: Yes - and literature is one of the best ways to do it.
When you systematically synthesize published cases, you start to see patterns that aren’t obvious in a narrow trial protocol or a single data source. On the genomic side, you can often move beyond a tiny set of “canonical” variants to include VUS and non-classic variants that clearly behave like the known pathogenic ones. On the clinical side, you see the full phenotypic spectrum - atypical presentations, comorbidities, age ranges - that let you broaden inclusion criteria without just guessing.
And it doesn’t stop at trial eligibility. The same evidence can point to new subgroups or adjacent indications where a drug might work - supporting label expansion, drug repurposing, or off-label use in a much more disciplined way. In that sense, the literature is a distilled summary of everything we know up to this point, and when you aggregate it carefully, it lets you expand thoughtfully, not indiscriminately.
Q: Can you share a few examples where literature-derived RWE has made a meaningful difference for a program?
Mark: A few good examples come to mind.
For genetically defined AML subset, an EHR dataset with linked genomics initially surfaced only a couple of dozen relevant patients - far too few for the sponsor’s planned analyses. Using literature-derived RWE, we identified >1,000 unique patients meeting pre-specified criteria, with detailed characterization of genotype, disease status, prognosis, and treatment response. That depth and scale directly informed how they thought about stratification and which subgroups to prioritize.
In PRKAG2 Syndrome, prior estimates suggested ~200 published patients. A systematic literature-derived RWE approach instead identified 548 unique patients, including demographics, phenotypes, genotype, interventions, adverse events, and - importantly - longitudinal follow-up for about 18%. That gave the team a much clearer picture of natural history and realistic endpoints and follow-up windows.
For pediatric ABCC6 Deficiency, there hadn’t been a focused effort to describe the pediatric population, despite its severity. By curating the literature, we identified 76 pediatric patients (plus 19 from prospective natural history cohorts), revealing a heterogeneous phenotype with a high burden of calcification and multi-organ complications, often in the first year of life. That evidence helped sharpen thinking around pediatric inclusion, disease burden, and how to frame the indication in regulatory dialogue.
We actually walk through several of these use cases in a recent white paper on literature-derived RWE.
Bringing literature-derived RWE into your next program
At Genomenon, we’ve built our platform and workflows around making literature-derived RWE practical: comprehensive ingestion of the peer-reviewed record, fit-for-purpose AI to extract and normalize the data, and expert curation to ensure the output is transparent, auditable, and ready for high-stakes decisions.
If you’re looking at how to bring the literature upstream in your own programs - whether for indication sizing, natural history, eligibility, endpoints, or regulatory strategy - we’d be happy to talk through where literature-derived RWE could fit in your workflow.
Contact us to connect with our team or to learn more about how we support pharma partners with literature-derived RWE.





