
The way most digital therapeutics teams think about evidence is fundamentally backward.
They start with “We need an RCT for FDA and payers” and only later realize they have no ongoing evidence pipeline, no real-world data strategy, and no scalable way to answer the dozen next questions that come after approval. That is how you burn cash and still fail to convince skeptical clinicians and health plans.
Let me walk you through how to design an actual evidence pipeline for digital therapeutics (DTx) – not a one-off trial, not a slide deck, but a system.
The context: you are post‑residency, probably in or adjacent to a medical startup, looking at job market realities. You speak both clinic and product. Good. You are exactly the person who should own this.
1. Start With the Destination: What Evidence Do You Actually Need?
Digital therapeutics live or die on three types of evidence:
- Regulatory sufficiency
- Clinical credibility
- Economic defensibility
If you do not design for all three from day one, you will pay for it later.
| Category | Value |
|---|---|
| Early Dev | 40 |
| Pre-Market | 80 |
| Post-Market | 100 |
Regulatory: what the FDA and similar bodies actually care about
For DTx, especially under the Software as a Medical Device (SaMD) umbrella, regulators want:
- Clear intended use and claims
- Defined clinical endpoints (not vanity metrics)
- Risk assessment and benefit–risk profile
- Software quality, cybersecurity, human factors
What they do not need: 10 different exploratory endpoints and machine-learning “insights” with no prespecified analysis plan.
So before you touch “real-world evidence” (RWE), write down:
- The exact core claim you want to make (e.g., “reduces A1c by ≥1.0% at 6 months vs usual care in adults with type 2 diabetes using insulin”).
- The primary endpoint that supports it.
- The patient population that matches your label.
Everything else in the pipeline either serves that, or it is research tourism.
Clinical credibility: will clinicians actually believe and use this?
Most DTx founders underestimate how conservative real clinicians are.
From the clinician’s perspective, they are asking:
- Does this work in my patients, not just cherry-picked trial participants?
- Does it integrate with my workflow, or does it become another inbox?
- What happens when the patient stops using it after 3 weeks?
- Is this one more shiny app or something with serious, durable data?
Clinicians trust:
- Well-powered RCTs in recognized journals
- Pragmatic trials embedded in real clinics
- Registry or health system data showing sustained effect
- Transparent safety signals and dropout patterns
They do not trust:
- Single-arm case series with 30 highly motivated volunteers
- Uncontrolled “before–after” comparisons without context
- Internal company slide decks that never see peer review
Economic defensibility: the payer/health system view
Your DTx is competing for budget against:
- GLP-1 agonists
- Care management programs
- Telehealth bundles
- Other digital point solutions
Payers and large employers will ask:
- Does this reduce total cost of care or hard clinical events?
- Can you risk-share or do outcomes-based contracts?
- What is the time to ROI? Year 1? Year 3? Never?
This is where RWE moves from “nice academic add-on” to “non-negotiable”. You need:
- Comparative data vs current standard (not placebo).
- Cost offsets: admissions, ED visits, high-cost meds, procedures.
- Subgroup analyses: which members generate ROI and which do not.
So the destination is not “have an RCT and some nice utilization data”. The destination is:
- One or two pivotal datasets for your core claims.
- An ongoing real-world evidence engine that feeds clinical and economic stories to regulators, clinicians, and payers continuously.
2. Build an Evidence Pipeline, Not a Single Trial
Most DTx teams run one inaugural RCT like it is a biotech Phase 3, then scramble later. Wrong model.
For software, your evidence strategy must look like a pipeline:
- Continuous
- Iterative
- Integrated with product and deployment
Think in stages.
| Step | Description |
|---|---|
| Step 1 | Prototype & Feasibility |
| Step 2 | Single Site Pilot |
| Step 3 | Pragmatic RCT or Hybrid Effectiveness |
| Step 4 | Post Market Registry & RWD |
| Step 5 | Continuous Learning & Label Expansion |
Stage 0: Measurement architecture (before any “study”)
If you are at “post-residency, joining a startup” stage and this is not done yet, this is your first battle.
You need:
A data schema that captures:
- Patient demographics and risk factors
- Clinical endpoints (e.g., A1c, BP, PHQ-9)
- Usage metrics (daily active users, feature engagement)
- Adherence and dropout timestamps
- Adverse events and safety signals
Clear data provenance: EHR, claims, PROs, device sensors, app logs.
Infrastructure for linkage: patient-level linking between app data and clinical/claims data (with proper HIPAA / GDPR compliance, PHI/PII separation, etc.).
If you cannot reconstruct a clean cohort with exposure, outcome, covariates, and time, you do not have an evidence pipeline; you have a marketing analytics setup.
| Domain | Examples |
|---|---|
| Clinical | A1c, BP, FEV1, PHQ-9, NYHA class |
| Utilization | Admissions, ED visits, office visits |
| Economic | Allowed amounts, PMPM cost, drug spend |
| Behavioral | Logins, module completion, sensor data |
| Safety | AEs, SAEs, symptom worsening |
Stage 1: Feasibility and single-site pilots
Objective: de-risk usability and basic signal, not prove everything.
Design:
- Small single-arm or simple controlled pilots.
- Focus on:
- Engagement curves
- Dropout reasons
- Data quality and completeness
- Basic clinical signal (direction, magnitude)
This is where you fix:
- Onboarding sequences that lose 40% of patients.
- Clinician workflows that generate resentment.
- Data gaps (e.g., no follow-up A1c in the EHR for half the cohort).
Do not over-interpret clinical effect here. Use it to refine the real trial.
Stage 2: Pragmatic RCT or hybrid effectiveness trials
This is your anchor evidence. You need:
- Clear inclusion / exclusion aligned to intended use.
- Hybrid designs when reasonable: part efficacy, part pragmatic.
- Real-world comparators: usual care, care management, etc.
- Semi-automated data capture (EHR integration, PROs via app).
If you design it “too clean” (single academic medical center, hyper-selected patients, heavily supported), you will get:
- A beautiful NEJM paper
- And a real-world effect size half as large and twice as messy
So push for:
- Multi-site, community-inclusive recruiting
- Minimal extra clinic workflow burden
- Outcomes that are routinely captured in EHRs or claims
Stage 3: Post-market RWE engine
After first approvals or first major contracts, the question shifts:
- Does the effect persist outside trial conditions?
- Does adherence decay kill value?
- What is the real ROI across plans and geographies?
This is where you need:
- Prospective registries of all deployed patients (or pragmatic cohorts by site/plan).
- Observational comparative effectiveness studies using external or internal controls.
- Pre-specified safety and performance monitoring rules.
This is not “maybe we will do a registry if we have time”. This is the core of a serious DTx company.
3. Real-World Evidence: Choosing the Right Design for Digital Therapeutics
Real-world evidence is not one thing. For DTx, specific designs make sense; others are traps.
| Category | Value |
|---|---|
| Single-Arm Pre/Post | 40 |
| Matched Cohort | 80 |
| Pragmatic Cluster Trial | 90 |
| Registry with External Controls | 85 |
Single-arm pre/post: the seductive but weak option
Everyone starts here: “We’ll look at PHQ-9 before and after 3 months of app use.”
It is quick. It is cheap. And it is riddled with:
- Regression to the mean
- Secular trends
- Hawthorne effect
- Selection bias (only motivated users stick around for follow-up)
Use it:
- For early signal.
- For product iteration.
Do not base reimbursement or big clinical claims on this alone.
Matched cohort studies: the RWE workhorse
Better: compare users vs non-users or high-users vs low-users, matched on key characteristics.
Sources:
- EHR: patients referred to DTx vs similar patients not offered it.
- Claims: members enrolled in DTx benefit vs similar members without.
Methods:
- Propensity score matching
- Inverse probability weighting
- Regression adjustment
Key pitfalls I have seen:
- Matching only on demographics, not disease severity or prior utilization.
- Index date problems (e.g., starting follow-up at different clinical stages in the two groups).
- “Immortal time bias” where one group has to survive event-free to qualify for exposure.
If your data science team cannot explain how they handled these biases in plain English, your RWE will not survive payer scrutiny.
Pragmatic and cluster-randomized trials in the wild
For DTx, cluster trials make a lot of sense:
- Randomize by clinic, system, or payer region.
- Some clusters get usual care; others get usual care plus DTx.
Advantages:
- Reduces contamination (patients in same clinic all get similar offer).
- Fits easier with reality of digital deployments.
Disadvantages:
- Need more clusters to avoid baseline imbalance.
- Requires buy-in from leadership and IT at the cluster level.
Hybrid designs (effectiveness plus implementation) are gold here. You answer:
- Does it work?
- Can we implement it across dozens of clinics?
- What are the real adoption barriers?
Registries and continuous RWD
A good DTx registry is not a dusty REDCap project. It is:
- Automatically populated via app + EHR/claims feeds.
- Designed with a minimal core dataset so completeness stays high.
- Governed with clear rules on analysis, transparency, and publication.
Common mistake: 200 data elements that no one can fill consistently. You want 15–25 core variables that are always there, and maybe 10–20 optional ones.
4. Designing a Data Architecture That Serves RWE (Not Just Product Analytics)
Most engineering teams will happily build Mixpanel dashboards of clicks and sessions. That is not an evidence pipeline.
You, as the medically trained person, must push for a clinically oriented data architecture.

Core requirements:
Patient-level longitudinal data
- Unique identifiers with safe, compliant linkage across sources.
- Time-stamped events: onboarding, exposure, outcomes, AEs.
Clear exposure definitions
- What counts as “treated”?
- Thresholds for adherence (e.g., modules completed, mins per week).
- Time windows (e.g., exposure during which follow-up period).
Outcome standardization
- Use standard coding where possible: ICD-10, SNOMED, LOINC.
- Map all site-specific quirks into a coherent layer.
Cohort and feature store
- Pre-built cohorts for main indications (e.g., T2D, MDD).
- Reusable feature definitions (e.g., “prior 12-month total cost”, “baseline PHQ-9 severity”).
If you build this right, each new RWE question becomes:
- “Define cohort, run pre-specified analysis template, interpret.”
Not:
- “Three-month project to clean and stitch together a new dataset, hope the analyst didn’t leave.”
5. Regulatory Strategy: Where RWE Fits for DTx
Regulators have become much more open to real-world evidence, but they are not naive. You must separate:
- Evidence for initial market authorization
- Evidence for label expansion or post-market commitments
| Category | Value |
|---|---|
| Pre-Market | 80 |
| Early Post-Market | 50 |
| Mature Product | 30 |
Initial authorization: RCTs plus supportive RWE
For first-in-class or higher risk DTx, you are unlikely to get away without at least one solid RCT. RWE here usually:
- Supports safety profile in broader populations.
- Demonstrates consistency of use or feasibility.
- Provides contextual data on standard of care outcomes.
Smart play:
- Run an RCT designed so that key endpoints mirror what you can later collect from RWD (e.g., same definitions of hospitalization, same lab thresholds).
- In parallel, start an early registry or observational cohort, so you are not starting from zero post‑approval.
Label expansion and iterative updates
DTx changes. Software versions update, indications expand. Here, RWE can be pivotal:
- New populations (e.g., adolescents, different comorbidities).
- Adjustments to dosing/usage recommendations.
- Additional clinical claims (e.g., impact on comorbidity X).
You need pre-planned:
- Versioning: which software build is associated with which dataset?
- Bridging analyses: are outcomes comparable across versions?
- Risk management plans: how do you monitor new failure modes?
FDA and other regulators will listen if:
- Your RWE data sources are stable and well-described.
- Your analytic plans are registered / pre-specified.
- You show sensitivity analyses and do not cherry-pick.
6. Payer-Facing RWE: Turning Data into Contracts
I have seen more good DTx products fail in payer conversations than in FDA conversations. Why? Because they bring the wrong evidence.
Payers care less about:
- Fancy engagement metrics
- App store ratings
- Even some intermediate clinical endpoints
They care more about:
- Admissions avoided
- ED visits avoided
- Cost per member per month (PMPM)
- High-cost drug utilization
- Return-on-investment timelines

You need a payer-grade RWE package that has:
- At least one robust comparative analysis vs standard of care.
- Subgroup results showing where the effect is strongest (so they can target).
- Economic modeling with transparent assumptions.
- A clear framing for an outcomes-based contract.
Typical studies:
- Retrospective cohort in claims data: enrollees using DTx vs matched controls.
- Prospective implementation with pre-agreed metrics: 12-month follow-up, shared analytics.
| Domain | Example Metric |
|---|---|
| Utilization | All-cause admissions |
| Utilization | ED visits per 1000 members |
| Economic | PMPM total cost |
| Clinical | A1c >1% drop, PHQ-9 remission |
| Engagement | Active users at 90 days |
If you walk into a payer meeting with only trial data and no real-world utilization or cost story, you will get “We’ll monitor and see in a few years.” Translation: no.
7. Operationalizing the Evidence Pipeline: Roles, Governance, and Culture
Evidence pipelines fail less from bad statistics and more from organizational dysfunction.
You need:
- A clear owner: usually a Head of Clinical Evidence / VP Medical Affairs with enough clout.
- Cross-functional team: clinical, biostatistics, data engineering, product, regulatory, payer strategy.
- Governance: how you decide what to study, how to publish, how to handle negative results.
| Step | Description |
|---|---|
| Step 1 | CEO |
| Step 2 | Medical Evidence Committee |
| Step 3 | CPO |
| Step 4 | CMO |
| Step 5 | Clinical Research |
| Step 6 | Data Science |
| Step 7 | Regulatory |
| Step 8 | Payer Strategy |
Culture rules that matter:
- Pre-specify major analyses. No p-hacking your way to significance after the fact.
- Commit to publishing or sharing key results, even when they are not flattering.
- Avoid “zombie studies” – analyses that get started and never finished because they lack a decision owner.
For someone post-residency working in a DTx startup, your leverage is here:
- You can translate between clinician expectations and data realities.
- You can smell clinically meaningless endpoints.
- You can push back when a product manager wants to measure “daily opens” instead of symptom change.
8. Common Failure Modes – And How to Avoid Them
I will be blunt. These are patterns I keep seeing, and they will kill your DTx evidence strategy.
| Category | Value |
|---|---|
| Poor Data Architecture | 35 |
| Weak Study Design | 30 |
| No Payer Focus | 20 |
| Regulatory Misalignment | 15 |
No clean denominator
- You cannot say “X% improved” if you do not know how many patients were eligible and started.
- Fix: rigorous onboarding tracking and cohort definition.
Overreliance on self-reported outcomes with no objective anchor
- PHQ-9 and GAD-7 are valuable. But by themselves, with no utilization or independent clinical confirmation, they are fragile for payers.
- Fix: combine PROs with EHR/claims outcomes where possible.
Disappearing denominator over time
- Only the most engaged patients provide follow-up data. Biases your effect estimate upward.
- Fix: analyze intent-to-treat like cohorts, handle missingness explicitly, report dropout patterns.
Version chaos
- You change product features weekly and then claim “clinical effect over 12 months” without version control.
- Fix: lock critical therapeutic elements per major version and tag users by build.
Engineering-driven metrics
- Measuring what is easy (clicks, screens) instead of what is clinically or economically meaningful.
- Fix: align measurement with endpoints clinicians and payers care about. Yes, even if harder.
Evidence divorced from go-to-market
- Clinical team produces beautiful RWE that sales and payer teams never use effectively.
- Fix: build payer- and provider-ready evidence playbooks with curated, interpretable numbers and stories.
9. How This Looks in Practice: A Concrete Example
Let me sketch a realistic evidence pipeline for a hypothetical DTx for COPD self-management:
Core claim: Reduces all-cause COPD-related hospitalizations over 12 months when added to usual care.
Stepwise plan:
Stage 0 – Architecture
- Decide to link: app data + EHR (exacerbations, spirometry) + claims (admissions, ED).
- Standardize COPD severity, smoking status, comorbidities.
Stage 1 – Single-site pilot (n≈100)
- Outcomes: engagement, feasibility, self-reported exacerbation management, data capture completeness.
- Fix onboarding flows, education content, alert fatigue.
Stage 2 – Pragmatic cluster RCT (10 clinics vs 10 clinics)
- Clinics randomized to DTx+usual care vs usual care.
- Primary endpoint: COPD-related admissions at 12 months from index.
- Data via EHR + claims, minimal extra clinic work.
- Pre-specified subgroups: GOLD stage, prior-year admissions.
Stage 3 – Post-market registry
- All deployments automatically enrolled into a registry layer (opt-in / consent as required).
- Quarterly safety and effect monitoring.
- Yearly payer-partnered RWE analysis, matching non-users at health system or plan level.
Payer package
- One major RCT result plus:
- 18-month real-world reduction in admissions (matched analysis).
- PMPM savings, stratified by baseline risk.
- Proposed outcomes-based arrangement: fees tied to admissions reduction threshold.
- One major RCT result plus:
This is a pipeline. Not a one-off trial. It keeps producing evidence as product and context evolve.
FAQs (Exactly 6)
1. Do all digital therapeutics need a randomized controlled trial, or can RWE alone be enough?
No, not all require an RCT, but for most higher-risk or first-in-class DTx claiming meaningful clinical benefit, one well-designed RCT is strategically smart. RWE alone may be adequate for lower-risk adjunctive products with modest claims, or for incremental indications on top of an existing evidence base. The real question is: what level of uncertainty will regulators, clinicians, and payers tolerate for your specific indication and risk profile? For anything involving hard outcomes (hospitalizations, disease progression), expect to need at least one solid randomized dataset.
2. How early in product development should I start building an RWE strategy?
Before you sign your first health system or payer contract. Ideally earlier. The data architecture that enables RWE – patient identifiers, time stamps, linkage to EHR/claims, PRO capture – must be baked into the product from the beginning. Retrofitting it later is painful and often incomplete. At minimum, during your first real-world pilot you should already be capturing data in a way that can scale into a registry.
3. What is the minimum dataset I need to support serious RWE analyses?
You need four pillars: (1) reliable patient identifiers for linkage; (2) exposure data (who used what, when, and how much); (3) outcome data that matter clinically or economically (labs, PROs, utilization, costs); and (4) covariates for confounding control (age, sex, comorbidities, baseline severity, prior utilization). If one of these is missing or severely incomplete, your ability to generate credible RWE drops sharply.
4. How do I handle constant software updates when trying to build long-term evidence?
Version control and “therapeutic core” definitions. Freeze the elements that are directly tied to your clinical claims (e.g., algorithm logic, core therapeutic content) per major version, and annotate all user data with version IDs. Then design analyses to either (a) focus on a specific version window, or (b) use bridging analyses to show that outcomes under different versions are comparable or improved. Regulators and payers are fine with iteration as long as you treat it transparently and systematically.
5. What is the biggest mistake clinicians make when they move into DTx evidence roles?
They import academic habits without adjusting for speed and business constraints. Endless exploratory analyses, over-complex protocols, or publications that are clinically impressive but commercially irrelevant. In a startup, every study must answer a decision-critical question: regulatory, payer, or product. If you cannot state that decision in one sentence, you do not need the study yet.
6. How do I convince leadership to invest in a real evidence pipeline instead of just a single flagship trial?
You frame it as risk and revenue protection. A one-off trial may get you through first authorization or the first few contracts, but it will not defend your pricing, support label expansion, or survive comparative scrutiny when competitors arrive. Show them a simple map: initial RCT → early real-world deployments → payer evaluations → larger contracts tied to outcomes. Without continuous RWE, you stall at the second arrow. With it, you build compounding proof and a moat. Then attach rough numbers: what a 10–20% improvement in payer win rate or contract size means in revenue terms.
Key takeaways:
- Stop thinking “trial” and start thinking evidence pipeline: continuous, data-architected, tied to product and commercial decisions.
- Build your measurement and data linkage capabilities early; without them, RWE is marketing fluff, not clinical evidence.
- Design RWE that speaks to regulators, clinicians, and payers differently, but from the same underlying, robust data backbone.