HypoGrid
Request access

Guide Hypothesis-driven thinking, end-to-end

A startup is a stack of hypotheses.
Hypothesis-driven thinking is the shared language between founders and investors.

A business plan is not a declaration of correctness. It is a stack of testable claims. Founders retire weak assumptions cheaply; investors evaluate the quality of the learning loop. This guide connects Lean Startup, Customer Development, design thinking, A/B testing, PLG, and VC diligence into a single uncertainty stack: different layers of the same problem, not competing schools.

00 — TL;DR

Three operating principles

  1. Run hypotheses as a layered ledger

    Market → customer → value prop → solution → acquisition → unit economics → retention → expansion → execution. Decompose the top claim into testable sub-claims, then into observable predictions.

  2. Move every hypothesis through the full loop

    Abduction → deduction → observation → induction → Bayesian update → resource reallocation. Don't think it, log it; don't log it, predict it; don't predict it, test it; don't test it, update.

  3. VCs read the update history, not the pitch

    Which hypotheses moved, on what evidence, with which mistakes avoided, and which milestone the next round buys. Sequoia, a16z, NfX, OpenView, YC, First Round — every public playbook lands on this same view.

01 — Theory

Hypothesis testing is one loop with four epistemic moves

Hypothesis testing isn't "ship it and see what happens." It's abduction (build the best partial explanation from incomplete information), deduction (turn that explanation into observable predictions), induction (generalize from multiple observations), and Bayesian update (let new evidence raise or lower your conviction). A hypothesis only becomes testable when it's been converted from a frame into a prediction: if this is true, here is what we should observe.

01

Abduction

From fragments, build the most plausible "why is this happening?" explanation.

02

Deduction

Convert the explanation into "if true, then we should observe X."

03

Observation

Collect quantitative and qualitative evidence — experiments, interviews, logs, signed contracts.

04

Induction

Pull patterns from multiple observations; weigh how far the generalization travels.

05

Bayesian update

Raise or lower conviction with new evidence. Move budget and headcount with it.

06

Decision

Pick one explicitly: continue, pivot, or kill. Vagueness here is where startups die slowly.

Human judgment is biased by default. Tversky & Kahneman's representativeness heuristic and Nickerson's confirmation bias do not pause for startups. Always write the falsification criteria next to the hypothesis itself: what would change your mind, before you go looking for evidence. This is the most load-bearing discipline in hypothesis-driven thinking.

The science of deciding (RCT evidence)

Camuffo et al.'s 2019 RCT (116 Italian startups) and the 2024 follow-up across four RCTs and 759 startups showed: founders who explicitly wrote out hypotheses, required data, and kill criteria up front got better outcomes — and, importantly, better quits. The scientific approach doesn't make founders give up on ideas. It helps them give up at the right moment, and biases pivots toward "fewer, better changes" instead of reckless reinvention.

02 — Hypothesis layers

What VCs actually evaluate is six layers, not one company

"Will this company grow?" is too coarse to act on. In practice, the uncertainty splits into stackable layers — each with its own evidence, its own kill criteria, and its own place in the financing structure.

Hypothesis layer Founder-side evidence VC-side question Tranche / structure Trigger clause example
Market & problem High-frequency pain, observable workarounds, willingness to budget Is this problem worth paying for, today? Initial check; follow-on after validation Repeat validation across 10 target accounts
ICP & GTM Best-customer profile, loss reasons, repeatability of founder sales Can you say who you sell to, in concrete terms? Tied to founder sales continuing 3+ paid customers in the same segment
Solution & retention Continued usage, cohort curves, churn reasons Do they keep using it? Pre-scale spend held back Retained-cohort improvement releases next tranche
Unit economics CAC, payback, gross margin, churn Does growth make the loss worse? Hiring budget gated on payback Gross margin / payback hits threshold
Execution Speed of testing, learning logs, pivot quality Does this team learn faster than the market changes? Heavier board / advisor support Quarterly learning review delivered on time
Capital efficiency Mapping of milestones to spend Is the next round's bar clearly defined? Capital-to-milestones written into the docs Key KPI hit unlocks next tranche

03 — Methods compared

Six methods, one stack — they're not rival schools

Design thinking owns problem discovery. Customer Development owns hypothesis exploration. Lean Startup owns minimum-cost learning. Discovery / Validation experiments close the gap between what people say and what they do. A/B testing tightens causal inference. PLG owns repeatable acquisition, activation, retention, and referral. Same uncertainty stack — different layers.

Method What it's good at When to use it Skills required Time cost
Design thinking (IDEO)Quality of problem discovery, surfacing latent needsProblem exploration, 0→1 ideation, UX reworkField observation, interviews, structuringMedium
Customer Development (Steve Blank)Killing customer / market hypotheses fastPre-PMF, deep problem-side validationCustomer dialogue, hypothesis writing, loggingMedium-high
Lean Startup / MVPLowest-cost learning per questionEarly solution, pricing, and onboarding testsMVP design, instrumentation, iterationLow-medium
Discovery / Validation (Strategyzer)Closing the say–do gap, staged confidenceWhen you need to manage hypothesis confidence in tiersExperiment design, threshold-settingMedium
A/B testingTight causal inference at scaleLive activation, retention, and pricing flowsStats, instrumentation quality, SRM checksMedium-high
PLG / growth loopsRepeatable acquisition, activation, expansionSelf-serve SaaS, AI tools, B2B-lightFunnel design, onboarding, analyticsMedium

04 — VC lens

VCs don't ask "will it grow." They ask "in what order does the uncertainty resolve."

When you read the public playbooks side by side, top-tier VCs aren't looking at the same hypothesis as the founder — they're each emphasizing a different layer of the stack. The composite below is built from public writing, not internal IC memos.

Sequoia

Customer pull as the strongest PMF signal; sharpness of the founder's articulation of the problem; quality of the team and its obsession with the details. PMF treated as a graduated state, not a binary.

Andreessen Horowitz (a16z)

Retention as the dominant PMF read. LTV, CAC, churn, paid CAC, and cohort retention as the diagnostic stack. ICP is defined narrowly: company size, industry, geography, role, tech stack, problem solved, and reasons for loss / churn.

NfX

Founder-market fit decomposed into obsession, founder story, personality, and experience. At Series A, looks for traction, PMF, minimum scale, and unit-economics evidence — in that order.

OpenView

CAC payback as the central read on GTM efficiency. Argues against worshipping LTV:CAC alone. Pairs payback with magic number and gross-retention as the real SaaS lens.

Y Combinator

Problem-first, low-tone, founder honesty. Distinguishes "real PMF" from "fake PMF." Wary of locking down ICP too early before founder sales has surfaced who actually pulls.

First Round

Treats the PMF survey ("very disappointed if you couldn't use this," 40% threshold) as a leading indicator — explicitly a supporting metric to be read alongside retention and cohort behavior, not in place of them.

05 — Templates

Put the hypothesis ledger at the center

The fastest way for a founder and an investor to look at the same map is a single hypothesis ledger: claim, rationale, kill criteria, test plan, update history, owner, decision, and investment implication — all on one row. HypoGrid is a runtime built directly on top of that ledger model.

TemplateRequired fieldsHow to use
Hypothesis treeTop hypothesis → sub-hypotheses → observable indicators → kill criteriaDon't say "there's a market." Decompose down to who, in what behavior, demonstrates it.
MECE hypothesis mapMarket / customer / problem / value / solution / GTM / pricing / retention / orgSurface what's untested; keep coverage from drifting into the parts you already believe.
Hypothesis test planHypothesis, prediction, experiment, sample, threshold, deadline, cost, next actionDecide go / pivot / kill before the experiment runs, not after.
Experiment design sheetPrimary metric, guardrails, MDE, SRM check, data source, stopping rulesUse for A/B, onboarding rework, anything where instrumentation quality matters.

Hypothesis test plan — worked example

Hypothesis
CS leaders at B2B SaaS companies (200–1,000 employees) feel response-quality consistency as a stronger pain than AI-summary speed.
Prediction
Within two weeks of demo, ≥2 of 5 advance to a paid PoC.
Kill criteria
Calls go great, but nothing moves through data integration, internal approval, or paid conversion.
Experiment
Founder sales: 10 calls, 3 PoC proposals, 3 price points tested.
Metrics
PoC conversion rate, time-to-first-value, weekly active retention, expansion across departments.
Decision rule
If ≥2 paid PoCs land and 8-week retention holds, this becomes a tranche-trigger condition.

06 — Pitfalls & checklist

Five traps that show up in almost every diligence

Common mistakeWhy it happensCounter
Jumping to solutionBelieving "users say they want it"Run problem interviews first; write kill criteria before building
Confirmation biasFive friendly calls, all said yesTag every signal as support / contradict / unclear
Misreading A/BSmall sample, claimed win, no SRM checkPre-define primary metric, MDE, stopping rules, SRM verification
Premature scalingHiring and ad spend before retention proves outHold fixed costs flat until the PMF gate is cleared
Fake ICPChasing big logos that don't retainCompare best customers vs. churned customers monthly

Joint diligence checklist

ItemFounder checkVC check
Depth of problemIs there an existing alternative and a workaround?Is the pain budgeted, not just felt?
ICPCan you describe the best customer in concrete terms?Are loss / churn reasons cleanly categorized?
MVPIs it the smallest thing that produces the learning?Has feature work outrun the learning goal?
Test qualityWere thresholds and kill criteria written before the test?Is the evidence closer to do than say?
RetentionAre cohorts improving?Are new cohorts deteriorating?
Unit economicsAre you watching CAC, payback, and gross margin?Does growth make the loss structure worse?
ExecutionIs the learning cycle running on a weekly cadence?Does this team out-learn the market?
Use of capitalIs the bar for the next round explicit?Is capital-to-milestones written into the round?

07 — Worked cases

Two short worked examples — built from public playbooks

B2B AI customer-support SaaS

Setup. "AI handles inbound faster" demos brilliantly, but no one can say which KPI it moves, for whom.

Worked answer. Narrow ICP to "CS leaders at SaaS companies with 20k+ inbound tickets / month." Reframe the lead hypothesis from AI accuracy to response-quality consistency and budgeted first-contact resolution. Run 10 founder calls → 3 design partners → 2 paid PoCs. Track weekly active use, FRT, CSAT, and department-level expansion. From the VC side, the round can be structured so that two paid PoCs and 8-week retention unlock the next tranche.

Local-services marketplace

Setup. Demand-side ad spend keeps growing, but supply-side retention is weak.

Worked answer. Lead with supply-side GMV retention. Don't read first transaction value — read m1 and m3 supplier re-listing and revenue expansion. Demand-side paid acquisition only scales once supplier retention improves and demand-side repeat lifts. From the VC side, hold paid acquisition flat until the cohort plateau is observed.

Make the hypothesis ledger your runtime.

Everything in this guide — the hypothesis ledger, the test plan, the kill criteria, the update history, the trigger clauses — lives in HypoGrid as JSON ledgers and the Hypothesis Briefs that render from them. Founders run validation; investors run diligence; both can work from the same H-IDs.