Sequoia
Customer pull as the strongest PMF signal; sharpness of the founder's articulation of the problem; quality of the team and its obsession with the details. PMF treated as a graduated state, not a binary.
Guide Hypothesis-driven thinking, end-to-end
A business plan is not a declaration of correctness. It is a stack of testable claims. Founders retire weak assumptions cheaply; investors evaluate the quality of the learning loop. This guide connects Lean Startup, Customer Development, design thinking, A/B testing, PLG, and VC diligence into a single uncertainty stack: different layers of the same problem, not competing schools.
00 — TL;DR
Market → customer → value prop → solution → acquisition → unit economics → retention → expansion → execution. Decompose the top claim into testable sub-claims, then into observable predictions.
Abduction → deduction → observation → induction → Bayesian update → resource reallocation. Don't think it, log it; don't log it, predict it; don't predict it, test it; don't test it, update.
Which hypotheses moved, on what evidence, with which mistakes avoided, and which milestone the next round buys. Sequoia, a16z, NfX, OpenView, YC, First Round — every public playbook lands on this same view.
01 — Theory
Hypothesis testing isn't "ship it and see what happens." It's abduction (build the best partial explanation from incomplete information), deduction (turn that explanation into observable predictions), induction (generalize from multiple observations), and Bayesian update (let new evidence raise or lower your conviction). A hypothesis only becomes testable when it's been converted from a frame into a prediction: if this is true, here is what we should observe.
From fragments, build the most plausible "why is this happening?" explanation.
Convert the explanation into "if true, then we should observe X."
Collect quantitative and qualitative evidence — experiments, interviews, logs, signed contracts.
Pull patterns from multiple observations; weigh how far the generalization travels.
Raise or lower conviction with new evidence. Move budget and headcount with it.
Pick one explicitly: continue, pivot, or kill. Vagueness here is where startups die slowly.
Human judgment is biased by default. Tversky & Kahneman's representativeness heuristic and Nickerson's confirmation bias do not pause for startups. Always write the falsification criteria next to the hypothesis itself: what would change your mind, before you go looking for evidence. This is the most load-bearing discipline in hypothesis-driven thinking.
Camuffo et al.'s 2019 RCT (116 Italian startups) and the 2024 follow-up across four RCTs and 759 startups showed: founders who explicitly wrote out hypotheses, required data, and kill criteria up front got better outcomes — and, importantly, better quits. The scientific approach doesn't make founders give up on ideas. It helps them give up at the right moment, and biases pivots toward "fewer, better changes" instead of reckless reinvention.
02 — Hypothesis layers
"Will this company grow?" is too coarse to act on. In practice, the uncertainty splits into stackable layers — each with its own evidence, its own kill criteria, and its own place in the financing structure.
| Hypothesis layer | Founder-side evidence | VC-side question | Tranche / structure | Trigger clause example |
|---|---|---|---|---|
| Market & problem | High-frequency pain, observable workarounds, willingness to budget | Is this problem worth paying for, today? | Initial check; follow-on after validation | Repeat validation across 10 target accounts |
| ICP & GTM | Best-customer profile, loss reasons, repeatability of founder sales | Can you say who you sell to, in concrete terms? | Tied to founder sales continuing | 3+ paid customers in the same segment |
| Solution & retention | Continued usage, cohort curves, churn reasons | Do they keep using it? | Pre-scale spend held back | Retained-cohort improvement releases next tranche |
| Unit economics | CAC, payback, gross margin, churn | Does growth make the loss worse? | Hiring budget gated on payback | Gross margin / payback hits threshold |
| Execution | Speed of testing, learning logs, pivot quality | Does this team learn faster than the market changes? | Heavier board / advisor support | Quarterly learning review delivered on time |
| Capital efficiency | Mapping of milestones to spend | Is the next round's bar clearly defined? | Capital-to-milestones written into the docs | Key KPI hit unlocks next tranche |
03 — Methods compared
Design thinking owns problem discovery. Customer Development owns hypothesis exploration. Lean Startup owns minimum-cost learning. Discovery / Validation experiments close the gap between what people say and what they do. A/B testing tightens causal inference. PLG owns repeatable acquisition, activation, retention, and referral. Same uncertainty stack — different layers.
| Method | What it's good at | When to use it | Skills required | Time cost |
|---|---|---|---|---|
| Design thinking (IDEO) | Quality of problem discovery, surfacing latent needs | Problem exploration, 0→1 ideation, UX rework | Field observation, interviews, structuring | Medium |
| Customer Development (Steve Blank) | Killing customer / market hypotheses fast | Pre-PMF, deep problem-side validation | Customer dialogue, hypothesis writing, logging | Medium-high |
| Lean Startup / MVP | Lowest-cost learning per question | Early solution, pricing, and onboarding tests | MVP design, instrumentation, iteration | Low-medium |
| Discovery / Validation (Strategyzer) | Closing the say–do gap, staged confidence | When you need to manage hypothesis confidence in tiers | Experiment design, threshold-setting | Medium |
| A/B testing | Tight causal inference at scale | Live activation, retention, and pricing flows | Stats, instrumentation quality, SRM checks | Medium-high |
| PLG / growth loops | Repeatable acquisition, activation, expansion | Self-serve SaaS, AI tools, B2B-light | Funnel design, onboarding, analytics | Medium |
04 — VC lens
When you read the public playbooks side by side, top-tier VCs aren't looking at the same hypothesis as the founder — they're each emphasizing a different layer of the stack. The composite below is built from public writing, not internal IC memos.
Customer pull as the strongest PMF signal; sharpness of the founder's articulation of the problem; quality of the team and its obsession with the details. PMF treated as a graduated state, not a binary.
Retention as the dominant PMF read. LTV, CAC, churn, paid CAC, and cohort retention as the diagnostic stack. ICP is defined narrowly: company size, industry, geography, role, tech stack, problem solved, and reasons for loss / churn.
Founder-market fit decomposed into obsession, founder story, personality, and experience. At Series A, looks for traction, PMF, minimum scale, and unit-economics evidence — in that order.
CAC payback as the central read on GTM efficiency. Argues against worshipping LTV:CAC alone. Pairs payback with magic number and gross-retention as the real SaaS lens.
Problem-first, low-tone, founder honesty. Distinguishes "real PMF" from "fake PMF." Wary of locking down ICP too early before founder sales has surfaced who actually pulls.
Treats the PMF survey ("very disappointed if you couldn't use this," 40% threshold) as a leading indicator — explicitly a supporting metric to be read alongside retention and cohort behavior, not in place of them.
05 — Templates
The fastest way for a founder and an investor to look at the same map is a single hypothesis ledger: claim, rationale, kill criteria, test plan, update history, owner, decision, and investment implication — all on one row. HypoGrid is a runtime built directly on top of that ledger model.
| Template | Required fields | How to use |
|---|---|---|
| Hypothesis tree | Top hypothesis → sub-hypotheses → observable indicators → kill criteria | Don't say "there's a market." Decompose down to who, in what behavior, demonstrates it. |
| MECE hypothesis map | Market / customer / problem / value / solution / GTM / pricing / retention / org | Surface what's untested; keep coverage from drifting into the parts you already believe. |
| Hypothesis test plan | Hypothesis, prediction, experiment, sample, threshold, deadline, cost, next action | Decide go / pivot / kill before the experiment runs, not after. |
| Experiment design sheet | Primary metric, guardrails, MDE, SRM check, data source, stopping rules | Use for A/B, onboarding rework, anything where instrumentation quality matters. |
06 — Pitfalls & checklist
| Common mistake | Why it happens | Counter |
|---|---|---|
| Jumping to solution | Believing "users say they want it" | Run problem interviews first; write kill criteria before building |
| Confirmation bias | Five friendly calls, all said yes | Tag every signal as support / contradict / unclear |
| Misreading A/B | Small sample, claimed win, no SRM check | Pre-define primary metric, MDE, stopping rules, SRM verification |
| Premature scaling | Hiring and ad spend before retention proves out | Hold fixed costs flat until the PMF gate is cleared |
| Fake ICP | Chasing big logos that don't retain | Compare best customers vs. churned customers monthly |
| Item | Founder check | VC check |
|---|---|---|
| Depth of problem | Is there an existing alternative and a workaround? | Is the pain budgeted, not just felt? |
| ICP | Can you describe the best customer in concrete terms? | Are loss / churn reasons cleanly categorized? |
| MVP | Is it the smallest thing that produces the learning? | Has feature work outrun the learning goal? |
| Test quality | Were thresholds and kill criteria written before the test? | Is the evidence closer to do than say? |
| Retention | Are cohorts improving? | Are new cohorts deteriorating? |
| Unit economics | Are you watching CAC, payback, and gross margin? | Does growth make the loss structure worse? |
| Execution | Is the learning cycle running on a weekly cadence? | Does this team out-learn the market? |
| Use of capital | Is the bar for the next round explicit? | Is capital-to-milestones written into the round? |
07 — Worked cases
Setup. "AI handles inbound faster" demos brilliantly, but no one can say which KPI it moves, for whom.
Worked answer. Narrow ICP to "CS leaders at SaaS companies with 20k+ inbound tickets / month." Reframe the lead hypothesis from AI accuracy to response-quality consistency and budgeted first-contact resolution. Run 10 founder calls → 3 design partners → 2 paid PoCs. Track weekly active use, FRT, CSAT, and department-level expansion. From the VC side, the round can be structured so that two paid PoCs and 8-week retention unlock the next tranche.
Setup. Demand-side ad spend keeps growing, but supply-side retention is weak.
Worked answer. Lead with supply-side GMV retention. Don't read first transaction value — read m1 and m3 supplier re-listing and revenue expansion. Demand-side paid acquisition only scales once supplier retention improves and demand-side repeat lifts. From the VC side, hold paid acquisition flat until the cohort plateau is observed.
Everything in this guide — the hypothesis ledger, the test plan, the kill criteria, the update history, the trigger clauses — lives in HypoGrid as JSON ledgers and the Hypothesis Briefs that render from them. Founders run validation; investors run diligence; both can work from the same H-IDs.