Deciding what’s worth building among everything you could ship.
what we mean by this
Worth is not a prioritization exercise. It's a judgment call — and it happens before the spreadsheet opens.
The failure mode most teams carry is treating worth as something you figure out at sprint planning, when the ticket is already written and the designer has already mocked it up. By then, the social cost of killing the idea is real, and the work has acquired a kind of momentum that feels like evidence. It isn't. That momentum is sunk cost wearing a roadmap badge.
Worth is the call you make in the quiet before any of that — when the opportunity is still a hypothesis, the capacity is still theoretical, and the only thing you're working with is judgment. The question isn't "how do we score this against the backlog?" It's: is this worth any of our time at all?
Three inputs matter: opportunity size (what's the realistic ceiling if this works?), strategic fit (does this pull in the direction we've decided to go, or is it an interesting detour?), and team capacity at the margin (can we actually prosecute this at the quality required, or are we spreading too thin?). But here's the thing most frameworks miss: the real skill in worth isn't the calculation. It's the ignore list. A team that ships three things in a year that genuinely matter has exercised worth better than the team that shipped eighteen things and couldn't tell you which ones moved anything.
citation
PL Standard v3.1 · Map · Worth
Markdown form: [PL Standard v3.1 · Map · Worth](https://pragmaticleaders.io/framework/competencies/worth)
The 50 requests you didn't build — that's your worth score. The criterion for making that ignore list sharp is what separates L1 from L4.
the four levels
Anchored at every rung. The blockquote at each level is panel-authored and pulled live from the rubric — edit anchors via the panel tooling and they appear here.
L1Developing
builds whatever was requested most recently, without checking whether it moves a metric or fills a real gap.
you'll see this when…
A PM explains a new feature by leading with who asked for it — a customer, the CEO, a sales rep — rather than what need it addresses and whether that need is actually large enough to matter.
The team's roadmap is a nearly-complete list of every request received in the last quarter, with light re-ranking but no real kills.
A PM can articulate reasons for every item on the backlog but struggles to say plainly why an item isn't on it. The absence of reasoning about non-bets is the tell.
common failure mode: building because someone asked — a customer, a C-suite exec, anyone with the social weight to make "no" feel risky — without an honest read on whether the underlying opportunity justifies the spend.
L2Competent
ranks options by user impact and feasibility on familiar problems, but stalls or seeks input when tradeoffs involve unfamiliar constraints or high business stakes.
you'll see this when…
A PM applies a real prioritization framework (RICE, opportunity scoring, value-effort grid) and can explain the mechanics of the ranking.
They push back on at least some requests with structured rationale: "the effort is high and the addressable problem is smaller than it looks."
Scope occasionally gets killed — but usually only after the team has already invested meaningful discovery time, not at the front door.
common failure mode: confusing scoring with judgment — running every idea through a framework and trusting the output, even when the inputs are guesses and the model is giving false precision. The framework becomes a way to defer the hard call rather than make it.
L3Proficient
spots the trap in a promising idea before the team commits — flags why a high-demand feature solves the wrong problem, and redirects scoping without waiting to be asked.
you'll see this when…
A PM can articulate the opportunity ceiling for a proposed bet before any discovery work starts — not a precise number, but a defensible order of magnitude, and a clear read on whether the ceiling justifies the cost of "knowing more."
They distinguish between "worth discovering" and "worth building" — and treat discovery investment itself as a bet that requires worth judgment.
Their roadmap has real negative space: explicit holes where things didn't make the cut, with clear reasoning. They can name the best ideas they said no to.
common failure mode: over-indexing on strategic fit at the expense of opportunity size — saying yes to things that "make sense" for the product direction but are genuinely too small to matter, because fit is easier to argue than ceiling.
L4Expert
reframes how the org defines worth — replaces inherited criteria with ones built from evidence, so future prioritisation decisions change, not just the current one.
you'll see this when…
A PM's worth call is felt before it's explained — a one-sentence read on an opportunity that turns out to be right in ways that only become visible months later. They saw the strategic asymmetry before the data confirmed it.
They hold worth as a live judgment throughout a product's lifecycle, not just at intake — re-running the call when market conditions shift, when a competitor moves, when early signal suggests the ceiling was wrong.
Their teams have a shared vocabulary for worth that doesn't require the PM in every room. The standard has transferred.
common failure mode: over-systematizing worth — building a scoring system so rigorous that it crushes the intuition that made the judgment sharp in the first place. L4 PMs who've been burned by gut calls sometimes over-correct into a process that turns worth into a committee exercise. The system starts filtering out the asymmetric bets that don't score well on any dimension — which is exactly the kind of thing expert worth judgment is supposed to catch.
how to develop it
The most leveraged move here is deceptively simple: start keeping an explicit ignore list. Not a "someday/maybe" backlog — an actual record of ideas you evaluated and chose not to pursue, with a sentence on why. Revisiting this list in six months will teach you more about your own worth calibration than any framework.
Read. /manual/core-skills/strategy/prioritization — the mechanics of scoring frameworks, with honest coverage of where each breaks down. /manual/core-skills/strategy/roadmap-prioritization — how to translate worth calls into a roadmap that communicates the reasoning, not just the conclusion. /manual/b2b/intake-judgment — specifically on the front-door evaluation of incoming requests, where worth is most commonly flubbed.
Practice. Run a scenario where you're given a list of ten plausible feature requests and asked to produce an ignore list with reasoning — not a ranked list, an ignore list. The instinct to rank rather than kill is exactly what to notice and push against. Scenarios where a high-status stakeholder is behind the weakest idea are the most instructive.
Write. The canonical Brief tests worth judgment directly: when evaluating a product decision, write out the explicit opportunity ceiling before any other analysis. Force yourself to name the number. Then name the assumptions behind it. Then name the conditions under which you'd be wrong.
Coach yourself. After every major roadmap decision, ask: what's the best idea I said no to this quarter, and do I still think that was right? If you can't name a good idea you killed, you probably didn't exercise worth this cycle.
how to spot it in others
In a planning meeting, they volunteer the ceiling of an opportunity before anyone asks — "the addressable problem here is maybe 12% of our user base, and that's the optimistic read." They've already done the sizing.
When a high-status person brings a request, they engage the idea on its merit rather than on its source. They don't defer, and they don't reflexively resist — they read the opportunity.
Their past roadmaps have real kills, not just re-rankings. You can ask "what's the best thing you said no to last year?" and they'll have a crisp answer.
They treat the discovery budget itself as a resource subject to worth judgment. "We could spend two weeks figuring out if this is real — is that the best two weeks we have?" is a question they ask before commissioning research.
They're uncomfortable with roadmaps that look like a list of every request sanitized into feature language. They push for the negative space to be made explicit.
three failure modes we see often
The request-driven roadmap. This is L1's defining failure, but it shows up at every level. The underlying mechanism is social: saying no to someone who asked creates friction, and friction is uncomfortable. So the team finds ways to say yes — or, more often, ways to say "not yet" that function as yeses with a longer fuse. The cost isn't just building the wrong thing; it's the opportunity cost of not building the right one, which is invisible until it's too late.
False precision from scoring models. RICE scores, ICE scores, opportunity-scoring matrices — all useful as thinking tools, all dangerous as decision-making authorities. The failure mode is treating the output of the model as the decision, particularly when the inputs are estimates dressed up as data. A PM who has run the numbers feels like they've done worth judgment. They've done arithmetic. The judgment is in choosing the inputs and knowing when the model's output should be overridden by what you actually know.
Strategic fit as a substitute for opportunity size. "This fits our platform strategy" is a true thing that can be said about dozens of ideas that are too small to matter. Strategic fit is a filter, not a justification. The L3-to-L4 transition often requires a PM to kill several things that fit well but don't have a real ceiling — and to make peace with the fact that the right answer sometimes is: this makes sense and we still shouldn't build it.
worth is the judgment about whether an opportunity deserves to move at all; kill is what you do when an idea that already has momentum needs to stop. They're on the same axis of the ignore/kill decision, but worth operates before the work begins and kill operates when stopping is costly. Worth is the front door; kill is the emergency exit. Confusing them leads to either never starting the hard conversation (worth failure) or never pulling the plug once started (kill failure).
halt is about deciding not to build at all in a given domain or moment — a category-level no, often strategic or timing-based. Worth is the per-opportunity call: among the things we could build, which ones have a real ceiling and fit? Halt is the higher-order call: is this the right time, market, or team to be building in this space? A team can make good worth calls on individual bets while still needing to exercise halt on the whole program.
the nearest neighbor from outside Map. Bet is about sizing investment under uncertainty — how much to commit, what the reversibility looks like, how to stage the spend. Worth is the prior question: is this worth betting on at all? You can't exercise bet without first having exercised worth. The difference matters when scoring: a PM who calibrates bet sizing beautifully on the wrong opportunities is sharp at Acuity and weak at Map.
what good looks like in the wild
A founder — call her R — ran product at a Series B infrastructure startup. In her first year, the team had forty-two items in active consideration. She spent her first six weeks doing nothing but worth calls: talking to customers, sizing ceilings, mapping what the team could actually prosecute at quality. By end of month six, she had eleven items she was willing to call worth building. The other thirty-one went on an explicit ignore list, with a sentence on each.
Her engineering lead thought this was aggressive. Her investors thought she was being conservative. Her CEO, who had spent the previous year watching the team ship features no one used, thought it was the best thing that had happened to the product org since founding.
They shipped four of the eleven that year. One of them became the product's first genuine retention driver — not because it was the highest-scored item on any matrix, but because R had seen, six months earlier, that the ceiling was real and the strategic asymmetry existed before anyone else had named it.
The ignore list is what made this visible. Twelve months later, R reviewed it and found three items she'd been wrong about — two that deserved to be there, one that didn't. That's a good ratio. Calibrating it is the work.
“
The thirty-one things she didn't build in year one? Nobody misses them.