Judging quality you didn’t make (incl. AI output).
what we mean by this
Taste is the post-AI judgment skill — the one that got *more* valuable when generation became free.
When anyone can produce a first draft in thirty seconds, the differentiator shifts from the ability to produce to the ability to judge. Who can look at the draft and know, precisely, what's wrong with it? Who can say not just "this feels off" but "the third paragraph claims we have strong retention data, and we don't — this copy will create expectations we can't support"? Who catches the AI-generated product brief that opens with the right framing but quietly inflates confidence intervals throughout? That person is doing taste. The person who waves it through because something exists is not.
What taste specifically tests — and what no other MARK competency captures — is the quality of your read on outputs you didn't author. Room measures how you read people. User measures how you read customers. Taste is how you read the work itself: a design, a brief, a campaign, a piece of code, a model response, a slide. The act is evaluative. The standard is "good enough to ship, or not" — and the test of the competency is whether you can articulate the reason clearly enough that the person holding the work understands exactly what to fix.
Two things confuse beginners here. The first is conflating taste with preference. "I don't like this color" is a preference — it's about you, not the work, and it can't be acted on without a reference standard. "This color has a 3.2:1 contrast ratio against the background text and we have accessibility commitments" is taste — it's about the work against a bar, and it's specific enough to close. The second confusion is conflating taste with authority. Taste doesn't require seniority. An IC with high taste who can name exactly what's wrong and why earns more trust per critique than a director who just says "it needs more polish."
citation
PL Standard v3.1 · Ken · Taste
Markdown form: [PL Standard v3.1 · Ken · Taste](https://pragmaticleaders.io/framework/competencies/taste)
the four levels
Anchored at every rung. The blockquote at each level is panel-authored and pulled live from the rubric — edit anchors via the panel tooling and they appear here.
L1Developing
ships the first thing that works; can't tell whether the output is good or just done, so waits for a teammate to point it out.
you'll see this when…
they can identify that something feels wrong but can't name what — feedback lands as "it doesn't feel right" or "I think we can do better," with nothing specific enough to act on
they rate AI output higher than human output for the same content because something existing is a relief — the delight that the AI produced anything at all overrides their critical read
they confuse personal preference for quality signal: feedback is about their taste, not about whether the work clears the bar
common failure mode: The PM treats vague discomfort as a quality signal worth sharing, and specificity as optional — the person receiving the feedback can't tell what to fix.
L2Competent
catches obvious misses against the brief, but freezes on calls like 'is this copy on-brand?' or 'is this visual confusing?' when there's no rubric to lean on.
you'll see this when…
they can name one or two specific problems with a piece of work — the structure is wrong, the headline contradicts the body — but tend to stop at the most visible surface problem and miss compounding issues underneath
they give better feedback on formats they're fluent in (copy, decks) and slide into preference-language on formats they're less confident about (visual design, engineering artifacts, data models)
they apply a quality bar but it's calibrated to "what the team will accept" rather than to an external standard — taste at this level is peer-relative, not craft-relative
common failure mode: feedback that solves the stated problem but doesn't surface the one that matters more — the PM catches the typos and misses the false claim.
L3Proficient
names what makes a piece of work fail before it ships — including ai-generated output — and explains the criteria so others can apply the judgment independently.
you'll see this when…
they can evaluate work against an external standard — not just "what does our team usually do" but "what does the best version of this look like" — and they can articulate the gap specifically
they read AI output with the same skepticism they'd apply to any first draft: the fact that it was generated quickly is not evidence that it's right; they look for what the model elided, inflated, or got confidently wrong
they separate "good for now" from "good" — they can ship something that isn't perfect because they understand what the imperfection costs and they've made that tradeoff deliberately, not by overlooking it
common failure mode: the PM holds the work to the right standard but can't always translate the gap into actionable direction — they can diagnose accurately but struggle to instruct specifically.
L4Expert
sets the criteria others use to judge quality — names what makes a piece of work succeed or fail before the team can articulate it, and those calls prove right.
you'll see this when…
they can spot a quality failure that nobody else in the room has named — and they can name it at the level of craft, not just feeling: the campaign's third headline contradicts a brand commitment shipped six months ago; the AI brief's confidence framing doesn't match the evidence level of the data cited
they apply their taste as a gate that raises quality without becoming a permanent obstruction — they know when to kill something and when to fix it and when to ship it imperfect and why
they develop others' taste: they don't just evaluate work themselves, they run reviews in a way that builds the team's collective standard — the rubric becomes distributed, not personal
common failure mode: taste so refined that nothing ships — the quality gate becomes a no-machine, and the expert's role mutates from "raises the floor" to "blocks the door." The L4 trap is perfectionism dressed as standards, where the unspoken bar keeps moving and the team stops bringing work forward because the review feels less like development and more like an audit. The other L4 trap is the inverse: taste calibrated to what's *defensible* rather than what's right — the expert learns to articulate quality in ways that survive a review, but the bar itself is brand-safe rather than genuinely high.
how to develop it
The most leveraged move is building a habit of *naming the failure* rather than just feeling it. Most PMs stop at the feeling. The practice is one more step: when something feels wrong, write down the specific claim or element that's failing, the standard it's failing against, and what fixing it would require. Do this enough times and the vocabulary builds. Taste without vocabulary is just discomfort — it doesn't travel, can't be taught, and doesn't scale.
Read. /manual/product-thinking is the structural foundation — it builds the habits of reasoning about quality in products that taste eventually relies on. /manual/ai-product covers how to evaluate AI outputs specifically, which is where most PMs' taste is underdeveloped right now. If you work on consumer products, /manual/growth/activation-optimization is worth reading as a secondary source — it's full of examples where the quality bar on copy and flow is either held or abandoned under shipping pressure.
Practice. Find scenarios where you're handed a piece of completed work — a Brief, a mock PRD, a feature spec — and asked to evaluate it, not improve it. The distinction matters: improvement is generative, evaluation is critical, and taste lives in evaluation. If no specific Taste scenario exists in your track, the shape you want is: *here's what someone shipped; tell me what's wrong and why*. Run it until "I need to see more context" stops being your first move.
Write. Draft a Brief review comment on a real piece of work — something in your current roadmap or a recent design — where you're not allowed to use the word "feels." Force the feedback into specificity. "The hero copy implies we've solved the problem but the supporting copy immediately hedges" is a taste note. "The hero copy doesn't feel strong" is not.
Coach yourself. After any review where you gave feedback, ask: *could the person who received this feedback tell exactly what to change?* If the answer is no — if your feedback was interpretable but not actionable — that's the competency gap. One sentence, after each review, for a month.
how to spot it in others
In design reviews: do they name a specific problem with a specific artifact — "the empty state copies the error state and that's confusing for users who haven't failed yet" — or do they say "it needs work"?
In planning: when reviewing a roadmap or strategy doc, do they catch internal contradictions — claims that disagree with each other, or with earlier commitments — or do they evaluate each section in isolation and miss the seam?
After an AI-assisted draft: do they treat the output as a starting point to be evaluated critically, or as a deliverable to be polished? The tell is in how they present it: "here's a draft, I've flagged where the AI got the framing wrong" is taste in action; "here's what it generated" is not.
In post-mortems: when a product shipped with a quality failure that someone could have caught in review, do they ask "what was our quality bar and where did it fail us" or do they absorb it as "we were moving fast"?
In hiring: when they're evaluating candidates or contractors, can they distinguish between "good at this craft" and "good at something adjacent" — or do they conflate fluency with quality?
three failure modes we see often
The AI delight bias. The PM is so relieved that the model produced something coherent that they rate it too highly. This is especially common early in an AI-augmented workflow — the bar shifts from "is this good" to "is this better than nothing," and since AI output usually clears that low bar, it gets waved through. The cost compounds: decisions get made on briefs that elide the hard questions, campaigns ship with copy that contradicts previous commitments, and the team slowly loses its reference point for what "good" means because everything that ships is "AI-assisted." The fix is to evaluate AI output against the same bar you'd hold a senior team member to — not "did it produce something" but "is what it produced right."
The consensus taste. The PM calibrates their quality bar to whatever the team agreed on this week. This is a subtler failure than it sounds — it looks like collaboration and deference. The tell is that the bar moves when the team moves: work that would have been returned last quarter now ships unchanged, because the team's informal standard dropped and the PM's taste followed it down. Taste that's purely relative to the room is no taste at all — it's a mood measurement. The competency requires a reference outside the room: what does the best version of this look like, independent of what this team usually ships?
The brand-safe ceiling. The PM has developed taste, but they've calibrated it to what's defensible in a review, not to what's genuinely right. They can articulate quality failures clearly — their feedback is specific, their vocabulary is good — but the bar they're applying is "this will survive scrutiny" rather than "this is actually good." The output this produces is technically correct, professionally presented, and forgettable. The sign that you're in this failure mode: you can explain why everything you shipped was fine, but you can't point to anything that was great.
Room is reading people: what they mean versus what they say, what the subtext of a conversation is. Taste is reading outputs: what a piece of work is versus what it claims to be. They share the same evaluative posture — skeptical, specific, evidence-anchored — but the object is different. Someone can be strong on Room (excellent at reading a meeting) and weak on Taste (misses quality failures in work artifacts, especially outside their primary domain). In practice they often co-develop, but they're tested by different situations.
User is reading customer truth: who you're actually building for, what they actually need, what the research actually says. Taste is reading whether a completed output serves that truth well — whether the design, the copy, the model response, the brief actually lands for the person it's meant to land for. User is the input-side read; Taste is the output-side judge. A PM strong on User but weak on Taste can diagnose the customer need clearly and then approve work that doesn't meet it.
Signal is about reading conflicting evidence to a decision — navigating ambiguity in data, research, or argument to reach a sound call. Taste is about evaluating a completed artifact: not "what does the evidence say" but "does this thing clear the bar." They overlap when the artifact being judged is an analysis or a brief — in that case, Taste becomes the meta-read on whether the Signal work is sound. But they're scored separately because the skill of reading evidence differs from the skill of judging work quality.
what good looks like in the wild
A head of product at a mid-stage company is ninety percent of the way through a campaign review. The brand and marketing leads have been on this for six weeks — research, messaging frameworks, three rounds of creative, two rounds of copy. The campaign is polished. It's also the best work this team has shipped in two years, by their own account. The review is going smoothly.
She's reading through the creative deck when she stops at the third headline variant. It's well-written. The art direction is strong. But the headline makes an implicit claim — "built to grow with you" — that the company had quietly walked back eight months earlier when they sunset a set of enterprise scaling features. There was an internal memo. There were customer conversations. The claim isn't technically false, but it contradicts a specific commitment the company had made to a cohort of customers who'd been told those features weren't coming back.
She flags it in the room. Not dramatically, not with a long speech about brand integrity. She names the contradiction specifically: this headline implies the feature set we sunset in Q3, and we told the accounts in that cohort we weren't rebuilding it. The marketing lead hadn't been in those customer conversations. The brand lead had left the company before the memo. Nobody in the room had made the connection. The campaign had been reviewed by four people before it reached her, and none of them had caught it.
“
The campaign held. The headline changed. Two accounts in that cohort later mentioned, in QBR conversations, that they'd seen the campaign and it hadn't triggered the anxiety they'd braced for. That's a small thing. It's also exactly what the competency is for — the ability to catch what slips through when everyone else is evaluating in isolation, because taste holds a standard that's larger than any single review.