MARK for team calibration — MARK

who this is for

Managers, Eng VPs, and heads of product who are trying to level their IC PMs honestly — not by title earned, but by judgment demonstrated.

the problem this solves

Traditional PM leveling breaks quietly. It breaks because every major ladder — Reforge, FAANG levels, Lenny’s template — measures scope: how many products did you own, how big was the team you influenced, how wide was your surface area. Scope is real. It’s also gameable, title-inflated, and structurally incapable of catching the thing that actually matters: whether this person’s judgment holds under pressure.

Here’s the failure mode we see most often. A manager inherits a senior IC from a previous org or a prior regime. The title says “Senior PM.” The track record shows ownership of a flagship product. The performance reviews say “strong.” And yet — every hard call lands on the manager’s desk. Every stakeholder conflict escalates. Every ambiguous strategic moment gets met with “should I socialize this more?” The senior label was earned against a scope rubric. The judgment rubric was never asked.

MARK is the second measurement. Not a replacement for scope judgment — scope matters, breadth matters, delivery credibility matters. But without a judgment read alongside it, you’re running one instrument in the cockpit and calling it a complete read.

how to use MARK for this, end-to-end

Step 1 — get your own read first.

Before you calibrate ICs, get your own MARK fingerprint via the Brief. This isn’t optional ceremony — it’s credibility. When you sit down with a PM to discuss their Hold score, you want to be able to say “here’s where I’m strong and here’s where I’m working on it” without that feeling like deflection. A manager who treats MARK as something that happens to ICs, not to themselves, poisons the well immediately.

Step 2 — run each IC through the Skill Scan.

The Skill Scan at /skill-scan takes about fifteen minutes. Have each IC on your team complete it before you begin calibration conversations. The scan gives you an initial fingerprint across all twelve competencies — not a definitive read, but enough signal to know where to start the conversation and where you already have evidence from their work.

Step 3 — map the team shape, not just the individuals.

Before any 1:1, build the team shape view. List all twelve competencies down one axis. List your ICs across the other. Fill in the scan-derived levels per cell. Don’t aggregate to a single score per person — that collapses the shape information you need. What you’re looking for is the pattern across the team: which competencies are represented at L3–L4 across the board, and which are thin or missing?

Most teams we’ve seen land like this: strong Map cluster (PMs who are good at picking what to build — Worth, Halt), surprisingly strong Acuity on Signal (reading data), noticeably thin on Resolve. Kill is often lower than managers expect — sunk cost attachment is pervasive. Hold and Power are where teams most frequently fall apart. The team can articulate a good call; they can’t hold it when the room pushes back.

Naming this shape at a leadership meeting — not as a performance problem, but as a team composition gap — changes how you think about hiring, coaching plans, and who runs which initiatives.

Step 4 — the 1:1 ritual: one competency over twelve weeks.

The most effective calibration ritual is simple. Pick one competency per 1:1 and walk through it together, using the L1–L4 behavioral anchors from /framework as a shared language. Not a performance review. Not a rating. A conversation.

The structure looks like this:

Open the anchor for the chosen competency at the relevant level (say, Hold at L2).
Read the anchor aloud — or have the IC read it.
Ask: “In the last two months, what’s a call you made that you’d put in this territory?”
Bring two recent calls you’ve observed as evidence. Specific moments: a planning meeting, a stakeholder alignment, a sprint decision. Not impressions — instances.
Let the IC self-assess first. Then share what you observed. Note where the reads agree and where they diverge — the gap is often more instructive than the level.
Close with one question: “What would L3 have looked like in that specific moment?”

The twelve-week cycle covers all twelve competencies once. By week twelve, you have a calibrated fingerprint built from evidence, not from annual-review impressions. The fingerprint doesn’t go into a personnel file — it goes into the development plan.

Step 5 — separate the calibration from the rating.

This is the step most managers collapse, and it’s the step that kills trust fastest. The MARK fingerprint is evidence. It’s not the rating. When it comes time to write a performance review or make a promo case, you use the fingerprint as one input into your overall judgment — the same way you’d use a 360 or a project post-mortem.

The language distinction matters: “I’ve observed that your Hold anchors to L2 — you frame the call well but you tend to solicit one more round of input before locking — and I want to put that on a development plan” is a coaching conversation. “Your MARK score is 2.1 and that’s why you’re not getting promoted” is a misuse of the framework. MARK is for development. Promotion criteria are a separate conversation that MARK may inform.

Step 6 — update the team shape view at the end of each cycle.

After twelve weeks, run the Skill Scan again — or use the Brief results from that cycle if your ICs have been doing practice work. Update the team shape view. Ask: which cells changed? Which stayed flat? The cells that stayed flat despite active coaching are the ones where you need to decide whether this is a gap to develop or a gap to hire for.

Step 7 — feed the team shape into your next hire profile.

Before writing a job description, look at the team shape view and ask: if I hire a clone of my current L3–L4 strengths, what’s the compounding failure mode? Most hiring processes optimize for complementing weakness in skills already present. MARK adds a second axis: which judgment competencies are thin at the team level?

what to do in week 1

Run the Skill Scan yourself. Get a fingerprint before you calibrate anyone else.
Have your ICs complete the Skill Scan before your next 1:1 cycle.
Build the team shape view — even a rough one, even from impressions — before you have scan data. The act of mapping forces useful questions.
Pick the competency for week 1’s 1:1 based on the gap you already know exists. Don’t start with Worth (it’s the most contested and ego-loaded). Start with Signal or Bet — they’re easier to anchor to specific calls without triggering defensiveness.
Block thirty minutes after each 1:1 to update the team shape view before the conversation fades.

what to expect by week 4

By week four, you’ve covered four competencies with each IC. You start to see which conversations land — where the IC has language for what they’re doing — and which ones produce blankness. The blankness is data. When an IC can’t recall a recent moment for a given competency, one of two things is true: either they’re not being asked to exercise it, or they’re exercising it unconsciously and can’t surface it yet. Both tell you something.

At the team level, you can start to say something like: “This team is strong on the front end of judgment — picking well, reading data accurately — and brittle on the back end. We make good calls; we don’t hold them.” That’s a sentence you couldn’t say before MARK gave you shared language. Now you can say it at a planning meeting without it becoming a personal criticism of anyone. The gap belongs to the team, not to an individual.

the five common pitfalls

1. Using MARK as a ranking system. The moment you aggregate fingerprints to “highest MARK” and “lowest MARK” across the team, you’ve broken the tool. It’s not a ranking instrument — it measures shape, not magnitude. A PM who is L4 Worth and L1 Hold is not “better” or “worse” than one who is L2 Worth and L3 Hold. They’re differently configured. Treating shape as a single scalar produces defensive ICs and gaming.

2. Sharing fingerprints without consent. Do not display an IC’s MARK fingerprint in team reviews, planning docs, or shared slides without their explicit agreement. The fingerprint belongs to the IC. You can reference patterns you’ve observed in calibration language (“this team is thin on Resolve”) without exposing individual reads.

3. Running the calibration ritual without evidence. The 1:1 ritual is only valuable if both the manager and the IC bring instances — specific calls, specific moments. “I generally feel like you’re pretty good at reading signals” is not a MARK calibration. It’s an impression. Impressions exist, they have value, but they’re not what the anchors are designed for.

4. Treating the scan as a performance review proxy. The Skill Scan and Brief results are self-assessments and evidence-based reads. They’re not official performance records. If they start to be treated as such — if ICs believe their scan score affects their comp or promo — they’ll optimize for scan performance rather than for actual judgment development.

5. Skipping the team shape view. The most common failure mode is calibrating each IC in isolation and never stepping back to ask what the aggregate pattern means. Individual development plans are useful. But if every IC’s plan says “develop Resolve,” that’s a structural signal about your team composition, your culture, or the initiatives you’re running — and fixing it one IC at a time is much slower than naming it as a team-level gap and addressing it in hiring and initiative design.

a worked example

An Eng VP at a mid-stage B2B SaaS company — roughly seventy engineers, eight PMs, a product org that had scaled fast through a Series B — came to MARK because she couldn’t explain what felt off about her IC ladder. On paper, it looked right. Five senior PMs, three mid-level, all with solid delivery track records. Promotions felt earned. But she kept noticing the same thing: the senior ICs were the best people in the room at articulating a product position. Smart, crisp, well-prepared. Genuinely excellent at Signal and Reframe — reading conflicting evidence, reframing the problem before the team ran at the wrong thing. The Acuity cluster was legitimately strong.

When she ran the team shape view after the first Skill Scan cycle, she saw what she’d been sensing but couldn’t name. Five of seven senior ICs anchored at L2 or lower on Hold. Three of seven anchored at L1 on Power. The team was — as a unit — incapable of holding a hard call. Every decision that faced meaningful pushback eventually landed on her desk. Not because the ICs were lazy or conflict-averse, but because the team had been selected and rewarded for Acuity, not for Resolve. They were brilliant at arriving at a call. They weren’t trained, coached, or expected to hold it.

She restructured three development plans to explicitly target Hold and Power over a six-month horizon. The 1:1 ritual for those ICs started with the Resolve anchors, not the Acuity ones. She also changed which ICs she put into high-friction stakeholder moments — deliberately routing those toward the ICs who needed Hold development, rather than toward her naturally high-Resolve reports who’d always “handled it” before.

Six months later, one of her three targeted ICs had moved from Hold L1 to Hold L3 in the calibration read — evidenced by two specific planning moments where they’d held a called position under exec pushback without escalating. The other two had moved to L2. Not transformed — developed. That’s the correct horizon for judgment work.

The more durable outcome: she had language for a hiring rubric she’d never had before. Every subsequent PM hire got evaluated explicitly against Resolve. Not just “are they confident?” — the interview circuit included specific Hold and Power probe questions anchored to the L3 behavioral descriptor. The next three hires all landed at L3 Hold or above. The escalation pattern to her desk dropped measurably over the following year.

That’s what MARK does for team calibration. Not a rank. Not a grade. A fingerprint — multiplied across the team to reveal the shape gap you couldn’t name before you had the language.

citation

PL Standard v3.1 · using MARK for team calibration