FirstHR

9-Box Talent Review: A Guide for Small Business

9-box talent review for small business: when it works, when it does not, simple alternatives for SMBs, and a practical calibration process.

9-Box Talent Review

A practical guide for small business, with honest advice on when to skip it

The first time I sat through a 9-box talent review, it was at a 200-person company with three layers of HR. The calibration session ran for four hours, involved seven managers, and produced a wall-sized grid full of color-coded names. By the end, I was convinced the framework was indispensable. The second time was at a 25-person startup where the founder ran calibration alone in a coffee shop with a notebook. The grid he sketched was identical in structure but took 45 minutes and produced equally good decisions. The difference was not the framework; it was the scale at which the framework actually adds value.

Most 9-box articles are written for HR managers at mid-market companies of 200-2000 employees. They assume calibration committees, performance management software, and annual review cycles. None of that scales down cleanly to a 12-person business or a 30-person startup. The advice gets less useful as your team gets smaller, and at some point the framework starts subtracting value instead of adding it.

This guide is different. It is written for small business owners and operators running 5-50 person companies, with honest advice about when 9-box helps and when to use something simpler. I will explain the framework completely, show you when it actually fits at SMB scale, give you alternatives for when it does not, and walk through a calibration process you can actually run with two managers and a Friday afternoon. I built FirstHR for this audience because most performance management content assumes a sophistication small businesses neither have nor need.

TL;DR
9-box talent review is a 3x3 grid that plots employees on performance and potential to inform talent decisions. It works best at 25-50+ employees with multiple managers. Under 15 employees, skip the grid and run a simpler three-bucket review. 15-30 employees, consider performance-only review. The framework adds the most value when calibration sessions force managers to align on standards across teams. The labels matter less than the conversations the framework produces.
Why Talent Decisions Matter at SMB Scale
Disengagement and weak talent decisions cost the world economy trillions of dollars annually (Gallup). For small businesses, every wrong promotion, missed development opportunity, or delayed exit conversation is amplified because the team is small. Talent reviews matter not because the framework is special, but because the discipline of regular evaluation prevents drift.

What 9-Box Talent Review Actually Is

Definition
9-Box Talent Review
A 9-box talent review is a 3x3 grid used to evaluate employees on two axes: current performance (low, moderate, high) and future potential (limited, moderate, high). The grid produces nine cells, each representing a different talent profile. Managers use it during calibration sessions to align on talent decisions about promotions, development, and exits. The framework is most commonly used in mid-market and enterprise HR, with applications becoming more selective at small business scale.

The simple working description: 9-box is a forced ranking exercise dressed up as a talent review. The 3x3 grid forces managers to place each person in one of nine cells, which forces conversations that managers would otherwise avoid. The value is not in the grid itself; it is in the discussions the grid produces during calibration. Without those discussions, the grid is just a colorful dashboard.

Three things are true about every successful 9-box implementation. First, multiple managers calibrate together. The framework loses most of its value if only one person rates everyone. Second, the placements drive specific actions: development plans, promotions, exits, retention bonuses. A grid that produces no actions is decoration. Third, the framework is part of a broader performance management system, not a substitute for it. 9-box without ongoing feedback and goal-setting is theatrical.

Most small businesses that adopt 9-box fail one of those three tests. They have one manager (the founder) doing the rating, no follow-through on placements, and no broader performance system. The grid becomes a Friday afternoon exercise that everyone forgets by Monday. The framework gets blamed for being outdated when the actual problem is that it was implemented in the wrong context.

Where 9-Box Came From and Why That Matters

The 9-box was developed by McKinsey for General Electric in the 1970s, originally as a strategic planning tool for evaluating business units. GE used it to decide which divisions to invest in, divest, or hold. Years later, the same structure was adapted for talent: replace business units with people, replace strategic position with performance and potential, and you have the modern 9-box talent review.

This origin matters for two reasons. First, the framework was designed for organizations with significant scale and complexity. The original GE was a 400,000-employee conglomerate with dozens of business units; the talent version inherited the same scale assumptions. Second, the framework was designed for hierarchical environments where managers have clear authority over their direct reports and calibration committees can produce binding decisions. Modern flat organizations and small businesses fit this model less cleanly than 1970s industrial conglomerates.

The framework has aged unevenly. The structure (3x3 grid, two axes, calibration process) remains valid. The cultural assumptions (annual ratings, top-down talent decisions, ranking employees publicly) have been undermined by the broader shift toward continuous performance management. Many large companies that used 9-box for decades, including GE itself, have moved away from annual rating systems entirely. Adobe abandoned annual reviews. Accenture eliminated rankings. The trend is real and ongoing.

For small businesses, this history suggests caution. The framework was not designed for you. It was designed for organizations with very different structures. Some elements transfer well, others do not. The honest evaluation is which parts of the framework solve real problems at your scale and which parts add overhead designed for a different context.

The 9-Box Grid Explained

The grid is straightforward. Two axes, three levels each, nine cells. Performance runs left to right (low, moderate, high). Potential runs bottom to top (limited, moderate, high). Each cell has a label that suggests the appropriate management response. Below is the standard layout, though specific labels vary by company.

The 9-box grid: performance × potential
Low performanceModerate performanceHigh performance
High potential
Inconsistent StarHigh potential, low performance. Coach or reassign.
Future StarHigh potential, solid performance. Stretch and develop.
StarHigh potential, high performance. Promote, retain, succession plan.
Moderate potential
Question MarkModerate potential, low performance. Performance plan or exit.
Core PlayerSolid in current role. The backbone of the team.
High PerformerStrong contributor. Reward, but limited stretch capacity.
Limited potential
RiskLimited potential, low performance. Often an exit conversation.
Reliable PerformerLimited potential, solid current role. Retain, do not promote.
SpecialistLimited potential, high performance in current role. Pay well, do not move.

The labels are useful as shorthand for management discussions but should never be used in employee-facing conversations. Telling someone they are a "Question Mark" or a "Risk" is demoralizing and rarely actionable. The labels exist to help managers align on talent decisions, not to give feedback to employees. The translation from grid placement to individual conversation is what separates effective 9-box implementations from harmful ones.

Three patterns from this layout worth noticing. First, the orange-highlighted cells (Star, Future Star, High Performer) cover the top-right area, where performance and potential are both strong. These are the people you actively retain and develop. Second, the bottom-left (Risk) is the only cell where exit conversations are the default response. Third, the diagonal of the grid (Specialist, Core Player, Inconsistent Star) covers the largest population of employees and requires the most thoughtful management; these are the people whose trajectory depends most on what you do next.

The Performance Axis: What to Actually Rate

Performance is the easier of the two axes to rate because it is observable. The challenge is keeping the rating focused on outcomes rather than personality, effort, or proximity to leadership. Most performance rating errors come from confusing "I like working with this person" with "this person produces results." The performance metrics guide covers measurement frameworks that produce evidence usable in calibration.

Performance levelWhat it looks likeCommon mistakes
Low performanceConsistently misses goals, requires significant manager time, deliverables fall behind schedule, customer or team complaintsConfusing low performance with personality conflicts. Rating someone low because they are difficult, not because they fail to deliver
Moderate performanceMeets most goals, occasionally exceeds, requires normal manager support, delivers on time, no major issuesInflating moderate to high because the person is well-liked. The default rating should be moderate; high requires evidence
High performanceConsistently exceeds goals, low manager support needed, delivers ahead of schedule, drives outcomes others cannotRating long-tenure employees as high by default. Tenure is not performance. The same evidence standard applies regardless of how long someone has been there

The single most useful test for performance ratings: can you name three specific outcomes from the last six months that justify the rating? If yes, the rating is grounded. If you have to think for two minutes, the rating is probably based on impressions, not evidence. SHRM's performance management toolkit covers the broader principles that apply to all performance evaluation, not just 9-box.

For small businesses, the performance ratings should come directly from the regular performance review process. The performance review guide covers how to structure those reviews to produce evidence usable in calibration. Without solid performance reviews underneath, 9-box ratings become subjective and political.

The Potential Axis: Where Most Calibrations Go Wrong

Potential is the harder axis because it is a judgment about the future, not a description of the past. Performance you can measure; potential you have to predict. This is where calibration sessions get political, biases enter, and managers disagree most.

Definition
Potential (in 9-box context)
Potential is a judgment about whether an employee can grow into a significantly larger role given the right opportunities, support, and time. It is not personality, not likeability, not how well they fit the current team. It is the capacity to take on more scope, complexity, or responsibility than their current role demands.

The most common potential rating mistake at small businesses: confusing "has not been given a stretch" with "limited potential." Many people rated as low potential are actually in roles that do not stretch them, with managers who have not given them larger work. They look low potential because the evidence is missing, not because the capacity is missing. Honest calibration forces this distinction.

Potential levelWhat it looks likeCommon mistakes
Limited potentialStrong in current role, has not shown ability to handle larger scope, may have explicitly chosen depth over breadth, often a specialist or technical expertRating someone limited just because they have stayed in the same role for years. Some people have potential but no opportunity to display it
Moderate potentialCould grow into a slightly larger role with development, may eventually run a small team or take on more complex projects, comfortable with measured stretchDefaulting most of the team to moderate to avoid hard conversations. Be specific about what 'moderate' actually means for your business
High potentialDemonstrably capable of running 2-3 levels above current role, takes on stretch work without prompting, others naturally follow them, learns rapidly in new domainsConfusing high potential with high visibility. The loudest person in the room is not necessarily high potential; sometimes they are just loud

For small businesses, an additional complication: the higher levels often do not exist yet. At a 15-person company, "running 2-3 levels above current role" might mean running a 3-person team that does not exist yet. Rating someone "high potential" means betting they could fill a role you have not yet created. This is appropriate forward-looking thinking, but it requires honesty about what those future roles actually look like.

The Bias Tax on Potential Ratings
Potential ratings are where bias enters most easily. Studies consistently show that managers rate employees who look like them or share their background as higher potential, all else equal. Counteract this with structured criteria, multiple-rater calibration, and explicit checks for demographic patterns in your high-potential pool. The EEOC small business resources cover anti-discrimination requirements that apply to all employment decisions, including talent ratings.
Still Using Spreadsheets for Onboarding?
Automate documents, training assignments, task management, and track onboarding progress in real time.
See How It Works

When 9-Box Actually Fits at Small Business Scale

The honest answer: 9-box adds value when you have enough complexity to justify the framework. Below that threshold, it adds overhead. Above it, it adds clarity. Knowing where your business sits on this spectrum is more important than running the framework correctly.

When 9-box fits
25+ employees, multiple managers
Active succession planning need
Performance and potential are both visible
Managers calibrate together regularly
Promotion and development decisions are pending
The framework adds structure to a real decision
When 9-box does not fit
Under 15 employees, founder-led
No succession planning need yet
Potential is hard to assess at small scale
One manager (the founder) for everyone
The framework adds overhead without clarity
A 1:1 conversation does the same job

The threshold is not just headcount; it is structural. A 40-person company with one founder making all talent decisions does not need 9-box because there is no calibration to do. A 25-person company with three managers calibrating across teams does, because the calibration prevents the inconsistencies that emerge when managers have different standards. The right question is not "how big are we?" but "how many people are making talent decisions, and how aligned are they?"

Other signals that 9-box might fit your business: you are about to make several promotion decisions and want them grounded in shared criteria, you are noticing inconsistencies between how different managers rate similar employees, succession planning has become a real issue (a key person might leave, and you have not identified their backup), or you have started conversations about layoffs and want to ensure the criteria are documented and consistent.

When to Skip 9-Box Entirely

The harder advice: most small businesses should not run 9-box. Below 15 employees, the framework adds no information the founder does not already have. Between 15-25 employees, simpler alternatives usually serve better. The 9-box becomes genuinely valuable only when the conditions above (multiple managers, real succession needs, calibration culture) are present.

Three honest tests to determine if you should skip 9-box. First: when you imagine running calibration, who is in the room? If the answer is "just me," calibration cannot happen because there is nothing to calibrate against. Skip the framework. Second: when was the last time you made a talent decision you regretted because it was inconsistent with another decision? If you cannot remember one, the framework is solving a problem you do not have. Third: do you have time, six months from now, to actually act on the placements? If not, the calibration becomes an exercise without consequences, which produces cynicism, not improvement.

The Overhead Problem
At 12 employees, running a 9-box calibration with one manager (the founder) takes 8-10 hours total: rating preparation, the calibration session that has nothing to calibrate, documentation, individual conversations. That is a full work day for output the founder could have produced in 30 minutes of thinking and a notebook. The framework adds overhead without adding clarity. Recognizing this is harder than running the framework anyway because skipping feels less professional. It is also the right call.
What worked for me
At one of my early companies, around 18 employees, I tried to run 9-box because it seemed like the "real HR" thing to do. Spent a Friday afternoon mapping everyone, assigning labels, and writing development plans. Two months later, I could not remember which box anyone was in. The grid had not informed any actual decision; the actual decisions came from 1:1 conversations and instinct. What I needed was not a grid; I needed a more disciplined cadence of 1:1s and clearer performance expectations. We ditched the grid, formalized monthly 1:1s, and the team performed better in the next 6 months than in the previous 12. The framework was not the lever.

Simpler Alternatives That Often Work Better

Five alternatives to 9-box, each calibrated for different business sizes and situations. Most small businesses are better served by one of these than by a full 9-box implementation. The right choice depends on your size, your succession needs, and the maturity of your performance management practice.

AlternativeBest forHow it works
Three-bucket talent reviewUnder 15 employeesSort the team into thriving, steady, struggling. Develop accordingly. No grid required.
Performance-only review15-30 employees, no succession need yetRate performance only (high/mid/low). Skip potential. Decide on raises, development, and exits based on performance alone.
Full 9-box grid30-50+ employees with multiple managersStandard 3x3 calibration with all managers. Use for succession and stretch opportunities.
Skills-based assessmentAny size, technical rolesReplace performance/potential axes with skill matrix. Identify gaps and learning paths individually.
1:1 development conversationAny size, complement to other toolsQuarterly conversation between manager and employee about goals, growth, and obstacles. The atomic unit.

The progression that works for most small businesses: start with the three-bucket talent review at very small scale, graduate to performance-only review as the team grows past 15 people, and consider 9-box only if you reach 25-30 employees with multiple managers and real succession decisions. Skipping ahead is the most common mistake; running a 9-box at 12 employees because the standard HR article recommends it is putting structure on a problem you do not have.

For more on the underlying performance management practice that makes any of these alternatives work, the performance management guide covers the broader system.

Goals and objectives feed the performance axis directly. The OKR guide covers a goal-setting framework that produces the kind of measurable outcomes calibration needs.

The 3-Bucket Talent Review for Under 25 Employees

If you are under 25 employees, here is the talent review process I actually recommend. It takes 2-3 hours, requires no software, and produces decisions you will use. The point is not to skip the discipline of evaluating talent; it is to skip the overhead of a framework designed for a different context.

1
Step 1: List your team and recent outcomesFor each person, write 2-3 lines: what they delivered in the last 6 months, what they did not deliver, and one thing that surprised you. This is the raw material for the conversation. 30-45 minutes total.
2
Step 2: Sort into three groups, not nineAt small scale, three groups (thriving, steady, struggling) capture what nine groups try to capture. Skip the grid. Sort honestly. The act of sorting is more valuable than where any individual ends up.
3
Step 3: Identify development moves for thriving groupFor each thriving person, name one stretch opportunity in the next 90 days. Could be a project lead, a new responsibility, or a promotion conversation. The goal is to keep them growing before they look elsewhere.
4
Step 4: Identify support needs for steady groupFor each steady person, name what they need to keep performing well. Often it is recognition, a small raise, or a 1:1 cadence change. Steady performers are the backbone; do not take them for granted.
5
Step 5: Decide on struggling groupFor each struggling person, ask: what specific change in the next 90 days would put them in the steady group? If you have an answer, that is a performance plan. If you do not, it is probably an exit conversation. Either way, decide and move.

Three patterns from this process worth noticing. First, the sorting is honest, not exhaustive. Three buckets force you to make calls that nine cells let you avoid. The discomfort of putting someone in "struggling" is exactly the discomfort the framework is supposed to surface. Second, every bucket produces a specific action. Thriving group gets stretch opportunities, steady group gets retention investment, struggling group gets a 90-day decision. Without actions, the sorting is theater. Third, the cadence is annual or semi-annual; quarterly is overkill at small scale and produces noise instead of signal.

The single biggest advantage of three buckets over nine: at small scale, you can hold three categories in your head and act on them. You cannot hold nine. The framework is only valuable if managers can actually use it in real decisions, and at SMB scale, three is the right level of compression.

Running an Actual 9-Box Calibration (If You Decide To)

If you are at 25-50+ employees with multiple managers and have decided to run 9-box, here is the practical process. This assumes you have at least 2 managers calibrating together and a senior leader (founder or COO) facilitating.

1
Step 1: Each manager rates their own team independentlyBefore the calibration meeting, every manager places their direct reports in the 9-box grid. They write 2-3 sentences justifying each placement based on observable outcomes. No discussion yet.
2
Step 2: Senior team meets to compare ratingsAll managers meet for a 90-120 minute calibration session. Going person by person, each manager presents their rating. Other managers push back with specific evidence. The goal is consistency across managers, not consensus.
3
Step 3: Adjust ratings based on cross-manager inputWhen two managers rate similar people differently, surface the inconsistency. Often one manager is being too generous or too strict. Adjust ratings until the grid reflects shared standards across the team.
4
Step 4: Document final placements and decisionsLock in the final grid. For each person, document: the box they are in, the reasoning, the development action for the next 6 months, and who is accountable for it. Without documentation, the calibration becomes a meeting nobody acts on.
5
Step 5: Translate placements into individual conversationsManagers go back to direct reports with specific feedback: behaviors observed, outcomes expected, development plans. The conversations should NOT mention the grid or the box label. They should sound like normal performance conversations, informed by the calibration but not framed by it.

Three failure modes to avoid during calibration. First, do not let the loudest voice dominate. The point of calibration is comparing across managers, not deferring to whoever speaks most. Use structured rounds where each manager presents their team in order, and others must produce specific evidence to challenge ratings. Second, do not skip Step 5 (the individual conversations). Calibration produces the framework; individual conversations produce the actual change. Skipping the conversations turns calibration into an exercise. Third, do not promise specific outcomes during calibration. The placements suggest direction, not guarantees. Telling a manager "your person is going to be promoted" based on a Star placement creates obligations that may not materialize when the actual promotion conversation happens.

Gallup research on managers consistently finds that the manager-employee relationship is the strongest predictor of engagement. Calibration sessions amplify this: well-run calibration produces consistent management quality across the company, while poorly-run calibration produces visible inequities that employees notice within months. The investment in doing calibration well pays off through retention.

What Happens After Calibration Matters Most

Most 9-box implementations fail not in the calibration session but in the weeks after. The grid gets locked into a presentation deck, the senior team feels accomplished, and nothing actually changes. The framework only adds value if calibration outcomes drive specific actions over the following 90 days.

Calibration outcomeWhat should happen in 90 daysWhat usually happens
Star (high performance, high potential)Stretch project assignment, succession discussion, retention check-inPraise but no concrete next step. Person leaves 6-12 months later
Future Star (moderate performance, high potential)Specific development plan, increased scope, manager mentorshipVague encouragement. Performance does not improve
Inconsistent Star (low performance, high potential)Targeted coaching, role fit conversation, possible role changeIndefinite tolerance. Manager hopes performance improves on its own
Question Mark (low performance, moderate potential)Performance improvement plan with clear 90-day metricsSoft conversation. Plan never written. Same review next year
Risk (low performance, limited potential)Performance plan or exit conversation initiatedAvoidance. Person stays, drains team morale, eventually quits or is laid off

The pattern across these outcomes: the right action is harder than the wrong action in every case. That is why follow-through is the hardest part of 9-box and why most implementations fail there. The grid produces clarity about what should happen; what actually happens depends on whether the senior team has the discipline to act on the clarity.

For implementing performance improvement plans specifically, the PIP guide covers when and how to use them. For cases where the calibration suggests an exit conversation, the discipline of doing it well rather than avoiding it is covered in broader people management resources.

Why Follow-Through Matters
Organizations with strong performance management discipline see significantly higher retention and productivity than those with weak follow-through (SHRM). For small businesses, this matters more, not less, because every retained high performer represents a larger fraction of the team than at enterprise scale.
Companies Using FirstHR Onboard 3x Faster
Join hundreds of small businesses who transformed their new hire experience.
See It in Action

9-Box vs Performance Improvement Plan: Different Tools

One source of confusion: 9-box and performance improvement plans (PIPs) are sometimes treated as alternatives. They are not. They serve different purposes and operate at different scales.

Dimension9-Box Talent ReviewPerformance Improvement Plan (PIP)
ScopeWhole team or companyOne individual
PurposeAligning on relative talent positionsDocumenting underperformance and required improvement
CadenceAnnual or semi-annualTriggered by sustained underperformance
AudienceManagement onlyManager and employee, with HR involvement
OutcomeTalent decisions across the companyImprovement or termination of one person
DocumentationInternal management notesFormal HR document, often legally significant

The relationship: 9-box can identify who needs a PIP (typically people in the Question Mark or Risk boxes), but the PIP itself is a separate process with its own structure and legal implications. Running 9-box does not eliminate the need for proper PIPs when underperformance becomes an exit-track issue. Conflating the two creates documentation problems and exposes the business to wrongful termination risk.

For the full PIP process and when to use it, see the PIP guide.

9-Box vs 360 Feedback: Complementary, Not Competing

Another common confusion: 9-box and 360 feedback are sometimes seen as alternatives. They serve different purposes and work better together than either alone.

Dimension9-Box360 Feedback
What it measuresPerformance and potential ratingsBehavioral feedback from multiple perspectives
Information sourceManager(s) onlyPeers, manager, direct reports, sometimes external
OutputGrid placement, talent decisionsBehavioral patterns, blind spots, development priorities
Best forTalent management decisionsIndividual development conversations
CadenceAnnual or semi-annualAnnual or quarterly
AudienceSenior managementIndividual employee and their manager

The relationship: 360 feedback feeds the 9-box. The behavioral data from 360 reviews informs both performance and potential ratings, especially for the potential axis where managers have less direct evidence. A well-run performance management system uses 360 data as input to calibration, then translates calibration outcomes into individual development plans informed by the same 360 data. The 360 feedback guide covers the practice in depth.

For the structure of the actual review session itself, including question design and synthesis of feedback, the 360 review guide walks through the mechanics.

For small businesses, this combination is often more powerful than 9-box alone. 360 feedback is feasible at any size and produces information that matters regardless of whether you formally calibrate. 9-box adds value once calibration becomes a real exercise, which only happens at certain scales.

Common Mistakes That Make 9-Box Backfire

The mistakes below appear consistently across small businesses implementing 9-box for the first time. All are avoidable once you understand the underlying patterns.

Running 9-box on a 12-person teamAt small scale, the founder already knows who is performing well and who has potential. Adding a grid does not surface new information; it just makes the existing knowledge feel more formal. Skip the framework, run a 30-minute conversation per person, and document the outcomes.
Confusing performance with personality9-box performance ratings should be based on observable outcomes (goals met, customers retained, deliverables shipped). Likeability, communication style, and meeting attendance are not performance. Start every calibration with: 'What did this person actually do or produce?'
Treating potential as a fixed traitPotential is not a personality test. It is a judgment about whether someone can grow into a larger role given the right opportunities. Many 'low potential' ratings are actually 'we have not given them the stretch' ratings. Be honest about which is which before placing someone in a box.
Letting the loudest manager dominate calibration9-box calibration sessions are political when one manager has more credibility or volume than others. The point of calibration is to compare across managers and surface inconsistencies. If one voice dominates, the calibration is fake. Force structured rounds where each manager justifies each rating.
Showing employees their boxEmployees should not see their position on the grid. The grid is a management tool for talent decisions, not a feedback document. Telling someone they are a 'Question Mark' is demoralizing and rarely actionable. Translate grid placement into normal feedback: 'Here are the 3 behaviors we need to see for promotion.'
Running it once and never again9-box is only useful if calibration happens regularly (annually minimum, ideally semi-annually). One-time placement becomes outdated within months. Either commit to the cadence or do not start. A stale grid is worse than no grid because it gives the team false confidence in old information.
Using the labels in conversationWords like 'Star,' 'Risk,' and 'Question Mark' are useful for management discussions but toxic when leaked. Treat the labels as internal shorthand. In actual conversations with employees, talk about specific behaviors, outcomes, and growth opportunities, never about box labels.
Overcomplicating the grid with sub-categoriesSome companies extend 9-box into 16-box or 25-box variants with sub-ratings within each cell. This is overengineering at any scale, and especially at SMB scale. The 3x3 grid was already a simplification of more complex frameworks; further simplification is not the answer to its limitations.

The pattern across these mistakes: treating 9-box as a deliverable rather than as a discipline. A deliverable gets produced once and filed away. A discipline gets practiced regularly and improves over time. 9-box that produces a grid but no consistent calibration practice, no follow-through, and no annual cadence is theater. The work is in the process, not the artifact.

Honest Criticisms of the 9-Box Framework

The 9-box has real flaws that any honest treatment should acknowledge. Glossing over them produces implementations that fail in the same ways every framework critic has predicted for decades.

Forced ranking creates artificial scarcity. The 9-box pushes managers to distribute employees across cells, even when the actual distribution would be more clustered. If everyone on your team is genuinely a Star, the framework forces you to demote some of them to fit the grid. This is bias by design, not a feature.

Potential is partly a self-fulfilling prophecy. Once someone is placed in "high potential," they get stretch opportunities, executive visibility, and development investment. The investment makes them more capable, which confirms the high potential rating. Conversely, "limited potential" placements often become permanent because the placement removes the opportunities that would prove them wrong. The framework can ossify talent assessments rather than illuminate them.

The bottom-row labels are dehumanizing. "Risk," "Reliable Performer," "Specialist" sound clinical but reduce people to instrumental categories. The framework does not require you to think of employees as boxes, but the language pushes in that direction over time. Managers who use 9-box for years often start describing colleagues by their box label instead of by their actual contributions.

Bias has many entry points. Performance ratings can be influenced by likeability, proximity to leadership, or demographic patterns. Potential ratings are even more vulnerable. Without explicit anti-bias procedures, the calibration session can amplify existing biases rather than correct them. The framework offers no built-in protection against this.

The annual cadence has been rendered obsolete by continuous performance management. The same companies that pioneered 9-box (GE, McKinsey clients) have largely abandoned annual rating cycles in favor of ongoing feedback. The 9-box assumes a cadence that the organizations using it have outgrown. For small businesses adopting 9-box now, this is adopting a framework already past its peak. Work Institute research on retention consistently shows that ongoing manager-employee feedback predicts retention better than periodic ratings, regardless of which framework produces the ratings.

None of these criticisms means 9-box is useless. They mean it should be implemented with eyes open and attention to known failure modes. The framework that ignores its own weaknesses produces predictable problems.

When to Stop Using 9-Box

The framework should evolve as the business evolves. Three legitimate signals to stop using 9-box:

  1. The calibration sessions stop producing new information. If three years of calibration produces the same placements with no surprises, the framework is no longer doing the work. Replace with continuous performance management.
  2. The team is small enough that calibration is theater. If the company shrinks or restructures back to 15 employees, 9-box adds overhead without adding clarity. Switch to three-bucket review.
  3. The cadence has become decorative. If calibration happens but placements never drive action, the framework has become ritual. Either restore the discipline or replace with something simpler that you will actually use.

Three illegitimate reasons to stop:

  1. The placements are uncomfortable. Real calibration produces uncomfortable conclusions. Abandoning the framework to avoid the discomfort is fraud, not strategy.
  2. A manager wants to. Single-manager preference is not enough. Validate against business needs first.
  3. It feels old-fashioned. Frameworks do not become useless because they have been around for decades. Some elements of 9-box remain valuable; abandoning the whole framework because of fashion is not better thinking, it is different thinking.

The healthy lifecycle: adopt 9-box when conditions justify it, run it consistently for 3-5 years, then evaluate whether it still serves the business. Most small businesses that genuinely need 9-box reach a stage where they no longer need it within 5-7 years, either because they have grown into something more sophisticated or because they have refined their continuous performance practice enough to make annual ratings redundant. The HR strategy guide covers how performance management practices fit into broader people operations strategy.

How FirstHR Fits

The honest disclosure: FirstHR is not a performance management or talent review platform. We do not currently have a 9-box module, calibration software, or talent management features. The platform handles onboarding, employee profiles, document management, org charts, and the operational HR foundations that most small businesses need. 9-box and talent reviews, when you adopt them, will live in your spreadsheet, your Notion page, or eventually in dedicated performance management software.

That said, talent reviews work better when the underlying people operations are working. A team running 9-box on top of broken onboarding will struggle no matter how perfect the calibration. A team running talent reviews with consistent onboarding, clear roles, and structured feedback will produce reviews that actually inform decisions. FirstHR exists to handle the operational HR foundation at flat-fee pricing ($98/month for up to 10 employees, $198/month for up to 50), so that owners and operators can focus on the higher-impact work of running good talent reviews and acting on the outcomes.

For the practice that sits underneath good talent management, the onboarding best practices guide covers the foundation that determines who shows up to be evaluated.

Whether 9-box outcomes translate into actual changes depends heavily on manager skill. The leadership development guide covers the manager skills that make or break any talent framework.

For the broader management foundation that performance reviews and calibration sit on top of, the people management guide covers running a small team without enterprise overhead.

Key Takeaways
9-box talent review is a 3x3 grid plotting employees on performance (low/moderate/high) and potential (limited/moderate/high) to inform talent decisions.
The framework was designed by McKinsey for GE in the 1970s. Most assumptions transfer poorly to small business; the structure (grid, calibration) transfers, the cadence and culture often do not.
9-box adds value at 25-50+ employees with multiple managers and real succession needs. Below 15 employees, simpler alternatives (three-bucket review) work better.
Performance ratings should be based on observable outcomes (goals met, deliverables shipped). Potential ratings are predictions about future capacity and where most calibration disagreements happen.
Multiple managers must calibrate together, or the framework loses most of its value. Single-manager 9-box at SMB scale is theater.
Never share box labels with employees. The grid is a management tool; conversations should focus on specific behaviors and outcomes, not labels like 'Star' or 'Risk.'
Most failures happen in the 90 days after calibration, when placements should drive action but do not. Follow-through is the hardest part.
Skip 9-box entirely if you are under 15 employees or have only one manager making talent decisions. The framework adds overhead without clarity in those contexts.

Frequently Asked Questions

What is the 9-box talent review?

The 9-box talent review is a 3x3 grid that plots employees on two axes: current performance (low, moderate, high) and future potential (limited, moderate, high). The grid produces nine cells, each representing a different type of talent (Star, Future Star, Core Player, Specialist, Risk, etc.). Managers use it during calibration sessions to align on talent decisions: who to promote, who to develop, who to manage out. Originally developed by McKinsey for General Electric in the 1970s, it remains common in mid-market and enterprise companies.

Should small businesses use the 9-box?

Sometimes. The framework adds value when you have 25+ employees, multiple managers, an active succession planning need, and a culture of regular calibration. Below 15 employees, the framework usually adds overhead without surfacing new information; the founder already knows who is performing well and who has potential. The honest answer for businesses under 15 employees is to skip the grid and run a simpler three-bucket review instead. For 15-50 employees, it depends on whether you have multiple managers and real succession decisions to make.

What are the 9 boxes?

The 9 boxes combine three performance levels with three potential levels. Common labels: Star (high performance, high potential), Future Star (moderate performance, high potential), High Performer (high performance, moderate potential), Core Player (moderate on both), Specialist (high performance, limited potential), Reliable Performer (moderate performance, limited potential), Question Mark (low performance, moderate potential), Inconsistent Star (low performance, high potential), and Risk (low on both). Different companies use different labels; the underlying structure is the same.

How do you do a 9-box talent review?

The standard process: each manager rates their direct reports independently, then all managers meet for a calibration session to compare ratings and adjust for inconsistency. Final placements are documented along with development actions for each person. Managers then translate the calibration into individual conversations, which should never mention the grid or box labels directly. The full session for a team of 30-40 takes 90-120 minutes. Done annually or semi-annually.

What is the difference between performance and potential in 9-box?

Performance is what someone has actually delivered in their current role: goals met, outcomes produced, customers retained. It is observable and measurable. Potential is a judgment about whether someone could grow into a larger role with the right opportunities. Potential is harder to assess and more subjective, which is why it is the axis where most calibration disagreements happen. The two are independent: a high performer can have limited potential (they are great at their current job, not built for the next one), and a low performer can have high potential (they are in the wrong role).

Should employees know their 9-box rating?

No. Employees should not see their position on the grid. The grid is a management tool for talent decisions, not a feedback document. Telling someone they are a 'Question Mark' or 'Risk' is demoralizing and rarely actionable. Translate grid placement into normal performance feedback: specific behaviors observed, outcomes expected, development plans. The grid informs the conversation but should not frame it. Managers who tell employees their box label undermine the credibility of the calibration process.

Is the 9-box still relevant in 2026?

Partially. The framework remains widely used in mid-market and enterprise HR, but it has lost ground to alternatives that focus on continuous performance management rather than annual ratings. The trend among large companies (Accenture, Adobe, GE itself) has been to abandon annual ratings entirely in favor of ongoing feedback and goal-setting. For small businesses, the 9-box is often outdated before they ever need it; modern alternatives like skills-based assessments and continuous 1:1 development conversations may serve better. The 9-box is not wrong, but it is not the only or best tool available.

How often should you run a 9-box talent review?

Annually at minimum, semi-annually if your team is changing fast. Quarterly is overkill for most small businesses; the placements do not change enough quarter-over-quarter to justify the calibration overhead. The cadence matters less than the consistency: a 9-box reviewed once and abandoned creates more cynicism than no 9-box at all. Either commit to annual calibration or do not start. Stale placements are worse than no placements because they give the team false confidence in outdated information.

What is the difference between 9-box and a performance review?

A performance review evaluates an individual against their goals and expectations for the period. 9-box compares people across the team and ranks them against each other on performance and potential. Performance reviews are individual; 9-box is comparative. Both have a place: performance reviews give individuals specific feedback on their work, 9-box helps the company decide who to promote, develop, or manage out. They work together, not in competition. Most 9-box calibrations use performance review data as input.

What are alternatives to the 9-box for small business?

Three practical alternatives. First, three-bucket talent review: sort the team into thriving, steady, and struggling, then develop accordingly. Works well for under 15 employees. Second, performance-only review: rate performance, skip the potential axis, and make decisions on performance alone. Works for 15-30 employees without active succession needs. Third, skills-based assessment: replace performance/potential axes with a skills matrix specific to your roles. Works for any size, especially technical teams. Most small businesses are better served by these than by a full 9-box implementation.

Can the 9-box be used for layoff or termination decisions?

Yes, but carefully. Using the bottom-row boxes (Risk, Reliable Performer, Specialist) as the basis for layoffs creates legal risk if the placements were not based on documented, consistent, performance-based criteria. The grid must reflect actual performance evidence, not personality or subjective fit. Document the calibration process, the criteria used, and the specific evidence for each placement. Do not surprise people: those in low-performance boxes should have already received feedback and development plans before any termination decision. Skipping these steps turns the grid from a management tool into a documentation problem.

How long does a 9-box calibration session take?

For 30-40 employees with 4-5 managers, plan 90-120 minutes for the full calibration meeting. Add 30-45 minutes per manager for pre-meeting individual rating preparation. Plus 15-30 minutes per employee for the manager to translate calibration into individual feedback conversations afterward. Total time investment for a single calibration cycle: 8-12 hours of management time. Annual cadence makes this a meaningful but bounded commitment. Quarterly cadence makes it unsustainable for most small businesses.

Ready to transform your onboarding?

7-day free trial No credit card required
Start Your Free Trial