Talent Systems — Science Team

Practical Assessments

How practical assessments work — disciplines, problems, scoring, and integrity

Practical Assessments

Practical assessments evaluate candidates through hands-on work — writing code, fixing bugs, reviewing code, drafting documents, or analyzing data. This is a separate evaluation phase from the conversational interview.

Three Assessment Modes

Each posting can be configured as one of:

  • Conversational — voice/text interview only (existing behavior)
  • Practical — hands-on assessment only, no Retell, zero voice API cost
  • Both — conversational interview first, then practical assessment (sequential)

Disciplines

Disciplines organize problems by domain. Each defines a workspace type:

DisciplineWorkspaceExample Problems
Software EngineeringCode Editor (Monaco)FizzBuzz, Bug fixes, Code review
Technical WritingRich Text EditorREADME writing, API docs, Editing
Data ScienceCode Editor (Python/SQL/R)SQL queries, Statistical analysis

Disciplines are admin-managed at /admin/practical-assessments.

Problems

Each problem belongs to a discipline and has:

  • Title and description — what the candidate sees
  • Difficulty tier — Foundational, Intermediate, or Advanced
  • Available languages — which programming languages the candidate can use (from the programming_languages table — all Monaco-supported languages)
  • Starter content (optional) — pre-loaded code or text for bug-fix / review problems
  • Starter language (optional) — locks the editor to this language when starter content exists
  • Scorer hints (optional) — freeform text giving the AI specific things to look for beyond what the competency anchors already cover
  • Time limit (optional) — per-problem time limit in minutes

Problems are stored as editable JSON files in data/practical-problems/*.json and seeded via scripts/seed-practical-problems.ts. This makes them easy to review and edit by SMEs.

Practical Scorer

The practical scorer (lib/practical-scorer-prompt.ts) evaluates candidate submissions against the same competency anchors used for conversational interviews. It receives:

  • The candidate's submitted work (code or text)
  • The problem description and optional scorer hints
  • Starter content (for diff context on bug-fix problems)
  • Integrity flags (paste events, tab switches)
  • Time taken and whether the candidate timed out

The scorer outputs the same scorecard format as the conversational scorer, but with source: "practical" and evidence (from the work product) instead of quote (from a transcript).

Configuring the Practical Scorer

The practical scorer rubric is fully configurable at Admin > Practical Scorer (/admin/practical-scorer). This page follows the same versioned pattern as the conversational scorer:

  • Scoring Philosophy — what a 3 vs 5 means, how to handle partial work
  • Code Evaluation — how to assess correctness, quality, edge cases
  • Text Evaluation — how to assess clarity, accuracy, structure
  • Integrity Weighting — how paste events and tab switches affect scoring
  • Scoring Rules — ordered bullet-point rules the AI follows
  • Thresholds — Advance/Hold/Decline cutoffs

All changes are versioned with change type, name, and description. The output format is engineering-managed and not shown in the UI.

Unified Scoring

When a posting uses "Both" mode:

  • One scorecard is created per practical assessment
  • One scorecard is created for the conversational interview
  • A unified scorecard merges all of them

Evidence in practical scorecards comes from the code/text artifacts, not transcript quotes.

Posting Configuration

Postings no longer have a discipline_id — employers select problems directly from any discipline via the Problem Browser. Assessment config is stored in the posting_assessments table (one row per posting with the problem IDs). Each problem carries its own time limit.

Status

Sprint 2 complete — full candidate-facing UI (Monaco workspace, timer, auto-save, problem sidebar), PostingForm with assessment type selector and expandable problem cards, admin practical scorer page with versioning. 895 unit tests.

On this page