How practical assessments work — disciplines, problems, scoring, and integrity

Practical Assessments

Practical assessments evaluate candidates through hands-on work — writing code, fixing bugs, reviewing code, drafting documents, or analyzing data. This is a separate evaluation phase from the conversational interview.

Three Assessment Modes

Each posting can be configured as one of:

Conversational — voice/text interview only (existing behavior)
Practical — hands-on assessment only, no Retell, zero voice API cost
Both — conversational interview first, then practical assessment (sequential)

Disciplines

Disciplines organize problems by domain. Each defines a workspace type:

Discipline	Workspace	Example Problems
Software Engineering	Code Editor (Monaco)	FizzBuzz, Bug fixes, Code review
Technical Writing	Rich Text Editor	README writing, API docs, Editing
Data Science	Code Editor (Python/SQL/R)	SQL queries, Statistical analysis

Disciplines are admin-managed at /admin/practical-assessments.

Problems

Each problem belongs to a discipline and has:

Title and description — what the candidate sees
Difficulty tier — Foundational, Intermediate, or Advanced
Available languages — which programming languages the candidate can use (from the programming_languages table — all Monaco-supported languages)
Starter content (optional) — pre-loaded code or text for bug-fix / review problems
Starter language (optional) — locks the editor to this language when starter content exists
Scorer hints (optional) — freeform text giving the AI specific things to look for beyond what the competency anchors already cover
Time limit (optional) — per-problem time limit in minutes

Problems are stored as editable JSON files in data/practical-problems/*.json and seeded via scripts/seed-practical-problems.ts. This makes them easy to review and edit by SMEs.

Practical Scorer

The practical scorer (lib/practical-scorer-prompt.ts) evaluates candidate submissions against the same competency anchors used for conversational interviews. It receives:

The candidate's submitted work (code or text)
The problem description and optional scorer hints
Starter content (for diff context on bug-fix problems)
Integrity flags (paste events, tab switches)
Time taken and whether the candidate timed out

The scorer outputs the same scorecard format as the conversational scorer, but with source: "practical" and evidence (from the work product) instead of quote (from a transcript).

Configuring the Practical Scorer

The practical scorer rubric is fully configurable at Admin > Practical Scorer (/admin/practical-scorer). This page follows the same versioned pattern as the conversational scorer:

Scoring Philosophy — what a 3 vs 5 means, how to handle partial work
Code Evaluation — how to assess correctness, quality, edge cases
Text Evaluation — how to assess clarity, accuracy, structure
Integrity Weighting — how paste events and tab switches affect scoring
Scoring Rules — ordered bullet-point rules the AI follows
Thresholds — Advance/Hold/Decline cutoffs

All changes are versioned with change type, name, and description. The output format is engineering-managed and not shown in the UI.

Unified Scoring

When a posting uses "Both" mode:

One scorecard is created per practical assessment
One scorecard is created for the conversational interview
A unified scorecard merges all of them

Evidence in practical scorecards comes from the code/text artifacts, not transcript quotes.

Posting Configuration

Postings no longer have a discipline_id — employers select problems directly from any discipline via the Problem Browser. Assessment config is stored in the posting_assessments table (one row per posting with the problem IDs). Each problem carries its own time limit.

Status

Sprint 2 complete — full candidate-facing UI (Monaco workspace, timer, auto-save, problem sidebar), PostingForm with assessment type selector and expandable problem cards, admin practical scorer page with versioning. 895 unit tests.

Practical Assessments

On this page