Talent Systems — Science Team
Phases Awaiting Your Input

Phase 13: Validity Data Collection

Schema review questions — 4 items, ~3 hours of your time.

Time estimate: ~3 hours
Impact: BLOCKS validity data collection infrastructure
Full design doc: docs/PHASE_13_VALIDITY_DATA_COLLECTION.md in the repo

Context

We're building infrastructure to collect data for future validity studies. Every interview from this point forward will contribute to the evidence base. The data model needs to capture everything needed for criterion-related validity, content validity documentation, and reliability analysis.

4 Questions for You

1. Outcome Categories

Are the following outcome categories correct and complete?

  • Hired
  • Rejected
  • Withdrew
  • Terminated
  • Performing well
  • Performing poorly

Should we add:

  • Promoted?
  • Still employed at X months?
  • Performance rating (if employer uses a scale)?
  • Manager satisfaction?

2. Follow-Up Intervals

Are these follow-up intervals right for validity data?

  • 30 days — initial hire/reject outcome
  • 90 days — early performance signal
  • 6 months — medium-term performance
  • 12 months — long-term validation

Should any be added, removed, or adjusted?

3. Additional Interview Metadata

What interview metadata should we capture NOW that's hard to reconstruct later?

Currently captured:

  • Prompt hash (exact prompt used)
  • Score vector (all competency scores)
  • Scorer version and interviewer version
  • Timestamp and duration
  • Interview mode, style, depth

Missing anything needed for:

  • Reliability analysis?
  • Content validity documentation?
  • Criterion-related validity studies?

What data points needed for criterion-related validity are not listed above?

Consider:

  • Job family classification
  • Level/seniority
  • Location (for group comparisons)
  • Interviewer language (the AI is consistent, but worth tracking model version)
  • Time of day (candidate alertness patterns)

How to Submit

Provide answers directly — we'll implement the schema and seed the tables. The answers become column definitions and enum values in the database.

What's Already Decided

  • In-app notification bell — employers get reminders to report outcomes
  • Automated follow-up notifications at 30d/90d/6mo/12mo
  • notifications table with type, message, read/unread, link, recipient
  • Email reminder pipeline deferred to follow-up sprint

On this page