Skip to content
All posts
·2 min read

Reliable AI in the Browser: Designing a Real-Time Interview Platform

How I built a WebRTC + LLM system that simulates live interviews with streaming feedback.

realtimeaiwebrtc

Problem Framing

Mock interviews often feel fake because latency and feedback lag break immersion. I wanted a browser-based system that felt like a real interviewer: live voice, timely cues, and structured scoring.

Architecture and Protocols

Browser (WebRTC + Web Audio) --> Edge Router (Cloudflare Workers)
                                 |--> Session Manager (Next.js API)
                                 |--> Audio Pipeline (Node + WebSocket)
                                 |--> Scoring Service (Python + FastAPI)
                                 |--> LLM Evaluator (Azure OpenAI)
  • WebRTC handles bidirectional audio between candidate and AI voice.
  • WebSockets stream transcripts, scoring events, and UI hints.
  • Edge Router terminates connections close to users and injects auth claims.

Transcript and Scoring Pipeline

  1. Browser captures audio, encodes to Opus, and sends over WebRTC.
  2. Audio pipeline fans out to:
    • Azure Speech for transcription.
    • Real-time sentiment model (ONNX) for tone.
    • Keyword extractor for domain-specific coverage.
  3. FastAPI scoring service aggregates features and calls GPT-4 for contextual feedback.
  4. Final scores are normalized (content, structure, clarity, confidence) with rubrics we co-designed with hiring managers.

Handling Network Issues Gracefully

  • Connection watchdog pings every five seconds using WebRTC data channels.
  • Automatic degrade to audio-only hints if bandwidth drops.
  • UI surfaces "Poor connection" state and continues collecting transcript for later review.
  • Partial results are cached so interviews resume without losing context.

Product Layer

Feedback appears in three tiers:

  • Instant cues ("Wrap up this example") inline above the prompt.
  • Segment summaries after each question.
  • Final analytics with recommendations and example phrasing.

A/B tests showed that summarizing in bullets instead of paragraphs increased completion rates by 17%.

Lessons Learned

  • Aim for < 400 ms round trip on transcripts; anything higher feels laggy.
  • Users trust the system more when they see confidence indicators per score.
  • WebRTC debugging is hard—invest in end-to-end tracing and WebRTC internals logging early.

The result is a browser-native experience that feels surprisingly human while staying reliable under flaky networks.