·2 min read
Reliable AI in the Browser: Designing a Real-Time Interview Platform
How I built a WebRTC + LLM system that simulates live interviews with streaming feedback.
realtimeaiwebrtc
Problem Framing
Mock interviews often feel fake because latency and feedback lag break immersion. I wanted a browser-based system that felt like a real interviewer: live voice, timely cues, and structured scoring.
Architecture and Protocols
Browser (WebRTC + Web Audio) --> Edge Router (Cloudflare Workers)
|--> Session Manager (Next.js API)
|--> Audio Pipeline (Node + WebSocket)
|--> Scoring Service (Python + FastAPI)
|--> LLM Evaluator (Azure OpenAI)
- WebRTC handles bidirectional audio between candidate and AI voice.
- WebSockets stream transcripts, scoring events, and UI hints.
- Edge Router terminates connections close to users and injects auth claims.
Transcript and Scoring Pipeline
- Browser captures audio, encodes to Opus, and sends over WebRTC.
- Audio pipeline fans out to:
- Azure Speech for transcription.
- Real-time sentiment model (ONNX) for tone.
- Keyword extractor for domain-specific coverage.
- FastAPI scoring service aggregates features and calls GPT-4 for contextual feedback.
- Final scores are normalized (content, structure, clarity, confidence) with rubrics we co-designed with hiring managers.
Handling Network Issues Gracefully
- Connection watchdog pings every five seconds using WebRTC data channels.
- Automatic degrade to audio-only hints if bandwidth drops.
- UI surfaces "Poor connection" state and continues collecting transcript for later review.
- Partial results are cached so interviews resume without losing context.
Product Layer
Feedback appears in three tiers:
- Instant cues ("Wrap up this example") inline above the prompt.
- Segment summaries after each question.
- Final analytics with recommendations and example phrasing.
A/B tests showed that summarizing in bullets instead of paragraphs increased completion rates by 17%.
Lessons Learned
- Aim for < 400 ms round trip on transcripts; anything higher feels laggy.
- Users trust the system more when they see confidence indicators per score.
- WebRTC debugging is hard—invest in end-to-end tracing and WebRTC internals logging early.
The result is a browser-native experience that feels surprisingly human while staying reliable under flaky networks.