Conversational AI — Experience Center | Praveen K C — Lead Unity Developer & XR Specialist

Overview

Designed and developed a production-grade conversational AI backend powering two distinct avatar personas — Nora (DevLearn 2024, Las Vegas) and Puja (Sify Experience Center, Navi Mumbai). The system orchestrates a Unity 3D avatar client and a separate camera feed application over a single Socket.IO server, handling real-time speech transcription, LLM response streaming, text-to-speech synthesis, computer vision, face recognition, and retrieval-augmented generation — all wired together into a seamless, sentence-by-sentence avatar response pipeline.

Key Responsibilities

•
Lead Backend Architect — Socket.IO Server: Designed the full server architecture using Python and Socket.IO, handling two simultaneous client connections (Unity avatar app and camera feed app) correlated by session ID, with per-session state management, async queues, and clean event-driven separation across the events/ and modules/ layers.
•
Multi-Modal Pipeline Integration: Engineered the end-to-end audio-visual pipeline — Unity microphone audio → Whisper transcription → LLM streaming → Edge TTS sentence synthesis → avatar response emission — alongside a parallel video pipeline feeding YOLO person detection and HOG face recognition into the same session context.
•
Dual Avatar Persona System (Nora & Puja): Built a configurable avatar persona framework supporting independent scenario definitions, voice IDs, chat history collections, system prompt contexts, and predefined audio paths per avatar — ingested and managed via MongoDB with a dedicated scenario ingestion script.
•
Streaming LLM Response Processing: Implemented a custom SSE stream parser consuming structured LLM output (emotion tags, sentence delimiters, end markers) to yield per-sentence (emotion, message) tuples, enabling real-time sentence-by-sentence TTS and avatar animation sequencing with a drain queue driven by avatar playback state events from the Unity client.
•
RAG & Vision Integration: Integrated ChromaDB-backed retrieval-augmented generation with keyword-triggered context injection, and built a vision LLM path that encodes the latest camera frame alongside the user transcript for image-aware responses — with automatic fallback to text-only when no valid frame is available.
•
Idle & Interrupt Handling: Developed proactive idle-state logic that triggers vision-based conversation starters when a person is detected but no speech occurs (~40s threshold), and an LLM-driven interrupt decision system that determines whether mid-response user speech should override the current avatar output or be ignored.
•
Unity Client Integration & Cross-Team Coordination: Defined and maintained the Socket.IO event contract (get_sid, audio_data, nora_response, avatar_state, image_data, etc.) between the backend and the Unity development team, ensuring reliable session correlation, queue drain behavior, and avatar state synchronization across both clients.

Conversational AI — Experience Center

Overview

Key Responsibilities

Gallery

Tech Stack

Continue Exploring

UTI Simulation — Ultrasonic Immersion Testing