Constella
Polaris-style voice constellation with native English-Spanish code-switching.
The problem
Bilingual patients, especially in U.S. Latino populations, regularly mix English and Spanish mid-sentence (“¿Tomo el inhaler dos veces al día, but only when I feel short of breath?”). Current voice AI systems don’t handle this well. They either pick a language per utterance (bad) or drift between voices mid-span (jarring). Intra-utterance code-switching is a real engineering problem, and most off-the-shelf TTS stacks treat it as an afterthought.
Around the same time I was reading about this, Microsoft released VibeVoice, the first open-source ASR model that natively handles code-switching across 50+ languages with no language parameter required.
I wanted to know: what happens if you build a safety-verifier constellation entirely on VibeVoice?
The approach
Constella is the answer in a 2-day build. The architecture is one stateful primary conversational agent (Llama 3.1 70B as “Nurse Constella”) flanked by four parallel specialist verifiers that fan out after every primary turn:
| Specialist | Role | Model |
|---|---|---|
| Language | Per-utterance language ID + intra-utterance code-switch segmentation. Routes downstream prompts to ES, EN, or MIX | Llama 3.1 8B (Groq) |
| Medication | Verifies dosage, timing, contraindications against the patient’s med list | Llama 3.1 8B (Groq) |
| Labs & Vitals | Extracts numeric values from utterances (“My sugar was 240 this morning”) and flags out-of-range | Llama 3.1 8B (Groq) |
| Escalation | Detects red-flag symptoms (chest pain, severe hypoglycemia, DKA, suicidal ideation) and triggers human handoff | Llama 3.1 8B (Groq) |
The orchestrator merges specialist verdicts before the primary’s turn is emitted. If escalation fires, the call is hard-transferred. If a med error is detected, the turn is rewritten. Otherwise the response flows to VibeVoice TTS (~300ms first audible latency) and back to the patient.
Use case
Post-discharge follow-up calls for a Type 2 diabetic patient who code-switches between English and Spanish. Narrow on purpose — the goal is to prove the architecture works on a clinically meaningful scenario, not to ship a general-purpose agent.
Why this matters
Bilingual patients code-switch, and healthcare voice AI that pretends otherwise fails the most linguistically diverse populations. Constella is a runnable, evaluable, open-source attempt at the safety architecture and the linguistic edge cases at the same time.
Try it live
Try Constella
Type a sentence that code-switches between English and Spanish. Constella detects each language span on the edge; your browser then plays the audio with per-span native voices.
Run it locally
Every project is open source. Clone the repo, install dependencies with
uv sync, drop your API keys into .env, and run the
CLI. The README walks you through each step.
git clone https://github.com/deepmind11/constella.git
cd constella
uv sync
cp .env.example .env # add your keys
uv run python -m constella --help Domain
- Healthcare voice AI
- Bilingual / code-switching
- Patient safety
- Multi-agent verification