Constella

The problem

Bilingual patients, especially in U.S. Latino populations, regularly mix English and Spanish mid-sentence (“¿Tomo el inhaler dos veces al día, but only when I feel short of breath?”). Current voice AI systems don’t handle this well. They either pick a language per utterance (bad) or drift between voices mid-span (jarring). Intra-utterance code-switching is a real engineering problem, and most off-the-shelf TTS stacks treat it as an afterthought.

Around the same time I was reading about this, Microsoft released VibeVoice, the first open-source ASR model that natively handles code-switching across 50+ languages with no language parameter required.

I wanted to know: what happens if you build a safety-verifier constellation entirely on VibeVoice?

The approach

Constella is the answer in a 2-day build. The architecture is one stateful primary conversational agent (Llama 3.1 70B as “Nurse Constella”) flanked by four parallel specialist verifiers that fan out after every primary turn:

Specialist	Role	Model
Language	Per-utterance language ID + intra-utterance code-switch segmentation. Routes downstream prompts to ES, EN, or MIX	Llama 3.1 8B (Groq)
Medication	Verifies dosage, timing, contraindications against the patient’s med list	Llama 3.1 8B (Groq)
Labs & Vitals	Extracts numeric values from utterances (“My sugar was 240 this morning”) and flags out-of-range	Llama 3.1 8B (Groq)
Escalation	Detects red-flag symptoms (chest pain, severe hypoglycemia, DKA, suicidal ideation) and triggers human handoff	Llama 3.1 8B (Groq)

The orchestrator merges specialist verdicts before the primary’s turn is emitted. If escalation fires, the call is hard-transferred. If a med error is detected, the turn is rewritten. Otherwise the response flows to VibeVoice TTS (~300ms first audible latency) and back to the patient.

Use case

Post-discharge follow-up calls for a Type 2 diabetic patient who code-switches between English and Spanish. Narrow on purpose — the goal is to prove the architecture works on a clinically meaningful scenario, not to ship a general-purpose agent.

Why this matters

Bilingual patients code-switch, and healthcare voice AI that pretends otherwise fails the most linguistically diverse populations. Constella is a runnable, evaluable, open-source attempt at the safety architecture and the linguistic edge cases at the same time.

Try it live

Try Constella

Type a sentence that code-switches between English and Spanish. Constella detects each language span on the edge; your browser then plays the audio with per-span native voices.

live

Type a sentence that switches between English and Spanish

Try:

Run it locally

Every project is open source. Clone the repo, install dependencies with uv sync, drop your API keys into .env, and run the CLI. The README walks you through each step.

git clone https://github.com/deepmind11/constella.git
cd constella
uv sync
cp .env.example .env  # add your keys
uv run python -m constella --help