Skip to content
← All projects
active started 2026-04-07

Constella

Polaris-style voice constellation with native English-Spanish code-switching.

Microsoft VibeVoice (ASR + TTS)Llama 3.1 70B + 8B (Groq)LangGraphPolaris-style constellationPython 3.11uv

The problem

Bilingual patients, especially in U.S. Latino populations, regularly mix English and Spanish mid-sentence (“¿Tomo el inhaler dos veces al día, but only when I feel short of breath?”). Current voice AI systems don’t handle this well. They either pick a language per utterance (bad) or drift between voices mid-span (jarring). Intra-utterance code-switching is a real engineering problem, and most off-the-shelf TTS stacks treat it as an afterthought.

Around the same time I was reading about this, Microsoft released VibeVoice, the first open-source ASR model that natively handles code-switching across 50+ languages with no language parameter required.

I wanted to know: what happens if you build a safety-verifier constellation entirely on VibeVoice?

The approach

Constella is the answer in a 2-day build. The architecture is one stateful primary conversational agent (Llama 3.1 70B as “Nurse Constella”) flanked by four parallel specialist verifiers that fan out after every primary turn:

SpecialistRoleModel
LanguagePer-utterance language ID + intra-utterance code-switch segmentation. Routes downstream prompts to ES, EN, or MIXLlama 3.1 8B (Groq)
MedicationVerifies dosage, timing, contraindications against the patient’s med listLlama 3.1 8B (Groq)
Labs & VitalsExtracts numeric values from utterances (“My sugar was 240 this morning”) and flags out-of-rangeLlama 3.1 8B (Groq)
EscalationDetects red-flag symptoms (chest pain, severe hypoglycemia, DKA, suicidal ideation) and triggers human handoffLlama 3.1 8B (Groq)

The orchestrator merges specialist verdicts before the primary’s turn is emitted. If escalation fires, the call is hard-transferred. If a med error is detected, the turn is rewritten. Otherwise the response flows to VibeVoice TTS (~300ms first audible latency) and back to the patient.

Use case

Post-discharge follow-up calls for a Type 2 diabetic patient who code-switches between English and Spanish. Narrow on purpose — the goal is to prove the architecture works on a clinically meaningful scenario, not to ship a general-purpose agent.

Why this matters

Bilingual patients code-switch, and healthcare voice AI that pretends otherwise fails the most linguistically diverse populations. Constella is a runnable, evaluable, open-source attempt at the safety architecture and the linguistic edge cases at the same time.

Try it live

Try Constella

Type a sentence that code-switches between English and Spanish. Constella detects each language span on the edge; your browser then plays the audio with per-span native voices.

live
Try:

Run it locally

Every project is open source. Clone the repo, install dependencies with uv sync, drop your API keys into .env, and run the CLI. The README walks you through each step.

git clone https://github.com/deepmind11/constella.git
cd constella
uv sync
cp .env.example .env  # add your keys
uv run python -m constella --help

Domain

  • Healthcare voice AI
  • Bilingual / code-switching
  • Patient safety
  • Multi-agent verification

Tech stack

Microsoft VibeVoice (ASR + TTS)Llama 3.1 70B + 8B (Groq)LangGraphPolaris-style constellationPython 3.11uv