ClinicOps Copilot
Agentic operations layer for a synthetic clinic.
The problem
Most healthcare AI demos on GitHub show what a LangChain tutorial does in a notebook. They don’t show what shipping an agent into a clinical operations workflow actually requires:
- Integrating with messy real-world systems (EHR, scheduling, billing, coverage)
- Instrumenting every tool call so an ops team can audit what the AI did
- Shipping a CLI someone’s IT team can actually run on their laptop
- Proving correctness with an eval harness that runs on every PR
- Handing over a one-command deploy to a cloud account
ClinicOps Copilot is an attempt at the full end-to-end version of that.
The approach
Three Claude agents operate over a synthetic FHIR R4 PostgreSQL database seeded by Synthea. A FastAPI gateway routes requests to the right agent. Every tool call streams to a SQLite events store. A Streamlit dashboard reads from the events store in real time so the ops team can see exactly what the AI is doing.
| Agent | Status | Role | Tool calls |
|---|---|---|---|
| Scheduler | shipped | Books, reschedules, cancels appointments. Handles double-bookings, slot conflicts, provider availability. | find_open_slots, book_appointment, cancel_appointment, lookup_patient |
| Eligibility | shipped | Checks insurance coverage status from FHIR Coverage resource. Flags expired plans, missing prior auth. | lookup_coverage, check_active_period, get_payor_rules |
| Triage | shipped | Routes new patient intents to the right downstream agent or human. Handles Spanish code-switching. | classify_intent, route_to_agent, escalate_to_human |
A fourth Billing/RCM agent is planned for Phase 2.
Architecture decisions worth defending
- No LangChain, no LlamaIndex. Custom tool-use loop on the OpenAI Python SDK pointed at OpenRouter. Single provider, no provider-switching code paths, no abstraction tax.
- FHIR R4 over a custom schema. Any real clinic already has FHIR. Meeting reality where it is matters more than schema cleverness.
- SQLite events store, not a vendor observability platform. Local-first observability survives environments where Datadog and friends don’t reach.
- Terraform module, not a Helm chart. Infra teams in clinical settings are more comfortable with Terraform. Optimize for their on-call rotation, not for engineering taste.
- Eval harness on every PR. 20 golden test cases (booking conflicts, coverage edge cases, Spanish code-switching) are the contract. If an eval fails, the build fails. No clever LLM-as-judge, just deterministic pass/fail on tool call sequences.
Why this matters
The interesting work in clinical AI isn’t the model. It’s the infrastructure around the model: tool-use loops that work under failure, observability that survives air-gapped environments, evals that catch silent degradations. ClinicOps Copilot is my attempt at that whole stack, end to end, for a realistic clinical workflow.
Try it live
Try ClinicOps Copilot
Ask a natural-language question about a synthetic clinical operations dataset. The copilot classifies intent, generates SQL, runs it against a seeded SQLite, and summarizes the result.
Run it locally
Every project is open source. Clone the repo, install dependencies with
uv sync, drop your API keys into .env, and run the
CLI. The README walks you through each step.
git clone https://github.com/deepmind11/clinic-ops-copilot.git
cd clinic-ops-copilot
uv sync
cp .env.example .env # add your keys
uv run python -m clinic_ops_copilot --help Domain
- Healthcare operations
- FHIR R4
- Clinical AI deployment