ClinicOps Copilot

The problem

Most healthcare AI demos on GitHub show what a LangChain tutorial does in a notebook. They don’t show what shipping an agent into a clinical operations workflow actually requires:

Integrating with messy real-world systems (EHR, scheduling, billing, coverage)
Instrumenting every tool call so an ops team can audit what the AI did
Shipping a CLI someone’s IT team can actually run on their laptop
Proving correctness with an eval harness that runs on every PR
Handing over a one-command deploy to a cloud account

ClinicOps Copilot is an attempt at the full end-to-end version of that.

The approach

Three Claude agents operate over a synthetic FHIR R4 PostgreSQL database seeded by Synthea. A FastAPI gateway routes requests to the right agent. Every tool call streams to a SQLite events store. A Streamlit dashboard reads from the events store in real time so the ops team can see exactly what the AI is doing.

Agent	Status	Role	Tool calls
Scheduler	shipped	Books, reschedules, cancels appointments. Handles double-bookings, slot conflicts, provider availability.	`find_open_slots`, `book_appointment`, `cancel_appointment`, `lookup_patient`
Eligibility	shipped	Checks insurance coverage status from FHIR Coverage resource. Flags expired plans, missing prior auth.	`lookup_coverage`, `check_active_period`, `get_payor_rules`
Triage	shipped	Routes new patient intents to the right downstream agent or human. Handles Spanish code-switching.	`classify_intent`, `route_to_agent`, `escalate_to_human`

A fourth Billing/RCM agent is planned for Phase 2.

Architecture decisions worth defending

No LangChain, no LlamaIndex. Custom tool-use loop on the OpenAI Python SDK pointed at OpenRouter. Single provider, no provider-switching code paths, no abstraction tax.
FHIR R4 over a custom schema. Any real clinic already has FHIR. Meeting reality where it is matters more than schema cleverness.
SQLite events store, not a vendor observability platform. Local-first observability survives environments where Datadog and friends don’t reach.
Terraform module, not a Helm chart. Infra teams in clinical settings are more comfortable with Terraform. Optimize for their on-call rotation, not for engineering taste.
Eval harness on every PR. 20 golden test cases (booking conflicts, coverage edge cases, Spanish code-switching) are the contract. If an eval fails, the build fails. No clever LLM-as-judge, just deterministic pass/fail on tool call sequences.

Why this matters

The interesting work in clinical AI isn’t the model. It’s the infrastructure around the model: tool-use loops that work under failure, observability that survives air-gapped environments, evals that catch silent degradations. ClinicOps Copilot is my attempt at that whole stack, end to end, for a realistic clinical workflow.

Try it live

Try ClinicOps Copilot

Ask a natural-language question about a synthetic clinical operations dataset. The copilot classifies intent, generates SQL, runs it against a seeded SQLite, and summarizes the result.

live

Ask a question about the synthetic clinical ops dataset

Try:

Run it locally

Every project is open source. Clone the repo, install dependencies with uv sync, drop your API keys into .env, and run the CLI. The README walks you through each step.

git clone https://github.com/deepmind11/clinic-ops-copilot.git
cd clinic-ops-copilot
uv sync
cp .env.example .env  # add your keys
uv run python -m clinic_ops_copilot --help