Quick Start

MedHELM is a public Python library with fewer dependencies and straightforward installation. Install from PyPI and run benchmarks with the uv workflow (or use pip and helm-run if you prefer).

Scenarios: PubMedQA, MedCalc-Bench, MedicationQA, MedHallu.

uv pip install medhelm

Run a benchmark:

uv run medhelm-run \
  --run-entries "pubmed_qa:model=huggingface/qwen2.5-7b" \
  --suite my_med_test \
  --max-eval-instances 10
uv run helm-summarize --suite my_med_test
uv run helm-server --suite my_med_test

Then open http://localhost:8000/ in your browser.

Clinical NLP tier (summarization)

Adds heavier libraries (bert-score, rouge-score, nltk). Install may take 2–3 minutes.

Scenarios: DischargeMe (hospital course summaries), ACI-Bench (clinical transcripts), Patient-Edu (simplifying medical jargon).

uv pip install "medhelm[summarization]"

Example:

uv run medhelm-run \
  --run-entries "discharge_summaries:model=huggingface/qwen2.5-7b" \
  --suite med_summaries \
  --max-eval-instances 5
uv run helm-summarize --suite med_summaries
uv run helm-server --suite med_summaries

Gated / licensing tier (Drive scenarios)

Adds gdown so the code can download data from Google Drive. Install can also take longer.

Scenarios: MedQA (USMLE/Board exams), MedMCQA (AIIMS/NEET exams).

uv pip install "medhelm[gated]"

Example:

uv run medhelm-run \
  --run-entries "med_qa:model=huggingface/qwen2.5-7b" \
  --suite board_exams \
  --max-eval-instances 10
uv run helm-summarize --suite board_exams
uv run helm-server --suite board_exams

Summary

Tier Install Scenarios
Standard uv pip install medhelm PubMedQA, MedCalc-Bench, MedicationQA, MedHallu
Summarization uv pip install "medhelm[summarization]" DischargeMe, ACI-Bench, Patient-Edu (2–3 min install)
Gated uv pip install "medhelm[gated]" MedQA, MedMCQA (Drive)

You can use pip install medhelm (and pip install "medhelm[summarization]" / pip install "medhelm[gated]") instead of uv pip install; then run with medhelm-run (or helm-run), helm-summarize, and helm-server.