Med Veda
A private medical assistant that runs entirely on your phone — MedGemma 1.5 4B reading X-rays and answering questions over patient records, with zero data leaving the device.

Built for the Kaggle Google MedGemma Challenge — a fully offline, multimodal clinical assistant on Android, with longitudinal patient records and English / Hindi / Telugu output, powered by a custom llama.cpp backend.
The problem
Clinical AI almost always means sending patient data to a server. In a hospital that is two problems at once: privacy — protected health information (PHI) leaving the device — and latency, when you need a patient's history or a read on an X-ray now, not after a round trip to the cloud.
I wanted to find out how far a small, specialised model could go if it never left the phone.
What I built
Med Veda is an Android-first medical assistant that runs the MedGemma 1.5 4B multimodal model completely on-device:
- Chat with patient records — ask natural-language questions over a patient's longitudinal history, stored locally.
- Read X-rays — show a chest X-ray and ask for a structured read; the model responds across heart size, lung fields, bones, and mediastinum.
- Stays private — 100% local execution, so PHI never leaves the device.
- Speaks the patient's language — generates output in English, Hindi, and Telugu for accessibility.
- Voice input — dictate symptoms instead of typing.
Running a 4B multimodal model on a phone
The hard part was inference. I moved off MediaPipe onto a custom llama.cpp
backend that runs Q4_K_M GGUF weights directly on Android, with multimodal
support through the medically-tuned SigLIP vision encoder. The ~2.8 GB model
downloads on first launch via a background service, then everything runs
offline. Tested on a Qualcomm Innovator Development Kit (Snapdragon 8 Elite
Gen 5) and a Pixel 7 Pro.
Fine-tuning for the clinic
To get consistent, clinically-shaped answers I fine-tuned with QLoRA
(rank 32 on the q_proj / v_proj attention matrices) to teach SOAP-style
structure and vernacular translation, then converted the Hugging Face weights to
Q4_K_M GGUF for the edge.
What I learned
- On-device multimodal is genuinely viable now. The ceiling is memory and careful quantization, not raw capability.
- Privacy can be an architectural property, not a policy. "No egress" is something you get for free once the model runs locally — and that changes what's possible in regulated settings.
Scroll sideways · click any photo to enlarge