Project - Priyanka Nidadavolu

01 - The Problem

Why one database isn't enough for drug safety

When a clinician prescribes medication they need to simultaneously answer three different questions: Is this drug appropriate for this patient's history? (structured EHR data), Does it interact with their other medications? (graph relationships), and What did real patients experience? (adverse event narratives). No single database paradigm handles all three well.

Traditional systems force clinicians to query multiple disconnected tools and mentally synthesise the results - a cognitive burden that contributes to prescribing errors. MediDB collapses this into one query, one report.

Data sources: Synthea synthetic EHR (PostgreSQL) · DrugBank + SIDER drug interactions (Neo4j) · openFDA FAERS adverse events (MongoDB + Qdrant vectors) · BioLORD-2023 clinical embeddings for semantic search.

02 - Architecture

Four databases, one orchestration layer

Each database is chosen for its paradigm strength. The orchestration layer sends parallel queries, collects results, and merges them into a structured safety report with a risk level, interaction insights, and alternative drug suggestions.

# Multi-step retrieval pipeline
def evaluate_drug_safety(patient_id, drug_name):
    # 1. Patient context from PostgreSQL
    patient = pg_client.get_patient_history(patient_id)
    current_meds = patient["medications"]

    # 2. Interaction graph from Neo4j
    interactions = neo4j_client.find_interactions(drug_name, current_meds)

    # 3. Similar adverse event cases from Qdrant
    embedding = bioLORD.encode(f"{drug_name} {patient['conditions']}")
    similar_cases = qdrant.search(embedding, top_k=10,
                                  filters={"severity": "serious"})

    # 4. FAERS raw reports from MongoDB for audit trail
    faers_reports = mongo.get_reports(drug_name)

    return build_safety_report(patient, interactions, similar_cases, faers_reports)

Multi-DB Architecture Design

Designed the paradigm selection: PostgreSQL for relational patient queries, Neo4j for drug interaction traversal, Qdrant for semantic case retrieval, MongoDB for FAERS raw storage + audit logs.

Graph Modeling in Neo4j

Modelled DrugBank interaction data as a property graph - drugs as nodes, interactions as typed edges with severity weights. Cypher traversal detects polypharmacy clusters and direct drug-drug conflicts.

BioLORD Vector Search

Used BioLORD-2023 (a biomedical sentence transformer) to embed clinical narratives from FAERS. Qdrant's multi-filter search retrieves top-k real-world adverse event cases matching drug, severity, and patient demographics.

ETL Pipeline

Three separate ETL jobs: Synthea EHR → PostgreSQL normalised schema; FAERS CSV → MongoDB raw + aggregated; SIDER side-effect data → Neo4j drug-symptom edges. All containerised with Docker Compose.

Streamlit Clinical UI

Four-tab interface: Safety Check (main report), Patient Data, Drug Knowledge Graph, Evidence Audit. The safety report shows risk level (Low/Medium/High), interaction details, and alternative drug suggestions.

Traceability & Audit

Every safety report is logged to MongoDB with timestamps, source citations, and the specific evidence used - making every recommendation fully traceable and auditable for clinical accountability.

03 - Challenges

Hard problems we solved

Challenge

FAERS data quality

The FDA adverse event dataset has severe inconsistencies - drug name variations, missing fields, duplicate reports. We built a normalisation layer that standardises drug names via RxNorm lookups before storage.

Challenge

Cross-DB result merging

Merging results from four different data models (relational, graph, vector, document) into one coherent report required careful schema design for the orchestration layer's output format.

Challenge

Embedding quality

Generic sentence embeddings performed poorly on clinical text. BioLORD-2023 (fine-tuned on biomedical literature) gave substantially better semantic similarity results for adverse event retrieval.

Challenge

Docker orchestration

Getting four databases + the Python app to initialise in the correct order with health checks and retry logic required careful Docker Compose dependency management.

04 - Stack

Technologies Used

Component	Tool & Purpose
Relational DB	PostgreSQL - normalised EHR data with Synthea synthetic patient records
Graph DB	Neo4j + Cypher - drug interaction network, polypharmacy detection
Vector DB	Qdrant - semantic similarity search over BioLORD clinical embeddings
Document DB	MongoDB - FAERS adverse event reports + audit log storage
Embeddings	BioLORD-2023 (Sentence Transformers) - biomedical clinical text encoding
UI	Streamlit - interactive clinical decision-support interface
Infrastructure	Docker Compose - containerised multi-database deployment
Data Sources	Synthea · DrugBank · SIDER · openFDA FAERS · RxNorm

05 - Outcomes

Results

database paradigms integrated into one query

Full

audit traceability for every safety report

unified report replacing 4 separate tool queries

Eliminated the need for clinicians to manually cross-reference multiple systems
BioLORD embeddings substantially outperformed generic embeddings for adverse event retrieval
Full Docker Compose deployment - any institution can spin up the system in minutes
Audit log provides complete provenance for every recommendation made by the system

06 - Learnings

What I took away

Choosing the right database paradigm for each data type is more impactful than optimising within a single paradigm - graph traversal for interactions is orders of magnitude more natural than SQL self-joins.
Domain-specific embeddings (BioLORD) dramatically outperform general-purpose ones for clinical text retrieval - generic models miss clinical synonyms and abbreviations.
Multi-database systems need an explicit orchestration contract defining exactly how results from each source are merged and weighted.
Docker Compose health checks and dependency ordering are non-trivial to get right when databases have complex initialisation sequences.

All Projects Jan – Mar 2026 · UC San Diego

MediDB: Drug Safety &Recommendation System

Why one database isn't enough for drug safety

Four databases, one orchestration layer

Multi-DB Architecture Design

Graph Modeling in Neo4j

BioLORD Vector Search

ETL Pipeline

Streamlit Clinical UI

Traceability & Audit

Hard problems we solved

FAERS data quality

Cross-DB result merging

Embedding quality

Docker orchestration

Technologies Used

Results

What I took away

MediDB: Drug Safety &
Recommendation System