GridGreen - Priyanka Nidadavolu

01 - The Problem

ML training has hidden, variable climate costs

The same training job can produce 2× different CO₂ emissions depending on when and where it runs. Model choice alone can determine whether a job consumes 0.5 kWh or 15 kWh. Existing tools like CodeCarbon only measure emissions after execution - by then the decision is already made.

GridGreen flips this: it analyzes ML training code before execution and gives engineers the information they need to make carbon-aware decisions at development time.

Key insight: Carbon efficiency in ML is a decision problem, not just a measurement problem. GridGreen shifts awareness from after execution to before execution.

02 - What I Built

System Architecture

User pastes training code into a Monaco editor. FastAPI backend runs three analyses in parallel and returns results in under 20ms.

Carbon Estimation Engine

Python AST + regex parses training scripts to extract model type, batch size, training loops. Applies Kaplan 2020, Patterson 2022, Strubell 2019 scaling laws: Code → FLOPs → Energy → CO₂. Includes methodology citations and explicit limitations.

RAG Model Suggestions

Sentence Transformers (MiniLM) over a curated dataset of 58 model-swap pairs. TF-IDF fallback for improved recall on the small corpus. Returns alternatives with compute reduction %, benchmark retention (MMLU), and citations. Example: GPT-2 Large → DistilGPT-2 (-77.6% compute, 94.2% MMLU).

Grid-Aware Scheduling

Ingests US EIA hourly grid carbon intensity data into SQLite. Prophet model (+ Seasonal Naive fallback) forecasts the cleanest 48-hour execution window. Same job, 2× less CO₂ just by timing it right.

Session Scorecard

Tracks cumulative CO₂ saved and decisions made across a session - gamifying sustainable ML decisions and encouraging long-term behavior change.

MCP Agent Integration

Wraps all backend APIs as Model Context Protocol tools. GridGreen works inside Claude Desktop and Cursor with zero UI changes - call estimate_carbon() and suggest_alternatives() directly from any agent workflow.

Cloud & Infra Pipeline

Snowflake Cortex for vector storage, Databricks DLT for data ingestion, AWS SageMaker for batch processing, NVIDIA Brev for GPU workloads, Weights & Biases for experiment tracking, Google Gemini for NL reasoning.

03 - Stack

Technologies Used

Component	Tool & Purpose
Frontend	Next.js 15 + Monaco Editor + Tailwind CSS + Recharts + Framer Motion
Backend	FastAPI - <20ms latency, all analysis endpoints
Code Analysis	Python AST + Regex - extract model type, training loops, batch size
Carbon Engine	Scaling laws (Kaplan 2020, Patterson 2022, Strubell 2019) - FLOPs → CO₂
RAG System	Sentence Transformers (MiniLM) + TF-IDF fallback + 58 model-swap pairs
Grid Forecasting	US EIA hourly data + SQLite + Prophet (+ Seasonal Naive fallback)
MCP Integration	Model Context Protocol - Claude Desktop, Cursor, agent pipelines
Cloud / Infra	Snowflake Cortex · Databricks DLT · AWS SageMaker · NVIDIA Brev · W&B · Gemini

04 - Results

Evaluation (12/12 workloads)

Metric	Value
Success Rate	100% (12/12 workloads)
Mean Analysis Latency	<20ms
Suggestion Coverage	66.7%
CO₂ Reduction (LLMs)	54.9%
CO₂ Reduction (Vision/Audio)	57.1%
Avg Compute Reduction	77.6%

05 - Outcomes

Impact

100%

success rate across 12 ML workload types

77.6%

average compute reduction via model swaps

<20ms

end-to-end analysis latency

Pre-run analysis shifts carbon decisions from measurement to prevention
RAG with TF-IDF fallback handles small dataset retrieval better than dense retrieval alone
MCP integration means GridGreen works in Claude Desktop and Cursor with no UI changes required
Grid scheduling finds windows where the same job produces 2× less CO₂

06 - Challenges

What made it hard

Challenge

Carbon estimate credibility

Early estimates lacked grounding. Added citations to published scaling laws and included explicit methodology limitations in every output.

Challenge

RAG on 58 pairs

Dense retrieval alone failed at this scale. TF-IDF fallback significantly improved recall for edge cases outside the embedding space.

Challenge

EIA API instability

Rate limits caused failures in grid data ingestion. Added mock mode and a diagnostics endpoint so the system degrades gracefully without breaking the core analysis flow.

Challenge

Multi-system integration

MCP + FastAPI + Next.js had conflicting dev environments. Required careful process coordination and explicit port management across three separate servers.

07 - Learnings

What I took away

Pre-run analysis is fundamentally more useful than post-hoc measurement - you cannot un-run a training job.
Hybrid retrieval (dense + sparse TF-IDF) outperforms pure embedding search on small, curated datasets.
MCP is a powerful abstraction - wrapping a FastAPI as MCP tools makes it instantly usable in any agent workflow without a single additional UI line.
Prophet handles seasonality cleanly for grid forecasting, but always build a Seasonal Naive fallback - EIA data gaps will break pure ML forecasters.

All Projects Apr 2026 · UC San Diego · DataHacks 2026