~/metaguard ai
AI data governance copilot for OpenMetadata
// overview
A governance copilot built on top of OpenMetadata that auto-classifies PII, scores metadata quality, and lets you query your data catalog in plain English. Built for the OpenMetadata Hackathon.
// problem
Data governance in large organizations is manual and inconsistent. Teams tag PII by hand, health metrics live in spreadsheets, and finding the right dataset still means asking someone who's been there long enough to know. I wanted to see how much of that could be automated with a small AI layer on top of an existing catalog.
// approach
Used OpenMetadata as the data catalog backbone — it already has an API for everything. On top of it, wired Gemini 2.5 Flash to scan column names and sample metadata for PII signals, then wrote the tags back through the OpenMetadata REST API. The health score aggregates completeness signals (descriptions filled, owners set, tags present) into a single 0–100 number. The NL query agent routes each question to the right OpenMetadata endpoint rather than trying to answer from model memory.
// highlights
- ▸PII auto-classification using Gemini 2.5 Flash; tags pushed back via OpenMetadata APIs
- ▸Governance health score (0–100) based on metadata completeness and tagging coverage
- ▸Natural language querying of metadata via AI agent with dynamic API routing
- ▸Built end-to-end during a hackathon sprint
// learnings
Dynamic API routing via an LLM agent is surprisingly reliable when you give it a strict schema of available endpoints. Hallucination rate dropped dramatically once I switched from free-form descriptions to typed tool definitions.