How Google’s DS-STAR turns natural language questions into verified answers over raw data files.
You’ve got a folder of CSVs, a JSON dump, or a mix of semi-structured files. Someone asks: “What was our top-selling product last quarter?” or “Which regions had the highest churn?” Today, that usually means writing a script, debugging it, and only then getting an answer. What if an agent could do the exploration, coding, and verification for you?
Google’s DS-STAR (Data Science Agent via Iterative Planning and Verification) does exactly that. I’ve open-sourced a Python implementation so you can run it on your own data with Gemini or OpenAI.
What DS-STAR Does
DS-STAR answers factoid questions over structured or unstructured data. You give it:
A question (e.g. “What’s the average order value by region?”)
A data directory (CSV, JSON, Markdown, etc.)
Optional guidelines for how you want the final answer formatted
It returns a concrete answer plus the code and reasoning it used to get there; no hand-written ETL required.
How It Works Under the Hood
The pipeline is a multi-agent loop:
Analyzer: Explores your raw data files and builds summaries so downstream agents know what’s available.
Planner → Coder → Execute → Debugger → Verifier: Plans a solution, generates code, runs it, fixes errors, and checks that the result actually answers the question. If verification fails, the Router decides the next step and the loop continues.
Finalizer: Turns the verified result into a clear, formatted answer (e.g. markdown, bullet points).
So you get planning, code execution, and verification in one system, closer to how a data scientist would work than to a one-shot LLM answer.
Selling Points For Data Scientists
Point and ask: Point at a data directory and ask in plain language; exploration of raw data files is built-in.
Auditable: You get the code that produced the answer and logs of the plan and verification steps.
Flexible: Works with mixed file types and supports both Gemini and OpenAI backends.
Try It Yourself
Install with Poetry, point it at your data directory and output directory, and call run_analysis() with your question. The repo includes a minimal example and a DABstep-based example for harder, multi-step tasks.
Implementation (Python, Gemini + OpenAI):
https://github.com/jonathan-hsiao/DS-STAR
Paper:
DS-STAR: Data Science Agent via Iterative Planning and Verification (arXiv)
If you use it on your own datasets or extend it, I’d love to hear how it goes. Drop a note in the repo discussions or reach out.