I sat down last week to prepare a live demo of a full ML pipeline running on Saturn Cloud: data ingestion through model deployment, with lineage stitched between every stage. I built it in a single Claude Code session using the saturn-cloud plugin. This post is about what that workflow looks like in practice, and how you can use the same plugin to drive Saturn Cloud from inside Claude Code.
The demo itself is on the demo/saturn-data-pipeline branch of saturncloud/examples. The interesting part isn’t the pipeline (NYC taxi data, a tip-prediction model, nothing exotic). The interesting part is that every Saturn resource was created, applied, and debugged through the plugin’s MCP tools, and every Saturn-specific gotcha I hit got captured in the plugin’s skill file so the next session won’t hit it.
What the plugin gives Claude Code
The saturn-cloud Claude Code plugin exposes Saturn’s API as MCP tools and ships a skill file with Saturn-specific operational knowledge. Concretely, that means Claude can:
- List, read, create, update, and apply Saturn recipes (the YAML resource definitions for Jobs, Workspaces, and Deployments)
- Inspect live resources to see how working examples are actually shaped
- Look up valid instance types, images, and other API enums instead of guessing
- Reference a
SKILL.mdof accumulated Saturn-specific patterns and pitfalls
In practice this collapses the loop between “describe what you want” and “have it running on Saturn.” You don’t context-switch to the dashboard to apply a recipe, then back to the editor to fix the YAML, then to the logs tab to read the failure. Claude does all of that and reports back.
The pipeline we built
The brief was three sections, each with a “show, don’t tell” moment:
- Ingestion. External source to object storage to training-ready dataset, triggered by an API call.
- Dataset versioning. v1 to v2 to rollback. Prove that a model trained months ago can pull byte-identical training data today.
- Feature store. Deploy Feast, materialize features, serve them to both training and inference. Tie model versions to feature versions.
The architecture, in one paragraph: NYC TLC yellow taxi parquet as the through-line dataset; DVC for dataset versioning with the repo and remote both on Saturn’s shared NFS; Feast in no-server mode against the same shared mount; MLflow file-based with a separate Saturn Deployment running mlflow ui against the mlruns/ directory for a stable browse URL. Two ingest Jobs (v1 and v2), one Feast materialization Job, one Workspace for the training notebook, one Deployment for the inference endpoint, one Deployment for the MLflow UI.
The load-bearing piece is the lineage chain. Every MLflow run is tagged with:
dvc_commit, the DVC commit producing its training datafeast_project+feast_feature_view+feast_registry_mtime, the exact feature definitionssource_parquet, the file path the run actually read
The inference endpoint echoes those tags on every prediction. Given just an inference response, you can git checkout {dvc_commit} && dvc pull and rebuild byte-identical training data six months later.
How a section actually got built
Here’s the loop, using the ingest job as the worked example.
I gave Claude a paragraph: “API-triggerable pipeline that downloads a month of taxi data, cleans it, produces a new DVC version on every run.” Claude wrote ingest_taxi_dvc.py end-to-end on the first pass: download, clean, dvc add, git commit, dvc push. We tested it locally against ~/shared/demo/test-data/ first, because Saturn job iteration takes ~2 minutes per cycle (image pull, pip install, run), and a local test catches the mistakes that don’t need a cluster to find.
One bug surfaced locally: DVC creates a per-directory .gitignore when it tracks files, and the script wasn’t staging those. Fix was a one-liner, dvc config core.autostage true at init time, so DVC stages its own metadata. Caught in seconds because we ran the script directly before wrapping it as a Job.
Then Claude drafted a Saturn Job recipe and applied it via the plugin’s MCP, and the agent’s first run produced a small recipe-shape mismatch. Rather than guess from documentation, Claude called saturn_list_resources to read a working recipe from the live API and copied its shape. This is the part of the plugin that earns its keep: the MCP is wired to the live API, so when something doesn’t match a static doc, the agent can read ground truth and fix itself in seconds. Recipe applied on the next pass.
A small detail caught the next attempt: Saturn’s deployment runner runs command directly via exec rather than through a shell, which is faster and more predictable but means shell builtins like cd won’t work inline. The recipe schema has a dedicated working_directory field for exactly this. Setting it and dropping the cd did the job. Second run: 50 MB downloaded in 0.5s, 2.96M rows cleaned, DVC commit logged. v1 done.
The interesting thing is the cadence. From “describe the job” to “running on Saturn” was a few minutes including two course corrections, none of which required me to leave the conversation. Claude wrote the code, applied the recipe, read the logs, fixed the recipe, and reported back when it worked.
What the plugin already knows for you
Every Saturn-specific detail Claude encountered during this session went into the plugin’s SKILL.md. The next session starts with those already known, which is the whole point of the skill file: it’s where operational knowledge accumulates so each user doesn’t rediscover it.
A few examples of what’s in there now:
- The
exec-vs-shell distinction above, with theworking_directoryrecipe. - Deployment binding conventions (Saturn’s proxy targets container port 8000, services should bind
0.0.0.0:8000), so deployments work the first time. - MLflow 3.x flags for working behind the proxy (
--allowed-hosts '*' --cors-allowed-origins '*'). - The token scope rule for resources that call other Saturn services (
token_scope: nullfor an unscoped user-level token).
If you’re using the plugin, you don’t need to memorize any of this. The agent will apply the right pattern because it’s already in context.
Patterns that worked across the session
A few collaboration habits emerged that were worth keeping:
Test the script before wiring it as a Job. Saturn job iteration is ~2 minutes per cycle. Catching bugs locally first saves multiples of that. The DVC core.autostage config, the Feast/ingest path-convention alignment, and a leakage guard in the training notebook all got sorted out locally before any Saturn cycle was spent.
MCP introspection over guessing. When the plugin’s docs were wrong, the MCP could query the live API. Treating the live API as ground truth and working back from there was worth more than any doc, every time. This is a good rule for any MCP-fronted system: the documentation is a hint, the API is the source of truth.
Skill content as muscle memory. Every Saturn-specific gotcha went into SKILL.md. The next session doesn’t re-discover. Over time the skill becomes more accurate than the docs because it’s edited by the agent that just hit the failure mode.
Auto-generation for the parts that drift. Saturn’s API is defined by Pydantic models in saturn-api. Docs drift; generated docs don’t. We added a scripts/generate_recipe_reference.py that regenerates skills/saturn/recipe-reference.md from the canonical schema. Drift is now a one-command refresh.
Patches across working copies. Claude was running in a working copy without push permissions. Rather than copy-pasting code, we used git format-patch and git am to move commits between trees. Preserves the commit message, authorship, and Co-Authored-By trailer.
What we contributed back
The work fed back into the plugin itself:
- A new MCP tool
saturn_list_instance_types, a thin wrapper around/api/info/servers. Future sessions can enumerate valid instance types directly. - An auto-generated
skills/saturn/recipe-reference.md, produced from the Pydantic models insaturn-api. The schema doc now refreshes from the canonical source with a single command, so it stays in lockstep with the API. - Several new entries in
SKILL.mdcapturing the operational details of running services behind Saturn’s proxy (port conventions, MLflow flags, token scopes), so the agent applies them by default.
This is the part that compounds. Each session leaves the plugin a little smarter, and the next user gets the benefit without learning the same things.
How to try it yourself
The Saturn Cloud Claude Code plugin lets Claude read and write Saturn resources directly. To use it:
- Sign up for Saturn Cloud. Free tier available and enough to run this demo end-to-end.
- Install Claude Code and the saturn-cloud plugin. If you don’t have Claude Code yet, the Run Claude Code on a Cloud GPU in 10 Minutes guide walks through getting an autonomous-mode setup running. Add the saturn-cloud plugin per its README.
- Point Claude at a recipe directory and describe what you want. The plugin will create, apply, and debug Saturn resources in-loop with you. The lineage-stitching pattern from this post is a useful starting point if you’re building reproducible ML pipelines.
The full demo is on the demo/saturn-data-pipeline branch of saturncloud/examples. Six recipes, four Python files, one notebook. Reproducible from a clean Saturn account: apply each recipe, run the ingest jobs, run the materialize job, open the workspace and run the training notebook, start the two deployments. No infrastructure outside Saturn beyond GitHub for the source.
The thing I came away with after this session is that the Claude Code plus saturn-cloud plugin combination changes what “build a pipeline on Saturn” looks like. It becomes a single conversation where the agent writes the code, applies the recipe, watches the logs, and tells you when it works. Each session leaves the plugin a little smarter, so the next user gets the same workflow with a little less friction. That’s the loop worth showing up for.


