As AI teams juggle an ever-expanding patchwork of clouds, Kubernetes clusters, and on-prem Slurm deployments, an open-source project called SkyPilot is positioning itself as the single control plane that ties them all together. The latest addition to its arsenal is GPU Compass, a public dashboard that lets engineers compare GPU pricing and availability across 24 providers in one place.
One interface, every cloud
SkyPilot is an open-source system, written in Python, designed to run, manage, and scale AI workloads on any AI infrastructure. The pitch is straightforward: give AI teams a simple interface to launch jobs anywhere, while giving infrastructure teams a unified control plane with advanced scheduling, scaling, and orchestration baked in.
Under the hood, the project supports a sprawling list of backends: Kubernetes, Slurm, AWS, GCP, Azure, OCI, CoreWeave, Nebius, Lambda Cloud, RunPod, Fluidstack, Cudo, Digital Ocean, Paperspace, Cloudflare, IBM, Vast.ai, VMware vSphere, Crusoe, and more than a dozen others. Workloads are described once in a portable YAML or Python spec (resource requirements, setup commands, run commands), and SkyPilot handles the rest: finding the cheapest available infrastructure, provisioning GPUs with automatic failover when a region runs dry, syncing the local working directory, and streaming logs back to the user.
Crucially, everything is "bring your own cloud." Jobs launch inside the customer's own accounts, VPCs, and clusters, which has made SkyPilot palatable to enterprises wary of routing sensitive training data through a third-party SaaS layer.
Berkeley origins
The project comes out of UC Berkeley's Sky Computing Lab, which sits in a fairly remarkable lineage. It is the same group that successively housed AMPLab (which produced Apache Spark and Apache Mesos) and RISELab (which produced Ray, the distributed-computing framework now commercialised through Anyscale). SkyPilot was first released in November 2022 in a launch post co-authored by Zongheng Yang and Ion Stoica, the latter of whom is also a co-founder of Databricks and Anyscale. The current lead developer is Zhanghao Wu (known on GitHub as Michaelvll), a Ph.D. student in the lab.
The original framing was as an "intercloud broker": a unified interface that could automatically pick the cheapest cloud, region, and zone for a given ML or data-science workload, manage spot-instance recovery from preemptions, and auto-stop idle clusters. The launch post claimed roughly 3x cost savings from cross-region/cross-cloud selection alone, and 3 to 6x from managed spot. Early adopters included ML groups at Berkeley AI Research and Stanford, and the Salk Institute for Biological Studies, which reported 6.5x cost savings running recurring bioinformatics workloads on CPU spot instances.
The codebase is Python end to end, which lowers the contribution barrier for a research community already operating in Python and gives the project a faster iteration cycle than systems written closer to the metal. The "Sky Computing" naming connects to a Berkeley white paper that argues compute should be abstracted across providers in the same way the internet abstracts networks. SkyPilot is the most concrete artifact of that thesis.
GPU Compass: a price comparison engine for the GPU economy
The newly launched GPU Compass gives the broader community a window into the fragmented GPU market without requiring a SkyPilot install. As of its most recent catalog update on May 10, 2026, the dashboard tracks 24 providers, 2,901 distinct offerings, and 52 GPU models, with filters for GPU type and count.
For practitioners trying to figure out whether an H100 is cheaper this week on a hyperscaler or a neocloud, or where they can actually get eight of them on short notice, it is a notable consolidation of information that has historically required tab-hopping across half a dozen pricing pages.
Traction with serious AI shops
The project has been picking up high-profile adopters. A January case study revealed that Shopify runs all of its AI training workloads on SkyPilot, and a March writeup detailed how H Company used the platform to unlock online reinforcement learning and unify its AI platform.
Recent engineering blog posts have leaned into the agentic-AI moment as well. In April, the team described "research-driven agents" that read arXiv papers before writing code, claiming five landed kernel fusions in llama.cpp and a 15% speedup on flash attention in roughly three hours of compute for around $29. A March post on scaling Andrej Karpathy's "autoresearch" concept showed what happens when an autonomous research loop, normally run one experiment at a time, is given 16 GPUs and allowed to parallelise.
v0.12 leans into agents and batch inference
SkyPilot v0.12, released in March, added Slurm support, job groups for RL workloads, an "Agent Skill" for AI coding assistants, and pool autoscaling for batch inference, alongside what the team describes as 7x faster data mounting. The Agent Skill in particular reflects where the project sees the puck heading: developers can now point Claude Code, Codex, or other coding agents at a single install command and have them provision GPUs and manage jobs directly through SkyPilot.
Why it matters
The AI infrastructure market in 2026 is defined by two competing pressures: explosive demand for accelerators and a deeply fragmented supply, with new neoclouds appearing almost monthly. Tools that abstract over that fragmentation, surface real pricing, and let teams move workloads without rewrites are increasingly load-bearing for anyone training or serving models at scale.
SkyPilot is betting that the winning abstraction is not a new cloud, but a thin, open-source layer that sits on top of all of them. With GPU Compass, it is now also betting that transparency about what compute actually costs will pull more teams into that abstraction.