KELOS: THE KUBERNETES-NATIVE FRAMEWORK FOR AUTONOMOUS AI CODING AGENTS
You’re tired of manually invoking Claude Code every time you need to refactor a messy module. You’ve tried chaining together shell scripts and GitHub Actions workflows, but it feels fragile—there’s no visibility into what’s running, no clean way to retry failures, and your “automation” is really just a pile of glue code held together by duct tape.
What if you could treat AI coding agents like any other Kubernetes resource? Define them in YAML, apply them with kubectl, and let your cluster handle the orchestration, scaling, and observability?
That’s exactly what Kelos does.
Who Is This Guide For?
This article is for platform engineers, DevOps teams, and technical leads who are running Kubernetes clusters and want to automate repetitive development tasks. You should be comfortable with kubectl, understand basic Kubernetes concepts (Pods, Jobs, Custom Resources), and have used AI coding assistants like Claude Code or GitHub Copilot.
If you’re already running AI agents manually and want to scale that automation—read on.
By the End of This, You’ll Know
- How Kelos transforms AI coding agents into Kubernetes-native workloads
- The four core primitives that drive Kelos orchestration
- How to set up your first TaskSpawner to react to GitHub issues automatically
- Security considerations for running autonomous agents in your cluster
- How Kelos uses itself to develop itself—a working example of autonomous software development
The Problem: AI Agents Are Still Manual
Here’s what’s happened: AI coding agents have gotten incredibly capable. Claude Code can refactor entire modules, OpenAI Codex can write tests from scratch, and Google Gemini reasoning models can plan complex implementations. But we’re still invoking them manually—starting a session, pasting in context, waiting for output.
This works fine for one-off tasks. But when you want to:
- Automatically pick up GitHub issues and create PRs
- Run code review on every pull request
- Trigger documentation updates on code changes
- Continuously improve your codebase without human intervention
…you’re stuck building fragile automation on top of shell scripts and GitHub Actions. There’s no unified way to:
- Track agent runs as first-class resources
- Chain multiple agents together with dependencies
- Manage concurrency and cost controls
- Observe what’s happening across all your automations
Kelos solves this by treating AI agents as Kubernetes resources. Your agent workflows become declarative, version-controlled, and observable—exactly like the infrastructure you already manage.
How Kelos Works: Four Core Primitives
Kelos introduces four Custom Resource Definitions that model your agent workflows:
1. Tasks — Units of Work
A Task wraps a single agent execution. Think of it like a Kubernetes Job, but for AI agents:
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: fix-auth-bug-123
spec:
agentConfigRef:
name: claude-code-default
workspaceRef:
name: main-repo
command: "Fix the authentication token refresh bug in auth.go"
Every Task captures deterministic outputs in its status: branch name, commit SHA, PR URL, and token usage. You can watch progress with kubectl get tasks and stream logs with kelos logs.
2. Workspaces — Git Repository Environments
A Workspace defines where the agent operates—a git repo cloned into an ephemeral volume:
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: main-repo
spec:
git:
url: https://github.com/yourorg/main-repo
credentialsRef:
name: github-token
persistent: true # Keep between runs, or false for fresh clone each time
The workspace is mounted into the agent’s Pod. Multiple Tasks can share a Workspace for multi-step workflows.
3. AgentConfigs — Instructions and Tools
An AgentConfig bundles the prompt, plugins, skills, and MCP servers an agent can use:
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
name: claude-code-default
spec:
agent:
type: claude-code
model: opus
instructions: |
You are a senior backend engineer. Focus on:
- Writing testable code with clear abstractions
- Adding comprehensive test coverage
- Updating documentation when APIs change
mcpServers:
- name: github
config:
repository: yourorg/main-repo
env:
- name: CLAUDE_API_KEY
valueFrom:
secretKeyRef:
name: claude-credentials
key: api-key
This is where you encode your team’s conventions, coding standards, and available tools. AgentConfigs are reusable across Tasks.
4. TaskSpawners — Event-Driven Orchestration
This is where the magic happens. A TaskSpawner watches external triggers and automatically creates Tasks:
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: issue-worker
spec:
source:
githubIssues:
repository: yourorg/main-repo
actor: kelos # Only issues labeled by kelos
template:
agentConfigRef:
name: claude-code-default
workspaceRef:
name: main-repo
command: "Implement the feature described in {{.issue.title}}"
TaskSpawners support multiple trigger sources:
- GitHub Issues (filtered by labels, actors, or comments)
- GitHub Pull Requests (on open, update, or review comments)
- GitHub Webhooks (generic HTTP triggers)
- Cron schedules (time-based triggering)
- Generic webhooks (any HTTP POST)
This is the infrastructure-as-code approach to AI automation. You’re not writing scripts—you’re declaring workflows that run continuously.
Setting Up Your First Kelos Installation
Here’s how to get started on a Kubernetes cluster (or kind for local development):
# Install Kelos operator and CRDs using the CLI
kelos install
# Or install via Helm
helm upgrade --install kelos oci://ghcr.io/kelos-dev/charts/kelos \
-n kelos-system --create-namespace
Install the CLI for convenience:
# macOS
brew install kelos-dev/kelos/kelos
# Or download from releases
curl -LO https://github.com/kelos-dev/kelos/releases/latest/download/kelos-darwin-arm64
chmod +x kelos-darwin-arm64
sudo mv kelos-darwin-arm64 /usr/local/bin/kelos
Verify your installation:
kelos version
# v0.31.0 (latest as of May 2026)
kubectl get crd | grep kelos
# agentconfigs.kelos.dev
# tasks.kelos.dev
# taskspawners.kelos.dev
# workspaces.kelos.dev
A Practical Example: Auto-Response to GitHub Issues
Let’s build a TaskSpawner that picks up issues labeled “needs-implementation” and creates a PR with the fix.
First, create a Workspace:
# workspace.yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
name: my-app
spec:
git:
url: https://github.com/yourorg/my-app
credentialsRef:
name: github-credentials
persistent: true
---
apiVersion: v1
kind: Secret
metadata:
name: github-credentials
type: Opaque
stringData:
token: ghp_your_fine_grained_pat_here
Apply it:
kubectl apply -f workspace.yaml
Now create an AgentConfig with your instructions:
# agentconfig.yaml
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
name: claude-code-fix
spec:
agent:
type: claude-code
model: sonnet # Use sonnet for cost efficiency on routine fixes
instructions: |
You are an autonomous coding agent. Your task is to implement the feature
described in the GitHub issue.
Guidelines:
1. Read the issue description carefully
2. Explore the codebase to understand the relevant components
3. Implement the feature with test coverage
4. Ensure CI passes before marking complete
5. Create a PR with clear description and summary of changes
If you encounter blockers, create a PR with WIP status and describe
what needs manual intervention.
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic-credentials
key: api-key
And the TaskSpawner:
# taskspawner.yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: issue-implementer
spec:
source:
githubIssues:
repository: yourorg/my-app
labels:
- needs-implementation
excludeAuthors:
- kelos # Don't pick up issues created by Kelos itself
template:
agentConfigRef:
name: claude-code-fix
workspaceRef:
name: my-app
command: |
Implement the feature described in issue #{{.issue.number}}.
Title: {{.issue.title}}
Body: {{.issue.body}}
maxConcurrency: 3 # Limit parallel agent runs
maxTotalTasks: 50 # Cap monthly runs for budget control
branchSerialization: serialize # Prevent conflicts on same branch
Apply everything:
kubectl apply -f agentconfig.yaml
kubectl apply -f taskspawner.yaml
Now watch it work:
# List active tasks
kubectl get tasks
# Stream logs from a specific task
kelos logs fix-issue-123 -f
# Watch the TaskSpawner status
kubectl get taskspawner issue-implementer -o yaml
When someone labels an issue needs-implementation, Kelos automatically:
- Clones the repo into a fresh workspace
- Spins up a Pod with Claude Code
- Runs the implementation based on your instructions
- Creates a branch, commits the changes, and opens a PR
- Updates the task status with outputs (branch name, PR URL)
All without you typing a single command.
Security Considerations
This is the question everyone asks first: “You’re putting AI agents in my cluster with access to my repos?”
Here’s how Kelos addresses the concern:
Host Isolation
Every Task runs in its own ephemeral Pod. The agent has no access to:
- The host node
- Other Pods
- The Kubernetes API (unless you explicitly grant it)
- Your local machine
It’s running in a sandbox.
Scoped Credentials
You control exactly what the agent can access:
- Fine-grained GitHub PATs with limited repo scope
- Branch protection rules that prevent direct main branch commits
- Read-only access by default; explicitly enable write access per workspace
No Persistent Credentials in Pods
Kelos injects credentials at runtime from Kubernetes Secrets. They’re never stored in container images or logs.
Resource Limits
Set maxConcurrency and maxTotalTasks to prevent runaway agent behavior:
spec:
maxConcurrency: 5
maxTotalTasks: 100
activeDeadlineSeconds: 3600 # Kill tasks running over 1 hour
Audit Trail
Every Task is a Kubernetes resource with full history. You can see:
- Who triggered it (GitHub issue, cron, manual)
- What command ran
- What outputs were generated
- Timing and duration
This isn’t security through obscurity—it’s security through architecture.
The Meta-Example: Kelos Developing Kelos
The most compelling demo of Kelos’s capabilities is Kelos itself.
The Kelos team runs seven TaskSpawners 24/7 that autonomously develop the project:
| TaskSpawner | Trigger | Model | Description |
|---|---|---|---|
| kelos-workers | GitHub Issues (actor/kelos) | Opus | Picks up issues, creates PRs, self-reviews |
| kelos-pr-responder | PR review feedback | Opus | Re-engages on changes_requested |
| kelos-planner | /kelos plan comment | Opus | Investigates issues, posts implementation plans |
| kelos-triage | needs-triage label | Opus | Classifies issues by kind/priority |
| kelos-fake-user | Cron (daily 09:00 UTC) | Sonnet | Tests DX as a new user, files issues for problems |
| kelos-fake-strategist | Cron (every 12h) | Opus | Explores new use cases and improvements |
| kelos-self-update | Cron (daily 06:00 UTC) | Opus | Reviews and tunes prompts/configs |
Seven autonomous agents, each handling a different part of the development lifecycle. They triage issues, write code, respond to review feedback, test the developer experience, and even improve their own prompts.
This is a working example of the autonomous development loop. Not theoretical—already running in production.
What Makes This Different
Compare Kelos to alternatives:
| Feature | Kelos | GitHub Actions + AI | Agent Pipelines |
|---|---|---|---|
| First-class K8s resources | ✅ | ❌ | ❌ |
| Declarative YAML workflows | ✅ | Partial | ❌ |
| Task chaining with dependencies | ✅ | ❌ | ✅ |
| Event-driven (Issues, PR, Cron) | ✅ | ✅ | Partial |
| Ephemeral isolated pods | ✅ | ❌ | ❌ |
| Observable via kubectl | ✅ | ❌ | Partial |
| Multi-agent support | ✅ | Limited | ✅ |
| Self-development loop | ✅ | ❌ | ❌ |
The key insight: Kelos applies infrastructure-as-code principles to AI workflows. Your agent automation becomes declarative, version-controlled, and GitOps-ready—exactly like your Kubernetes configs.
Getting Started
Ready to try it? Here’s your path forward:
Spin up a local cluster:
kind create clustergives you a single-node Kubernetes to experiment withInstall Kelos:
kubectl apply -f https://get.kelos.devStart small: Create a Workspace and AgentConfig, then run a manual Task to see how it works
Add automation: Build a TaskSpawner that watches GitHub issues
Scale up: Add more TaskSpawners, chain agents together, enable self-development
The framework is production-ready (v0.31.0 with 27 releases), though the project acknowledges it’s early stage. Expect rough edges—but the core concept works.
The future of development isn’t AI replacing humans. It’s AI as infrastructure—autonomous, orchestrated, and managed like the reliable systems we already run on Kubernetes.
Related articles:
- Self-Hosted LLM Guide 2026 / — Running LLMs on your own infrastructure
- Kustomize vs Helm 2026 / — Declarative Kubernetes configuration
- PaaS First in 2026: When Kubernetes is Too Much / — When to use platform abstractions over raw Kubernetes