Kelos is an open-source Kubernetes-native framework for orchestrating autonomous AI coding agents. It treats AI agents like infrastructure—defined as declarative Kubernetes resources and managed through GitOps workflows.

What agents does Kelos support?

Kelos supports Claude Code, OpenAI Codex, Google Gemini, OpenCode, Cursor, and custom agent images through a standardized container interface.

Is it safe to run AI agents on Kubernetes?

Yes. Kelos runs each agent task in an isolated, ephemeral Pod with a freshly cloned git workspace. Agents have no access to the host machine. Use fine-grained PATs and branch protection to control repository access.

How does Kelos develop itself?

Kelos runs seven TaskSpawners 24/7 that autonomously handle issue triage, PR creation, code review, self-improvement, and documentation updates—demonstrating the framework's capability for autonomous development.

KELOS: THE KUBERNETES-NATIVE FRAMEWORK FOR AUTONOMOUS AI CODING AGENTS

2/5/2026
9-minute read
1772 words

You’re tired of manually invoking Claude Code every time you need to refactor a messy module. You’ve tried chaining together shell scripts and GitHub Actions workflows, but it feels fragile—there’s no visibility into what’s running, no clean way to retry failures, and your “automation” is really just a pile of glue code held together by duct tape.

What if you could treat AI coding agents like any other Kubernetes resource? Define them in YAML, apply them with kubectl, and let your cluster handle the orchestration, scaling, and observability?

That’s exactly what Kelos does.

Who Is This Guide For?

This article is for platform engineers, DevOps teams, and technical leads who are running Kubernetes clusters and want to automate repetitive development tasks. You should be comfortable with kubectl, understand basic Kubernetes concepts (Pods, Jobs, Custom Resources), and have used AI coding assistants like Claude Code or GitHub Copilot.

If you’re already running AI agents manually and want to scale that automation—read on.

By the End of This, You’ll Know

How Kelos transforms AI coding agents into Kubernetes-native workloads
The four core primitives that drive Kelos orchestration
How to set up your first TaskSpawner to react to GitHub issues automatically
Security considerations for running autonomous agents in your cluster
How Kelos uses itself to develop itself—a working example of autonomous software development

The Problem: AI Agents Are Still Manual

Here’s what’s happened: AI coding agents have gotten incredibly capable. Claude Code can refactor entire modules, OpenAI Codex can write tests from scratch, and Google Gemini reasoning models can plan complex implementations. But we’re still invoking them manually—starting a session, pasting in context, waiting for output.

This works fine for one-off tasks. But when you want to:

Automatically pick up GitHub issues and create PRs
Run code review on every pull request
Trigger documentation updates on code changes
Continuously improve your codebase without human intervention

…you’re stuck building fragile automation on top of shell scripts and GitHub Actions. There’s no unified way to:

Track agent runs as first-class resources
Chain multiple agents together with dependencies
Manage concurrency and cost controls
Observe what’s happening across all your automations

Kelos solves this by treating AI agents as Kubernetes resources. Your agent workflows become declarative, version-controlled, and observable—exactly like the infrastructure you already manage.

How Kelos Works: Four Core Primitives

Kelos introduces four Custom Resource Definitions that model your agent workflows:

1. Tasks — Units of Work

A Task wraps a single agent execution. Think of it like a Kubernetes Job, but for AI agents:

apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: fix-auth-bug-123
spec:
  agentConfigRef:
    name: claude-code-default
  workspaceRef:
    name: main-repo
  command: "Fix the authentication token refresh bug in auth.go"

Every Task captures deterministic outputs in its status: branch name, commit SHA, PR URL, and token usage. You can watch progress with kubectl get tasks and stream logs with kelos logs.

2. Workspaces — Git Repository Environments

A Workspace defines where the agent operates—a git repo cloned into an ephemeral volume:

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: main-repo
spec:
  git:
    url: https://github.com/yourorg/main-repo
    credentialsRef:
      name: github-token
  persistent: true  # Keep between runs, or false for fresh clone each time

The workspace is mounted into the agent’s Pod. Multiple Tasks can share a Workspace for multi-step workflows.

3. AgentConfigs — Instructions and Tools

An AgentConfig bundles the prompt, plugins, skills, and MCP servers an agent can use:

apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
  name: claude-code-default
spec:
  agent:
    type: claude-code
    model: opus
  instructions: |
    You are a senior backend engineer. Focus on:
    - Writing testable code with clear abstractions
    - Adding comprehensive test coverage
    - Updating documentation when APIs change
  mcpServers:
    - name: github
      config:
        repository: yourorg/main-repo
  env:
    - name: CLAUDE_API_KEY
      valueFrom:
        secretKeyRef:
          name: claude-credentials
          key: api-key

This is where you encode your team’s conventions, coding standards, and available tools. AgentConfigs are reusable across Tasks.

4. TaskSpawners — Event-Driven Orchestration

This is where the magic happens. A TaskSpawner watches external triggers and automatically creates Tasks:

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-worker
spec:
  source:
    githubIssues:
      repository: yourorg/main-repo
      actor: kelos  # Only issues labeled by kelos
  template:
    agentConfigRef:
      name: claude-code-default
    workspaceRef:
      name: main-repo
    command: "Implement the feature described in {{.issue.title}}"

TaskSpawners support multiple trigger sources:

GitHub Issues (filtered by labels, actors, or comments)
GitHub Pull Requests (on open, update, or review comments)
GitHub Webhooks (generic HTTP triggers)
Cron schedules (time-based triggering)
Generic webhooks (any HTTP POST)

This is the infrastructure-as-code approach to AI automation. You’re not writing scripts—you’re declaring workflows that run continuously.

Setting Up Your First Kelos Installation

Here’s how to get started on a Kubernetes cluster (or kind for local development):

# Install Kelos operator and CRDs using the CLI
kelos install

# Or install via Helm
helm upgrade --install kelos oci://ghcr.io/kelos-dev/charts/kelos \
  -n kelos-system --create-namespace

Install the CLI for convenience:

# macOS
brew install kelos-dev/kelos/kelos

# Or download from releases
curl -LO https://github.com/kelos-dev/kelos/releases/latest/download/kelos-darwin-arm64
chmod +x kelos-darwin-arm64
sudo mv kelos-darwin-arm64 /usr/local/bin/kelos

Verify your installation:

kelos version
# v0.31.0 (latest as of May 2026)

kubectl get crd | grep kelos
# agentconfigs.kelos.dev
# tasks.kelos.dev
# taskspawners.kelos.dev
# workspaces.kelos.dev

A Practical Example: Auto-Response to GitHub Issues

Let’s build a TaskSpawner that picks up issues labeled “needs-implementation” and creates a PR with the fix.

First, create a Workspace:

# workspace.yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app
spec:
  git:
    url: https://github.com/yourorg/my-app
    credentialsRef:
      name: github-credentials
  persistent: true
---
apiVersion: v1
kind: Secret
metadata:
  name: github-credentials
type: Opaque
stringData:
  token: ghp_your_fine_grained_pat_here

Apply it:

kubectl apply -f workspace.yaml

Now create an AgentConfig with your instructions:

# agentconfig.yaml
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
  name: claude-code-fix
spec:
  agent:
    type: claude-code
    model: sonnet  # Use sonnet for cost efficiency on routine fixes
  instructions: |
    You are an autonomous coding agent. Your task is to implement the feature
    described in the GitHub issue.

    Guidelines:
    1. Read the issue description carefully
    2. Explore the codebase to understand the relevant components
    3. Implement the feature with test coverage
    4. Ensure CI passes before marking complete
    5. Create a PR with clear description and summary of changes

    If you encounter blockers, create a PR with WIP status and describe
    what needs manual intervention.
  env:
    - name: ANTHROPIC_API_KEY
      valueFrom:
        secretKeyRef:
          name: anthropic-credentials
          key: api-key

And the TaskSpawner:

# taskspawner.yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-implementer
spec:
  source:
    githubIssues:
      repository: yourorg/my-app
      labels:
        - needs-implementation
      excludeAuthors:
        - kelos  # Don't pick up issues created by Kelos itself
  template:
    agentConfigRef:
      name: claude-code-fix
    workspaceRef:
      name: my-app
    command: |
      Implement the feature described in issue #{{.issue.number}}.
      Title: {{.issue.title}}
      Body: {{.issue.body}}
  maxConcurrency: 3  # Limit parallel agent runs
  maxTotalTasks: 50  # Cap monthly runs for budget control
  branchSerialization: serialize  # Prevent conflicts on same branch

Apply everything:

kubectl apply -f agentconfig.yaml
kubectl apply -f taskspawner.yaml

Now watch it work:

# List active tasks
kubectl get tasks

# Stream logs from a specific task
kelos logs fix-issue-123 -f

# Watch the TaskSpawner status
kubectl get taskspawner issue-implementer -o yaml

When someone labels an issue needs-implementation, Kelos automatically:

Clones the repo into a fresh workspace
Spins up a Pod with Claude Code
Runs the implementation based on your instructions
Creates a branch, commits the changes, and opens a PR
Updates the task status with outputs (branch name, PR URL)

All without you typing a single command.

Security Considerations

This is the question everyone asks first: “You’re putting AI agents in my cluster with access to my repos?”

Here’s how Kelos addresses the concern:

Host Isolation

Every Task runs in its own ephemeral Pod. The agent has no access to:

The host node
Other Pods
The Kubernetes API (unless you explicitly grant it)
Your local machine

It’s running in a sandbox.

Scoped Credentials

You control exactly what the agent can access:

Fine-grained GitHub PATs with limited repo scope
Branch protection rules that prevent direct main branch commits
Read-only access by default; explicitly enable write access per workspace

No Persistent Credentials in Pods

Kelos injects credentials at runtime from Kubernetes Secrets. They’re never stored in container images or logs.

Resource Limits

Set maxConcurrency and maxTotalTasks to prevent runaway agent behavior:

spec:
  maxConcurrency: 5
  maxTotalTasks: 100
  activeDeadlineSeconds: 3600  # Kill tasks running over 1 hour

Audit Trail

Every Task is a Kubernetes resource with full history. You can see:

Who triggered it (GitHub issue, cron, manual)
What command ran
What outputs were generated
Timing and duration

This isn’t security through obscurity—it’s security through architecture.

The Meta-Example: Kelos Developing Kelos

The most compelling demo of Kelos’s capabilities is Kelos itself.

The Kelos team runs seven TaskSpawners 24/7 that autonomously develop the project:

TaskSpawner	Trigger	Model	Description
kelos-workers	GitHub Issues (`actor/kelos`)	Opus	Picks up issues, creates PRs, self-reviews
kelos-pr-responder	PR review feedback	Opus	Re-engages on changes_requested
kelos-planner	`/kelos plan` comment	Opus	Investigates issues, posts implementation plans
kelos-triage	`needs-triage` label	Opus	Classifies issues by kind/priority
kelos-fake-user	Cron (daily 09:00 UTC)	Sonnet	Tests DX as a new user, files issues for problems
kelos-fake-strategist	Cron (every 12h)	Opus	Explores new use cases and improvements
kelos-self-update	Cron (daily 06:00 UTC)	Opus	Reviews and tunes prompts/configs

Seven autonomous agents, each handling a different part of the development lifecycle. They triage issues, write code, respond to review feedback, test the developer experience, and even improve their own prompts.

This is a working example of the autonomous development loop. Not theoretical—already running in production.

What Makes This Different

Compare Kelos to alternatives:

Feature	Kelos	GitHub Actions + AI	Agent Pipelines
First-class K8s resources	✅	❌	❌
Declarative YAML workflows	✅	Partial	❌
Task chaining with dependencies	✅	❌	✅
Event-driven (Issues, PR, Cron)	✅	✅	Partial
Ephemeral isolated pods	✅	❌	❌
Observable via kubectl	✅	❌	Partial
Multi-agent support	✅	Limited	✅
Self-development loop	✅	❌	❌

The key insight: Kelos applies infrastructure-as-code principles to AI workflows. Your agent automation becomes declarative, version-controlled, and GitOps-ready—exactly like your Kubernetes configs.

Getting Started

Ready to try it? Here’s your path forward:

Spin up a local cluster: kind create cluster gives you a single-node Kubernetes to experiment with
Install Kelos: kubectl apply -f https://get.kelos.dev
Start small: Create a Workspace and AgentConfig, then run a manual Task to see how it works
Add automation: Build a TaskSpawner that watches GitHub issues
Scale up: Add more TaskSpawners, chain agents together, enable self-development

The framework is production-ready (v0.31.0 with 27 releases), though the project acknowledges it’s early stage. Expect rough edges—but the core concept works.

The future of development isn’t AI replacing humans. It’s AI as infrastructure—autonomous, orchestrated, and managed like the reliable systems we already run on Kubernetes.

Related articles:

Self-Hosted LLM Guide 2026 / — Running LLMs on your own infrastructure
Kustomize vs Helm 2026 / — Declarative Kubernetes configuration
PaaS First in 2026: When Kubernetes is Too Much / — When to use platform abstractions over raw Kubernetes

kubernetes ai-agents devops platform-engineering