KELOS: THE KUBERNETES-NATIVE FRAMEWORK FOR AUTONOMOUS AI CODING AGENTS

You’re tired of manually invoking Claude Code every time you need to refactor a messy module. You’ve tried chaining together shell scripts and GitHub Actions workflows, but it feels fragile—there’s no visibility into what’s running, no clean way to retry failures, and your “automation” is really just a pile of glue code held together by duct tape.

What if you could treat AI coding agents like any other Kubernetes resource? Define them in YAML, apply them with kubectl, and let your cluster handle the orchestration, scaling, and observability?

That’s exactly what Kelos does.

Who Is This Guide For?

This article is for platform engineers, DevOps teams, and technical leads who are running Kubernetes clusters and want to automate repetitive development tasks. You should be comfortable with kubectl, understand basic Kubernetes concepts (Pods, Jobs, Custom Resources), and have used AI coding assistants like Claude Code or GitHub Copilot.

If you’re already running AI agents manually and want to scale that automation—read on.

By the End of This, You’ll Know

  • How Kelos transforms AI coding agents into Kubernetes-native workloads
  • The four core primitives that drive Kelos orchestration
  • How to set up your first TaskSpawner to react to GitHub issues automatically
  • Security considerations for running autonomous agents in your cluster
  • How Kelos uses itself to develop itself—a working example of autonomous software development

The Problem: AI Agents Are Still Manual

Here’s what’s happened: AI coding agents have gotten incredibly capable. Claude Code can refactor entire modules, OpenAI Codex can write tests from scratch, and Google Gemini reasoning models can plan complex implementations. But we’re still invoking them manually—starting a session, pasting in context, waiting for output.

This works fine for one-off tasks. But when you want to:

  • Automatically pick up GitHub issues and create PRs
  • Run code review on every pull request
  • Trigger documentation updates on code changes
  • Continuously improve your codebase without human intervention

…you’re stuck building fragile automation on top of shell scripts and GitHub Actions. There’s no unified way to:

  • Track agent runs as first-class resources
  • Chain multiple agents together with dependencies
  • Manage concurrency and cost controls
  • Observe what’s happening across all your automations

Kelos solves this by treating AI agents as Kubernetes resources. Your agent workflows become declarative, version-controlled, and observable—exactly like the infrastructure you already manage.

How Kelos Works: Four Core Primitives

Kelos introduces four Custom Resource Definitions that model your agent workflows:

1. Tasks — Units of Work

A Task wraps a single agent execution. Think of it like a Kubernetes Job, but for AI agents:

apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: fix-auth-bug-123
spec:
  agentConfigRef:
    name: claude-code-default
  workspaceRef:
    name: main-repo
  command: "Fix the authentication token refresh bug in auth.go"

Every Task captures deterministic outputs in its status: branch name, commit SHA, PR URL, and token usage. You can watch progress with kubectl get tasks and stream logs with kelos logs.

2. Workspaces — Git Repository Environments

A Workspace defines where the agent operates—a git repo cloned into an ephemeral volume:

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: main-repo
spec:
  git:
    url: https://github.com/yourorg/main-repo
    credentialsRef:
      name: github-token
  persistent: true  # Keep between runs, or false for fresh clone each time

The workspace is mounted into the agent’s Pod. Multiple Tasks can share a Workspace for multi-step workflows.

3. AgentConfigs — Instructions and Tools

An AgentConfig bundles the prompt, plugins, skills, and MCP servers an agent can use:

apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
  name: claude-code-default
spec:
  agent:
    type: claude-code
    model: opus
  instructions: |
    You are a senior backend engineer. Focus on:
    - Writing testable code with clear abstractions
    - Adding comprehensive test coverage
    - Updating documentation when APIs change
  mcpServers:
    - name: github
      config:
        repository: yourorg/main-repo
  env:
    - name: CLAUDE_API_KEY
      valueFrom:
        secretKeyRef:
          name: claude-credentials
          key: api-key

This is where you encode your team’s conventions, coding standards, and available tools. AgentConfigs are reusable across Tasks.

4. TaskSpawners — Event-Driven Orchestration

This is where the magic happens. A TaskSpawner watches external triggers and automatically creates Tasks:

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-worker
spec:
  source:
    githubIssues:
      repository: yourorg/main-repo
      actor: kelos  # Only issues labeled by kelos
  template:
    agentConfigRef:
      name: claude-code-default
    workspaceRef:
      name: main-repo
    command: "Implement the feature described in {{.issue.title}}"

TaskSpawners support multiple trigger sources:

  • GitHub Issues (filtered by labels, actors, or comments)
  • GitHub Pull Requests (on open, update, or review comments)
  • GitHub Webhooks (generic HTTP triggers)
  • Cron schedules (time-based triggering)
  • Generic webhooks (any HTTP POST)

This is the infrastructure-as-code approach to AI automation. You’re not writing scripts—you’re declaring workflows that run continuously.

Setting Up Your First Kelos Installation

Here’s how to get started on a Kubernetes cluster (or kind for local development):

# Install Kelos operator and CRDs using the CLI
kelos install

# Or install via Helm
helm upgrade --install kelos oci://ghcr.io/kelos-dev/charts/kelos \
  -n kelos-system --create-namespace

Install the CLI for convenience:

# macOS
brew install kelos-dev/kelos/kelos

# Or download from releases
curl -LO https://github.com/kelos-dev/kelos/releases/latest/download/kelos-darwin-arm64
chmod +x kelos-darwin-arm64
sudo mv kelos-darwin-arm64 /usr/local/bin/kelos

Verify your installation:

kelos version
# v0.31.0 (latest as of May 2026)

kubectl get crd | grep kelos
# agentconfigs.kelos.dev
# tasks.kelos.dev
# taskspawners.kelos.dev
# workspaces.kelos.dev

A Practical Example: Auto-Response to GitHub Issues

Let’s build a TaskSpawner that picks up issues labeled “needs-implementation” and creates a PR with the fix.

First, create a Workspace:

# workspace.yaml
apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app
spec:
  git:
    url: https://github.com/yourorg/my-app
    credentialsRef:
      name: github-credentials
  persistent: true
---
apiVersion: v1
kind: Secret
metadata:
  name: github-credentials
type: Opaque
stringData:
  token: ghp_your_fine_grained_pat_here

Apply it:

kubectl apply -f workspace.yaml

Now create an AgentConfig with your instructions:

# agentconfig.yaml
apiVersion: kelos.dev/v1alpha1
kind: AgentConfig
metadata:
  name: claude-code-fix
spec:
  agent:
    type: claude-code
    model: sonnet  # Use sonnet for cost efficiency on routine fixes
  instructions: |
    You are an autonomous coding agent. Your task is to implement the feature
    described in the GitHub issue.

    Guidelines:
    1. Read the issue description carefully
    2. Explore the codebase to understand the relevant components
    3. Implement the feature with test coverage
    4. Ensure CI passes before marking complete
    5. Create a PR with clear description and summary of changes

    If you encounter blockers, create a PR with WIP status and describe
    what needs manual intervention.
  env:
    - name: ANTHROPIC_API_KEY
      valueFrom:
        secretKeyRef:
          name: anthropic-credentials
          key: api-key

And the TaskSpawner:

# taskspawner.yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: issue-implementer
spec:
  source:
    githubIssues:
      repository: yourorg/my-app
      labels:
        - needs-implementation
      excludeAuthors:
        - kelos  # Don't pick up issues created by Kelos itself
  template:
    agentConfigRef:
      name: claude-code-fix
    workspaceRef:
      name: my-app
    command: |
      Implement the feature described in issue #{{.issue.number}}.
      Title: {{.issue.title}}
      Body: {{.issue.body}}
  maxConcurrency: 3  # Limit parallel agent runs
  maxTotalTasks: 50  # Cap monthly runs for budget control
  branchSerialization: serialize  # Prevent conflicts on same branch

Apply everything:

kubectl apply -f agentconfig.yaml
kubectl apply -f taskspawner.yaml

Now watch it work:

# List active tasks
kubectl get tasks

# Stream logs from a specific task
kelos logs fix-issue-123 -f

# Watch the TaskSpawner status
kubectl get taskspawner issue-implementer -o yaml

When someone labels an issue needs-implementation, Kelos automatically:

  1. Clones the repo into a fresh workspace
  2. Spins up a Pod with Claude Code
  3. Runs the implementation based on your instructions
  4. Creates a branch, commits the changes, and opens a PR
  5. Updates the task status with outputs (branch name, PR URL)

All without you typing a single command.

Security Considerations

This is the question everyone asks first: “You’re putting AI agents in my cluster with access to my repos?”

Here’s how Kelos addresses the concern:

Host Isolation

Every Task runs in its own ephemeral Pod. The agent has no access to:

  • The host node
  • Other Pods
  • The Kubernetes API (unless you explicitly grant it)
  • Your local machine

It’s running in a sandbox.

Scoped Credentials

You control exactly what the agent can access:

  • Fine-grained GitHub PATs with limited repo scope
  • Branch protection rules that prevent direct main branch commits
  • Read-only access by default; explicitly enable write access per workspace

No Persistent Credentials in Pods

Kelos injects credentials at runtime from Kubernetes Secrets. They’re never stored in container images or logs.

Resource Limits

Set maxConcurrency and maxTotalTasks to prevent runaway agent behavior:

spec:
  maxConcurrency: 5
  maxTotalTasks: 100
  activeDeadlineSeconds: 3600  # Kill tasks running over 1 hour

Audit Trail

Every Task is a Kubernetes resource with full history. You can see:

  • Who triggered it (GitHub issue, cron, manual)
  • What command ran
  • What outputs were generated
  • Timing and duration

This isn’t security through obscurity—it’s security through architecture.

The Meta-Example: Kelos Developing Kelos

The most compelling demo of Kelos’s capabilities is Kelos itself.

The Kelos team runs seven TaskSpawners 24/7 that autonomously develop the project:

TaskSpawnerTriggerModelDescription
kelos-workersGitHub Issues (actor/kelos)OpusPicks up issues, creates PRs, self-reviews
kelos-pr-responderPR review feedbackOpusRe-engages on changes_requested
kelos-planner/kelos plan commentOpusInvestigates issues, posts implementation plans
kelos-triageneeds-triage labelOpusClassifies issues by kind/priority
kelos-fake-userCron (daily 09:00 UTC)SonnetTests DX as a new user, files issues for problems
kelos-fake-strategistCron (every 12h)OpusExplores new use cases and improvements
kelos-self-updateCron (daily 06:00 UTC)OpusReviews and tunes prompts/configs

Seven autonomous agents, each handling a different part of the development lifecycle. They triage issues, write code, respond to review feedback, test the developer experience, and even improve their own prompts.

This is a working example of the autonomous development loop. Not theoretical—already running in production.

What Makes This Different

Compare Kelos to alternatives:

FeatureKelosGitHub Actions + AIAgent Pipelines
First-class K8s resources
Declarative YAML workflowsPartial
Task chaining with dependencies
Event-driven (Issues, PR, Cron)Partial
Ephemeral isolated pods
Observable via kubectlPartial
Multi-agent supportLimited
Self-development loop

The key insight: Kelos applies infrastructure-as-code principles to AI workflows. Your agent automation becomes declarative, version-controlled, and GitOps-ready—exactly like your Kubernetes configs.

Getting Started

Ready to try it? Here’s your path forward:

  1. Spin up a local cluster: kind create cluster gives you a single-node Kubernetes to experiment with

  2. Install Kelos: kubectl apply -f https://get.kelos.dev

  3. Start small: Create a Workspace and AgentConfig, then run a manual Task to see how it works

  4. Add automation: Build a TaskSpawner that watches GitHub issues

  5. Scale up: Add more TaskSpawners, chain agents together, enable self-development

The framework is production-ready (v0.31.0 with 27 releases), though the project acknowledges it’s early stage. Expect rough edges—but the core concept works.

The future of development isn’t AI replacing humans. It’s AI as infrastructure—autonomous, orchestrated, and managed like the reliable systems we already run on Kubernetes.


Related articles: