PROMETHEUS STORAGE SCALING 2026: THANOS VS CORTEX VS MIMIR — REAL COSTS
If you’re running Prometheus beyond a handful of clusters, you’ve hit the wall. The built-in TSDB tops out around 10 million active series before things get uncomfortable, and local storage means your retention window is whatever disk you can afford. The moment you need months of historical data or a global query view across regions, you’re shopping for an external storage layer.
That’s where Thanos, Cortex, and Mimir come in. All three extend Prometheus with long-term retention, horizontal scaling, and high availability. But they take fundamentally different approaches, and the cost difference between them is not academic — it’s the difference between a $9,000 annual bill and a $53,000 one.
I’ve spent time deploying and operating all three, and the gap between marketing claims and production reality is wide enough to drive a truck through. Let me walk you through what each one actually costs, where the hidden expenses live, and how to pick the right one without learning it the hard way.
Who Is This Guide For?
This is for platform engineers and SREs who are scaling Prometheus beyond a single cluster and need to make a storage decision with real numbers. You’re probably evaluating Thanos because it’s the most well-known, or Mimir because Grafana is pushing it hard, or you inherited a Cortex deployment and wonder if it’s time to migrate.
By the End of This, You’ll Know
You’ll understand the real cost structure of each solution — not just infrastructure bills but the engineering time, compaction overhead, and operational tax that turn “free open source” into expensive overhead. You’ll have a decision framework you can apply to your own workload, and you’ll know which pitfalls to avoid before they cost you a weekend.
The Architecture Tax
The first thing to understand is that these three solutions don’t just differ in features. They differ in how many moving parts you’re signing up to operate at 2 a.m.
Thanos extends your existing Prometheus deployment with a sidecar pattern. You keep your Prometheus instances exactly as they are, add a Thanos sidecar container that uploads TSDB blocks to object storage, and deploy a handful of additional components — a Querier for unified PromQL access, a Store Gateway for historical data, and a Compactor for downsampling and retention. The sidecar adds roughly 10 percent CPU overhead to each Prometheus instance, but you don’t need to rewrite your scrape configs or change your collection model. It’s the gentlest migration path available.
Cortex takes the opposite approach. Instead of extending Prometheus, it replaces the storage layer entirely. Prometheus remote-writes metrics into Cortex, which ingests them through a distributed microservice mesh — distributors route data, ingesters buffer and store it, queriers execute PromQL, and a compactor merges blocks in object storage. The appeal is that you no longer need Prometheus servers serving recent data; Cortex handles everything. The catch is that you’ve just adopted a distributed system with seven component types, each with its own scaling profile and failure modes.
Mimir is Cortex’s successor. Grafana forked Cortex under the AGPLv3 license and has been iterating aggressively ever since. The latest Mimir 3.0 release, from November 2025, introduced a decoupled architecture that uses Apache Kafka as an asynchronous buffer between ingestion and query paths. Before this change, the ingester handled both reads and writes, meaning heavy query loads could starve ingestion. The Kafka layer lets each path scale independently. If you’re starting fresh in 2026, Mimir is the Cortex answer.
VictoriaMetrics deserves a mention here even though it’s not part of the original question. It’s a from-scratch time series database that accepts Prometheus remote-write, uses its own columnar storage format, and achieves compression ratios that make the block-based solutions look wasteful. In benchmarks, it uses five times less memory and 1.7 times less CPU than Mimir for the same workload. The trade-off is that it stores data on block storage rather than object storage, which changes your cost model and your disaster recovery strategy.
Current Versions and Ecosystem Health
Before we talk money, let’s establish where each project stands today.
Thanos is at version 0.40.1 as of late 2025, releasing on a six-week cadence. It’s a CNCF Incubating project with an active maintainer team and a large contributor base. The project is healthy and shows no signs of slowing down.
Grafana Mimir’s latest stable line is 3.0.x, with the 3.0 release in November 2025 introducing the Kafka-based decoupled architecture. The 2.17.x LTS line continues to receive bug fixes. Mimir is backed by Grafana Labs and powers Grafana Cloud Metrics, so it has institutional support that most open source projects can only dream about.
Cortex is on version 1.18.x but development has effectively stalled. The Cortex maintainers themselves have migrated to Mimir. The project is in maintenance mode, and I would not recommend it for any new deployment. If you’re running Cortex today, plan a migration to Mimir — Grafana provides a migration path that lets Thanos query Mimir during the transition, so you can move incrementally.
VictoriaMetrics is at version 1.136.x as of March 2026, with an LTS line at 1.136.x supported for 12 months. The project is commercially backed and has a growing enterprise customer base.
Retention and Compaction: Where Costs Actually Live
This is the part most articles gloss over, and it’s where budgets blow up.
Prometheus stores data in TSDB blocks — two-hour chunks of time-series data. Left alone, Prometheus keeps these blocks locally until its retention window expires, then deletes them. When you add Thanos, Cortex, or Mimir, you’re adding a compaction layer that does three things: it merges small blocks into larger ones for query efficiency, it creates downsampled copies of your data at lower resolutions, and it enforces retention policies by deleting old blocks.
The compactor is where things get interesting — and expensive.
Thanos Compaction Model
Thanos compacts data in object storage. It reads blocks from S3 or GCS, merges them, writes the result back, and optionally creates downsampled versions at 5-minute and 1-hour resolutions. The compactor needs local scratch space — about 100 GB is recommended — because it downloads blocks, processes them, and re-uploads. If your compactor runs out of scratch space, it crash-loops, and the first thing to try is deleting the local data directory.
Here’s the retention configuration that most teams use:
# Thanos compactor retention configuration
args:
- compact
- --objstore.config-file=/etc/thanos/objstore.yml
- --retention.resolution-raw=30d
- --retention.resolution-5m=90d
- --retention.resolution-1h=365d
- --downsampling.disable=false
- --wait
This keeps raw 30-second data for 30 days, 5-minute downsampled data for 90 days, and 1-hour downsampled data for a year. After 30 days, raw data is deleted and only the 5-minute resolution remains. After 90 days, only the 1-hour resolution remains.
There’s a critical trap here that’s cost people their data. The Thanos documentation recommends aligning all three retention durations, but it presents this as a recommendation rather than a requirement. If your raw retention is set to one year but your 5-minute downsampling retention is set to six months, the compactor may delete raw blocks before a downsampled version has been successfully created. The downsampling process can “succeed” with warnings — logging empty chunks happened, skip series — and produce no output. Your raw data is gone, and the downsampled copy was never created.
Always set retention.resolution-raw, retention.resolution-5m, and retention.resolution-1h to the same value, or ensure that each level’s retention is strictly longer than the previous. Treat this as a hard requirement, not a suggestion.
Mimir Compaction Model
Mimir’s compactor works similarly but operates on a per-tenant basis. It merges blocks, keeps the bucket index updated for queriers and store-gateways, and enforces retention. The retention is configured differently:
# Mimir retention configuration
limits:
compactor_blocks_retention_period: 1y
By default, Mimir never deletes data from object storage. You must explicitly configure the retention period, or your storage costs grow without bound. Mimir compresses samples to approximately 1.3 bytes per sample after compression — raw would be 16 bytes — which is competitive with Thanos but not as aggressive as VictoriaMetrics.
Mimir’s compactor also runs compaction jobs at multiple intervals: 2 hours, 12 hours, and 24 hours. This means you need to run benchmarks for multiple days to see the full compression picture, since not all compaction cycles complete in a short test window.
VictoriaMetrics Compression
VictoriaMetrics takes a different approach entirely. Instead of block-based compaction, it uses a proprietary columnar format with aggressive compression that achieves up to 10x better storage efficiency than vanilla Prometheus. It doesn’t downsample in the traditional Thanos sense — it stores all data at full resolution and relies on compression to keep costs manageable. For teams that need full-resolution historical data without the complexity of managing multiple retention tiers, this is a significant advantage.
The Real Cost Numbers
Let’s put actual numbers on these solutions. I’ll model three scenarios: a modest 100-service deployment ingesting roughly 100 GB of metrics per day, a larger deployment at 500 services, and a managed service comparison.
Scenario 1: 100 Services, 100 GB/Day Ingestion
Thanos (self-hosted):
Object Storage (S3): 100 GB/day × 365 × $0.023/GB = $840/year
Compute (3x m6i.xlarge for compactor/store-gateway/querier):
$0.192/hr × 8,760 hrs × 3 = $5,046/year
Prometheus servers (existing): already budgeted
Networking (data transfer out of S3 for queries): ~$500/year
Engineering time (0.1 FTE for operations): $15,000/year
Total: ~$21,400/year
The compute cost is lower than you might expect because Thanos reuses your existing Prometheus instances for real-time data. You only need additional compute for the query layer and compactor. The engineering time is the real cost — someone needs to manage compactor health, handle block upload failures, and troubleshoot query performance when object storage latency spikes.
Mimir (self-hosted):
Object Storage (S3): similar to Thanos = $840/year
Compute (5 ingesters + 3 distributors + 2 queriers + compactor + caches):
~15 vCPUs, 60 GB RAM across multiple pods = $8,000-10,000/year
Kafka (Mimir 3.0 requirement): 3 brokers, minimal = $2,000/year
Engineering time (0.15 FTE for microservice operations): $22,500/year
Total: ~$33,000-35,000/year
Mimir costs more in compute because of its microservice architecture. The Kafka layer in Mimir 3.0 adds another component to operate. The engineering overhead is higher because you’re managing more moving parts, but the payoff is better multi-tenancy, query sharding, and the ability to scale ingestion and query paths independently.
Cortex (self-hosted):
Object Storage (S3): $840/year
Compute (5+ microservices, similar to Mimir pre-Kafka): $7,000-9,000/year
Engineering time (0.25 FTE — Cortex is harder to operate): $37,500/year
Total: ~$45,000-47,000/year
Cortex is the most expensive option when you factor in engineering time. The microservice mesh is harder to tune than Mimir, the documentation is thinner, and the project is no longer actively developed. There’s no reason to choose Cortex over Mimir in 2026.
Grafana Cloud Mimir (managed):
Platform fee: $19/month × 12 = $228/year
Metrics: $6.50 per 1k active series/month
Estimate: 500k active series = $3,250/month = $39,000/year
Retention: 13 months included in Pro tier
Total: ~$39,200/year
Grafana Cloud’s pricing has shifted to a per-active-series model. At 500k active series, you’re looking at roughly $39,000 per year on the Pro tier. The Enterprise tier starts at $25,000/year minimum commit and offers volume discounts that can bring the per-series cost down to $3 per 1k series for large deployments.
Scenario 2: 500 Services, 500 GB/Day Ingestion
At this scale, the differences become dramatic.
Thanos: Object storage jumps to $4,200/year. Compute scales to about $12,000/year for the query layer. Engineering time increases to 0.2 FTE ($30,000/year) because compaction takes longer, block counts grow, and query performance tuning becomes a regular task. Total: approximately $46,000/year.
Mimir (self-hosted): Object storage is the same $4,200/year. Compute scales to $20,000-25,000/year because you need more ingesters and distributors to handle the ingestion rate. Kafka costs rise to $4,000/year. Engineering time is 0.2 FTE ($30,000/year). Total: approximately $58,000-63,000/year.
Grafana Cloud Mimir: At 2.5 million active series, you’re in Enterprise territory. With volume discounts bringing the per-series cost to around $4 per 1k series, you’re looking at $10,000/month for metrics alone, plus the $25,000 minimum commit. Total: approximately $145,000/year.
VictoriaMetrics (self-hosted): This is where VictoriaMetrics shines. With 10x compression, your storage costs are a fraction of the block-based solutions. For 500 GB/day ingestion, you’re looking at roughly $1,500/year in storage (HDD-based, which VictoriaMetrics is optimized for), $8,000-10,000/year in compute, and 0.1 FTE engineering time ($15,000/year) because the single-binary or three-component architecture is simpler to operate. Total: approximately $25,000-27,000/year.
Query Performance at Scale
Cost is only half the equation. If your queries take ten seconds to return, nobody uses the dashboards, and you’ve wasted the money.
For recent data queries under two hours, all three solutions perform adequately. VictoriaMetrics leads with 20-50 millisecond median latency. Mimir sits at 30-80 milliseconds with proper caching. Thanos ranges from 50-100 milliseconds for sidecar queries.
For historical data queries beyond 24 hours, the differences widen. VictoriaMetrics maintains 100-500 millisecond latency thanks to its efficient compression and block storage design. Mimir achieves 150 milliseconds to 1 second with store-gateway caching. Thanos ranges from 200 milliseconds to 2 seconds, heavily dependent on object storage latency — if your S3 bucket is in a different region than your querier, expect the upper end of that range.
The 99th percentile tells a more interesting story. In benchmarks running 5.5 million active series with 360k samples per second ingestion, Mimir’s 99th percentile latency hit 47 seconds at maximum, while VictoriaMetrics peaked at 20 seconds. Mimir’s latency spikes correlate with ingester flush cycles — every two hours, ingesters write their in-memory TSDB blocks to disk and upload to object storage, creating a periodic performance dip.
The Hidden Costs Nobody Talks About
Cardinality Explosions
The fastest way to blow your Prometheus storage budget is an unbounded label value. If you add pod_name or request_id as a label, your active series count grows with every pod restart and every request. Thanos and Mimir both store every unique series in object storage, so cardinality explosions translate directly into storage costs.
The fix is relabeling at the Prometheus level, before data ever reaches your storage layer:
# Drop high-cardinality labels before they hit storage
relabel_configs:
- action: labeldrop
regex: "pod_name|request_id|instance_id"
Mimir offers per-tenant rate limiting and series quotas, which can prevent a single noisy tenant from consuming all your storage. Thanos relies on careful configuration of your Prometheus external labels and query-time filtering — it’s more manual but gives you finer control.
Compactor Failures
The Thanos compactor is a single point of failure. Only one compactor should run at a time, and if it crashes mid-compaction, you need to clean up its scratch directory before restarting. At scale, compaction windows grow — merging terabytes of blocks isn’t instant — and the compactor can fall behind, causing query performance to degrade as the querier has to scan more, smaller blocks.
Mimir’s compactor is also single-instance but is stateless, which makes recovery faster. The trade-off is that Mimir’s compactor runs per-tenant, so if you have many tenants, compaction takes longer overall.
Data Egress Costs
This one catches teams off guard. If your Prometheus instances are in us-east-1 but your S3 bucket for Thanos blocks is in us-west-2, you’re paying data transfer costs on every block upload. At 100 GB/day, that’s $0.09/GB × 100 × 365 = $3,285/year in cross-region transfer fees alone. Always colocate your object storage with your Prometheus instances.
Similarly, when queriers read historical data from object storage, they pull data across the network. If your Grafana instance is in a different region than your S3 bucket, every dashboard refresh incurs egress charges. Keep your entire observability stack in the same region.
Decision Framework
Here’s how I’d approach the decision in 2026.
Choose Thanos if you already have a fleet of Prometheus instances and want to add long-term storage with minimal disruption. The sidecar pattern means you don’t need to change your collection model, and the operational complexity is manageable for teams with Prometheus experience. It’s the cheapest self-hosted option if you’re willing to invest the engineering time.
Choose Mimir if you need multi-tenancy, strict tenant isolation, or are already in the Grafana ecosystem. The Kafka-based architecture in Mimir 3.0 is genuinely better than Cortex ever was, and the Grafana Cloud managed option is compelling if you’d rather pay than operate. It costs more than Thanos but delivers better operational characteristics at scale.
Choose VictoriaMetrics if cost efficiency and operational simplicity are your top priorities. The compression ratios are real, the resource usage is lower, and the architecture is simpler. The trade-off is that you’re storing data on block storage rather than object storage, which affects your disaster recovery strategy and means you can’t use the same cheap S3 Glacier archival tiers.
Don’t choose Cortex. It’s been superseded by Mimir, and there’s no advantage to starting with a project that’s no longer actively developed.
Migration Paths
If you’re moving from Cortex to Mimir, Grafana provides a migration guide that lets you run Thanos as a query layer over both Cortex and Mimir during the transition. Point your Prometheus remote-write to Mimir, keep Thanos querying both backends, and decommission Cortex once you’ve validated that Mimir has all your historical data.
If you’re moving from Thanos to Mimir, you can use the Thanos sidecar to feed Mimir while keeping your existing Thanos querier for historical data. This gives you a gradual migration path without a hard cutover.
If you’re moving from any of these to VictoriaMetrics, you’ll need to reconfigure your Prometheus instances to remote-write to VictoriaMetrics instead of (or in addition to) your current storage. VictoriaMetrics accepts the standard Prometheus remote_write protocol, so the configuration change is minimal.
Validation and Success Criteria
However you decide, validate your choice with these measurable criteria:
Your ingestion pipeline should handle peak load without dropping samples. Test with at least 2x your expected peak to build in headroom. Monitor the prometheus_remote_storage_failed_samples_total metric — it should stay at zero.
Query latency for your most common dashboards should stay under 500 milliseconds at the 95th percentile. If historical queries regularly exceed one second, your compaction strategy or storage tier needs adjustment.
Storage costs should be predictable and grow linearly with ingestion volume, not exponentially. If your monthly storage bill is growing faster than your metric count, you have a cardinality problem or a compaction backlog.
Compactor lag should stay under one hour. If the compactor is consistently behind, you need more compute resources or a more aggressive compaction concurrency setting.
The right choice depends on your team’s expertise, your compliance requirements, and your growth trajectory. All three solutions have proven themselves in production. The key is matching the tool to your actual constraints, not your aspirational architecture.
Further Reading
- Thanos Documentation
- Grafana Mimir Documentation
- VictoriaMetrics Documentation
- Prometheus Remote Write Specification
- Grafana Cloud Pricing