How we migrated our monitoring stack from DigitalOcean to Hetzner Baremetal and cut costs by 40 percent

If you recommend private cloud, you should live on private cloud yourself

CCsolutions advises companies evaluating a migration from public cloud to private infrastructure. For over a year our own monitoring stack ran on DigitalOcean. The inconsistency was obvious. We preached the benefits of private cloud while paying premium for shared virtual machines.

This article documents exactly how we moved Grafana, Loki, and Mimir from DigitalOcean to Hetzner Baremetal with Syself-managed Kubernetes. No marketing filter. All the numbers.

The problem with the old setup

Our DigitalOcean stack consisted of 6 virtualized nodes, each with 8 vCPU and 32 GB of RAM. Monthly cost: around 2,000 EUR. On paper, enough for a small team with moderate log and metric volume. In practice, a series of operational problems.

First headache: OOM errors on Loki and Mimir every few days. Compaction of historical Mimir blocks consumed more than 25 GB of RAM at peak, enough to trigger the OOM killer. Every restart meant temporary data loss in monitoring, exactly when you need it most.

Second: query latency. Grafana dashboards with 30 days of data loaded in 8 to 12 seconds. With 90 days, 30 seconds or more. The reason is simple. Block storage on shared VMs does not deliver the I/O throughput Loki needs for index scanning.

Third, and most relevant for our strategy: zero control over the underlying hardware. On virtual machines you do not know who you share the hypervisor with. Your noisy neighbor can consume disk bandwidth at the worst moment. For a critical monitoring system, that unpredictability is unacceptable.

Why Hetzner Baremetal with Syself

We evaluated four options. Upgrade the DigitalOcean plan. Migrate to AWS or GCP with dedicated instances. Rent baremetal at OVH or Hetzner. Build our own on-premises cluster.

We dropped the first because it scaled the economic problem without solving the root cause. AWS and GCP fell out on cost: an EC2 dedicated instance with comparable specs runs at double the price. On-premises requires datacenter staff we do not have.

Hetzner Baremetal won through a rare combination: low price, modern hardware, European datacenters with low latency toward DACH and LATAM, and acceptable APIs for automation.

The second problem was the Kubernetes control plane. Manual bootstrap with kubeadm is feasible, but control plane upgrades and daily operations eat team time. This is where Syself came in, a provider that manages the Kubernetes cluster on third-party hardware. You pay for control plane management and keep full control over workers and data.

Hardware and final architecture

Four baremetal nodes at Hetzner. Specs per node:

AMD Ryzen, 16 physical cores
64 GB DDR4 ECC RAM
1 TB NVMe SSD

Total cluster capacity: 64 physical cores, 256 GB RAM, 4 TB local NVMe storage.

Syself manages the Kubernetes control plane on its own infrastructure. The 4 nodes are workers where the entire stack runs: Grafana for visualization, Loki for log aggregation, Mimir for long-term metrics, and the agents for both (Promtail, Grafana Agent).

Storage is the fundamental change. Previously we depended on DigitalOcean block volumes. Now we have local NVMe on every worker. For Loki and Mimir, which are I/O-sensitive, the effect is direct. Queries that previously took seconds now run in milliseconds.

The migration process

The migration took three weeks, without monitoring downtime. The key steps:

Week 1: Provisioning and validation. We provisioned the 4 nodes at Hetzner, connected them to the Syself control plane, configured networking and storage. We validated that the cluster passed basic Kubernetes tests (e2e tests) before touching productive workloads.

Week 2: Data replication. We configured Mimir and Loki in the new cluster pointing to the same S3 bucket as the old cluster. Both clusters wrote to the same data source for several days to guarantee consistency. Grafana was configured with parallel data sources.

Week 3: Traffic switch. We pointed the agents in productive environments at the new cluster. We verified that dashboards worked identically. We kept the old cluster for 5 additional days as a safety net. After that, termination.

Zero data loss, zero maintenance windows, zero 3 AM phone calls.

Results with real numbers

After three months in productive operation:

| Metric | Before (DigitalOcean) | After (Hetzner) | Change | |---|---|---|---| | Monthly cost | ~2,000 EUR | ~1,000 to 1,100 EUR | 40 to 50% lower | | Total CPU cores | 48 vCPU | 64 physical cores | 33% more | | Total RAM | 192 GB | 256 GB | 33% more | | OOM errors | Every 2 to 3 days | Zero in 90 days | Eliminated | | Query latency 30 days | 8 to 12s | < 1s | 10x faster | | Query latency 90 days | 30s+ | 2 to 3s | 10x faster |

The real saving is larger than the nominal 40 percent because we are not comparing equivalent capacity. We pay less for more resources. The difference between virtualized vCPU with shared tenancy and dedicated physical cores also matters. Mimir spikes no longer compete with unknown workloads.

What we learned

Three takeaways we carry into client projects.

Cost savings are real, but operational consistency matters more. Monitoring is the first infrastructure an SRE team needs stable. If your Grafana goes down during an incident, you multiplied the problem. Zero OOM errors in 90 days are worth more than 40 percent savings.

I/O is the hidden bottleneck. In most stacks we evaluate, the bottleneck is not CPU or RAM but storage throughput. Local NVMe eliminates that problem. If your workloads are disk-sensitive (databases, log systems, metric systems), consider baremetal before VMs.

Managed Kubernetes on your own hardware is an under-explored sweet spot. Most teams choose between fully managed (EKS, AKS, GKE) or fully self-hosted. The middle option, managed control plane with your own workers, combines the best of both. Operational simplicity without losing data sovereignty.

Conclusion

Migrating our own monitoring was not a client project. It was a validation of the thesis we sell. If we recommend private cloud to clients with cost-sensitive or compliance-sensitive workloads, we want to have made that journey ourselves first, with all the mistakes and lessons.

If you are evaluating a similar migration and want to discuss details that did not fit into a blog post (exact Mimir configuration, data transfer costs, provider options in DACH or LATAM), book a free 45-minute call. We share what we know.

Tags kubernetes, hetzner, digitalocean, finops, monitoring, baremetal

Written by

Antony Ricardo Goetzschel

Co-Founder and CTO

Co-founder of CCsolutions. Over a decade building infrastructure for regulated industries (finance, healthcare, energy). Specialized in Kubernetes, FinOps, and private AI architectures. Writes about what works and what does not, with real numbers.

More about us