⚡ Optimizing Docker on ARM64: From Emulation Hell to Native Performance
How I discovered my portfolio was running through QEMU emulation on Oracle Cloud ARM64 and achieved 98% faster cold starts by switching to native builds.
Words
2,167
Read Time
11 min read
Category
General
Recent articles you open here will appear in this quick history.
🚨 The Discovery
After migrating from DigitalOcean to Oracle Cloud Infrastructure (OCI), I was excited about the powerful Ampere ARM64 processors. Everything seemed to work perfectly—my portfolio loaded fast, containers were healthy, and monitoring showed no issues. But then I decided to dig deeper into performance metrics, and what I found surprised me.
My Docker images were built for AMD64, running on ARM64 hardware through QEMU emulation.
While QEMU is impressive technology that makes cross-architecture execution possible, it's essentially a translator—converting x86 instructions to ARM instructions in real-time. This works, but at a cost. This post documents my journey to native ARM64 builds and the dramatic performance improvements I achieved.
If you're interested in my OCI migration journey, check out:
- 🚀 Methodical Migration: DigitalOcean Droplet to Oracle Cloud
- 🧱 Home Server Chronicles: My Docker-Powered Ecosystem — Part 1
🤔 How Did This Happen?
During my migration from DigitalOcean (AMD64) to Oracle Cloud (ARM64), I focused on getting everything working quickly. My docker-compose.yml had this line:
services:
portfolio:
platform: linux/amd64 # ← Forcing AMD64 on ARM64 = QEMU emulation
image: jayakrishnakonda/portfolio:latest
This configuration was intentional on DigitalOcean—the GitHub Actions runner builds AMD64 images by default. When I moved to OCI's ARM64 Ampere instances, I kept this config because "it worked." Docker pulled the AMD64 image and transparently used QEMU to make it run.
The problem? Performance was being left on the table.
📊 Understanding the Infrastructure
OCI Migration Context
When I migrated to Oracle Cloud, I faced several challenges:
- Architecture mismatch: DigitalOcean used AMD64, OCI offered ARM64 Ampere
- Image compatibility: My GitHub Actions workflow only built AMD64 images
- Time pressure: I wanted everything up quickly during migration
- "It just works" syndrome: QEMU made everything appear fine
I documented the full migration in this post, but one thing I didn't address immediately was architecture optimization.
Disk Performance Baseline
Before tackling the CPU architecture, I benchmarked my storage:
Boot Drive (/dev/sda1):
- 📝 Write: 54.3 MB/s
- 📖 Read: 26.8 MB/s
Attached Block Storage (/dev/sdb - Docker location):
- 📝 Write: 74.3 MB/s (37% faster)
- 📖 Read: 37.4 MB/s (40% faster)
Good news: Docker runs on faster block storage, so I/O wasn't the bottleneck. The real issue was CPU architecture.
🔍 The Problem: QEMU Emulation Overhead
Why QEMU emulation matters:
Running AMD64 containers on ARM64 requires real-time instruction translation:
- 🐌 CPU overhead from x86 → ARM translation
- 🕐 Slower container startup times
- 💾 Increased memory usage
- ⚠️ Potential compatibility issues
Verifying the Problem
# System architecture
$ uname -m
aarch64 # ← ARM64 system
# Docker image architecture
$ docker inspect jayakrishnakonda/portfolio:latest | grep Architecture
"Architecture": "amd64" # ← AMD64 image!
# Container runtime
$ docker exec portfolio uname -m
aarch64 # ← Running through QEMU translation
Everything worked, but I was paying a performance tax on every request.
📈 Benchmarking: The Baseline
Before making changes, I collected baseline metrics with AMD64+QEMU:
Container Performance (AMD64 + QEMU)
| Metric | Value | |--------|-------| | 🚀 Container startup | 0.236s | | ❄️ Cold start (first request) | 278ms | | 🔥 Warm response time | ~10ms (9.97ms avg) | | 💾 Memory usage (idle) | 82.5 MiB | | 🖥️ CPU usage (idle) | 0.00% | | ⏱️ CI/CD build time | ~4 minutes |
Not terrible numbers! But I suspected native ARM64 could do better.
🛠️ Solution Options
I had three paths forward:
Option 1: Native ARM64 Build ✅
Pros:
- ✅ Simplest implementation
- ✅ Fastest builds (no multi-platform overhead)
- ✅ Best performance for ARM64
- ✅ Smaller registry footprint
Cons:
- ❌ Only works on ARM64
- ❌ Can't deploy to AMD64 without rebuilding
- ❌ Less flexible for multi-cloud
Best for: Single-architecture deployments
Option 2: Multi-Architecture Build 🌟 (My Choice)
Pros:
- ✅ Supports both AMD64 and ARM64
- ✅ Docker auto-selects correct architecture
- ✅ Future-proof and portable
- ✅ Industry standard approach
- ✅ Works across cloud providers
Cons:
- ⚠️ Longer build times (5-10x)
- ⚠️ More complex CI/CD setup
- ⚠️ Requires Docker Buildx
Best for: Production apps that may deploy anywhere
Option 3: Self-Hosted ARM64 Runner
Pros:
- ✅ Zero emulation during build
- ✅ Uses local VPS resources
- ✅ Complete control
- ✅ Fastest native builds
Cons:
- ❌ Must manage runner infrastructure
- ❌ VPS must be online during builds
- ❌ Security considerations
- ❌ More maintenance overhead
Best for: Dedicated build infrastructure
I chose Option 2 for maximum flexibility—build once, deploy anywhere.
🔧 Implementation: Multi-Architecture Builds
Step 1: Verify Dockerfile Compatibility
First, I verified all base images and dependencies support ARM64:
# Base image check
FROM node:18-alpine # ✅ Supports both amd64 and arm64
# System dependencies
RUN apk add --no-cache \
vips-dev \ # ✅ ARM64 support
build-base \ # ✅ ARM64 support
python3 \ # ✅ ARM64 support
git # ✅ ARM64 support
Compatibility verification:
- ✅ Node.js 18: Full ARM64 support
- ✅ Next.js 14: Full ARM64 support
- ✅ sharp (native module): Builds for ARM64 automatically
- ✅ All Alpine packages: Multi-arch by default
Result: No Dockerfile changes needed!
Step 2: Update GitHub Actions Workflow
Modified .github/workflows/build-and-deploy.yml:
- name: 🚀 Build and push Docker image (Multi-Architecture)
uses: docker/build-push-action@v5
with:
context: .
file: ./Dockerfile
push: true
platforms: linux/amd64,linux/arm64 # ← Multi-arch magic
tags: ${{ steps.meta.outputs.tags }}
cache-from: type=gha # ← Speed up subsequent builds
cache-to: type=gha,mode=max
build-args: |
NODE_OPTIONS=--max-old-space-size=4096
NEXT_PUBLIC_GA_MEASUREMENT_ID=${{ vars.NEXT_PUBLIC_GA_MEASUREMENT_ID }}
provenance: false
timeout-minutes: 40 # ← Increased for multi-arch
Key changes:
- Added
platforms: linux/amd64,linux/arm64 - Added GitHub Actions cache for faster builds
- Increased timeout from 25 to 40 minutes
Step 3: Remove Platform Override
Modified docker-compose.yml:
services:
portfolio:
# Removed: platform: linux/amd64
# Docker now auto-selects native ARM64 image
image: jayakrishnakonda/portfolio:latest
That's it! Just remove the platform override.
Step 4: The Build Challenge
First build attempt? Failed. The build timed out after 25 minutes—it was still running (TypeScript type checking) when the timeout hit.
The fix: Increased timeouts:
- Job timeout: 30 → 45 minutes
- Build step: 25 → 40 minutes
Second attempt? Success! Build completed in ~38 minutes.
📊 Results: The Performance Transformation
Native ARM64 Performance
After deploying the multi-arch image, I re-ran all benchmarks:
| Metric | AMD64 + QEMU | Native ARM64 | Improvement | |--------|--------------|--------------|-------------| | 🚀 Container Startup | 0.236s | 0.183s | 22% faster ⚡ | | ❄️ Cold Start | 278ms | 5.6ms | 98% faster 🚀 | | 🔥 Warm Response | 9.97ms | 6.6ms | 34% faster ⚡ | | 💾 Memory (Idle) | 82.5 MiB | 40.45 MiB | 51% less 💾 | | 🖥️ CPU (Idle) | 0.00% | 0.00% | Same | | 📦 Image Size | 527 MB | 527 MB | Same | | ⏱️ Build Time | ~4 min | ~38 min | 9.5x longer |
The Dramatic Improvements
🚀 98% Faster Cold Starts
The most shocking improvement: cold start dropped from 278ms to 5.6ms. This is a game-changer for user experience—the first page load is now nearly instantaneous.
This massive improvement comes from eliminating QEMU's instruction translation overhead on the first request.
⚡ 34% Faster Response Times
Even warm requests improved significantly: 9.97ms → 6.6ms. While both are fast, this compounds across thousands of requests daily.
💾 51% Memory Reduction
Memory usage dropped from 82.5 MiB to 40.45 MiB—over 50% reduction. This means better resource utilization and the ability to run more containers on the same hardware.
🏃 22% Faster Container Startup
Container startup improved from 0.236s to 0.183s. This matters for deployments, scaling events, and recovery from failures.
⏱️ Trade-off: Longer Build Times
The only downside: CI/CD builds increased from 4 minutes to 38 minutes. However, this is a one-time cost per deployment, while runtime performance benefits every single request.
Deployments happen occasionally. The app runs 24/7. The math clearly favors native builds.
💡 Lessons Learned
1. QEMU Works, But Native is Better
QEMU is impressive—my site worked fine with it. Most users probably wouldn't notice. But "works fine" isn't the same as "works optimally."
The benchmarks don't lie:
- 98% faster cold starts = better user experience
- 51% less memory = better resource utilization
- 34% faster responses = compounds over millions of requests
2. The Cold Start Improvement is Surprising
I expected improvements, but 278ms → 5.6ms (98% reduction) was stunning. This suggests QEMU has significant overhead on first instruction translation. Once cached (warm requests), the gap narrows to 34%, but that first impression matters.
3. Multi-Arch is Worth the Complexity
While single-arch builds are simpler, multi-arch provides:
- Testing on local AMD64 development machines
- Deploying to different cloud providers
- Future infrastructure flexibility
The 9.5x longer build time is a small price to pay.
4. Build Time Trade-off is Acceptable
Yes, builds take 9.5x longer (38 min vs 4 min). But consider:
- Deployments: Few times per week
- Runtime: 24/7, millions of requests
- Performance: Every request is 34% faster
If you deploy 10 times per week, that's 340 extra minutes of build time. But your app serves millions of faster requests. The math favors native builds.
5. GitHub Actions Cache is Essential
Multi-platform builds take longer. Using cache-from: type=gha reduces subsequent build times significantly. Without caching, builds would be even slower.
6. Always Verify Base Image Support
Check Docker Hub for multi-architecture support before choosing base images. Official images (like node:18-alpine) usually support both, but always verify.
7. Don't Skip the Benchmarks
I almost didn't benchmark because "everything worked." The 98% cold start improvement would have remained hidden. Always measure!
🎯 Best Practices for ARM64 Deployments
1. Use Multi-Architecture Builds by Default
platforms: linux/amd64,linux/arm64
2. Never Hardcode Platform in Docker Compose
# ❌ Don't do this (unless you have a specific reason)
platform: linux/amd64
# ✅ Let Docker choose the right architecture
image: myapp:latest
3. Increase Your Timeouts
Multi-arch builds take 5-10x longer:
timeout-minutes: 40 # Not 25!
4. Use Build Caching Aggressively
cache-from: type=gha
cache-to: type=gha,mode=max
5. Test on Target Architecture
Even with multi-arch builds, test on actual deployment architecture before production.
6. Document Architecture Decisions
Add comments explaining why certain architectures are chosen:
# Multi-arch image: Docker auto-selects ARM64 on this VPS
image: portfolio:latest
💰 Cost Implications
Oracle Cloud Infrastructure (Ampere A1)
- Free Tier: 4 ARM cores, 24GB RAM
- Performance: Excellent for ARM64 workloads
- Cost: $0/month (within free tier)
GitHub Actions
- Multi-arch builds: ~2x build time vs single-arch
- Free Tier: 2000 minutes/month
- Cost: $0/month for public repos
Performance vs Cost Trade-off
Native ARM64 builds provide dramatically better performance at zero additional cost when using OCI's free tier and GitHub Actions for public repos.
🚀 Conclusion
Migrating to native ARM64 Docker builds was straightforward and delivered game-changing performance improvements:
- ✅ 98% faster cold starts (278ms → 5.6ms)
- ✅ 34% faster warm responses (9.97ms → 6.6ms)
- ✅ 51% less memory (82.5 MiB → 40.45 MiB)
- ✅ 22% faster container startup (0.236s → 0.183s)
- ✅ Eliminated QEMU emulation overhead
- ✅ Future-proof multi-architecture support
Key Takeaways
- Check your architecture - Don't assume your images match your hardware. Run
docker inspectto verify. - Multi-arch is the way - Build once, deploy anywhere. The flexibility is worth the longer build times.
- Benchmarking matters - I measured 98% improvement in cold starts—you might see similar gains.
- It's easier than you think - Docker Buildx makes multi-arch simple. The changes fit in a single commit.
- Performance compounds - A 34% improvement per request adds up to massive savings at scale.
Recommendations
If you're running containers on ARM64 infrastructure (AWS Graviton, Oracle Ampere, Apple Silicon, Raspberry Pi, etc.), verify your images are built for the correct architecture. The performance gains are worth the minimal effort:
- For new projects: Start with multi-arch from day one
- For existing projects: Add
platforms: linux/amd64,linux/arm64to your build - Increase your timeouts: Multi-arch builds take 5-10x longer
- Use build caching: GitHub Actions cache dramatically speeds up subsequent builds
The cold start improvement alone (278ms → 5.6ms) justifies the migration. When you factor in the memory savings and response time improvements, it's a no-brainer for production ARM64 deployments.
📚 Resources
- Docker Buildx Documentation
- Multi-platform Images Guide
- Oracle Cloud Ampere A1 Documentation
- GitHub Actions - Docker Builds
- My OCI Migration Journey
- My Docker Home Server Series
Running ARM64 infrastructure? Have questions about multi-arch builds? Feel free to reach out via email or check out my other infrastructure articles.
Follow This Topic
Keep exploring through related builds and skill areas connected to this post.
Related Projects