Blog freshness: Research notes liveLatest update: May 2026Telemetry mode: Public-safe live stripAI tools: Self-hosted demos live
Skip to main content
General
April 3, 2026
11 min read

⚡ Optimizing Docker on ARM64: From Emulation Hell to Native Performance

How I discovered my portfolio was running through QEMU emulation on Oracle Cloud ARM64 and achieved 98% faster cold starts by switching to native builds.

Words

2,167

Read Time

11 min read

Category

General

Read aloud
Browser TTS unavailable
Ready for a more natural read-aloud pass.
Reading list
Reading History

Recent articles you open here will appear in this quick history.

#Docker#ARM64#Performance#DevOps#CI/CD#Oracle Cloud+1

🚨 The Discovery

After migrating from DigitalOcean to Oracle Cloud Infrastructure (OCI), I was excited about the powerful Ampere ARM64 processors. Everything seemed to work perfectly—my portfolio loaded fast, containers were healthy, and monitoring showed no issues. But then I decided to dig deeper into performance metrics, and what I found surprised me.

My Docker images were built for AMD64, running on ARM64 hardware through QEMU emulation.

While QEMU is impressive technology that makes cross-architecture execution possible, it's essentially a translator—converting x86 instructions to ARM instructions in real-time. This works, but at a cost. This post documents my journey to native ARM64 builds and the dramatic performance improvements I achieved.

If you're interested in my OCI migration journey, check out:


🤔 How Did This Happen?

During my migration from DigitalOcean (AMD64) to Oracle Cloud (ARM64), I focused on getting everything working quickly. My docker-compose.yml had this line:

services:
  portfolio:
    platform: linux/amd64  # ← Forcing AMD64 on ARM64 = QEMU emulation
    image: jayakrishnakonda/portfolio:latest

This configuration was intentional on DigitalOcean—the GitHub Actions runner builds AMD64 images by default. When I moved to OCI's ARM64 Ampere instances, I kept this config because "it worked." Docker pulled the AMD64 image and transparently used QEMU to make it run.

The problem? Performance was being left on the table.


📊 Understanding the Infrastructure

OCI Migration Context

When I migrated to Oracle Cloud, I faced several challenges:

  1. Architecture mismatch: DigitalOcean used AMD64, OCI offered ARM64 Ampere
  2. Image compatibility: My GitHub Actions workflow only built AMD64 images
  3. Time pressure: I wanted everything up quickly during migration
  4. "It just works" syndrome: QEMU made everything appear fine

I documented the full migration in this post, but one thing I didn't address immediately was architecture optimization.

Disk Performance Baseline

Before tackling the CPU architecture, I benchmarked my storage:

Boot Drive (/dev/sda1):

  • 📝 Write: 54.3 MB/s
  • 📖 Read: 26.8 MB/s

Attached Block Storage (/dev/sdb - Docker location):

  • 📝 Write: 74.3 MB/s (37% faster)
  • 📖 Read: 37.4 MB/s (40% faster)

Good news: Docker runs on faster block storage, so I/O wasn't the bottleneck. The real issue was CPU architecture.


🔍 The Problem: QEMU Emulation Overhead

Why QEMU emulation matters:

Running AMD64 containers on ARM64 requires real-time instruction translation:

  • 🐌 CPU overhead from x86 → ARM translation
  • 🕐 Slower container startup times
  • 💾 Increased memory usage
  • ⚠️ Potential compatibility issues

Verifying the Problem

# System architecture
$ uname -m
aarch64  # ← ARM64 system

# Docker image architecture
$ docker inspect jayakrishnakonda/portfolio:latest | grep Architecture
"Architecture": "amd64"  # ← AMD64 image!

# Container runtime
$ docker exec portfolio uname -m
aarch64  # ← Running through QEMU translation

Everything worked, but I was paying a performance tax on every request.


📈 Benchmarking: The Baseline

Before making changes, I collected baseline metrics with AMD64+QEMU:

Container Performance (AMD64 + QEMU)

| Metric | Value | |--------|-------| | 🚀 Container startup | 0.236s | | ❄️ Cold start (first request) | 278ms | | 🔥 Warm response time | ~10ms (9.97ms avg) | | 💾 Memory usage (idle) | 82.5 MiB | | 🖥️ CPU usage (idle) | 0.00% | | ⏱️ CI/CD build time | ~4 minutes |

Not terrible numbers! But I suspected native ARM64 could do better.


🛠️ Solution Options

I had three paths forward:

Option 1: Native ARM64 Build ✅

Pros:

  • ✅ Simplest implementation
  • ✅ Fastest builds (no multi-platform overhead)
  • ✅ Best performance for ARM64
  • ✅ Smaller registry footprint

Cons:

  • ❌ Only works on ARM64
  • ❌ Can't deploy to AMD64 without rebuilding
  • ❌ Less flexible for multi-cloud

Best for: Single-architecture deployments

Option 2: Multi-Architecture Build 🌟 (My Choice)

Pros:

  • ✅ Supports both AMD64 and ARM64
  • ✅ Docker auto-selects correct architecture
  • ✅ Future-proof and portable
  • ✅ Industry standard approach
  • ✅ Works across cloud providers

Cons:

  • ⚠️ Longer build times (5-10x)
  • ⚠️ More complex CI/CD setup
  • ⚠️ Requires Docker Buildx

Best for: Production apps that may deploy anywhere

Option 3: Self-Hosted ARM64 Runner

Pros:

  • ✅ Zero emulation during build
  • ✅ Uses local VPS resources
  • ✅ Complete control
  • ✅ Fastest native builds

Cons:

  • ❌ Must manage runner infrastructure
  • ❌ VPS must be online during builds
  • ❌ Security considerations
  • ❌ More maintenance overhead

Best for: Dedicated build infrastructure

I chose Option 2 for maximum flexibility—build once, deploy anywhere.


🔧 Implementation: Multi-Architecture Builds

Step 1: Verify Dockerfile Compatibility

First, I verified all base images and dependencies support ARM64:

# Base image check
FROM node:18-alpine  # ✅ Supports both amd64 and arm64

# System dependencies
RUN apk add --no-cache \
    vips-dev \       # ✅ ARM64 support
    build-base \     # ✅ ARM64 support
    python3 \        # ✅ ARM64 support
    git              # ✅ ARM64 support

Compatibility verification:

  • ✅ Node.js 18: Full ARM64 support
  • ✅ Next.js 14: Full ARM64 support
  • ✅ sharp (native module): Builds for ARM64 automatically
  • ✅ All Alpine packages: Multi-arch by default

Result: No Dockerfile changes needed!

Step 2: Update GitHub Actions Workflow

Modified .github/workflows/build-and-deploy.yml:

- name: 🚀 Build and push Docker image (Multi-Architecture)
  uses: docker/build-push-action@v5
  with:
    context: .
    file: ./Dockerfile
    push: true
    platforms: linux/amd64,linux/arm64  # ← Multi-arch magic
    tags: ${{ steps.meta.outputs.tags }}
    cache-from: type=gha  # ← Speed up subsequent builds
    cache-to: type=gha,mode=max
    build-args: |
      NODE_OPTIONS=--max-old-space-size=4096
      NEXT_PUBLIC_GA_MEASUREMENT_ID=${{ vars.NEXT_PUBLIC_GA_MEASUREMENT_ID }}
    provenance: false
  timeout-minutes: 40  # ← Increased for multi-arch

Key changes:

  • Added platforms: linux/amd64,linux/arm64
  • Added GitHub Actions cache for faster builds
  • Increased timeout from 25 to 40 minutes

Step 3: Remove Platform Override

Modified docker-compose.yml:

services:
  portfolio:
    # Removed: platform: linux/amd64
    # Docker now auto-selects native ARM64 image
    image: jayakrishnakonda/portfolio:latest

That's it! Just remove the platform override.

Step 4: The Build Challenge

First build attempt? Failed. The build timed out after 25 minutes—it was still running (TypeScript type checking) when the timeout hit.

The fix: Increased timeouts:

  • Job timeout: 30 → 45 minutes
  • Build step: 25 → 40 minutes

Second attempt? Success! Build completed in ~38 minutes.


📊 Results: The Performance Transformation

Native ARM64 Performance

After deploying the multi-arch image, I re-ran all benchmarks:

| Metric | AMD64 + QEMU | Native ARM64 | Improvement | |--------|--------------|--------------|-------------| | 🚀 Container Startup | 0.236s | 0.183s | 22% faster ⚡ | | ❄️ Cold Start | 278ms | 5.6ms | 98% faster 🚀 | | 🔥 Warm Response | 9.97ms | 6.6ms | 34% faster ⚡ | | 💾 Memory (Idle) | 82.5 MiB | 40.45 MiB | 51% less 💾 | | 🖥️ CPU (Idle) | 0.00% | 0.00% | Same | | 📦 Image Size | 527 MB | 527 MB | Same | | ⏱️ Build Time | ~4 min | ~38 min | 9.5x longer |

The Dramatic Improvements

🚀 98% Faster Cold Starts

The most shocking improvement: cold start dropped from 278ms to 5.6ms. This is a game-changer for user experience—the first page load is now nearly instantaneous.

This massive improvement comes from eliminating QEMU's instruction translation overhead on the first request.

⚡ 34% Faster Response Times

Even warm requests improved significantly: 9.97ms → 6.6ms. While both are fast, this compounds across thousands of requests daily.

💾 51% Memory Reduction

Memory usage dropped from 82.5 MiB to 40.45 MiB—over 50% reduction. This means better resource utilization and the ability to run more containers on the same hardware.

🏃 22% Faster Container Startup

Container startup improved from 0.236s to 0.183s. This matters for deployments, scaling events, and recovery from failures.

⏱️ Trade-off: Longer Build Times

The only downside: CI/CD builds increased from 4 minutes to 38 minutes. However, this is a one-time cost per deployment, while runtime performance benefits every single request.

Deployments happen occasionally. The app runs 24/7. The math clearly favors native builds.


💡 Lessons Learned

1. QEMU Works, But Native is Better

QEMU is impressive—my site worked fine with it. Most users probably wouldn't notice. But "works fine" isn't the same as "works optimally."

The benchmarks don't lie:

  • 98% faster cold starts = better user experience
  • 51% less memory = better resource utilization
  • 34% faster responses = compounds over millions of requests

2. The Cold Start Improvement is Surprising

I expected improvements, but 278ms → 5.6ms (98% reduction) was stunning. This suggests QEMU has significant overhead on first instruction translation. Once cached (warm requests), the gap narrows to 34%, but that first impression matters.

3. Multi-Arch is Worth the Complexity

While single-arch builds are simpler, multi-arch provides:

  • Testing on local AMD64 development machines
  • Deploying to different cloud providers
  • Future infrastructure flexibility

The 9.5x longer build time is a small price to pay.

4. Build Time Trade-off is Acceptable

Yes, builds take 9.5x longer (38 min vs 4 min). But consider:

  • Deployments: Few times per week
  • Runtime: 24/7, millions of requests
  • Performance: Every request is 34% faster

If you deploy 10 times per week, that's 340 extra minutes of build time. But your app serves millions of faster requests. The math favors native builds.

5. GitHub Actions Cache is Essential

Multi-platform builds take longer. Using cache-from: type=gha reduces subsequent build times significantly. Without caching, builds would be even slower.

6. Always Verify Base Image Support

Check Docker Hub for multi-architecture support before choosing base images. Official images (like node:18-alpine) usually support both, but always verify.

7. Don't Skip the Benchmarks

I almost didn't benchmark because "everything worked." The 98% cold start improvement would have remained hidden. Always measure!


🎯 Best Practices for ARM64 Deployments

1. Use Multi-Architecture Builds by Default

platforms: linux/amd64,linux/arm64

2. Never Hardcode Platform in Docker Compose

# ❌ Don't do this (unless you have a specific reason)
platform: linux/amd64

# ✅ Let Docker choose the right architecture
image: myapp:latest

3. Increase Your Timeouts

Multi-arch builds take 5-10x longer:

timeout-minutes: 40  # Not 25!

4. Use Build Caching Aggressively

cache-from: type=gha
cache-to: type=gha,mode=max

5. Test on Target Architecture

Even with multi-arch builds, test on actual deployment architecture before production.

6. Document Architecture Decisions

Add comments explaining why certain architectures are chosen:

# Multi-arch image: Docker auto-selects ARM64 on this VPS
image: portfolio:latest

💰 Cost Implications

Oracle Cloud Infrastructure (Ampere A1)

  • Free Tier: 4 ARM cores, 24GB RAM
  • Performance: Excellent for ARM64 workloads
  • Cost: $0/month (within free tier)

GitHub Actions

  • Multi-arch builds: ~2x build time vs single-arch
  • Free Tier: 2000 minutes/month
  • Cost: $0/month for public repos

Performance vs Cost Trade-off

Native ARM64 builds provide dramatically better performance at zero additional cost when using OCI's free tier and GitHub Actions for public repos.


🚀 Conclusion

Migrating to native ARM64 Docker builds was straightforward and delivered game-changing performance improvements:

  • 98% faster cold starts (278ms → 5.6ms)
  • 34% faster warm responses (9.97ms → 6.6ms)
  • 51% less memory (82.5 MiB → 40.45 MiB)
  • 22% faster container startup (0.236s → 0.183s)
  • ✅ Eliminated QEMU emulation overhead
  • ✅ Future-proof multi-architecture support

Key Takeaways

  1. Check your architecture - Don't assume your images match your hardware. Run docker inspect to verify.
  2. Multi-arch is the way - Build once, deploy anywhere. The flexibility is worth the longer build times.
  3. Benchmarking matters - I measured 98% improvement in cold starts—you might see similar gains.
  4. It's easier than you think - Docker Buildx makes multi-arch simple. The changes fit in a single commit.
  5. Performance compounds - A 34% improvement per request adds up to massive savings at scale.

Recommendations

If you're running containers on ARM64 infrastructure (AWS Graviton, Oracle Ampere, Apple Silicon, Raspberry Pi, etc.), verify your images are built for the correct architecture. The performance gains are worth the minimal effort:

  • For new projects: Start with multi-arch from day one
  • For existing projects: Add platforms: linux/amd64,linux/arm64 to your build
  • Increase your timeouts: Multi-arch builds take 5-10x longer
  • Use build caching: GitHub Actions cache dramatically speeds up subsequent builds

The cold start improvement alone (278ms → 5.6ms) justifies the migration. When you factor in the memory savings and response time improvements, it's a no-brainer for production ARM64 deployments.


📚 Resources


Running ARM64 infrastructure? Have questions about multi-arch builds? Feel free to reach out via email or check out my other infrastructure articles.

Continue Reading

These are close to this article’s reading time, so they make a good next step without a big context switch.