Benchmarking Swift vs Rust on Apple Silicon: When Emulation Outpaces Native
A quick performance deep-dive with matrix-multiply and neural-net benchmarks
Introduction
Apple’s transition to ARM-based silicon has brought native Swift performance to the forefront for many macOS developers. But how does Swift on ARM64 really stack up against another “fast” language like Rust? Some developers still rely on Rust code compiled for x86_64 and run under Rosetta 2 emulation. Swift, of course, compiles natively to ARM64. I’ve had bad experiences with performance issues using Swift, so I was curious as to which was faster.
I set out to run two compute-intensive workloads: naïve matrix multiplication and a tiny 2-layer neural-network backprop pass. I benchmarked these across all three configurations using hyperfine. Here’s what we found.
Benchmarking Setup
To ensure an apples-to-apples comparison:
Swift (ARM64 Native)
Rust (ARM64 Native)
Rust x86_64 under Rosetta
All benchmarks were compiled with full optimizations (-O, Release builds, and Rust LTO enabled). We used hyperfine for 5 warm-up runs and 20 timed runs, flushing filesystem caches between each. This gave us mean ± σ timings with outlier detection.
Matrix Multiply Results
We multiplied two random 600 × 600 double-precision matrices using a straightforward triple-loop implementation.
Implementation Mean Time ± σ
Swift ARM64: 207.9 ms ± 0.7 ms
Rust x86_64 (Rosetta): 205.0 ms ± 4.1 ms
Rust ARM64 Native: 189.6 ms ± 7.6 ms
Native Rust outpaced Swift by ~10 %.
Rust under emulation still beat Swift by ~1 %.
Neural Net Backpropagation Results
Next, a small 2-layer dense neural net (1024→512→10) ran 50 iterations of forward + backprop.
Implementation Mean Time ± σ
Swift ARM64: 99.7 ms ± 0.6 ms
Rust x86_64 (Rosetta): 65.8 ms ± 0.7 ms
Rust ARM64 Native: 52.4 ms ± 0.5 ms
Swift was nearly 2× slower than native Rust, and 1.5× slower than Rust + Rosetta.
Why Is Swift Slower Than Emulated Rust?
A few factors help explain these counterintuitive results:
Bounds-check Overhead
Swift’s safe arrays emit runtime checks on every index access unless you use
unsafebuffers.Rust can often prove index safety in release mode and eliminate most or all bounds checks.
Emulation Hot-Path Optimization
Apple’s Rosetta JIT identifies hot loops in the Rust x86_64 binary and recompiles them into optimized ARM64 on the fly.
This dynamic translation can sometimes out-optimize a statically compiled Swift binary, especially in tight inner loops.
LLVM Frontend Differences
Rust’s LLVM backend pipeline may auto-vectorize and unroll loops more aggressively than Swift’s default front-end optimizations.
Link-Time Optimization (LTO) in Rust further fuses and inlines hot code paths across modules.
Memory Allocation Patterns
In our neural-net test, we pre-allocated error buffers in Rust to avoid per-iteration heap allocations.
Swift’s memory model and ARC can still incur minor retain/release costs in tight loops.
Takeaways
Swift needs some optimization
Don’t assume “native always wins”; in some workloads, Rosetta can surprise you.

