Why some teams don’t always rush to upgrade their cloud infrastructure
We’ve spent the last year talking to dozens of engineering leaders at companies ranging from gaming giants to financial services and we’ve noticed a commonality: the smartest teams aren’t always running on the latest VM generations. In fact, some of the most sophisticated organizations we work with are choosing to stay on what many would call “legacy” infrastructure. A senior systems engineer at a major gaming company put it perfectly: “As long as instances are meeting our performance requirements, we stay on older ones.” As we explored further, we realized, the reasons for delaying modernization run deeper than simple inconvenience.
Operational friction, workload incompatibility, or simply, bad timing were often cited as reasons for delaying upgrades. A head of infrastructure at a large education technology company told us they need to coordinate with teams before any Compute Engine VM upgrades can happen. A team lead at a global fintech firm explained that “the latest generation resources don’t have capacity at scale” for their workload size, making the newest options unsuitable for their use case.
These teams have valid reasons for postponing upgrades. Often, the anticipated gains don’t justify the potential risks. This can be especially true for organizations with steady growth. The last thing they want to do is jeopardize their momentum with an operational change that isn’t urgent. However, we worry about what happens when the upgrade is pushed for too long.
The opportunity costs of delaying modernization
Most teams recognize the trade-offs when delaying modernization, however, the scale of those consequences can be underestimated. Beyond the obvious risks of inopportune End-of-Life (EOL) announcements, there is a “hidden tax” to maintaining legacy infrastructure. Even when current VMs are still meeting workload requirements, the business risks of price-performance inefficiencies can silently compound over time as a direct result of:
- Performance drag: Older VMs can technically meet workload performance requirements but often mask a growing efficiency tax as usage scales.
- Wasted capacity: You may need to scale out an older fleet extensively to maintain growing vCPU requirements, while your attached storage pool grows far beyond what’s necessary
- Discount uncertainty: You can’t confidently cover usage with long-term resource-based discounts because you don’t want to commit to infrastructure that will likely change in the near term.
Several enterprise customers have expressed legitimate concerns about committing to three-year commitments on current-generation VMs, worried they’ll be locked into soon-to-be-obsolete infrastructure while Google Cloud releases the next generation. One director of DevOps put it bluntly: they didn’t want to be “stuck on old instance types for three years” just to capture some savings.
All three of these risks share a root cause: the infrastructure you’re paying for is no longer delivering proportional value for the work it does. It’s not always about running on the fastest chips. It’s about getting the best price on the useful work your system does.
That’s where the new N4 VM family comes in. Google designed N4 specifically for better price-performance using the latest 5th gen Intel Xeon chips and Google’s own Titanium infrastructure. The N4 series is up to 18% better than N2 and up to 70% better than N11. N4 has even better results for specific jobs like running MySQL or Java.
A Tom’s Hardware piece on AMD’s Athlon hitting 1GHz is a good reminder of how this works. AMD didn’t win by hitting 1GHz first. It won because the Athlon did more useful work per cycle than anything Intel had. N4 follows the same playbook. Better instructions-per-clock on 5th gen Xeon, plus Google’s Titanium system taking storage and network overhead off the CPU entirely. The clock isn’t faster. You just get more out of every cycle you’re already paying for. And if you prefer a modern AMD option, the N4D uses 5th gen AMD EPYC chips with Titanium. You can expect generally the same benefits as the N4 for your applications running on AMD hardware.
Translating hourly spend into cloud unit economics
Understanding the true business risks of delaying upgrades starts by understanding the unit economics of your current operations. What your organization is spending per hour on any VM matters less than the actual ROI of your spend.
Start by defining your fully loaded cost per unit of work. Most teams make the mistake of only counting the VM in the numerator but VM cost is just one input. Attached storage and network costs move with your compute choice too, and as you’ll see in a follow-on blog in this series, your VM shape directly determines your storage performance ceiling.
Cost per unit of work = (Hourly VM cost + Hourly storage cost +Hourly network cost)Throughput per hour
“Throughput per hour” can be any metric meaningful to your system, for example:
- Requests your API handles per hour
- Transactions processed per hour
- Batch jobs completed per hour
This is a resource efficiency unit metric. It’s the right starting point for engineering teams evaluating a migration. Once established, it maps naturally to a more business-oriented view: cost per customer transaction, cost per tenant, or cost per revenue dollar. That second layer is where FinOps and product teams converge, and it’s where modernization stops being a technical decision and starts being a business case.
For the purposes of benchmarking N4 against its predecessors, the resource efficiency metric is your baseline. Even if N4 carries a slightly higher hourly VM cost, the unit economics will reflect the true performance and cost efficiency gains of modernization
Benchmarking the price performance delta
Grasping unit economics is just the beginning. Turning a business case into reality requires empirical evidence that these advantages apply to your particular application stack. Modernization goes beyond simple hardware upgrades; it is a calculated initiative to prove that advanced architectures, like the N4 series, can successfully reduce the cost of meaningful work while sustaining or enhancing performance.
In this article, we offer a detailed, hands-on guide for benchmarking your environment. Learn how to apply the “cost-per-unit-of-work” equation to calculate the actual performance gap between aging N1/N2 machine series and the latest-generation N4 family. This methodology enables organizations to measure the efficiency boosts delivered by innovations such as 5th Gen Intel Xeon processors and Google’s Titanium infrastructure. By defining these resource efficiency benchmarks, teams can confidently address technical debt and sync engineering performance with overarching business objectives.
1https://cloud.google.com/blog/products/compute/a-closer-look-at-compute-engine-c4-and-n4-machine-series
Federico Iezzi is a Customer Engineer at Google Cloud, where he acts as a Trusted Advisor to Digital Native companies running large-scale infrastructure. He’s been working in cloud infrastructure for over 15 years, with a focus on performance benchmarking, high-performance networking, and AI inference. He publishes his work and results on Medium and spoke at FOSDEM.