In my previous role, I was a platform engineer.
When people talk about platform engineering ROI, the conversation can get fuzzy fast. Everyone agrees the platform should create value, improve consistency, and help teams move faster. But once you ask how to measure that value in a way that actually changes decisions, things tend to get vague.
That was never enough for me.
When I was working as a Platform Engineer, I didn’t want to talk about platform value in abstract terms. I wanted metrics that helped me evaluate whether we were making smarter engineering decisions, whether we were modernizing in the right direction, and whether the cloud spend tied to our platform was actually yielding our organization a significant return.
I spent a lot of time in modernization work, including helping teams think through paths from legacy systems into cloud environments. In one case, that meant evaluating options for moving applications out of a mainframe-oriented world and into AWS. The question was never just, “Can we migrate this?” It was, “What’s the most efficient way to do it, and how do we prove it?”
Over time, I found myself coming back to three ROI metrics again and again. These weren’t the only numbers that mattered, but they were the three I lived by because they gave me a practical view into platform efficiency, business value, and cost discipline.
1. Cost Per Transaction
If I had to pick the most foundational platform ROI metric, it would be cost per transaction.
This mattered so much to me because it gave us a way to compare architecture decisions in real terms. During modernization efforts, we were often evaluating multiple paths for the same application. Should we preserve more of the legacy application logic and move it into a cloud environment where we had more control? Or should we refactor the application into something more cloud-native? Those are big design questions, but eventually they need to cash out into measurable outcomes.
That’s where cost per transaction helped.
We used a pretty sophisticated tagging strategy to get there. Some infrastructure was shared across all modernization paths, while other resources were specific to a given path or team. By tagging that environment carefully, we could allocate shared costs, isolate path-specific costs, and calculate the average cost per transaction across different approaches. That gave us a clearer picture of which option was actually the most efficient, not just the most technically elegant on paper.
What I liked about this KPI is that it cut through a lot of noise. Instead of debating architecture in the abstract, we could ask a sharper question: for the work this application actually does, what does each transaction cost us?
That created a much more honest way to compare approaches. It also gave us a durable metric we could use across teams, not just in one-off modernization projects. If cost per transaction is improving, there’s a good chance your platform is helping teams operate more efficiently. If it’s getting worse, that’s a signal worth digging into.
To me, cost per transaction was never just a finance metric. It was an engineering efficiency metric. It told us whether the systems we were building were becoming more effective at delivering business outcomes per unit of spend.
2. Total Incremental Value of the IDP
The second metric I cared about was what I’d call the total incremental value of the IDP.
For me, this was always about resisting a bad instinct that shows up in a lot of cloud conversations: the idea that success means having the lowest possible cloud bill. I never saw that as the goal. Some things are expensive to build in the cloud because they create real value. The point isn’t to spend as little as possible. The point is to maximize the delta between what the platform costs and the value it creates.
I’d much rather spend $30 an hour and generate $100 an hour in value than spend $5 an hour and only generate $7 an hour. Those are intentionally simple numbers, but they capture the mindset. So the question goes from, “How low can we go on our spend?” to “Are we getting the highest returns from our cloud spend?”
To measure that, I would look at the total cost of the platform and divide that cost across teams based on usage. From there, we could connect platform consumption to transaction patterns, modernization paths, and downstream outcomes such as revenue tied to a particular product. It wasn’t a perfect science, but it was enough to move the conversation from “platform as overhead” to “platform as a product” that drives measurable value.
There was also another layer to this that was harder to quantify but impossible to ignore: the opportunity cost the platform was removing for the application developers.
If a centralized platform team can provide reusable modules, proven infrastructure patterns, and verified pathways to production, then application teams don’t have to spend that same time reinventing those things themselves. They don’t have to answer as many questions in security reviews. They don’t have to navigate every infrastructure decision from scratch. They can build along a paved road spending little to no mindshare on infrastructure.
I saw this firsthand. Teams using verified golden modules and pre-approved patterns generally had a much shorter path to production. I didn’t always have access to a clean top-down productivity metric that captured that effect, but the impact was obvious in practice. Fewer bottlenecks. Less repeated work. Less friction with compliance and security. More time spent on product and customer value.
That, to me, is part of the incremental value of an IDP. Not just the infrastructure it provisions, but the engineering time it gives back.
3. MoM Waste Reduction
The third metric, and in some ways the one I tracked most closely, was month-over-month waste reduction.
This was the clearest lever for action because it translated directly into behavior. Whenever we ran optimization efforts, I wanted to see what happened to spend over the following months. Did we actually remove waste? Did those changes stick? Were teams getting better at making cost-aware decisions?
Waste reduction covered a wide range of fixes. Sometimes it meant identifying truly orphaned resources, like unattached EBS volumes that no longer served any purpose. Sometimes it meant removing over-provisioned infrastructure, like a Redis cluster that had been oversized from the start and later gone dormant. Other times it meant adjusting EFS holding patterns, reducing fast-access windows, or resizing EBS volumes in non-production environments so they better matched actual testing needs.
These weren’t theoretical optimizations. In some cases, we saw team spend drop by hundreds of thousands of dollars over a three-month window. Sometimes more, sometimes less. But the pattern was real: once you start systematically attacking waste, meaningful savings show up.
This metric mattered to me because it sharpened the other two. Cost per transaction gets more truthful when waste is low. The total incremental value of the IDP gets stronger when every dollar you spend is doing useful work.
I also liked MoM waste reduction because it reinforced a mindset I tried hard to spread across teams: every decision in the cloud is a buying decision.
That was a powerful shift. A bad cloud decision isn’t just technical debt you clean up later. It’s also margin erosion. It’s money lost to inattention. It’s waste that chips away at the value your platform is supposed to create.
One of my most memorable discoveries of waste started with a security review, not FinOps reporting. During a review, an excessive number of unattached EBS volumes got flagged. That triggered a deeper dive, and the deeper dive exposed a much larger pocket of waste in the environment. After that, we put stronger guardrails in place, including budget alerts and tighter review patterns. But the broader lesson stuck with me: if you build the right operational rigor, waste has a way of revealing itself.
That’s why I always treated waste reduction not as “maintenance” but as a platform capability.
At the end of the day, these three performance indicators worked together for me. Cost per transaction told me whether our engineering choices were efficient. Total incremental value of the IDP told me whether the platform was worth what we were investing in it. MoM waste reduction told me whether we were disciplined enough to protect that value over time.
That combination gave me a practical way to quantify platform ROI beyond just anecdotes. And as a Platform Engineer, that’s what I wanted most: metrics that didn’t just describe the platform, but helped us build a better one.