Cloud Governance done right!

This is the final article on Eugene, the Enterprise Architecture platform we built at Octo Telematics.

Cloud Governance isn’t just about monitoring costs — it’s about understanding them, optimizing them, and making informed decisions. In our latest work on Eugene, we’ve taken Cloud Governance at Octo Telematics to the next level, integrating multi-cloud cost tracking, service attribution, and real-time insights into a unified platform.

How can forecasting help us predict cloud expenses before they happen?
How do we map cloud costs back to software and business value?
How do we handle cost attribution for shared environments like Kubernetes clusters?

These are some of the challenges we addressed, and we’re now working on predictive cost modeling—so that cloud expenses don’t just get tracked, but anticipated before deployment.

Purpose of Cloud Governance

Cloud Governance is a broad topic that spans across multiple domains – from technical aspects like Performance Monitoring to formal concerns such as Risk Management, from design principles like Conventions and Standardization to security areas like IAM and Compliance, and from Budgeting Cloud Expenses to Cost Analysis for reviewing past spending.

In the previous articles, we explored how Eugene has helped us organize knowledge and track the correct SOA for our platforms and Cloud Environment. However, one critical aspect we haven’t covered yet is cost management. In this area, we needed to focus on three major aspects:

Budget Compliance in a Dynamic Cloud

Octo Telematics isn’t a startup, it’s a large company that requires thoughtful budgeting at the beginning of each year. However, the cloud paradigm sometimes clashes with traditional budgeting since cloud costs can fluctuate significantly as projects grow in complexity or scale. While this variability isn’t necessarily a problem, it can become one without proper governance to track ongoing costs, provide visibility, and enable prompt adjustments.

Through continuous monitoring, Cloud Governance ensures that spending stays aligned with financial expectations -whether by identifying cost inefficiencies, adjusting usage, or providing early warnings when budget revisions are needed. Instead of reacting to cost overruns too late, teams can proactively optimize spending while keeping financial plans on track.

Maximizing Value for Cloud Spend

The second key aspect of Cloud Governance is ensuring that cloud services provide the right value for their cost. Different cloud providers offer similar services, and even within the same provider, there are often multiple tiers or variations of the same service. Without regular evaluation, it’s easy to overpay for features or capacity that aren’t fully utilized.

Proper Cloud Governance periodically reviews the services in use to ensure that the selected tier aligns with actual business and technical requirements. This ongoing assessment helps organizations avoid unnecessary costs while maintaining the right balance of performance, scalability, and efficiency.

Bridging Cost Analysis and Architecture decisions

Cloud Governance can also highlight architectural inefficiencies in platform design. Even if cloud costs are correctly allocated and proportional to the services being used, there may be more efficient or simpler ways to achieve the same goals – often at a lower cost.

This is why Cloud Governance is a key responsibility of Enterprise Architecture teams: insights from cost analysis should serve as a feedback loop to improve early design decisions, ensuring that platforms are both technically well designed and financially optimized.

Integrating with Cloud Providers

Our journey to implementing Cloud Governance in Eugene started by integrating with the cloud providers that host Octo Telematics’ telematics platform.

Since our platform is designed to be cloud-agnostic, it integrates services from multiple providers. The core infrastructure is currently hosted on IBM Cloud, while additional services run on AWS and Azure. Managing costs and services across different providers presented a challenge, as each platform offers different pricing models and service structures.

Our first step was to integrate with Cloud Provider APIs to continuously track which services we were using, how much they cost, and how they compared across providers. The tricky part was building a correlation table – a structured way to normalize and compare services and pricing across different cloud vendors. This gave us a unified view of our cloud investments, ensuring greater transparency and control over spending.

Integration with IBM Cloud

IBM Cloud was the first cloud provider we integrated with. Its offering is essentially divided into two distinct parts, each requiring a different API model.

The first, referred to in IBM terms as “Classic Infrastructure”, is the older offering and a direct evolution of the dedicated hosting and bare-metal services IBM acquired from SoftLayer in 2013. The second is IBM’s newer Virtual Private Cloud (VPC) infrastructure, which also implements modern IaaS and PaaS offerings. These two environments operate independently, requiring separate integration approaches to manage workloads and resources efficiently.

SoftLayer Model

The integration with SoftLayer was the first one we implemented in Eugene. Although SoftLayer provides a native SDK, we chose to interact directly with its REST API for several internal reasons – I’d probably differently today.

There’s not much to say about these APIs – except that they are simple, clear, and incredibly effective. In fact, implementing all the integrations we needed took just a few days. Of course, there are quirks here and there, and some parts of the API clearly reflect its age. But how refreshing is it to use an API where a server is simply called “server,” storage is “storage,” and the network stack is based on VLANs, subnets, and IPs? No unnecessary complexity, just straightforward functionality.

SoftLayer’s API structure divides resources into Items (representing physical servers, virtual servers, storage, networks, etc.) and Billing Items (which directly map to line items on your invoice). Yes, you read that right – this API is so well-designed that billing and infrastructure are seamlessly integrated into the same system.

Pure gold.

Last but not least, the same endpoint doesn’t just provide information – it also allows direct interaction with the infrastructure, enabling users to create new items, turn machines on and off, and manage resources dynamically.

A special mention goes to the fact that SoftLayer’s APIs provide a complete history of all purchased resources over time. This allowed us to easily import and analyze cloud costs dating back to 2014!

IBM Model

Integration with IBM’s newer cloud solutions came a bit later, as we began incorporating more modern cloud services into our Telematics Platform. Our usage evolved gradually – our first integration focused on Storage-as-a-Service and Kubernetes solutions. Later, we expanded to include compute resource management and, finally, the Usage APIs, which allow us to track both costs and resource consumption.

IBM provides SDKs for interacting with its APIs, but as with our SoftLayer integration, we opted for a pure REST approach since it gave us greater flexibility in extracting the data we needed. Unlike SoftLayer, IBM Cloud’s API model is service-specific, meaning each cloud service (compute, storage, networking, billing) has its own dedicated endpoint, all tied together by a two-step token authentication process via IBM Cloud IAM.

Behind the scenes, IBM Cloud services follow a consistent internal naming convention with namespaces, allowing seamless cross-referencing between APIs. This standardization makes it easier to navigate dependencies between resources while maintaining a unified integration approach.

Integration with Amazon AWS

AWS provides an SDK to interact with its cloud services, and we decided to use it because trying to work directly with their fragmented native APIs would have been impractical. The complexity of AWS services makes it extremely challenging to build a tool that scrapes resources and tracks costs in real-time.

The first challenge we faced was that every individual service (EC2, EKS, S3, RDS, Lambda, Glue, etc.) implements similar but slightly different interfaces. While they mostly share common interaction patterns, each has its own quirks, making even a simple task – like identifying purchased resources – a hurdle.

When we finally tackled the infamous “Cost Explorer” API, which is supposed to provide spending insights, we realized it was nearly impossible to retrieve cost details for individual items. To solve this, we implemented our own tagging system: Eugene applies an internal tag to every AWS resource, allowing us to track costs by tag – and thus, by item.

However, AWS imposes several limitations on historical data. Cost Explorer only provides up to one year of billing data, and if a resource is deleted, its details disappear completely – the service-specific API will simply return a “resource not found” error. To compensate for this, Eugene scans AWS resources hourly, saving all available metadata in real time, ensuring we have a historical record beyond AWS’s retention limits.

Two unexpected quirks we encountered:

Past-month costs sometimes get recalculated. Occasionally, AWS adjusts previous billing data, and we find that the initial imported costs are a few dollars off.
Tagging for cost tracking isn’t retroactive. If you add a tag today, it only applies to future data in Cost Explorer – there’s no way to track past spending on untagged resources.

These limitations required us to build several workarounds in Eugene, ensuring better tracking, visibility, and historical cost analysis within AWS.

Integration with Microsoft Azure

Azure provides an extensive SDK, similar to AWS in terms of complexity. However, since we found that nearly all the data we needed could be retrieved from a single API – the Azure Resource Manager (ARM) API – we opted to use the native REST API for simplicity when integrating Azure reports into Eugene.

In our experience, Azure’s API is much more consistent and easier to work with than AWS’s. The biggest challenge was understanding the deeply nested structure of commercial offerings – from Accounts to Subscriptions to Resource Groups – which results in very long, REST-style unique references. At first, these paths felt tricky, but once we understood their construction, everything worked smoothly.

One major advantage of Azure Cost Management APIs is that they have no limitation on historical data retrieval. We were able to fetch usage data going back as far as 2018, though we chose not to retrieve earlier records.

A final interesting item is Azure’s unique filtering syntax, which uses OData-like expressions with spaces in logical conditions (e.g., “location eq ‘westus’ and usageType eq ‘ComputeHours’”). It looked odd at first, but in practice, it worked well for querying cost and usage data efficiently.

Tracking and forecasting Cloud Costs

Now that we’ve covered how Eugene interacts with cloud providers, let’s dive into how it monitors and manages cloud costs. The process is divided into two main steps: first, importing usage and billing data, and then reconciling these costs with platform components to provide an accurate financial view.

Let’s take a closer look at how this works.

Fetching and Consolidating Cloud Costs

Every day, Eugene polls cloud provider APIs to fetch updated data about cloud usage and billing.

Real-Time vs. Usage-Based Cost Tracking

When cloud resources are purchased under a reservation model, some providers – such as SoftLayer (IBM Classic) – generate real-time invoices, projecting costs through the end of the billing cycle. In such cases, Eugene imports these invoices as soon as they are available, providing an immediate view of ongoing costs.

Other providers – AWS, Azure, and IBM VPC – follow a usage-based billing model, reporting ongoing summaries rather than immediate invoices. For these providers, Eugene only imports definitions of newly created resources until costs are finalized. When a new month begins, the final cost usage data for the previous month is imported into Eugene’s internal database. This triggers an automated email report for management, summarizing costs across all providers in a clear, unified report.

Visualizing Cost Trends and Invoice Comparisons

Through Eugene’s web interface, users can access expense trend graphs for individual accounts (we manage multiple accounts per provider) or view aggregated spending grouped by cloud provider. The dashboard also includes a three-month average trend, which helps filter out temporary cost spikes and provides a more accurate view of real spending patterns.

A particularly powerful feature is Invoice Comparison, which – again in a unified, cross-provider approach – allows us to compare monthly invoices from the same provider. This helps detect changes in pricing for the same services, new purchases, or unexpected billing fluctuations.

CMDB Integration & Cost Correlation

All cost data is also integrated into Eugene’s CMDB, allowing direct resource-to-cost correlation. For example, a user can search for a specific server in the CMDB and access a dedicated cost tab, summarizing two years of historical expenses, complete with graphical trends and detailed breakdowns.

This is where Eugene’s smart cost association comes into play. In cloud environments, a virtual machine’s cost isn’t just the instance price – it often includes attached storage, extra IPs, or additional features. Eugene scrapes cloud provider APIs to identify relationships between resources, ensuring that total server costs include not just the base instance price but all associated expenses as well.

Mapping Cloud Costs to Software Components

The second phase involves correlating the cloud costs stored in the CMDB with the software deployments tracked by Eugene as part of the Software Lifecycle Management (SLM) process. This step is crucial because it allows us to directly link infrastructure expenses to the software components consuming those resources.

By establishing this correlation, we can identify which software applications, business requirements, and even customers are driving the highest OPEX costs within our infrastructure.

With this analysis in hand, we can make informed decisions about whether a part of the platform needs optimization -whether in terms of infrastructure, software design, or architectural changes – because its cost is too high compared to its business value. Additionally, we can provide feedback to the Product Business Unit when a service is costing significantly more than expected, ensuring better cost awareness and planning across the organization.

Consolidate or distribute?

One of the biggest challenges in cloud cost allocation comes from the long-standing IT trend of consolidating multiple applications onto shared infrastructure – which makes it difficult to associate specific costs with individual software components.

A clear example of this is Virtualization Clusters, which are often billed per cluster rather than per Virtual Machine (VM), making individual VM costs invisible. Similarly, Kubernetes clusters are billed per node, with no native breakdown of how much each Pod contributes to the total cost.

To solve this, Eugene implements a cost distribution model that aggregates all cluster-related costs and proportionally distributes them across individual workloads. The system analyzes the total cluster cost (e.g., €10K per month) and breaks it down into CPU, memory, and storage components. Each VM or POD is then assigned a proportional share of the total cost based on its actual consumption of these resources.

This is an empirical approach that we continue to refine, but it has already provided valuable insights. It helps us identify inefficient workloads, such as “resource hog” VMs or Pods consuming disproportionate amounts of CPU or memory – sometimes without delivering meaningful business value.

Building Smarter Cost Predictions

Cost forecasting is the next step in Eugene’s Cloud Governance roadmap. We are working on two key functionalities that will enable Octo Telematics to accurately predict upcoming cloud costs:

Predicting costs at provisioning and deployment time – Estimating expected cloud expenses before a service is launched, helping teams make informed decisions during planning.
Forecasting costs based on current usage patterns – Analyzing real-time consumption trends to anticipate future spending and optimize resource allocation.

Pre-Deployment Cost Forecasting

We plan to add cost prediction at provisioning time to Eugene’s internal “Deployment Requests” process – used whenever a user requests a deployment in a cloud environment.

Our goal is to provide users with a rough estimate of the expected cloud costs before they proceed with their deployment. This estimate will be:

Based on the cost of newly provisioned resources when deploying a component for the first time.
Calculated as a delta in resource usage (CPU, RAM, storage) when the new deployment modifies an existing setup by increasing resource requirements.

By implementing pre-deployment cost estimation, we aim to increase cost transparency and allow users to make more informed decisions before consuming cloud resources.

Cost Forecasting Based on Usage Trends

Another forecasting approach we plan to implement is a static model to project current cost trends into the future, assuming no changes to the deployed platform.

This type of analysis is particularly useful for usage-based resources (such as storage) where consumption follows a relatively stable growth pattern over time. By simulating future costs under current conditions, we can identify long-term cost trends and anticipate budget needs in advance.

This analysis complements the pre-deployment cost estimation model, so we need to determine how to merge both predictions into a clear and actionable metric for management.

Final words

Cloud Governance is not just about tracking costs – it’s about understanding them, optimizing them, and making informed decisions that align with business objectives. Throughout this journey, we’ve seen how Eugene has evolved to bring transparency, accountability, and actionable insights into Octo Telematics’ cloud spending.

From integrating with cloud provider APIs to mapping costs back to software components, Eugene has given us a clearer picture of how cloud resources are consumed and who is responsible for what. By addressing challenges such as cost attribution in shared environments and multi-cloud cost reconciliation, we’ve built a system that empowers us to identify inefficiencies, prevent unnecessary expenses, and optimize the platform.

Looking ahead, our next challenge is forecasting – predicting how much a new deployment will cost before it even goes live and anticipating future costs based on current usage trends. These capabilities will further enhance Eugene’s role in Cloud Governance, giving us the tools to proactively manage budgets, prevent overspending, and ensure cloud investments are aligned with business needs.

As we saw with SLM, Cloud Governance isn’t just a process – it’s a mindset shift. Just like SLM transformed the way we manage software lifecycles, Cloud Governance is reshaping how we handle cloud costs – moving from reactive tracking to proactive optimization. With Eugene, we’ve taken major steps toward transforming how we manage cloud investments – and we’re just getting started.