Cloud Cost Optimization – Best Practices to Reduce Cloud Costs

Tony

SURPRISE! At least for the people that have moved their workload to the Cloud recently and opened their monthly Cloud bill. Although I have referenced a 2017 article from Business Insider showing typical Cloud spend, our practical experience also tells us similar results. Many organizations assume Cloud costs will be lowered once they migrate, but in all reality, Cloud gives a sense of liberation where people feel free to create as they see fit, but never clean up! Surprised? Keep on reading!

Cloud cost optimization is a hot topic as enterprises are starting to get more in-depth experience and skills on board, and have mandates to reduce infrastructure and application spend, especially with decreasing revenues thanks to COVID-19.

There are several avenues to maintain or reduce costs on any of the 3 major public Clouds, and AWS, Azure, and GCP all compete in the same realm. In addition to these commons sense factors, there are tools available to help in cost reduction, but of course at the expense of taking a percentage of your savings. Gartner has a comparison tool for such vendors, but this article is free, except your time investment to read.

Our experience has been 70% +/- of Cloud spent is on Compute and Storage resources. When we engage a client for cost optimization, the immediate areas that get the best ROI are:

Compute
Storage
Contract Commitment
Utilization
Bandwidth

There are others such as utilizing PaaS and SaaS strategies, but we won’t In the next segments, I will briefly explain some strategies on how to address each area.

1. Appropriate compute family resources – Often clients have servers that are costing them twice or more than they should be. Let’s use some Azure instances as example:

a. D2 v3 instance with Windows license

Costs about $0.21/hour or $153/month
2 vCPU
8 GB RAM

b. B2MS instance with Windows license

Costs about $0.11/hour or $80/month
2 vCPU
8 GB RAM

As you see, horsepower is similar, yet B2MS yields savings of approximately 48%. Although some features are different like extra-large temp drives that increase the costs, so obviously due diligence will be required to choose the right family of instances for the compute requirements.

2. Downsizing under-utilized instances – We have seen over-provisioned servers, either in CPU or RAM or both at every engagement. When the resources are on-premises, utilization metrics are typically not measured. In the Cloud, under-utilization costs you actual cash!

As an example, if an application only needs 4 servers of 2-vCPU and 4GB-RAM, but we have provisioned 8-vCPU and 16GB-RAM instances in the Cloud, our performance monitoring might look like this:

There might be spikes at times, but overall, the above performance monitoring shows under-utilization. Below table depicts that in hard-earned cash terms:

3. Turn off unused instances – Who hasn’t left an outside or garage light on for what seems like days? For those of us who have but don’t want to admit, this analogy is perfect! We won’t know the impact until we see the bill, but in the Cloud, we’re not talking $10s of dollars, but sometimes $1000s of dollars depending on the Virtual Machine (VM) horsepower, or number of VMs that are unused.

Simple process is to setup policies and procedure around utilization, logins, bandwidth, and overall policy for unused instances. Whether it’s Dev/Test/QA or Production systems, a variety of strategies including Auto Scaling can be implemented (see #9 below). No figures and table on this one, just common sense.

4. Use reserved instances – “I’m just afraid of commitment” is one of the most frequently-heard statements. Of course, we’re not talking about relationships, but rather commitment to the Cloud instances. Cloud Service Providers (CSP) like predictability of income, so they charge less when companies commit to a 1 or 3-year deal. It’s great news for folks who know they will be in the Cloud for the long run because some savings could be around 50% depending on the commitment terms, operating system, instance family, among other factors.

A very recent and real example is a client whose Cloud bill would be . . . well, you see in the table below. Pay-As-You-Go (PAYG) is also the same as hourly:

5. Use spot instances – Are you thrifty when it comes to using the Cloud? Hopefully not for security, but when you have workloads that can sustain interruption, great bargains can be had based on Spot purchases. These are unused capacity that can be purchased at super deep discounts (up to 90% on Azure compared to PAYG), and similar discounts for AWS and GCP. There is also a maximum cap for the price, so you know the bargain is real.

They are perfect for low-priority workloads that can withstand interruption, but require compute, such as testing and application QA, or large-scale stateless applications.

6. Data transfer costs – Although this may seem like peanuts compared to the other savings discussed above, for high-traffic workloads, especially transfers in-between regions and availability zones, it could add up to a significant number. For example, if the web servers are in one zone, and databases are in another, while storage is in another, some or all this data transfer could very well be billable for every transaction.

Storage analysis requires an in-depth look as it could be complicated and depends on the architecture, performance and redundancy requirements, application availability, and slew of other points. There are also caching services that could help move the data for a flat fee, but this also will require a cost-benefit analysis.

7. Storage – I’m going to combine multiple strategies into this section. Keeping all those log files, backups, PDFs, and such non-structured data may be necessary, but we need to understand what is required to keep, based on legal or compliance needs. If you are married to the idea of keeping everything, data can be compressed, and moved to cold storage where costs are less than a couple of pennies per Gigabyte. Data could also be rotated out and into the recycle bin. Good example of this is web server, application, or system logs where after a certain period (as defined by your company policies), that data can be whacked to recover space.

8. Scaling workloads – Every application should be architected with scale in mind. It means that either applications are running on smaller instances and they scale up as the load increases (larger instances) or scale out which means more instances can be brought up to serve the requests.

Apps could also be written to where they don’t require infrastructure, but rather run as functions like Azure Serverless or AWS Lambda. Containers could also be utilized to scale core services.

9. Auto-scaling – Of course this will change with every application profile, meaning that some applications are used constantly 24×7 vs. others that may have a more predictable peak usage time. We could architect the resources where they can be shut down or reduced based on traffic. A real-life example of this is a client who has customers in the US, UK, and AUS regions, and to serve them best, we implemented redundant instances in all regions. Cloud scripts were written to shut down all but one at night, and bring the rest back online in the morning, with reduced capacity on the weekends.

Overall, the client’s bill for that Cloud datacenter had been reduced by more than 50% the following months.

Can anyone do this, you ask? Maybe, but Cloud expertise and a holistic view of the entire picture will be key to manage cost optimization successfully. We ensure all these ups and downs of resources, applications, and infrastructure do not impact the end-consumer, who is ultimately paying the bills. Furthermore, resources need to be monitored closely, with proper alert and notifications with knowledgeable staff to handle them.

There are also other strategies such as API management, Platform-as-a-Service database instances, among many others that can be established as part of not only reducing costs overall (including staffing), but also upgrade and maintenance costs that go into the over Cloud ROI calculations.

WinWire specializes in cloud cost optimization (CCO) with several expert-level resources that have been there and done that and got the t-shirt! We assess the entire ecosystem before making recommendations and work as a partner with just as much skin in the game as you. Contact us to learn how we can help you optimize your cloud spending and help you focus on what matters most to your business.

Cloud Cost Optimization – Best Practices to Reduce Cloud Costs

Tony

Related blogs

What Good Data Governance Looks Like with Unity Catalog

Rajarajeswari

10 Steps to Build a Virtual Machine Images for self-hosted GitHub runner in Azure

Anjali

Smartsheet to Microsoft Power Platform Migration – Unlocking Scalability and Innovation

Akash

Digital Experience

Digital Engineering

Digital Assurance

Life Sciences

Generative AI

Data & AI

Cloud