Understanding Instance Recommendations and Outliers

FAQ: Why has x instance been recommended? Wouldn’t y instance be better suited? Often, a cloud instance can appear to be over-provisioned at first glance, and the reports may not explain this in an obvious way without added context or a practiced eye. Understanding how and why Cloudamize makes its recommendations is valuable when reading results.

Note that the below applies largely to recommendations for usage-based designs. Hardware-based designs follow a different, simpler logic that is more geared towards a lift-and-shift approach.

Key Concepts

At its core, Cloudamize will follow a simple flow of logic to produce a cloud instance recommendation.

First and foremost, a recommendation must be fit for purpose:

  • All performance requirements must be met (CPU performance, memory utilization, storage size/throughput/IOPS)

  • Any special requirements must be met based on Designer parameters (e.g., special licensing, tenancy, price plan, region, etc.)

If our database of the IaaS providers' offerings cannot produce a recommendation that is fit for purpose, the recommendation will default back to a hardware-based recommendation. This will show in the results as the entry in the Constraints column having the appendage “(Hardware Mapping)”, e.g. “CPU (Hardware Mapping)” or “Memory (Hardware Mapping)”.

The recommendations engine will consider all instance/storage/licensing combinations that are fit for purpose and will then make a simple secondary choice: Which fit-for-purpose combination is cheapest overall?

When calculating the cost of an instance, all relevant costs are combined; compute cost (base instance multiplied by on-time percentage), storage cost, network costs, license cost, etc.). This total cost is used when comparing instance costs, so occasionally, a more expensive base instance might be cheaper overall due to, for example, a favorable storage configuration. The lowest total cost of all fit-for-purpose instance/storage/licensing combinations is considered the cheapest overall.

Fit for purpose and cheapest overall combine to produce the key logic behind a Cloudamize recommendation:

Cloudamize will recommend the cheapest overall instance/storage/licensing combination that is fit for purpose given the Design requirements and the IaaS providers' offerings available to it.

Outliers: How to Find the Relevant Current Performance Indicator

You may query a particular recommendation when its obvious markers appear to be over-provisioned to some degree (e.g. an instance with a far more powerful CPU than is necessary may be recommended, to the degree that its predicted CPU usage may be as low as 3% against a Service Level Target of 80% in some cases).

Over-provisioned instance CPU is by far the most common source of this type of query and usually the most complex to comprehend.

Understanding the reason such an outlier instance has been chosen involves looking at some key data points, beginning with the Constraints column in the recommendation output.

The Constraints column lists the key performance requirement that carries the largest weight in deciding whether the instance recommendation is fit for purpose (e.g. it may list “CPU”, in which case the CPU performance is the primary deciding factor in deciding that this instance meets the design Service Level Targets and usage requirements).

Usually, when an instance appears over-provisioned, the value in the Constraints column is a key indicator of why the over-provisioning has occurred. Crucially, the instance will usually be over-provisioned in a manner not related to the Constraints column entry (e.g. an instance recommendation where the CPU is over-provisioned will not list “CPU” in the Constraints column, as another factor has decided that over-provisioning had to take place to meet fit for purpose requirements).

Reading the constraints column entry is the starting point of figuring out an outlier recommendation.

A common value for an over-provisioned instance CPU is “Disk and Network IO”. This leads to looking at the Storage and Network tabs in the recommendation for further clues, for example.

Alternatively, the value may read “Memory” in which case the Compute tab’s memory data should be consulted.

Once you know the constraint on the recommendation, you can examine the relevant tab and data points. What you are looking for is a current performance indicator that is large and/or very close to the predicted performance indicator (e.g. a particular disk has a very high throughput value in its current performance, which is common on highly active databases).

When you have identified the current performance indicator that is driving the recommendation, you can then begin some research with the relevant IaaS provider.

IaaS Provider Calculators: Pricing and Availability Considerations

Once we know the relevant current performance indicator, we need to consult both Cloudamize Designer and IaaS Provider information to find the reasoning behind the outlier recommendation.

It is important to understand Designer restrictions: we must check which design contains the outliers (or is being queried) to check any applicable Designer values that affect this particular performance indicator (e.g. if we have identified Network Throughput as the relevant data point, we can check any Design parameters that scale the SLT for this particular value in the design, such as the Network Scaling slider).

The design may also restrict instance types directly, set a particular SLT for CPU, have a different method of hosting (shared tenancy, etc.) or have licensing requirements (such as BYOL), or be tied to a particular Pricing Plan. All these factors can affect availability depending on the IaaS provider’s offerings.

To see what effect the relevant current performance indicator has, we then consult with the providers themselves, using their offerings calculators:

Amazon Web Services: AWS Pricing Calculator

Microsoft Azure: Pricing Calculator | Microsoft Azure

Google Cloud: Google Cloud Pricing Calculator

Additionally, the providers have documentation that can be consulted to see the capabilities of instance ranges, etc., which can also provide clues.

Within the calculators and/or documentation for the providers, we take the approach of “how do I fit any instance with this requirement (the relevant current performance indicator) onto the cloud platform?”

An example: an AWS recommendation features an over-provisioned CPU on a large and expensive instance, and our investigation has shown us that the key current performance indicator is Disk Throughput.

We would start in this case by checking what the Disk Throughput requirement is in the recommendation (based on its current/observed performance) - both throughput rate and IOPS count should be considered.

This would likely involve one disk being set up as a RAID0 array with multiple volumes to meet requirements, due to the way AWS storage is set up (e.g. GP3 volumes have fixed performance values regarding IOPS and throughput). It is important that we also check the Design restrictions and the recommendation here to see which types of storage can be recommended, and which type is being recommended.

The throughput requirement will likely drive a particular type of disk, or a particular number of volumes in one or more RAID0 arrays. Additionally, all drives for a particular node should be accounted for, so nodes with multiple drives need to take all their recommended drives into account when fitting these on a particular instance.

With this information in mind, we can check the recommended instance family’s performance metrics for some key details: does a smaller version of the same instance not support the required number of drives, e.g.?

We can also check similar, lower performance instance ranges: does a cheaper instance family not support the required storage type, for example? Maybe GP3 drives are required, but the cheaper instance family only supports GP2.

Another thing to consider is Regional Availability - perhaps there is a suitable instance available on the platform that is cheaper and smaller but is unavailable in the region the Design is set to.

In this example, we might discover that the recommendation requires a total of 12 GP3 volumes to cover all its drives and their throughput needs. When we examine the AWS offerings, we see that the instance recommended supports up to 16 GP3 volumes, but the instance immediately below it in the same family only supports up to 8 GP3 volumes. Thus, the lower performance instance is not fit for purpose, while the higher performance instance is.

Full details on IaaS offerings should be queried directly with the provider themselves, directly or via their calculator links listed above. They are experts in answering queries related to their products.

It is occasionally possible that a particular recommendation is constrained by more than one factor. Currently, the constraints column only lists the single largest factor in deciding the recommendation, so if there still appears to be an inconsistency after researching based on the first factor, you may need to investigate a secondary factor. Use best judgement in this case - examine all key performance indicators to see if any others look like they could be driving an outlier recommendation (e.g. when Throughput is the primary constraint, perhaps an unusually large Memory Usage is a secondary one) and use these secondary data points to further refine your research.

Further Information

If you are unsure at any point about a particular recommendation and have checked the above, reach out to your Technical Account Manager or the Cloudamize Technical Helpdesk team for advice and assistance.