Azure Monitoring Costs and Complexity
So like most companies we are spinning up new workload in Azure cloud and I have been tasked to look at Azure monitoring. As I have looked into the capabilities that Azure monitoring brings, I have also been struck by the relative complexity of the cost model.
Of course, we had an option to extend legacy on-prem monitoring into the cloud, but it made more sense to leverage cloud native tools. (Both from an architectural perspective – reduces complexity using natively integrated platform components and from a lock-in perspective – makes it easier to change Service Integrator when not using their tools … and yes I am aware of cloud lock in and the hybrid/aggregation models that could be a post for another day …).
As a newcomer to Azure, albeit with some cloud experience on AWS and Rackspace, it made sense to me to start at the infrastructure layer monitoring and work up.
Of course, the hardware is not really under consideration here as it is an IaaS/PaaS service but we still want to look at metrics from the guest VM Operating System.
Standard metrics are included with Azure monitoring using Monitor. But hang on – read the warning and standard metrics do not include the guest VM – only the Hyper-V session metrics. To get access to CPU utilisation and such like from the VM, recommended practice is to configure Azure Diagnostics to send guest OS counters to the Azure Monitor metric database. Here they are counted as Custom metrics. Each custom metric is considered as 8 bytes in size and there is a total amount (depending on hosting location & currency – for me it was 150MB) over which additional amounts become chargeable.
Metrics can also be collected via an API call. A large number, 1 million at time of writing this blog entry, were included with Azure Monitoring.
Of course, as well as counters I also want to query logs to look for certain application and platform warning and error messages. A small amount of log ingestion is included, the rest … you got it becomes chargeable.
Now I have my metrics and logs. I can visualise them to some degree in Azure Monitor but to create custom dashboards and drill through data reports. I wanted to use PowerBI. Fortunately an E3 licence gave me PowerBi Free version. Had we bought an E5 licence then PowerBi Premium could have been used (greater data storage and scheduled data refresh enhancements).
The other thing that I wanted to do as well as analyse the metrics and logs, was to to generate alerts – event alerting. Alert rules are created against what azure call ‘metric time series’. A metric time series is for example CPU utilisation or memory used. Each additional metric time series over the 10 included are charged. There is also an additional cost for setting a dynamic threshold.
In old on-prem monitoring we generally used static thresholds – that is, for example alert a warning when CPU usage exceeds 80% and again as a critical alert when it exceeds 95%. Dynamic Thresholds for Azure monitoring use machine learning to look for standard deviation against past performance of an asset. Each dynamic threshold attracts a small cost.
So now I have my metrics, logs, and alerts generating. But I need to pipe those alerts out, for event correlation or incident ticketing in an ITSM tool. For that I could have configured email alerts or SMS, but wanted to integrate to my SaaS based ITSM Service Desk tool. Azure monitoring has a connector for my ITSM tool of choice – ServiceNow – but includes only 1,000 alerts per month. More than that were charged in further blocks of 1,000. Not forgetting of course that some events fire at least twice, once when they occur and again when the event clears.
Finally, I also wanted to do some basic application layer monitoring – URL ping and multi-step web test e.g. check the site is up and sending back HTTP 200 and then try to start a customer journey and check that pages refreshed.
Azure monitoring has a component called Application Insights for application testing. URL pings are free but multi-step tests are billed per test per month. Web test can check for slow performance (timeout), availability (up/down) or content matching.
So you can see that whilst Azure Monitor has capability, the charging model is complex. Fortunately Microsoft publish the rates on their website but for large-scale cloud deployments it takes some effort to calculate costs.
After the event- the Azure Monitor platform lets you look back at costs incurred in the previous 30-day period so that you can fine tune a cost-effective monitoring solution.