Traditionally a process-driven discipline, IT service management (ITSM) now has more opportunities to use automation, thanks to the increased popularity of cloud services. Let’s call it “ITSM as code.”
In ITIL Practitioner Guidance from AXELOS, there’s rightly a large focus on the soft skills required to manage modern IT services. That’s because we often focus too much on ITSM processes and technology, and then we underestimate how much the people are the real ITSM glue.
While the Information Technology Infrastructure Library (ITIL) provides best practices to ITSM professionals on how to improve service via policies, processes, metrics and controls, cloud service providers (CSPs), such as Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP), now offer capabilities that in effect enable this ITIL guidance to be programmatically applied to cloud services. Let’s call it “service management as code.”
In the AWS “Well-Architected Framework” white paper, one of the five recommended pillars of the framework is “operational excellence.” This pillar aligns nicely with ITIL Practitioner Guidance, and all the design principles and best practices contained within it are actionable in code.
While the above is an AWS paper, its practices are applicable to other CSPs, and in this article I outline five ways in which CSPs enable service management as code.
1. Perform Operations with Code
When code is used to translate policies and controls into application programming interface (API) commands, this code can be version controlled or access controlled. It’s also 100 percent clear what the interpretation of a policy should be. Code can also be replayed for investigation or test purposes.
As such, configuration management and responses to operational events are excellent candidates for codifying procedures.
Example: One of the most eye-opening examples of this is the fault-injection approach of Netflix called Chaos Monkey. This is code that programmatically creates a series of failures in production to ensure resilience works. The same programmatic approach can be used to monitor and correct configurations (AWS Config), and now AWS Lambda offers “function-as-a-service” where code runs in response to operation (and other) events.
2. Replace Human Checks with an “Automated Trusted Advisor”
Use programmable AWS Trusted Advisor services to continually employ the most cost-effective resources.
Think of those old-school, manual, consultant-led health checks that used to assess your IT environment against “best practices.” This is now automated in AWS with the AWS Trusted Advisor service, which will check your services against best practices for cost optimization, performance, security and fault tolerance.
Example: Human error is a common cause of system outage. AWS Trusted Advisor checks identity and access management configurations to ensure principles such as least privilege are in place, reducing the risk and consequences of human error.
3. API-First Practice
With CSPs, the metrics and controls for all services are available via a programmatic API. Thus, instead of a human using the familiar graphical user interface (GUI) console via a web browser, clicking and typing to control services, the human can now use a command line interface (CLI) to program the web services. This can be contained in a script or program and used repeatedly.
So if you select any technology today, always ensure that it has a good set of API capabilities, such as the ability to be integrated with other cloud services and to be consumed itself.
Example: The creation of a virtual machine in AWS EC2 can be done via multi-page screens in the browser or via a one-line script on the command line.
4. Business Focus to Reduce the Signal-to-Noise Ratio
Align the programmable operations to business objectives – for example, reducing the signal-to-noise in metrics.
There are many monitoring services in AWS covering API calls, logging, security access and more. These should be programmatically and incrementally turned on only if they align to clear business goals. The rule of thumb is: If you don’t know how to action an alert, there shouldn’t be an alert. How do you know that it’s important? It must be aligned to a business outcome.
Example: An important metric with online services is the response or wait time for clients –Impatient customers will give up on a website if it’s too slow. Using monitoring and response services such as AWS Cloudwatch, you can monitor across the entire application and identify bottlenecks and slowdowns that affect website response times.
5. Automated Configuration and Release
Where once upon a time, systems administrators ran scripts to configure servers and application stacks consistently, AWS has taken this one step further with configuration-as-a-service in AWS CloudFormation. It can control many AWS cloud resources, providing you with version control for your AWS cloud services just as you can do to software.
In ITSM terms, services like AWS CloudFormation allow you to programmatically define a business service as a collection of integrated cloud services that can be repeatedly and reliably reproduced in testing or investigation scenarios. This type of service can also be driven by all the familiar enterprise configuration and release tools, such as Powershell DSC, Chef Server, Puppet, Ansible Tower, Red Hat OpenShift, Docker Datacentre and Spinnaker.
The choice is yours. You could manage cloud services like on-premise services – via humans interpreting documented procedures and clicking and typing into many GUIs. But you shouldn’t, unless you are happy with insufficient speed and the risk of adverse service impact.
Cloud services are programmatic and, as such, you can – and should – use code, scripts, and cloud services to codify your ITSM practices.
As the company’s first employee, Sarah Lahav has remained the vital link between SysAid Technologies and its customers since 2003. She is the current CEO and former Vice President of Customer Relations at SysAid – two positions that have fueled her passion in customer service.