Sustainable IT through observability
IT is essential for businesses today. It is intertwined with every single business process of any organization. However, it contributes to a large number of carbon emissions and the impact becomes larger as new technologies such as generative AI and IoT devices develop and get implemented in business practices.
The GHG protocol defines 3 types of emissions. These are explained in the diagram below:
Nowadays many countries have put regulations in place with regards to ESG reporting. Non-manufacturing businesses such as banks or e-commerce platforms are major contributors to greenhouse gas emissions, especially emissions classified under scope 2 and 3. This side of the sector still needs improvement and better reporting.
Let us now see how observability helps in practicing sustainable IT and improving the carbon footprint reduction in its operations.
What is observability?
At its heart, sustainability is about people working together to ensure long-lasting harmony on our planet. Different understandings of this concept have sparked discussions due to varying origins, situations, and historical times. Matthew Bradley, sustainability director at IT services provider Capgemini, describes sustainable IT as “an umbrella term that describes an environment-focused approach to the design, use, and disposal of computer hardware and software applications and the design of accompanying business processes.”
Observability shows how well you can figure out what's happening inside a system by looking at what happens outside. In control theory, observability and controllability of a linear system are like two sides of the same coin.
In recent years, climate change has led to severe floods on one side of the world and severe droughts on the other. Many of these negative effects are caused by the way we treat our planet, by producing CO2 and other toxic gasses, and by the irresponsible use of water and minerals. We need to improve our practices to reduce these negative effects.
IT is one of the areas where an improvement in practices must be done. We must eliminate its negative effects by having a holistic vision, and this is impossible to achieve without complete visibility of the entire IT landscape.
Today, IT is business. Every single business process is possible thanks to IT and these processes contribute to nearly 4% of the world’s carbon footprint, with some measurements expecting this number to rise to 13% by 2030. Some of the technologies contributing to this growth in the carbon footprint of the IT sector are generative AI and the increase in the use of IoT devices.
To be able to improve and reduce carbon footprint in IT, we need to have clear visibility of the entire value stream and the entire CI/CD pipeline. Today the CI/CD pipeline extends to production and hence requires complete visibility.
We need observability in the entire CI/CD pipeline extending to the production. We need to understand the impact of code in the production at an early stage from the perspective of the developer. We need to understand the impact of a release that is going to happen in production as operations practitioners and be ready for it in production.
Observability helps us to understand the inner state of the system through MELT – Metrics, Events, Logs and Traces. This gives us the visibility of how the system is working. From the sustainability point of view, we will start looking at how much resources like energy or water are getting used up for each piece of code or for each request being serviced. As we get a clear picture of how each resource is working and being used within the system, we can have a clear picture of how much carbon footprint we are creating for every customer transaction.
On the other hand, observability helps us to see the carbon footprint and water usage during every step of the process, from development and testing to the release and deployment phases for every new feature. Every incident, planned and unplanned, can be analyzed, providing us with a holistic view of the overall lifecycle assessment.
One important set of metrics we try to follow in SRE is Google’s golden signals, we will see the great aid of these metrics as we progress. The 4 signals are:
How observability contributes to sustainable IT practices
Proactive issue detection
Observability tools provide real-time insights into system behavior, allowing teams to identify issues before they escalate, minimizing downtime and resource wastage. According to Gartner, 70% of the total cost of projects are rework costs. This means a 70% hgiher carbon footprint. Proactively identifying and addressing issues will help reduce such rework.
Faster incident resolution
With detailed observability, we can quickly pinpoint the root cause of incidents, leading to faster resolution and reduced impact on users. This helps in the reduction of the carbon footprint as we take less time to detect an incident and rectify it.
By monitoring key metrics like latency and error rates (golden signals), we can identify resource bottlenecks and optimize infrastructure, ensuring efficient resource utilization.
Resource optimization helps in reducing resource usage by optimizing and thus helps in further reduction in carbon footprint.
Observability helps in understanding the system's resource usage patterns, enabling informed capacity planning, and preventing both over-provisioning and under-provisioning.
Having over-provisioning, we create a larger carbon footprint. Observability helps in reducing carbon footprint due to over-provisioning.
Improved user experience
Monitoring user-facing metrics such as latency and traffic can help ensure a positive user experience, fostering user satisfaction and loyalty.
Less latency means more carbon footprint creation. Optimizing latency and other UX features like your webpage loading time and webpage contents leads to a smaller carbon footprint.
Observability provides data for continuous feedback and improvement cycles, facilitating iterative enhancements to the system's performance and reliability.
Proactively reducing outages and improvement leads to lesser carbon footprint creation.
Historical data collected through observability tools can be used to predict potential performance issues and plan for mitigation strategies, ensuring sustained system reliability.
As IBM has mentioned, the cost of rectifying a problem during production is 100 times more than the cost of rectifying it in the early stages. This is because more work needs to be done. Predictive analysis and chaos engineering will help in early detection, less work, and a decreased carbon footprint.
Observability-driven insights can trigger automatic scaling based on workload fluctuations, enhancing agility and adapting to changing demands. This leads to a decreased need for the provisioning of resources and a decreased carbon footprint. We can scale up and down as required.
Golden signals guide the definition of meaningful service level objectives (SLO) by focusing on metrics that directly impact user experience and business outcomes.
Practical customer-driven SLOs help in having our services attain a level of service that is suitable to the customer. We do not need to set a very high level and thus optimize the resource utilization and thus reduce carbon footprint.
By closely monitoring error rates and saturation levels, we can mitigate risks associated with system failures and data breaches, safeguarding the organization's reputation.
This again helps in the optimization of resources. We can also look at sustainability-related metrics and risk management related to sustainability risks and thus help in practicing sustainability.
Observability aids in identifying energy-intensive processes and optimizing them, contributing to more environmentally sustainable IT operations.
This leads to lesser energy consumption and thus lesser carbon footprint creation.
Cascading failure prevention
Observability enables the identification of dependencies and potential points of failure, allowing us to implement safeguards against cascading failures.
This helps in understanding the overall impact and enables organizations to optimize holistically, resulting in more efficient systems and more sustainable systems.
Collaborative problem solving
Observability tools provide a common data source for cross-functional teams, enabling collaborative problem-solving and alignment between development and operations.
This helps in reducing duplication of work, and duplication of data and avoids unnecessary rework and reduction of technical debts.
Documentation and knowledge sharing
Observability data can be used to create detailed documentation about system behaviour and troubleshooting steps, promoting knowledge sharing within the team.
The use of this data in AIOps further improves our ability to identify and rectify problems faster, thus a lot more efficiency in the system and managing the system leading to more sustainable IT practice.
Observability helps ensure compliance with regulatory requirements related to system performance, security, and user data protection.
This will help in also reporting scope 2 and scope 3 emissions as part of the ESG reporting which is becoming mandatory in many countries.
Observability strengthens business continuity by reducing downtime, maintaining a positive user experience, and supporting critical operations even during challenging circumstances. Recently, an outage occurred in Microsoft's data center as the climate became so hot that the cooling system could not handle it and the center crashed. Observability is important not only in the IT systems but also in considering external factors to ensure business continuity.
There are many limitations on how observability is practiced. Some of the existing challenges include lack of end-to-end coverage, adoption of SLOs, or large uses of SLAs without focusing on outcomes. Systems are spread across and thus the data is dissipated instead of being consolidated. Observability is seen as an item on the Ops side, though it is also essential on the Dev side.
Incorporating observability not only enhances the reliability of IT systems but also aligns with sustainable IT practices by optimizing resource utilization and preventing unnecessary downtime or disruptions.
Since IT is now mandatory for organizations throughout various domains and in many countries, it has become necessary for the industry to gear up and start practicing sustainability.
illuminem Voices is a democratic space presenting the thoughts and opinions of leading Sustainability & Energy writers, their opinions do not necessarily represent those of illuminem.