Posts

Showing posts from December, 2020

What You Don't Know Can Hurt your Uptime

Image
Image Source: Buckminster Fuller Institute Distributed systems are fascinating things. Like organisms which are comprised of many different & dependent parts, if we want to understand the likelihood of failure of those systems, we need to do some interesting math.  No component is perfect, nor is any full system, but based on the number of dependencies and redundancies, we can accurately predict its estimated uptime. Here is the math: Given set 𝑀 of components for a System = { 𝑛 0 , 𝑛 1 , … , 𝑛 𝑡 } System Uptime  𝑃 = ∏ 𝑖 = 0 𝑡 𝑛 𝑖 That is that for dependent systems, we need to take the product of their estimated component uptimes.  For instance:  P = 99.9% x 99.9% = 99.8001% This is because the total system will fail if ANY of the parts fail.  The conventional wisdom: "a chain is only as strong as its weakest link" is actually an overestimation! And this isn't even all the bad news.  Lurking underneath our assumptions is some potentially nasty...

Site Reliability Engineering Maturity Model

After a lot of study of the Google SRE books , I came up with this SRE framework.  The idea is that you could judge a development team's reliability maturity based on this framework. As teams begin to adopt various characteristics of the framework, we would expect that their reliability will go up.  I hope this is useful to you as you start your journey in applying these principles!  The first step will be to do a self-assessment of the current status of your Product Team for each one of the identified capabilities. Define the desired end-point at the end of the next improvement cycle, a cycle can be a month, a quarter, a semester ... every team can define their improvement cycles although a good start would be to set quarterly targets to be able to define meaningful actions. Identify the actions you will need to achieve the desired end-point. Characteristic Crawl Walk Run Risk Observe it/Measure it (SLIs) Set a target (SL...

Application Maturity Mental Model

Image
TL;DR - Applications are like people*, and go through maturity phases (gestation, newborn, weaning, school, adulthood, maturity, retirement). We should encourage a strong "family" structure around our applications.  While it does "take a village", typically, a strong family structure has been shown as the best indicator of maturity in people. An architect I worked with once talked about how applications are like people. They go through phases of life and maturity, from conception to heartbeat, to a screaming & messy entry into the world. Then on to feeding, walking & weaning, potty training, going off to school, adolescence, moving out of the home, college, 1st job, etc.  This metaphor is very powerful, and with that architect's license, would like to extend it further into how we support our applications. If we set some smilies in this model, it will be helpful. Stages Anthropomorphic Equivalent Application Equivalent Notes ...

Revolutionary Ideas Evolved over Time

I’ve been listening to an audiobook about the American Revolution that happened in the late 18th century in North America, and it got me thinking about what we have come to call DevOps. If you are in the midst of fomenting a DevOps Revolution at your company, your patience is wavering, or you are struggling with the half-measures and compromises (“That’s not revolutionary, that is evolutionary”), indulge me for a moment while I address an older idea in a new light: Revolutionary ideas evolved over time. Let me explain. When Americans look back at the American Revolution, generally, we think of a few very principled, very intelligent men deciding to go against the stodgy monarchy who was insufficient to meet the needs of a growing empire and a modernized world. (At least that is what I think of). (Sound familiar? Keep reading.) We then relish in the glorious revolution of freedom and fireworks and Francis Scott Key’s anthem “that the flag was still there”. An impossible victor...