Since the early days of the internet, we have had two broad categories of software people: application developers, those entrusted to develop new products and features to existing products; and operations, those tasked with keeping the live system running and healthy.
The reason for this division is simple. Organizations that develop and manage software need to do two things simultaneously: a) maintain a working system that customers or internal users depend on, and b) change that system in novel and unpredictable ways.
Right away, we can see a conflict. These two mandates are completely at odds because making any change to a running system introduces risk.
To demonstrate this basic paradox, I present the following hypothetical scenarios:
1. If we were to implement a permanent code freeze, we could ensure that our system would be stable — or at least, it would fail within known parameters. We could sleep soundly at night, knowing that no changes would be made. Of course, we’d still need to upgrade for security patches, but the range of potential changes would be quite limited.
2. If we were to do away with our testing, release, and deploy processes entirely, we could effortlessly push changes out to production and get features in front of users at will. Sometimes this is called “cowboy coding.” Product teams would be amazed at how quickly features make it into production, but we would inevitably have quality problems due to the breakneck pace of change in our system.
There are problems with both of these hypotheticals. In scenario 1, productivity would grind to a halt, and the company would quietly lose market share as it failed to deliver new value to customers. In scenario 2, the company would gradually lose customers or even find themselves in breach of contract due to system stability problems.
The DevOps Solution
To be glib, DevOps = development + operations. We combine these two roles into one. We are no longer allowed to do one but not the other.
DevOps asks us to step back and re-evaluate the way we have framed our mandates. Instead of two mandates, paradoxically opposed, what if we incorporated both sides of the paradox into a single mandate? If our one single mandate is to produce a stable system that can be rapidly evolved to meet changing requirements, we will quickly come to a different conclusion about how to define roles within the team.
Under DevOps, we combine the ‘developer’ mentality and the ‘operations’ mentality into one. We no longer think in terms of “how can I get this into production without being blocked by operations” or “how can I prevent this team from pushing their risky feature into production on Friday afternoon.” These attitudes both speak of misaligned objectives.
DevOps leads us to questions like, “how can I create the tools and processes to allow myself and others to deploy to production as soon as I have verified my changes?” There is a lot implied in this shift of attitude:
- Every developer should be authorized to release code to production.
- Waterfall and micro-waterfall processes are out the window. None of this works if we can only make very coarse changes to our system.
- Our SCM branching model must be setup to accommodate a CI/CD process. The `master` branch (or some equivalent) must be deployable at all times. Other branches should be merged into `master` as soon as possible, ideally within a day or two.
- We must make small, granular changes to code. No gigantic 1000+ line commits.
- We must allow every developer to have visibility into the running production system. So we need solid monitoring and logging. Today, we might look into observability solutions to ask complex questions of the running system.
- Developers must have access to production-like testing environments to verify changes before they hit production. If we have no means of testing changes in a pre-prod environment that closely resembles production, we cannot have confidence that our code will work in production.
- Automated tests are a must, not only for small-scale application changes, but also system-level components, like feature tests that verify API contracts. Even infrastructure changes should be tested.
- Every developer should be able to deploy production and support their own running production system. There is no more “throwing it over the fence” to an operations team. We may have Site Reliability Engineers (SREs) to provide front line production support, but developers should have the skills and knowledge to provide support if needed.
- On the other hand, time spent waiting for another team to “sign off” on a deploy is wasted time. We don’t want to bottleneck the team’s productivity.
In short, under the DevOps model, everyone is responsible for the full end-to-end process of software development. Engineers work on verticals, not on isolated components that must be updated one at a time by completely different groups of humans. We still have processes to verify code before it hits production, but these processes are largely automated, and each engineer has control of that process.
DevOps In Reality
In theory, we’ve all accept DevOps into our hearts and minds at this point. There are probably not a lot of developers that have not heard the arguments against waterfall development. There are entire conferences devoted to DevOps as a practice, and dozens of books written on the subject.
Yet the reality is different from the ideal. Think for a moment about the way that your team operates today. I’m willing to bet that most teams come up far short of this utopian vision. Ask yourself the following questions:
- Do you have a separate “DevOps engineer” role that manages all the tooling and deploy pipelines for your development teams? Does this role look suspiciously like a dedicated ops role?
- Do the developers that write application code have any idea how the production system actually works? How far would they get debugging an issue in production on their own? Do they have access to the logs, and monitoring, and if so, do they even know what to look for?
- If your system ran into a critical error at 2 AM on a Saturday, would there be a developer on-call to receive the alert and deploy a fix to production without assistance? Or would that be someone else’s job… someone whose actual responsibilities bare a curious resemblance to that of a sys admin?
Why The Revolution Stalled
I think there are a couple practical challenges to making the dream of DevOps a reality.
- Muscle memory. Developers have been trained to fear production. Operations engineers have learned to avoid application code lest they become unwitting owners of a stale codebase. The old patterns die hard.
- People like to specialize. Humans are not interchangeable components that can be deployed to any task at all. Many experienced developers have spent the bulk of their careers working in one area of focus, and we naturally tend to continue doing what we like doing. When we need to do something in another area of the system, our instinct is naturally to reach out to someone who is already an expert rather than learn it ourselves. And every hour that we spend doing something we’re not already good at detracts from our main skill set where we are much more effective.
- Tooling churn. In the last decade or so, we’ve gone from many teams still running software in on-prem or leased data centers, to moving to the cloud, and now we’re doing things in cloud native platforms like Kubernetes. Each of these transitions adds another layer of abstraction, tooling, skills that must be acquired to make the transition. Even technologies that now feel like stable ground may become superceded by yet another management layer in a few years.
However, I think the biggest reason that DevOps has been much harder to achieve in reality is that it asks us to do more, not less. The widened of responsibilities carries implications for how organizations hire and train technology workers. We may simplify the process of developing software by co-locating ops and development within a single brain, but this simplicity in process comes at the cost of increasing complexity of work.
In other words, a developer in a true DevOps world has a harder job to do than a developer in a typical technology organization, where responsibilities are horizontally delineated. Corporate hiring practices typically try to make jobs as simplified as possible, to the point that the role can be defined with a set of bullet points. The simpler a job is, the easier it is the staff; the easier it is to staff, the less we must pay for competitive talent.
So the economics, unfortunately, lean toward horizontal job specialization, rather than hyper-productive verticals.
Promise for the Future
The cloud native movement is the most promising trend I have seen toward reviving the stalled DevOps revolution. Cloud platforms provide APIs and tooling to allow developers to create infrastructure with little required knowledge of how servers and databases are configured. As cloud native solutions like Kubernetes become more ubiquitous, we may approach a point of convergence where typical developers can be reasonably assumed to have enough tooling to be productive “end to end.”
In the meantime, we can help bring about the ‘revolution’ by taking on the DevOps mentality within our own teams and organizations. Take to heart the single mandate of software development: to produce a stable system that can be rapidly evolved to meet changing requirements.