Someone once remarked about iMatix that “no matter how big the problem, they can solve it.” Sometimes, we joke that if a project is not terrifying, it is not worthwhile to complete. In addition, iMatix has never failed to complete a project, no matter how enormous and terrible, since we began operating in 1996. We must thus be doing something correctly. This essay aims to clarify how we reduce “terrifying” to merely “tedious”.
Since we don’t create software for consumers, stability is a prerequisite for every project we work on. Naturally, clients want their projects to be completed efficiently, accurately, and on time, but reliability is always the most important and challenging criterion.
I’ll describe how iMatix creates completely dependable software systems in this white paper. We’ve been doing this for years, and the fundamentals hold true across all areas of software design, development, and application. Designing for reliability also decreases the likelihood of functional failures, cost and schedule overruns, and makes the project easier to manage overall.
In our experience, a system’s dependability is influenced by two interrelated factors: the system’s total complexity and the reliability of each component separately. In general, more complex systems have fewer reliable components, and we believe there is a direct cause and effect relationship since complicated architectures make it more difficult for developers to create trustworthy components.
Therefore, how to solve complexity is the primary difficulty that all software architects must overcome, yet only good architects are aware of this. We resolve any technical problems by just devoting time, energy, and resources. However, when poorly constructed, even the most well-funded projects can crumble under their own weight. Contrary to popular belief, simplicity is not naive and complexity is not value. Simple systems are challenging to create, whereas complicated systems are simple to create.
I’ll go over the key tenets that direct our work on massive systems. If I can, I’ll include examples that go against the principle to show how breaking it might lead to issues.
Buildings Are Not Just Created Suddenly
Although it ought to go without saying, people frequently overlook this. A talented architect who puts effort into their work is responsible for a successful building. Software systems without architects would inevitably turn into slums, just as a lively, livable city is the result of many generations of expert planners, and a mega-slum is the outcome of unregulated growth.
There are numerous well-known situations when no architect is assigned to oversee the overall design:
We discuss the AMQP project from 2006–2007 as a contrast. Numerous businesses contributed pro gratis to this workgroup. The workgroup received an original version of the protocol from iMatix, which we believe was well-designed (as we were the architects). By all accounts, this version of AMQP is straightforward to comprehend and use, reliable, and overall a product we are pleased of.
Lack of an architect and a lack of architecture have always been traits of the workgroup. The design has grown incredibly intricate, and the protocol process itself is now bogged down by its own complexity.
The question of “do we need an architecture” generated heated debate in this workgroup. No one person could be in charge of architecture due to vendor competition to dominate and contribute to the protocol. Detailed design and architecture went hand in one. so forth.
The AMQP/0.9 working group release serves as an example of what happens when one permits a slum to grow in the middle of a city, just as the AMQP/0.8 release serves as an example of a well-architected protocol.
Software system architecture does not entail micromanagement. Contrarily, it actually makes considerably more opportunity for delegation and local ownership. In the larger software community, there are numerous instances of flawlessly designed systems that create overall stable and simple (in comparison to the problem they are tackling) systems, even when subjected to unforeseen and unpredictable change: Debian GNU/Linux package management, Wikipedia, and Unix.
The name “software engineer” implies that software is a substance, much like carbon fibre or steel. In actuality, creating software is far more akin to creating meals, designing outfits, or creating traditional homes and workplaces. We certainly deal with technical problems, but we do so in the context of humans. People have limitations, commit errors, and occasionally require assistance. This is true for both individuals who create the programme and those who use it.
Above all else, people are terrible at recalling arbitrary complexity. We enjoy routines, things we can understand and anticipate, and laws we can rely on.
At iMatix, we believe that every developer, especially a very skilled one who works quickly and intuitively, will make mistakes as they write code at a rate of one to two errors per ten lines. The goal of the development process is to find these mistakes as soon as possible, preferably catching 100% of them before the code is committed. Since we know from experience that writing flawless code is impossible, we don’t make this assumption. We therefore design our tools and development environment to account for this.
The same difficulty faces a systems architect: presuming that every component is flawed from the start. Give the developers all the help they need to identify and correct those mistakes if you want to locate them as soon as feasible. How can we support developers in making sure their work is accurate?
These are rhetorical questions, and I respond to them in the pages that follow. I’m hoping you’ll agree with my idea that our true work involves addressing human deficiencies rather than technical ones.
I’ll use city architecture as a counterexample from a field other than software. Many examples of aesthetically pleasing yet cruel residential and office architecture may be found in the 20th century. Apartment towers that were turned into ghettos. demolition of historic city centres for the sake of modernity. Every nightfall as the workers left for home, business districts would close. roads referred described as “long term parking lots”.
Like a diamond cutter divides a large stone into tiny parts, a skilled architect begins by cutting large difficulties into smaller bits. A knife is guided by both individual and collective experience. An old joke asks, “How many people does it take to write an N-pass compiler?” This refers to the difficulty that computer scientists used to tackle in the 20th century. “N + 1, one person per pass plus a project leader,” was the response.
It took a long time for compiler writers to figure out how to divide the problem into “passes” that could be tackled separately by different engineers. Rarely is the best way to divide a complicated issue evident.
But there are some broad guidelines that work well:
People must fit the pieces. In the end, architecture is about adapting people to the problem. A problem is of a good size if it can be solved by one developer or a small team. A challenge is poorly sized if it can only be solved by teams working together.
The fracture spots are defined by interfaces. It’s better to slice where the interface is the most straightforward. The interface will serve as a binding agreement between several teams or engineers. The best contract is the simplest possible.
Every issue can be broken down. A smart architect can divide a complicated issue into manageable bits. Lateral thinking is sometimes necessary. But we’ve never encountered a big issue that couldn’t be broken down.
A contract is the architecture. It must be distinct enough to establish boundaries between teams and organisational tiers that can never be traversed without using predetermined interfaces.
Separate the process of change. So that the total system may be both dynamic and stable, the architecture should cleanly bundle change.
The first thing an architect produces is the overall architecture, which demonstrates the many components and how they interact. The job is not complete if the architecture is overly intricate or contains an excessive number of lines or parts.
One of my rules is that every system’s architecture, no matter how large, can fit on a single page. And by using a 3pt typeface, I do not mean.
An excellent illustration of a successful architecture is the Linux kernel. It defines abstract concepts like “kernel modules” that break down a massive issue into manageable parts. It establishes agreements amongst various layers to ensure the longevity of code. The modification process is decoupled, allowing for the addition of new features at any time—even to running kernels.
I’ll contrast two well-known operating systems, Windows and Unix, as a more general illustration. Large monolithic programmes (like SQLServer) that handle everything from user interface to security in one bundle are representative of Windows. Pipes, libraries, and other generic components that can be put together using a set of well-defined principles are what define Unix.
The Unix concept is likely the most successful operating system design ever, having been adopted by FreeBSD and Linux1.
On the other hand, Windows is still persistently expensive to create and has poor performance, security, and dependability. Due to the failure of Windows Vista to generate the anticipated sales, the RAM market is in danger of collapsing in 2008.
Make the Interfaces Official
Every interface must be formalised as much as is reasonable because it is a contract between two parties (or rather, two types of parties). Usually, we formalise some or all of these interface characteristics:
What details (the semantics) are communicated between the two parties;
The syntax used to exchange this information;
how we respond to various types of failures;
how we extend the interface; how we negotiate; how we report;
how the interface changes over time;
A good user interface is consistent, predictable, entirely consistent, and all around easy to comprehend. Applications that are “complex” in their interfaces tend to be unstable. A good interface is also rigid; there are no grey areas or flexibility when anything goes wrong. We like it when a system’s issues are discovered as soon as possible.
Sadly, there are a lot of instances of convoluted and unreliable interfaces. Simple, effective ones are considerably harder to come by.
Determine and Remove Risk
Reliable software typically requires more time to build but costs significantly less to maintain.
Cut down on Dependencies
The fact that altering one component can have unintended consequences elsewhere is one of the main problems with complex structures. It’s possible that you’ve heard of programmers who encounter this in their code; this is a clue that the code was not properly designed.
By removing component dependencies, we make architectures simpler to modify. Consider the scenario where a Linux kernel module was dependent on other kernel modules. Any modification to one of these could have unexpected effects on the others.
The distinction between “dependency” and “interface” is the crucial idea here. When two parts need to communicate, they do so using a formal interface that can be defined and that enables in vitro testing of each part.
The use of the same database tables by two apps would constitute an undesirable dependency.
Confirm Complete Testability
Although “unit tests” and “integration tests” are occasionally used interchangeably, in our opinion, they should be the same. In other words, whether a unit is operating in a lab or a real architecture, it should function the same.
It is quite simple to create testing frameworks that implement each half of formal interfaces once they are defined. If we take a look at Linux kernel modules as an example, I can create a fake module that allows me to test the kernel. I can create a fake kernel that enables module testing. I would anticipate flawless cooperation between a tested kernel and a tested module. If they don’t, the interface or the testing are flawed. The next time I develop a module, I’ll know it will be completely tested because I can make these improvements.
Although creating such test frameworks might be costly, it allows teams to provide nearly error-free code and pursue complete reliability.
I’ve outlined our philosophy at iMatix in this white paper, which holds that dependability results from good design, which entails paying intentional, attentive attention to a system’s overall architecture as well as the relationships between its components.
I’ve also discussed a few crucial methods that we employ to build more dependable systems while also finding solutions to enormous issues.
Please get in touch with me at [email protected] if you want additional details.