What should you do when you have a large important system with an unknown structure?
This is important for architects because… architects will not always design a system from scratch, but will often inherit existing systems that may not be well understood.
The iterative SDLC. In a typical software development lifecycle, new systems are are designed, implemented, and tested, then go into maintenance mode. In most cases for successful systems, however, this software development cycle is actually “iterative”. That is, the system may be in maintenance mode for a while, but then we need to create a replacement or major upgrade for the system.
How often are major upgrades done? Major upgrades and rewrites are probably done more often than creating new systems. When client/server computing was new, there were many unique systems being created. The more common scenario in 2019 is an upgrade or rewrite to take advantage of new technology or to add major new features.
Lost Architecture? In many cases, there is no documented or understood architecture for an existing system. There are several common reasons for this, to include lost documentation, the original developers no longer being with the organization, or the original documentation simply being out of synch with the current system.
Effects of “Lost” Architecture. When the architecture of a system is not understood, these bad things start happening:
(1) Maintenance. Maintenance becomes slow and difficult.
(2) Defects. The code gets buggier and buggier at an exponentially increasing rate.
(3) Expensive. Maintaining and supporting the system may not be economically viable.
(4) Fragility. Things often get so bad that the organization may not even want to take the risk of making changes to the code any more – the results are just too unpredictable.
(5) Potentially Dangerous. The existing code may be insecure or even present a danger to others. Anyone who does not believe this is true need look no further than the recent case of Theranos, the now-defunct medical testing firm that put lives at risk with faulty blood testing.
KEY POINT: In real world software engineering, you are often handed a giant intractable mess of code. You have to know how to deal with that situation.
Subject Matter Experts (SMEs)
Much of the work will be technical analysis of technical work products. In most cases, however, there will be SMEs that you will want to consult with. The SMEs may not be architects, but rather be developers, testers, IT personnel, support personnel, security personnel, DBAs, or business analysts.
How important are the SMEs? If the architecture is not already documented and understood, it still exists, but it is hidden in the technical artifacts and in the fading memories of SMEs. A lot of these SMEs know things, but may not understand the value of their organizational knowledge, so you need to take a structured approach to documenting their knowledge.
How and when to work with SMEs? You will, of course, want to work with SMEs throughout the reverse engineering effort, but it is important to keep in mind that everyone’s time is valuable and their patience is limited. You don’t want to approach SMEs and ask, “Where do I begin?”. Rather, once you have taken a pass over a set of technical deliverables and have some organized information, you will then have some questions. Then and only then should you be engaging the SMEs.
Whiteboard sessions can be effective. You can approach SMEs individually or in small groups and simply start drawing the designs on a whiteboard. As you are drawing things on the board, you can prompt the SMEs to correct or add to any assumptions that you do not have correct. Because you are starting with concrete knowledge, this typically prompts the SMEs to add additional valuable information.
Determining the original intent. Perhaps the best thing that SMEs can help with is determining the original intent of an architect who is no longer with the organization. There are typically other people, either technical or business, that have a lot of information on the original intent.
KEY POINT: Understanding the intent of the original architect is a key success factor in architecture reverse engineering.
The reluctant SME. Some SMEs may be reluctant to work with you. They may feel that they are too busy, chatting with you is not important, or perhaps they are just shy. Those are simply rationalizations. The simple truth is that by hoarding knowledge, many IT personnel think they have better job security. This is, is course, often true in poorly run organizations. The bottom line is that the SMEs need to talk to you – it is part of their job. This is one of the rare occasions when you really should escalate to management when you feel that you are not getting the cooperation you need. In fact, you should have already approached their manager to ensure that the SME that reports to them will be cooperating smoothly.
What is you simply don’t get the cooperation you need? SMEs can tell you where to look. They can also tell you where the rules of the architecture have been compromised. You can complete the reverse engineering without SMEs, but you will spend more time backtracking from initial guesses.
KEY POINT: Engaging with SMEs is sufficiently important that you need to secure their cooperation even if it requires exercising influence with their management to do so.
Timing the project request. Reverse engineering an architecture is expensive – the costs have to be justified. If a system is in the maintenance phase and may eventually be phased out, then reverse engineering the architecture is probably not a wise use of time and money. Reverse engineering makes sense when there are major upgrade or replacement plans rather than just an extended maintenance phase.
It is the architecture that is important. In many cases, the system under consideration is an important product that makes a lot of money, has a lot of customers, or may otherwise be around for long time for some business reason. In all cases with complex systems, understanding the design is the foundation of smooth and cost-effective operation.
A farming analogy. Having a clean architecture as the cornerstone to success is not unique to software development, but is important for all engineered products. In fact, the same assumptions can be applied to farming. If you have a farm with milk cows, you may be able to able to get by for a while with just letting the cows eat some grass and collecting the milk. Dairy farms that are successful over the long term, however, apply scientific principles, modern best practices, and wise capital investment to build a sustainable business. Good farmers know – as should software architects – that simply “milking the cows” is not a sustainable strategy.
Organizational inertia. In many cases, organizations are reluctant to invest in reverse engineering the architecture of an existing system. There are several reasons for this:
(1) No resources. In many cases, the organization simply does not have the time and money to invest in such a reverse engineering project even though it might otherwise be a profitable idea.
(2) More pressing concerns. Even if the organization has well-qualified people to do the reverse engineering, there is often a perception of more pressing problems. For instance, if a customer is complaining about a bug, there is a security problem, etc., reverse engineering may not be a current priority even if the organization has the resources to do it.
(3) No trust in software development. In many cases, the software development organization may want to take this important effort on, but if you have those kinds of problems, then the business sponsors may have already lost confidence in the software development team, so taking on new initiatives without an “immediate” return on investment may be a hard sell.
(4) No understanding. In many cases, the key decision-makers may not be very technical. They may not even understand the concept of software architecture and how it adds value.
(5) No architect. This is, oddly enough, the most common reason for not documenting the software architecture. Even if you want to reverse engineer the architecture of a system, it is hard to find someone that is qualified to do it.
Reverse Engineering Deliverables
There are two main deliverables of a reverse engineering effort:
(2) Conformance assessment. Ensure that the system that was built is the system that was designed.
Reverse Engineering Phases
There are typically four phases for reverse engineering an architecture. These phases are iterative. That is, you typically work through the various phases, but often return to the first phase and iterate through the process again as you learn new things.
(1) Raw design view extraction. Collect the information of both static structures – typically code and database – and runtime behavior.
(2) Organize design views. The main problem with the first phase is not that there isn’t enough information – the more common problem is that there is too much information. The extracted views need to be structured in layers just like a new architecture.
(3) Design view fusion. By this phase, you will have organized design views. For instance, you will have lists of source code files, features, database tables, network connections, disk files, and user interface components. None of these information sets has much value on its own. Rather, the views need to be combined to create clean abstract representations of the system.
(4) Architecture analysis. By this phase, you should have a clean set of system diagrams or other abstract system descriptions. You now want to use that clean description to assess where the currently implemented software conforms to expectations of functionality and architectural qualities.
Tools and Techniques
“Advanced” developer skills. In many of my architecture posts, I emphasize that you are not a legitimate software architect unless you are a developer as well. This is especially true when reverse engineering a software architecture. Naturally, you have to be a very good developer to understand the artifacts. More importantly, however, you need to know enough about software development to make good guesses about the intent of the original authors of the code. This includes where common mistakes may have been made.
Automated tools vs manual analysis. Manual analysis is both time-consuming and error-prone. There are a lot of automated tools available to help you reverse engineer an architecture. Some of the tools are very expensive, but most of these expensive tools are not worth the cost. Generally speaking, you should use the minimalist reverse engineering features that are built into IDEs like Visual Studio and common add-ins like Resharper. Those tools will get you part of the way, then you have to do some additional manual analysis.
KEY POINT: You will always have to do some manual analysis, but you should use automated tools for at least part of the analysis.
Good and bad automated tool analysis diagrams. Automated tools vary widely in quality. It is not unusual for a tool to generate a diagram like the first diagram below. That diagram is comprehensive, but it is not very useful. The second diagram below has a limited content, but it communicates helpful information. The second diagram was produced with Resharper and a few mouse clicks in a matter of seconds.
Analyzing runtime behavior. When we are analyzing things, it is typical to think only about the static structures in the system, typically code and database schemas, rather than the runtime behavior. It is also easier to analyze and reason about the static structures than the runtime behavior. Nonetheless, you cannot fully understand the architecture without analyzing the runtime behavior.
Runtime behavior from a technical perspective. When we speak about the runtime analysis of a system, we are not referring to interacting with a system via the user interface like a user would. That can be helpful and perhaps necessary, of course, but actually does not need to be done that much. An experienced developer actually has a better sense of the operation of the user interface from even briefly examining the code than from actually running the application.
KEY POINT: Experienced developers can visualize user experience simply be examining the underlying source code.
Instrumentation is key. Architecture reverse engineering is a good time to add additional logging and monitoring in order to examine the runtime behavior. Logging, monitoring, profiling tools, and traces are the key mechanisms for analyzing the runtime behavior.
Existing unit tests. Running the existing unit tests can provide a lot of information on the runtime behavior of the system. You may want to add some additional tests.
(1) Lost Architecture. The architecture of a system always exists, but it may not be documented or understood for a variety of organizational reasons.
(2) Need to Sell It. This is the type of initiative that the development team asks for resources from the business for, so you need to be prepared to sell the business team on the value of the analysis. It is best to time the request to coincide with major upgrade plans.
(3) Subject Matter Experts (SMEs). You will want to spend a lot of time engaging SMEs. A SME can be anyone who has been there long enough to provide some understanding of the original intent of the design.
(4) Deliverables. The deliverables of the reverse engineering effort will be documentation and assessment of the conformance of the implementation with the design intent.
(5) Four Steps of Reverse Engineering. You need to extract the views, organize the extracted views, fuze the organized views, and analyze the organized views.
(6) Expert Developer Needed. This type of analysis is extremely complex, so you need a developer experienced enough to understand how other developers think.
(6) Lot of Tools. Non-trivial systems employ many different programming languages and technologies. This implies that many different tools will be needed to analyze this wide variety of artifact types.
(7) Automated Tools. You will always have to do some manual analysis, but you should try to leverage automated tools as much as possible, at least to get started.
(8) Runtime Behavior. You need to understand runtime behavior as well as static system structures. Logging, instrumentation, profilers, and traces can be very helpful in analyzing runtime behavior.