As an architect, how do you ensure that the SQL Server database function is stable enough to support your ongoing development efforts?
This is important for an architect because… an architect is responsible for all of the technology on the initiative and database support is a key piece.
THIS IS THE SECOND PART OF A TWO PART BLOG POST – Part I is available at https://softwarearchitectureinpractice.com/blog/f/assessing-the-sql-server-database-function-part-i.
As an architect, you will want to sponsor periodic assessment and review of associated IT functions like database, security, networking, and support.
We will review several categories of things to assess for database support.
Not Covering Best Practices. This post is dedicated to explaining preliminary assessment areas. Acting on the results of that an assessment is a topic for separate posts.
Data Model and Design. The data model is undoubtedly the most complex and sensitive database artifact on the project. It is critical that the architect analyze and understand the data model well. There is too much detail on this to cover it in this post, but there is a helpful link at https://opentextbc.ca/dbdesign01/chapter/chapter-5-data-modelling/.
Data Quality. Protecting the data quality, especially with constraints, is important to prevent scenarios like (000) 000-0000 for phone numbers and 99/99/9999 for birth dates. If the data quality is not certain, the data should be assessed for type compliance.
Typically, if the system has been working for a while, you have to be cautious trying to correct the data quality. In many cases, leaving the data alone may be the best option.
Code Analysis. This will mainly apply to stored procedures. Code review for dynamic SQL will be part of the code review for the application that is using the dynamic SQL. Generally speaking, you will want to review stored procedure code more carefully than application code. There is a good checklist to follow at http://udayarumilli.com/stored-procedure-code-review-checklist/.
Use of stored procedures vs. dynamic SQL vs. ADO.NET vs. Entity Framework. These are all different technologies for running SQL. Unfortunately, many existing applications are using a combination of these. I have seen many systems that are using all four methods to run queries. This is not necessarily bad, but it is complicated.
You don’t need to rewrite all of the queries to use a single technology. If the code is working and performing adequately, I would typically leave it alone. It is important, however, to inventory the use of the various technologies with a bias towards optimizing performance where necessary and simplifying the logic if possible.
Checkins. Naturally, we want all artifacts that contribute to our production environment to be checked into source control. Unfortunately, it is not unusual for database artifacts not to be checked in or not up to date if they are checked in. You will want to ensure that all database creation, database query, reporting, and system integration artifacts are checked in. Failure to check in database artifacts is a potential business killer, so if you discover that something is not checked in, you will want to correct that as quickly as possible.
What is data warehousing? Data warehousing is part of the “business intelligence” specialty. We don’t expect architects to know a lot about business intelligence. The most important thing about business intelligence that architects need to know is when to get the report generation load off of their OLTP (Online Transaction Processing) systems.
The problem with reporting. Reports often need very complex and time-consuming queries when they are run against the OLTP system. This adversely affects the ability of the OLTP system to serve customers interacting with the system dynamically. In fact, complex reports can and often do end up taking most of the database resources.
The typical solution for reporting performance. Moving appropriate data to a data warehouse allows the reports to be run against a reporting system. While specific details are beyond the scope of this post, the data warehouse has denormalized data, thus allowing for faster performance.
Reporting data typically does not need to be current. Another advantage of moving reporting queries from the OLTP system is that reporting data typically does not have to be current. Data that is a day or perhaps a few hours old is usually OK for reporting. You can update data from the OLTP system to the data warehouse during times when the OLTP system is not under heavy load, perhaps in the middle of the night.
Rudimentary snapshots or replication may also be OK. When we bring up the possibility of data warehousing, this implies a lot of planning, time, and resources. To create a “formal” data warehouse can be time-consuming, but if you are in a hurry, you may want to just create a full copy of the OLTP database for reporting queries to run against. You can update the data periodically or use replication to keep the data up to date. This won’t make the reports run faster, but it will keep pressure off of the OLTP database – the architect’s key concern with reporting. If you are using transactional replication, you could even use the reporting database as an OLTP database if the main OLTP database becomes unavailable.
“But I Need Reports to be Current”. This is a common complaint from business sponsors. In my experience, most business sponsors are willing to compromise when they understand how much more it will cost and how much customers will be impacted to keep the reports running against the OLTP system.
What is scalability? A system is “scalable” if it can handle an increased load with a proportionate increase in computer resources usage. Ultimately, the architect will want to focus on scalability of the entire system rather than just the database. Having said that, the solution to most scalability problems tends to involve the database.
Different approach than for applications. Assessing scalability for a database is different than for an application. Generally speaking, you can add a lot of application servers to a system, but you are restricted to one database server unless you resort to partitioning the data across multiple servers.
Key scalability actions. Because you only have one database, the main ways to facilitate scalability are simply to use the database as little as possible and to ensure that queries run as quickly as possible. This is a big topic, so additional detail is probably a topic for another post.
What does our monitoring suggest? Your DBA team probably has special tools to do monitoring and can give you detailed reports on the types of queries that have been run and the resources that the queries are consuming. You can generally use this data to make some quick performance fixes.
Should the DBA team have already looked for performance problems? Not necessarily, especially if they have not perceived performance as a pressing problem. The DBA team may also not understand the significance of the performance data in the tool or that the data suggest some easy performance improvements. This is not a shortcoming of the DBA team, but a reflection of their focus on physical database administration rather than development issues.
Measure, measure, measure. When assessing performance for any system component, it is important to measure performance rather than make assumptions. It is very common for developers to make changes that they think will help performance but that are actually making things worse for reasons that are not easy to anticipate.
Indexing. The most important – and typically easiest – way to improve database performance is to add indexes to tables to allow specific queries to run faster. There are a number of tools that can easily recommend indexes that you can create based on the queries that you are running.
In many cases, physical DBAs will make arguments against indexing based on the indexes taking time to update as changes are made to the data. While in theory that can be a valid argument, is is very, very rare that adding additional indexes would create a performance problem bigger than the index is solving.
As a general rule, don’t recommend additional indexes that only increase performance by a small percentage, but look for gains of at least 10-20% improvement. If you are missing a vital index, the gains are typically mush higher than 20% – improvements of close to 100% are not unusual.
Profiling with XEvents or SQL Profiler. When you are measuring the performance of either database code or application code, then a “profiler” that actually measures the performance of the code is vital. The two key tools are the simply named “SQL Profiler” and the XEvents profiler described at https://docs.microsoft.com/en-us/sql/relational-databases/extended-events/use-the-ssms-xe-profiler?view=sql-server-2017.
The XEvents profiler is preferred because it works with more efficient technology that will interfere less with the operation of the SQL Engine. It works with “Extended Events” described at https://docs.microsoft.com/en-us/sql/relational-databases/extended-events/quick-start-extended-events-in-sql-server?view=sql-server-2017.
Statistics. “Statistics” are used by SQL query plans to factor in historical query results to help optimize the query plan. Architects don’t have to worry too much about statistics other than to know that they are enabled and being updated on a routine basis. Microsoft has some information on statistics at https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-2017.
Execution Plans. To execute queries, SQL Server must analyze the statement to determine the most efficient way to execute it. This analysis is handled by a component called the “query optimizer”. The input to the query optimizer consists of the query, the database schema (table and index definitions), and the database statistics. The output of the query optimizer is a “query execution plan”. These execution plans can be viewed graphically to analyze the potential for optimization and to otherwise analyze the resource consumption. This is typically not necessary very often, but the architect should know the techniques and be prepared to analyze the query plans in detail. Additional information is available at https://docs.microsoft.com/en-us/sql/relational-databases/performance/execution-plans?view=sql-server-2017
Not an architect focus area. We left this category for last because the architect will generally not need to spend much time analyzing SQL operations. In fact, if your DBA team has a good reputation, it is likely not worth the time to ask questions about this – most DBA teams simply do an outstanding job with these operational topics.
When to ask such questions. If you are new to the organization, your should ask about the DBA team’s procedures. If you are working on an entirely new project with new databases, that is another good time to ask.
Disaster Recovery. While architects need to be concerned about business continuity for the entire system, it is reasonable to assume that the DBA team has an excellent disaster recovery plan for the database. The architect does, however, need to know how quickly a database can be restored by the DBA team and if the chosen disaster recovery level calls for any potential data loss, but that’s all.
There are important financial considerations as well – disaster recovery options can get terribly expensive. It is not unusual to have one or even two backup systems ready to go for sensitive applications, thus automatically tripling infrastructure costs. You don’t want to pay for more disaster recovery than you actually need. Additional information is available at https://docs.microsoft.com/en-us/sql/database-engine/sql-server-business-continuity-dr?view=sql-server-2017.
Monitoring. SQL Server monitoring tools can give you a wealth of valuable information, but are mainly for the DBA team. The DBA team spends a lot of time with these tools, but generally does not share them with the development team. That is because the tools are generally quite expensive and using them requires access to the production database environment. In fact, many developers have never even seen the kind of expensive professional level tools that production DBAs use. Just as importantly, in most professional software organizations, architects will not be given unrestricted access to the production environment. It is regarded as a “conflict of interest” to allow developers and architects to touch production in an unrestricted way.
Architects should simply know what tools are the standard for the organization and ask for reports from the tools on an as-needed basis. If you have specific needs, most DBAs will happily bring up the tools and show you what you need to know.
Automation. Good DBA teams want to automate almost “everything”. Unlike development, most DBA tasks are actually quite repetitive – unless you automate those tasks. Most DBA teams don’t have enough people not to automate such tasks, but the tasks do tend to be at least somewhat customized to an organization. This is another area where an architect should know what is going on, but can generally just let the DBA team worry about the details.
Maintenance. Databases should regularly be optimized, backed up, and have data cleaned up. DBA teams can be relied upon to typically have daily, weekly, and other periodic procedures. It is helpful to know what the procedures are, but architects typically never get involved with database maintenance. Database maintenance is usually heavily automated.
Error Log Analysis. The SQL Server error log file is the most important SQL log file. You can assume that the DBA team is actively checking the SQL error log. The SQL error log typically has more information in production environments than test environments simply because there is typically a much larger volume of activity against a production environment than a development environment. Additional information is available at https://docs.microsoft.com/en-us/sql/relational-databases/performance/view-the-sql-server-error-log-sql-server-management-studio?view=sql-server-2017.
Patching. Any enterprise system has to have patch updates applied regularly. Many of these are critical for security. Patching will usually be done during a regularly scheduled maintenance period. Good organizations will also ensure that affected stakeholders have an opportunity to validate that applications dependent on the patched database are still working correctly. There will typically be a rollback plan to remove the patch safely if necessary.
Database size. Database storage planning is more complicated than planning for operating system disk size. DBAs spend a lot of time on it. There are multiple reasons for the additional complexity:
KEY POINT: When communicating with DBAs, use the term “storage” rather than “disk”. Database storage is way more complicated than single disks – the DBA might even laugh at you if you naively refer to it as a “disk”.
(1) Enormous Sizes. The size of disks on application servers are typically a few dozen GB in size. After all, we are not storing data on the application server. It is not unusual for DBAs to be managing databases that are several Terabytes in size – 1000 times larger than development databases.
(2) Expensive. It is not unusual for the disk space for an application to cost more than the salaries of the entire development team. Consequently, the decision to approve a development project often rests on the expense of the disk space.
(3) Transaction Log. In addition to the database itself, you have to have transaction logs for disaster recovery. It is not unusual to need more disk space for the transaction logs than for the main database.
(4) Compression. Compression is a simple decision for application servers and file servers, but is a more complex decision for a database server. More information is available at https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017.
(5) Archiving. Archiving for databases serves two vital purposes. First, we want to save some money by moving data that is rarely needed to lower cost storage. Just as importantly, however, moving a lot of data to archival storage allows the remaining data to be queried faster.
(6) Disk Speed. One of the reasons that SQL Server disk space is so expensive is because the hardware typically has various provisions to make disk access much faster. In fact, SQL Server can access data from multiple disks using multiple disk controllers at the same time. Among the various assumptions that we make for disk configuration to optimize speed is that the database and the transaction logs should be on separate disks.
(7) Allocation. Operating system disk space needs to be specifically allocated to the database for the database to use it. While there are generally automated provisions for growth, allocation does need to be considered in the storage planning process.
(8) Fast Growth. Active databases can grow very, very quickly. It is often difficult for architects to understand how fast a production database can grow. As a result, the DBA team always needs to plan for some unallocated and for some unused space within the allocated space.
(9) Indexing. Indexes can take up a lot of disk space as well, so they need to be considered in capacity planning.
(10) Temporary backup storage. To back storage up to tape or another backup medium, there typically has to be enough operating system disk space to hold a full database copy, thus doubling the storage requirements.