Jephe Wu - http://linuxtechres.blogspot.com
Environment: MS SQL server 2008 R2, Windows server 2008
Objective: SQL server performance monitoring and tuning, database backup and restore procedures
According to - How to determine proper SQL Server configuration settings http://support.microsoft.com/kb/319942
SQL Server can obtain a very high level of performance with relatively little configuration tuning. You can obtain high levels of performance by using good application and database design, and not by extensive configuration tuning. See the "References" section of this article for information about how to troubleshoot various SQL Server performance issues.
When you address a performance problem, the degree of improvement that is available from configuration tuning is typically modest unless you do not currently have the system properly configured. In SQL Server version 7.0 and later, SQL Server uses automatic configuration tuning and it is extremely rare that configuration settings (especially advanced settings) need any changes. Generally, do not make a SQL Server configuration change without overwhelming reason and not without careful methodical testing to verify the need for the configuration change. You must establish a baseline before the configuration change so that you can measure the benefit after the change.
If you do not have SQL Server properly configured, some settings might de-stabilize the server or might make SQL Server behave erratically. Years of support experience with many different environments indicate that non-default configuration settings might have results that range from neutral to highly negative.
If you do make a configuration change, you must perform rigorous methodical performance testing both before and after the change to assess the degree of improvement.
Based on actual support scenarios, SQL Server version 7.0 and later can achieve an extremely high level of performance without any manual configuration tuning.
In SQL Server version 7.0 and later, do not make any configuration changes to user connections, locks, and open objects because, by default, SQL Server dynamically tunes these settings.
Part I: Performance monitoring:
Performance Counter threshold value:
http://msdn.microsoft.com/en-us/library/ms189628.aspx - performance objects explanation
check CPU and memory first, then disk I/O, lastly network.
because memory problem can trigger disk I/O problem.
then check database internal performance counters, server minimum/maximum memory configuration, data/log files, tempdb data/log files, paging files.
2. CPU bottleneck check
a. firstly check task manager to see if any other application is using most of cpu, rather than sql server. If it's sqlserver, then check further
b. run 'perfmon', add performance counter
- processor information - % Processor time the percentage of elapsed time that all of process threads used the processor to execution instructions should <80%. Process is the object created when you run a program; thread is object that spawned by progress to be used to execute instruction.
- Processor - % user time should be > 70%
- Processor - % privileges time - The operating system switches application threads to privileged mode to access operating system services. should be < 20%
- System - Processor Queue Length - Number of threads waiting to be scheduled for CPU time
Some caused for high processor queue length:
1. Unnecessary compilation and recompilation.
- System - Context Switches/sec - if it's greater than 20,000 per second, that's a lot.
sys.dm_os_wait_stats for Signal waits and Total waits, if Signal waits > 25% of total waits indicates cpu bottleneck
Note: to counter the number of cpu in sql server:
select cpu_count from sys.dm_os_sys_info
a 100M ethernet can roughly handle 1000 batch request/sec
sql recompilation/sec : sql compilation/sec should be < 10%.
sql compilation/sec : batch request/sec should be < 10%
The default setting for min server memory is 0, and the default setting for max server memory is 2147483647 MB in MS SQL 2008
run 'perfmon', check the following:
- SQL server: Buffer Manager - Average Page Life Expectancy Counter - should not less than 300 (5mins)
How long can a page stay in the memory before being flushed out to disk. too lower indicates lack of memory or missing index.(full table scan)
- SQL server: Memory Manager - memory grant pending - task waiting for RAM to execute.
- Memory: - paging file % usage - should be as lower as possible, < 70% based on KB 889654
- Memory: - paging file % usage peak - should be as lower as possible, < 70% based on KB 889654
- Memory: - Available MBytes - should not be less than 200
- Memory: - Page fault /s
- Memory: - pages/sec - indicator of the amount of paging that Windows is performing
- Memory: - Page Reads/sec (should <=5)
If Memory: Pages/sec is greater than zero or Memory: Page Reads/sec is greater than five, Windows is using disk to resolve memory references (hard page fault). This costs disk I/O + CPU resources.
a. If you have less memory, the disk I/O will increase, they are related, you should consider both sides if you suspect your system has performance issue.
4. Disk I/O bottleneck
We should concentrate on disk queuing instead of on the actual I/O for each disk as actual disk I/O speed cannot be adjusted.
- PhysicalDisk - avg. Disk Queue Length (if exceeds twice the number of spindles, then you are likely developing a bottleneck, you have to know how many disk behind the system, if it's SAN system, it might be difficult to get that information, even SAN administrator might not know exactly)
Note: Disk queuing might be caused by short of memory, check pages/sec value also.
- Physical disk - avg. disk seconds/read - indicates the average time, in seconds, of a read of data from the disk. (4-8ms is ideal, if > 15ms might indicate disk bottlenecks
refer to http://technet.microsoft.com/en-US/library/cc917690 for IO bottleneck part
According to http://download.microsoft.com/download/F/1/0/F10BC023-9396-4D67-BAC9-7C43AE954BF4/SQLServer2000_SQL_Performance_Tuning_using_Waits_and_Queues.pdf
- Physical disk - avg. disk seconds/write (< 8ms (non cached) and <=1ms for highend SAN) The throughput for high volume OLTP applications is dependent on fast sequential transaction log writes.
Excellent < 08 Msec ( .008 seconds )
Good < 12 Msec ( .012 seconds )
Fair < 20 Msec ( .020 seconds )
Poor > 20 Msec ( .020 seconds )
Cached Writes Only
Excellent < 01 Msec ( .001 seconds )
Good < 02 Msec ( .002 seconds )
Fair < 04 Msec ( .004 seconds )
Poor > 04 Msec ( .004 seconds
- Physical disk: % disk time - should be < 50% - percentage of elapsed time that the selected disk drive was busy servicing read or write requests.
- Physical disk - avg. disk reads/s and Physical disk - avg. disk write/s
Note: this 2 values indicate how busy is your database server, just for measuring purpose to know how busy your system is , compare to 3 month or a year ago
Should maintain above 2 values < 85% disk capacity
Disk Writes/sec 70
Avg. Disk Queue Length 5
Then to maintain a raid configuration below, you need:
Raid 0 - need 75 IO per disk
Raid 1 - need 110 IO per disk
Raid 5 - need 180 IO per disk
Raid 10 - need 110 IO per disk
The method of watching out for disk queuing associated with SQL Server log files is different from SQL Server database files. For log file, we use
SQL Server: database: log flush wait time
SQL server: database: log flush waits/sec
c. raid level - use raid 10 if possible
Each SQL Server write to the mirrorset results in two disk I/O operations
When dynamiclly adding new disk into raid array, The RAID controller will move some existing SQL Server data to these new drives so data is evenly distributed across all drives in the RAID array
d. sequential I/O
sequentional I/O is faster than non-sequential one, most of hard disk perform 2 times better for sequential than non-sequential I/O operations
Note: Logs generally are not a major concern because transaction log data is always written sequentially to the log file in sizes ranging up to 32 KB.
5. SQL Server
See SQL Server, Buffer Manager Object - http://msdn.microsoft.com/en-us/library/ms189628.aspx for various objects explanation
- SQL Server: Buffer Manager: Buffer Cache Hit Ratio, should be at least 90%
Note: a major part of any database server environment is the management of memory buffer cache. less than 90% may indicate memory pressure or missing index.
See SQL Profiler: Stored Procedure: CacheHit, CacheMiss, and CacheInsert to see what stored procedure query plans are already in cache (Hit), vs. those not in cache (Miss,Insert)
- SQL server: cache manager: Cache Hit Ratio - Percentage of time the procedure
plan pages are already in cache e.g. procedure
cache hits. I.e. how often a compiled procedure is found in the procedure cache
(thus avoiding the need to recompile).
- SQL server: Buffer Manager: Page Life expectancy -
Sudden big drop in page life expectancy - DW applications (e.g. big transactions) could experience big drops in page life expectancy. This is due to a cache flush from a big read
average page life expectancy less than 300 which is 5 minutes - OLTP database - indicate memory pressure, missing indexes, or a cache flush.
- SQL server: access method : Forwarded
- SQL Server: access method: Full
The number of unrestricted full scans. These can either be base table or full index scans. -
should be as lower as possible > 1 might indicate problem
- MS SQL Server: Access method: index search/s (ideally index search/s : full scan /s should > 1000)
- MS SQL Server: Access method: Page
Splits/sec - Number of page splits occurring as
the result of index pages overflowing.
Normally associated with leaf pages of clustered indexes and
- MS SQL Server: Buffer Manager: Checkpoint pages/sec - Pages written to disk during the checkpoint process, freeing up SQL cache
- MS SQL Server: Buffer Manager: Lazy
writes/sec - Pages written to disk by the
lazywriter, freeing up SQL cache
- MS SQL Server: Buffer Manager: Readahead pages/sec - indicates the number of pages read per second in anticipation of use.
- SQL Server: database: log flush wait time (waiting for transaction log write request to complete <ms>) and SQL server: database: log flush waits/sec (Tranlog writes per second) and
- SQL Server: log file size and used log file size
- SQL Server: database file size
- SQL server: tempdb file size
- SQL server: database: Transactions /sec - SQL Server transactions per second
- MS SQL server: Transactions - Longest Transaction Running time.
check long transaction records view
sys.dm_tran_database_transactions - columns of particular interest include the time of the first log record (database_transaction_begin_time), the current state of the transaction (database_transaction_state), and the log sequence number (LSN) of the begin record in the transaction log (database_transaction_begin_lsn).
In perfmon, check
- General statistics: login/s (number of logins per second), logout/s
- General statistics: User Connections : not very accurate since some application uses connection pool, there are many users behind connection pool
- Latches : Average Latch Wait
- Latches: Latch
- Lock - Average Wait
Time(ms) Transactions should be
as short as possible to limit the blocking of other users.
- Lock - Lock
- Lock - number of deadlocks/sec
sys.dm_os_wait_stats - check top wait statistics
sys.dm_db_index_operational_stats - check lock and latch wait
8. Network bottlenecks
in perfmon, choose the following:
a. Network Interface - Current Bandwidth.
Note: actual bandwidth is computed as packets/sec * 1500 * 8 /1000000 Mbps.
Also check paging file, separate data file and log file drive, tempdb data/log should be at separate drive.
Move tempdb data and logfile to new location:
alter database tempdb modify file (name='tempdev',filename= 'e:\mssql7\tempnew_location.mDF') alter database tempdb modify file (name='templog',filename= 'c:\temp\tempnew_loglocation.mDF')
Part II - performance tuning
1. CPU/Memory bottlenecks - upgrade server or adding more RAM
AWE: check it on 32-bit servers with >4GB of memory, and unchecked the rest of the time.
2. reduce disk I/O
- buy faster disk (15k rpm)
- using database partition to reduce I/O
- separate data and log file disk
- increase temp tablespace initial size and put to separate disk other than data/log files
- configuring suitable maximum server memory in sql server to leave some for OS
- suitable raid level:raid 0 or raid 0+1(raid 1/0 or raid10) , mirrored stripes (ms) , stripe first, mirror later, raid 0 first. Raid 10 is the bestfiles.
3. MS SQL server
- create and maintain good index
recently used index is in sys.dm_db_index_usage_stats, any defined index not included in this DMV has not been used since the last re-start of SQL Server.
- Too many indexes need frequent updates or unused indexes that need frequently maintenance
(insert/select/update) will create overhead, but without benefiting any users.
- create appropriate indexing to avoid excessive sorting operations
- Big IOs such as table and range scans due to missing indexes
Reduce table join for frequent queries. join overuse will result in long run query and waste resources.
- Partition large data sets and indexes
- Tune applications and queries.
It is very important for application developers to understand SQL Server architectural basics and how to take full advantage of SQL Server indexes to minimize I/O.
- Lack of useful statistics:
the following command calls sp_updatestats to update all statistics for the database.
UPDATE STATISTICS Sales.SalesOrderDetail;
Master database files:
The master database, msdb, and model databases are not used much during production compared to user databases, so it is typically not necessary to consider them in I/O performance tuning considerations. The master database is usually used only for adding new logins, databases, devices, and other system objects.
Part III - References:
2. SQL server performance tuning guide for dataware house - http://technet.microsoft.com/library/Cc966420
5. sql long transaction view - http://msdn.microsoft.com/en-us/library/ms186957(v=sql.105).aspx
6. update statistics - http://msdn.microsoft.com/en-us/library/ms187348(v=sql.105).aspx
7. Performance Monitor Counters - http://technet.microsoft.com/en-us/library/cc768048.aspx
8. performance and tuning howto - http://msdn.microsoft.com/en-us/library/ms191511(v=sql.105).aspx
10. establishing a performance baseline - http://technet.microsoft.com/en-us/library/cc781394(v=WS.10).aspx
11. storage best practise - http://technet.microsoft.com/library/Cc966534
12. Troubleshooting performance problem in sql server 2008 - http://msdn.microsoft.com/en-us/library/dd672789(v=SQL.100).aspx#phrss - excellent doc
13. best practice for sql sever maintenance - http://technet.microsoft.com/en-US/library/cc966447
14. SAN storage best practise for sql server - http://www.brentozar.com/sql/sql-server-san-best-practices/
15. sql server 2008 I/O performance - http://blogs.technet.com/b/josebda/archive/2009/03/31/sql-server-2008-i-o-performance.aspx
16. diagnosing and resolving Latch Content for SQL server 2008 R2 - http://www.microsoft.com/en-us/download/details.aspx?id=26665
17. How to: Configure SQL Server to Use Soft-NUMA－ http://msdn.microsoft.com/en-us/library/ms178144(v=sql.105).aspx
Part IV - useful DMVs
1. DMV list: http://msdn.microsoft.com/en-us/library/ms179984.aspx
select * from sys.dm_os_wait_stats order by wait_time_ms
select * from sys.dm_os_latch_stats;
I/O wait at this moment
top 10 wait type
select top 10 *
order by wait_time_ms desc
Part V - best practise
1. Isolate log from data at the physical disk level -
- separate database data and log file disk, Pre-size data and log files.
According to http://technet.microsoft.com/library/Cc966534, It is recommended to have .25 to 1 data files (per filegroup) for each CPU on the host server.
- tempdb data/logfile should on different disk spindle
create one data file for tempdb per cpu (Dual core counts as 2 CPUs; logical procs (hyperthreading) do not)
depending on tempdb usage, use raid10 for storage if possible
- backup should go to separate disk
- If you have real time anti-virus monitoring, it is recommended that you exclude the SQL Server database files (including data files, transaction log files, tempdb and other system database files) from real time monitoring
- Update statistics on the largest tables weekly or monthly
- Rebuild or defrag the most important indexes
2. storage disk for data file should not be shared with other application etc, dedicated for data file with enough spindle
3. vcpu and memory amount should be same as physical machine baseline if running in virtual environment.
4. If raid 1/0 or raid 1 for logfile disk if possible. RAID 1+0 provides better write performance than any other RAID level providing data protection, including RAID 5.
Part VI - testing tools
1. sql nexus
2. sql diag
3. sqlIO , alternatively , use IOMETER
download sqlio at http://www.microsoft.com/en-us/download/details.aspx?id=20163
SAN performance tuning with sqlio - http://sqlserverpedia.com/wiki/SAN_Performance_Tuning_with_SQLIO
4. perfmon and sql profiler
http://www.simple-talk.com/sql/database-administration/correlating-sql-server-profiler-with-performance-monitor/ good article talking about how to corelate profiler and perfmon data
5. latch wait
For a non-production environment only, clear the sys.dm_os_wait_stats DMV with
the following command:
dbcc SQLPERF ('sys.dm_os_wait_stats', 'CLEAR')
A similar command can be run to clear the sys.dm_os_latch_stats DMV:
dbcc SQLPERF ('sys.dm_os_latch_stats', 'CLEAR')
Part VII - backup and restore
1. restore master datbase in single user mode - http://technet.microsoft.com/en-us/library/ms190679.aspx
2. Performing a Complete Database Restore (Full Recovery Model)
Part VIII - FAQ
1. if performance counter doesn't load or corrupted, you can run this to fix: