Importance of “sar” for Oracle DBA’s

Photo by Volodymyr Dobrovolskyy on Unsplash
It was a fine morning when i received a ping from my monitoring team that a high load alert on an oracle database has appeared 15 times in past 6 hours. Well if the database host was under high load now i would run “top” or “uptime” command . How should i verify when the load has gone high in past 6 hours. Well “sar” or “System Activity Report” is the quickest answer. Sar is not only the answer to finding load on the host but also it can answer cpu , memory , network and I/O consumption .
However an Oracle dba can also use the ash and awr reports to pin point to the queries causing the problem , a quick glance of sar could assist in pinpointing to the section in awr of your concern. If you wish to look for installation instruction you could find here . Before diving into various commands let me tell you “sar” simply displays output for the current day. If you wish to see sar for previous days go to /var/log/sa in Linux . You will see files by name saxx , xx will be the two digits of date.
sar for load: sar -q ( for current day’s statistic ) or sar -q 2 5 ( displaying sar output 5 times with 2 sec interval) or sar -q -f /var/log/sa/sa20 ( -f is to read from a file rather than system followed by file name).Here is a sample output with explanation.
08:40:00 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
08:50:00 PM 6 5287 18.98 28.26 32.50 0
09:00:00 PM 6 5487 7.82 18.70 25.57 0
09:10:00 PM 14 6610 349.13 280.44 144.46 0
09:20:00 PM 7 6756 5.29 45.96 82.82 0
09:30:00 PM 24 6745 8.95 15.01 49.22 0
09:40:00 PM 8 5790 7.04 7.91 29.10 0
09:50:00 PM 6 5434 7.33 11.07 22.42 0
10:00:00 PM 16 5339 9.45 8.72 15.35 0
runq-sz (run queue size) : Indicates number of processes waiting to run on CPU . It includes process that are runnable and those currently executing.
plist-sz (process list size) : Its the sum of all process and threads currently existing in the system
ldavg-1/5/15 (Load average ) : Its the load average over interval of 1 / 5 and 15 min . This is significantly high during 11:10 PM .
blocked ( blocked processes) : It indicates number of blocked process on the CPU usually waiting on I/O
In our case we could see there was a sudden increase in number of process (plist-sz) from 5487 at 9:00PM to 6610 at 9:10PM. However its noticeable that load average returned to normal in next 10mins but number of processes didn’t reduce. However the number of processes gradually reduced to 5790 by 9:40PM.
This pattern was visible throughout the day. A sudden burst of process causing huge load for few mins maybe, which suddenly declines. A look at the ash report for this time frame tells me there was an adhoc query running with huge parallel hint . However these parallel threads were later waiting on master thread to finish a write operation. These idle threads contributed to the sum of processes in plist-sz value.
Listing a few more “sar” commands incase high load is not an issue for database admin
sar for CPU usage : sar -u
sar for Memory: sar -r
sar for I/O : sar -b
References:
https://www.geeksforgeeks.org/linux-unix/sar-command-linux-monitor-system-performance