DatabaseOracle Database

Importance of “sar” for Oracle DBA’s

Photo by Volodymyr Dobrovolskyy on Unsplash

It was a fine morning when i received a ping from my monitoring team that a high load alert on an oracle database has appeared 15 times in past 6 hours. Well if the database host was under high load now i would run “top” or “uptime” command . How should i verify when the load has gone high in past 6 hours. Well “sar” or “System Activity Report” is the quickest answer. Sar is not only the answer to finding load on the host but also it can answer cpu , memory , network and I/O consumption .

However an Oracle dba can also use the ash and awr reports to pin point to the queries causing the problem , a quick glance of sar could assist in pinpointing to the section in awr of your concern. If you wish to look for installation instruction you could find here . Before diving into various commands let me tell you “sar” simply displays output for the current day. If you wish to see sar for previous days go to /var/log/sa in Linux . You will see files by name saxx , xx will be the two digits of date.

sar for load: sar -q ( for current day’s statistic ) or sar -q 2 5 ( displaying sar output 5 times with 2 sec interval) or sar -q -f /var/log/sa/sa20 ( -f is to read from a file rather than system followed by file name).Here is a sample output with explanation.

08:40:00 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
08:50:00 PM         6      5287     18.98     28.26     32.50         0
09:00:00 PM         6      5487      7.82     18.70     25.57         0
09:10:00 PM        14      6610    349.13    280.44    144.46         0
09:20:00 PM         7      6756      5.29     45.96     82.82         0
09:30:00 PM        24      6745      8.95     15.01     49.22         0
09:40:00 PM         8      5790      7.04      7.91     29.10         0
09:50:00 PM         6      5434      7.33     11.07     22.42         0
10:00:00 PM        16      5339      9.45      8.72     15.35         0

runq-sz (run queue size) : Indicates number of processes waiting to run on CPU . It includes process that are runnable and those currently executing.

plist-sz (process list size) : Its the sum of all process and threads currently existing in the system

ldavg-1/5/15 (Load average ) : Its the load average over interval of 1 / 5 and 15 min . This is significantly high during 11:10 PM .

blocked ( blocked processes) : It indicates number of blocked process on the CPU usually waiting on I/O

In our case we could see there was a sudden increase in number of process (plist-sz) from 5487 at 9:00PM to 6610 at 9:10PM. However its noticeable that load average returned to normal in next 10mins but number of processes didn’t reduce. However the number of processes gradually reduced to 5790 by 9:40PM.

This pattern was visible throughout the day. A sudden burst of process causing huge load for few mins maybe, which suddenly declines. A look at the ash report for this time frame tells me there was an adhoc query running with huge parallel hint . However these parallel threads were later waiting on master thread to finish a write operation. These idle threads contributed to the sum of processes in plist-sz value.

Listing a few more “sar” commands incase high load is not an issue for database admin

sar for CPU usage : sar -u

sar for Memory: sar -r

sar for I/O : sar -b

References:

https://www.geeksforgeeks.org/linux-unix/sar-command-linux-monitor-system-performance

https://linux.die.net/man/1/sar

Leave a Reply