Nowadays, having performance issues in the database has become a kind of cliché. As new data is getting piled up every minute, it is quite normal that the server is going to overload which ultimately affects the database performance. But what if a single command would direct you to the root cause and tackling such problems becomes easy?
This blog will explain more about the following command “onstat -p”. This single command gives all the details about engine statistics, based on its output you can decide where exactly the database performance is lagging. The command displays a performance profile that includes the number of reads and writes, the number of times that a resource was requested but was not available, and other miscellaneous information.
Below is a sample screenshot of the output:
Onstat -p output varies as shared memory access disk memory. For example when we fire a simple SQL query let’s say “select * from abc” and if abc pages are not available in buffers then shared memory will access the disk to fetch “abc” table pages and that’s how to output data will change based on a number of disk access attempts. The first portion of the output describes reads and writes. Reads and writes are tabulated in three categories: from disk, from buffers, and the number of pages (read or written).
Let’s understand the above output line by line:
The status line provides the following information:
- The name and version of the OneDB server product
- The current mode of your server -> (Prim) stands for primary, each server modes ( quiescent, single-user, fast recovery, etc. ) serves a separate purpose.
- The length of time since shared memory was last initialized
- The size of the shared memory -> Detailed information about shared memory distribution can be acquired using “onstat -g seg “.
Whatever transactions are happening, all of them will be stored in buffer pools. Every data of the transaction goes through buffer pools. Let’s say you are inserting a record, all the details about the record will be stored in buffer pages, but before that server check for the required page is available in shared memory or not and if that required page is not available there then the only server will check that page into a disk.
Cache Hit ratio comes into the picture here, like the number of times the server looking for pages, the number of times the server looking for pages in shared memory, pages searched in the disk, the number of times the server actually found the required page, etc are the parameters affects cache hit ratio.
Below are the parameter details:
- Dskreads :- The number of actual reads from disk
- Pagreads :– The number of pages read
- Bufreads :- Is the number of reads from shared memory
- %cached :- The percent of reads cached in the buffer pool.
- Dskwrits :- The actual number of physical writes to disk. This number includes the writes for the physical and logical logs reported in onstat -l .
- Pagwrits :- The number of pages written
- Bufwrits :- The number of writes to shared memory
- %cached :- The percent of writes cached in the buffer pool.
%cached plays important role in determining how frequent disk reads/writes are and it is important because the number of disk reads/writes are inversely proportional to performance ie. If the server is accessing disk more often, it’s definitely going to affect your performance because the higher the number of disk reads/writes the slower the cache ratio (ultimately performance).
Let’s further understand this example:
Suppose we are inserting 10 records in “abc” table then, in this case, the server will fetch all required pages in shared memory only once at the time of the first insertion, further it will not go for another disk reads. Unless the server requires another page to store data. if the server frequently required to access more pages to fill the next records then performance will be hampered. We can imagine this in day to day examples. If we are cooking pizza and we do not have anything in hand then we are required to out for every ingredient then it is for sure that pizza is going to get delayed, but if we have all ingredients near, we can cook pizza faster, can’t we? Exactly, in the same way, the server needs as much information as possible near to it.
The read and write cache rates can vary dramatically depending upon the applications and the type and size of the data being operated on. In general, both the read cache rate and write cache rate should be in the 80 to 90th percentile. If these rates are consistently lower than 80%, you should consider increasing the value of the Buffers in your OneDB configuration file to achieve higher read and write cache rates. Low read and write cache rates indicate OneDB is doing a lot more disk reads and writes than it should, which will slow down the overall database engine performance. Few other parameters that can be tuned to improve write cache rates BUFFERPOOL ( lrus , lru_max_dirty, lru_min_dirty ), CKPTINTVL, PHYSLOG, CLEANERS.
This portion of the -p display tabulates the number of times different ISAM calls were executed. The calls occur at the lowest level of operation and do not necessarily correspond one-to-one with SQL statement execution. A single query might generate multiple ISAM calls. These statistics are gathered across the database server and cannot be used to monitor activity on a single database unless only one database is active or only one database exists.
- Isamtot :- The total number of calls means indicates how much actual work has been performed (total isam operations )
- Open :- Increments when a tblspace is opened
- Start :- Increments the pointer within an index
- Read :- Increments when the read function is called
- Write :- Increments with each write call
- Rewrite :- Increments when an update occurs
- Delete :- Increments when a row is deleted
- Commit :- Increments each time that an iscommit() call is made No one-to-one correspondence exists between this value and the number of explicit COMMIT WORK statements that are executed.
- Rollbk :- Increments when a transaction is rolled back
The Generic Page Manager provides an API for OneDB® to manage on standard pages in the database server buffer pool. The following table describes the Generic Page Manager fields in the onstat -p command output.
- gp_read :- The number of generic page reads
- gp_write :- The number of generic page writes
- gp_rewrt :- The number of generic page updates
- gp_del :- The number of generic page deletes
- gp_alloc :- The number of generic page allocations
- gp_free :- The number of generic pages freed and returned to tblspaces
- gp_curs :- The number of cursors used against generic pages
The next portion is the number of times that a resource was requested when none was available:
- Ovlock :- Number of times that sessions attempted to exceed the maximum number of locks.
- Ovuserthread :- The number of times that a user attempted to exceed the maximum number of user threads. The user cannot connect because of a user table overflow. You can verify user table overflow by looking for ovuserthreads greater than 0 in onstat -p output.
- Ovbuff :-The number of times a changed buffer from this buffer pool was written to disk specifically to create a free buffer to read another requested page. If the ovbuffvalue is high, the buffer pool might not be large enough to hold the working set of pages that are needed by applications. An insufficient buffer pool can lead to performance degradation.
- Usercpu :- Is the total user CPU time that all user threads use, expressed in seconds. This entry is updated every 15 seconds. (refer onstat -g glo for more details )
- Syscpu :- The total system CPU time that all user threads use, expressed in seconds. This entry is updated every 15 seconds. (refer onstat -g glo for more details )
- Numckpts :- The number of checkpoints since the boot time
- Flushes :- The number of times that the buffer pool was flushed to the disk
The next portion of the onstat -p command output contains miscellaneous information, as follows:
- Bufwaits :- Increments each time that a user thread must wait for a buffer. The number of times a thread had to wait for a lock on a buffer in this buffer pool. Higher numbers indicate more contention among multiple threads for mutually incompatible locks on the same pages.
- Lokwaits :- Increments each time that a user thread must wait for a lock
- Lockreqs :- Increments each time that a lock is requested
- Deadlks :- Increments each time that a potential deadlock is detected and prevented
- Dltouts :- Increments each time that the distributed deadlock time-out value is exceeded while a user thread is waiting for a lock
- Ckpwaits :- Is the number of checkpoint waits
- Compress :- Increments each time that a data page is compressed
- Seqscans :- Increments for each sequential scan
xda-RA :- The count of read-aheads that go from index leaves to data pages
idx-RA :- The count of read-aheads that traverse index leaves
da-RA :- The count of data-path-only scans
logrec-RA :- The log records that the database server read ahead
RA-pgsused :- The number of pages used that the database server read ahead default is 128 pages.
Lchwaits :- Stores the number of times that a thread was required to wait for a shared-memory latch. Latch waits typically results from a high volume of processing activity in which the database server is logging most of the transactions.
Monitoring and Tuning – Cached reads and cached writes will usually be above 90%. If they are under 85% in an OLTP system, this may be an indication of a shortage of buffers. Although the %cached writes should be above 85%, the actual value can vary greatly. Low %cached writes may simply indicate that not many actual writes have occurred. This can be determined by examining the number of writes, rewrites, and deletes relative to the number of reads.
Bufwaits show the number of times a thread had to wait to acquire a latch on a buffer. If the readahead values (RA_PAGES and RA_THRESHOLD) are too high, bufwaits may grow excessively. High bufwaits may also be caused by too small a buffer pool relative to the amount of work being performed (see %cached values).
Lock waits should normally be less than 1% of lock requests unless you have applications that hold locks for an inordinate amount of time. If lock waits are excessive, examine the isolation levels used by SQL statements plus the overall design of transaction management.
Checkpoint waits can and do occur in a high volume system. Checkpoints are prevented from occurring if a userthread is in the middle of a critical section (known as a critical write). If a checkpoint must wait, it is usually only for a brief moment.
Read aheads allow the system to perform more efficient I/O when a need arises to scan multiple sequential pages from disks. This capability is highly desired in DSS systems where large table and index scans are normal and in certain types of OLTP processing. In an ideal situation, ixda-RA + idx-RA + da-RA should equal RA-pgsused. However, if the values of RA_PAGES and RA_THRESHOLD are too high, the system may end up reading in more pages for a read ahead than it should. This will result in high bufwaits since the pages being read in will force the existing buffer pool pages to be overwritten or, in the case of dirty pages, flushed out.
For more details kindly visit HCL ONEDB.