Monday, July 20, 2009

Oracle direct IO

By default, init parameter filesystemio_options value is set to none, which means that system file cache buffer is used for data file io and alike. Many Oracle professional think that it should be set to “directio” or “setall” (directio and async io). Metalink note 462072.1, “File System's Buffer Cache versus Direct I/O” and Steve Adams’s web site (http://www.ixora.com.au/tips/avoid_buffered_io.htm) are such examples. The reasons directio is preferred are:
  • Cache data in Oracle data buffer cache and again in file system buffer cache is wasting of memory by double caching.
  • DBA can manage how the data block to be cached inside Oracle but not for file system cache buffer.
  • System buffer cache can mislead the Oracle system statistics for physical IO, since it maybe read from file system cache but as physical read from Oracle point of view.
  • It also complicates database tuning because the cost of physical IO is the not same because it may came from file system buffer cache or true disk IO.
  • On some operating system, it is possible that system buffer cache keeps expanding until all physical memory is consumed, resulting system crash.
I did some testing for Oracle 10g with RedHat AS4 update 3 on NFS (networked file system) and found out the read (full table scan) is similar for system buffer cache or Oracle data buffer cache at rate of about 400 MB/second. Because file system buffer cache read and Oracle data buffer cache read are all from local memory, the result will be the same no matter what kind of file system the data is sitting on. Because the cache age out methods for Oracle data buffer cache and Linux file system buffer cache are different, and the size of Oracle cache vs. file system cache, it is possible data is in one cache but not in the other one. For write, it was surprise to see the io with file system buffer is 3 times faster.

dirctio=yes
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:50.46


dirctio=none
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:12.40

The table gene is about 500 MB in size and is cached so not physical io is needed to read it. From Linux document, the write actually is on file system buffer cache until it is more than 30 seconds old or more that 10% of the cache. Any way, file system buffer cache can smooth out the disk write load.

Whether Oracle uses direct IO is not only dependent on the filesystemio_options parameter, but the OS, file system type, storage device, mount option, device driver and so on. The following procedure can be used to check if direct IO is used in Linux.
  1. startup database mount.
  2. ind the db writer by “select spid, program from v$process where program like '%DBW%';”, write down the spid number.
  3. do strace of db writer process, such as strace -o /tmp/strace.out -e trace=open -p 2260
  4. alter database open.
  5. Open the strace output file and looking for “O_DIRECT”, the sample file is some thing like open("/.../tst01.dbf", O_RDWR|O_SYNC|O_DIRECT) = 22.
Linux by default, use all memory that is not used by processes as file system buffer cache and allocate it out to processes as needed. However, it is possible that the memory used by Oracle is paged out to virtual memory and large size of physical memory is used for file system buffer.

In conclusion, to set filesystemio_options to directio or not is dependent on you system, hardware, database, application and load pattern. This is why DBA is paid for and is needed to make such decision.

No comments:

Post a Comment