Friday, December 11, 2009

Oracle rolling upgrade

Oracle rolling upgrade allows the Oracle RAC configuration to apply patch or upgrade one node at one time, so the RAC cluster is still online during the patch or upgrade. Currently, most new releases of CPU patches and upgrades support rolling upgrade. Rolling upgrade works very well for software upgrade but many patches and upgrades need to run database script in exclusive mode. Because RAC has only one database image so DBAs have to use logical standby database to do the rolling upgrade, which is complicated and costly in term of Oracle license and hardware. Until one day, Oracle can extend Edition-based Redefinition new feature in 11g R2 to sys tables and data dictionary, rolling upgrade will become more common practice.

Monday, December 7, 2009

Oracle row level locking

We all know that Oracle provides row level lock for database transaction. I’d like to blog how it is done in details.

Let us start with creating a test table:

SQL> create table test (name varchar2(100)) tablespace users;

Table created.

SQL> insert into test values ('Mike');

1 row created.

SQL> insert into test values ('Steve');

1 row created.

SQL> insert into test values ('Tom');

1 row created.

SQL> commit;

Commit complete.

SQL> select rowid, name from test;

ROWID NAME

------------------ --------------------

AAANzsAAEAAAAPHAAA Mike

AAANzsAAEAAAAPHAAB Steve

AAANzsAAEAAAAPHAAC Tom

SQL> select dbms_rowid.rowid_relative_fno('AAANzsAAEAAAAPHAAA') file_no from dual;

FILE_NO

----------

4

SQL> select dbms_rowid.rowid_block_number('AAANzsAAEAAAAPHAAA') block_no from dual;

BLOCK_NO

----------

967

SQL> alter system dump datafile 4 block 967;

System altered.

Now let us do some updates on the test table

SQL> update test set name = 'Ed' where name = 'Mike';

1 row updated.

The update QSL should start a transaction and place row level lock on the updated row. To see the locks, we query V$LOCK and V$TRANSACTION:

SQL> select ADDR, sid, type, id1, id2, lmode from v$lock where sid = 149;

ADDR SID TY ID1 ID2 LMODE

---------------- ---------- -- ---------- ---------- ----------

0000000087A8D950 149 TM 56556 0 3

0000000087AAD388 149 TX 655364 1172 6

SQL> select addr, xidusn, xidslot, xidsqn from v$transaction;

ADDR XIDUSN XIDSLOT XIDSQN

---------------- ---------- ---------- ----------

0000000087AAD388 10 4 1172

From V$LOCK, we get table level DML lock TM with lock mode 3, which is row exclusive lock. ID1 56556 is the object id of the locked table, which is “TEST”. Other lock related to the update session is transaction lock TX with ID1 655364, which is decimal value of undo segment number and slot number, and ID2 1172, which is transaction sequence number. Convert 655364 to hex is 0xA0004 and it means undo segment 10 (0xA = 10), slot 4.

From V$TRANSACTION with same ADDR (0000000087AAD388), we got XIDUSN (Undo Segment Number) 10, XIDSLOT (Undo segment slot) 4, and XIDSQN (transaction sequence number) 1172. All matches well with V$LOCK.

Now we do another update and exam the V$LOCK and V$TRQANSACTION again.

SQL> update test set name = 'David' where name = 'Steve';

1 row updated.

SQL> select ADDR, sid, type, id1, id2, lmode from v$lock where sid = 149;

ADDR SID TY ID1 ID2 LMODE

---------------- ---------- -- ---------- ---------- ----------

0000000087A8D950 149 TM 56556 0 3

0000000087AAD388 149 TX 655364 1172 6

SQL> select addr, xidusn, xidslot, xidsqn from v$transaction;

ADDR XIDUSN XIDSLOT XIDSQN

---------------- ---------- ---------- ----------

0000000087AAD388 10 4 1172

Well, we still find only one TM table lock with row exclusive lock mode for this session even we have two row level locks, one for each row. In fact, TM lock with row exclusive lock mode means that there is a shared lock on the table and some rows in the table have exclusive lock. Oracle does not provide a system interface to view the row level locking. To view the Oracle row level locking, we have to go deep into Oracle data block.

Do the data block dump again and we find the interesting output:

0067F39D0 00C07F43 0000000B 00000000 022CC506 [C.............,.]

0067F39E0 64450201 0301002C 2C6D6F54 44050102 [..Ed,...Tom,...D]

0067F39F0 64697661 0401002C 656B694D C6900601 [avid,...Mike....]

……

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46

0x02 0x000a.004.00000494 0x00800342.0396.16 ---- 2 fsc 0x0000.00000000

-----

tab 0, row 0, @0x1f7a

tl: 6 fb: --H-FL-- lb: 0x2 cc: 1

col 0: [ 2] 45 64

tab 0, row 1, @0x1f87

tl: 9 fb: --H-FL-- lb: 0x2 cc: 1

col 0: [ 5] 44 61 76 69 64

tab 0, row 2, @0x1f80

tl: 7 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 3] 54 6f 6d

The second section is Interested Transaction List (Itl) and has two entries, 01 and 02. The Flag for it is ---- and that means a active transaction. Lck is 2, means two row level locking in this block. The Xid (Transaction ID) is 0x000a.004.00000494 and it means undo segment A (Undo segment 10 in hex), slot 4 and sequence 0x494, which is 1172 in decimal. So Itl entry 2 is the transaction lock for the two updates. In the third section for detailed information on each row, you can see “lb: 0x2” for row 0 and row 1. “lb” here stands for “lock byte” and the value 0x2 is for Itl entry 2. That is our row level locking. It is actually under TX lock, not TM lock.

Oracle actually stores the row level locking at row heading highlighted in yellow in section 1 of the dump output.

Row 0 heading is 022C 0201, the block dump is from Linux system with small endian format. To put the heading in right sequence, it is 2C 02 01 02 and the “02” in green highlight is the row level locking byte. Similarly, the row 1 heading is 2C 02 01 05. The heading for the third row is 2C 00 01 03, which is highlighted in blue in the dump section 1 and the lock byte is “00”, which means not locking.

So far so good. We do a commit and another data block dump. I expect the lock bytes for row 0 and row one to be changed to 0.

SQL> commit;

Commit complete.

And of course the TM lock and TX lock for this session are release. The data block dump after commit is following:

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46

0x02 0x000a.004.00000494 0x00800342.0396.16 --U- 2 fsc 0x0000.002dc9b8

……

tab 0, row 0, @0x1f7a

tl: 6 fb: --H-FL-- lb: 0x2 cc: 1

col 0: [ 2] 45 64

tab 0, row 1, @0x1f87

tl: 9 fb: --H-FL-- lb: 0x2 cc: 1

col 0: [ 5] 44 61 76 69 64

tab 0, row 2, @0x1f80

tl: 7 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 3] 54 6f 6d

Well, the lock byte for row 0 and row 1 do not change at all after commit. The only thing changed for this data block is the flag for Itl entry 2, besides block scn and checksum. Flag “--U-“ marks the transaction inactive and consider all row level locks under this Itl is release. This is why Oracle can do commit operation very fast. The row lock cleanup is piggy packed with next DML for the block and the Itl flag changes to “C---“ and lock byte changes to 0 after the clean up.

SQL> update test set name = 'Adam' where name = 'Ed';

1 row updated.

SQL> rollback;

Rollback complete.

SQL> alter system dump datafile 4 block 967;

System altered.

And the dump file is:

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46

0x02 0x000a.004.00000494 0x00800342.0396.16 C--- 0 scn 0x0000.002dc9b8

……

tab 0, row 0, @0x1f6c

tl: 6 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 2] 45 64

tab 0, row 1, @0x1f87

tl: 9 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 5] 44 61 76 69 64

tab 0, row 2, @0x1f80

tl: 7 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 3] 54 6f 6d

To understand Oracle row level locking implementation is not just for fun, but has practical use as well. The row locking implementation for Oracle is at row heading level and do not have system wide structure to track them, so it is very scalable and make RAC row level locking relatively simple. Oracle do not have latch directly to protect row level locking, so if a process finds the row is locked by another session, it escalate row level locking to transaction lock and wait the transaction to be done. For rare case, such implementation blocks a session unnecessary.

Here is an example that was demonstrated by Scott Martin at http://www.tlingua.com/new/articles/Chapter1.html.

Let us start with session1:

SQL> update test set name = 'Adam' where name = 'Ed';

1 row updated.

SQL> savepoint a;

Savepoint created.

SQL> update test set name = 'Mike' where name = 'David';

1 row updated.

Now open session 2 and update row 2:

SQL> update test set name = 'Dave' where name = 'David';

Session is blocked by session 1 and waits the row lock.

From session 1, do rollback of update the row 2.

SQL> rollback to savepoint a;

Rollback complete.

However, session 1 is still waiting the row locking. Do a block dump and lock byte for row 2 is 0x0, so no row locking on the row session 2 is updating. In fact, session 2 is not waiting on row locking any more, it is waiting on session 1 transaction.

Block dump is

tab 0, row 1, @0x1f53

tl: 9 fb: --H-FL-- lb: 0x0 cc: 1

col 0: [ 5] 44 61 76 69 64

Let us start session 3 to update the same row:

session 3

SQL> update test set name = 'Magic' where name = 'David';

1 row updated.

The row is updated and session 2 is still blocked. Rollback session 1 and session 2 is waiting on session 3. Finally, the session 3 is rollback and session2’s update is completed.

Monday, November 2, 2009

Tuning SQL for third party application

DBAs often face the challenges to tuning the third party applications that SQL statements it interacts with database can not be changed and hinted. Additionally, SQL plan can change dramatically because of Oracle optimizer upgrade and/or statistics change of the database. Such changes may have negative impact on the application performance.

One way to tune such application is to add views between the application and base tables so application queries views created by DBA instead of base tables. DBA can add hints for the views to make the application SQL use right execution plan. However, it is kind of troublesome because you have to rename all base tables and create views with original base table name. Plus, each base table may serve more than one SQL statement, so the hint on view can be harmful for some SQL statements.

Another way is to change database init parameters. Because it is system level tuning, DBA makes balance to improve the critical SQL statements without making negative impact on other SQLs too much. If an upgrade makes performance to deteriorate, set optimizer_features_enable to old version is a quick fix before a better solution is found.

What if DBA can construct an execution plan just for the slow SQL statement that is generated from application without modify it with hints? There is a way to do so by using stored outline. A stored outline is a set of hints Oracle use for a matched SQL string. If Oracle finds a SQL string in stored outline, it will add the hints for this stored outline to cost based optimizer. DBA can run the SQL to be tuned in a testing environment, with the change of init parameter either at system level or session level and add hint at view level similar to the methods we discussed previously to generate desired execution plan for the SQL and generate stored outline. Then such stored outline can be import into production database. By doing this way, no impact on any other SQL except the one you want to tune.

As we know, the stored outline can not change based on database statistics and if the hint is not valid any more, for example, the index for index hint is dropped, the optimizer just ignore the hint and make the good execution plan into a bad one. Start with Oracle 11g, the preferred way is to use SQL plan baseline. The SQL plan baseline stores the execution plan instead of hints, so it is more robust to optimizer changes. Additional, the SQL plan baseline can evolve to accept the better execution plan either manually or automatically.

Here are links to mini how to for stored outline and to mini how to for SQL Plan Baseline

Thursday, October 15, 2009

ORA-00054: resource busy with no wait

Often find Oracle is struggling and try to help with additional index or changing table parallel degree but only getting ORA-00054 "resource busy and acquire with NOWAIT specified" error? Well , you have few options:

1. Go away and have a cup of coffee, then try again. If you are lucky, you can do this time.
2. Kill the session that locks database objects you want to modify. You do not have to wait but users may not be too happy about it.
3. Quiesce database by “ALTER SYSTEM QUIESCE RESTRICTED”. Wait until all active transactions are completed. So you can modify objects as DBA. However, you may wait a long time for changing database into quiesce state if some long transaction is running. Additional, it will block all user sessions, not just one that locks your objects.
4. You can brute force the DDL on locked objects by simple PL/SQL, which I found from DBA Tools web site (http://www.dbatools.net/experience/oracle_resource_busy.html).

declare
resource_busy exception;
pragma exception_init (resource_busy,-54);
begin
loop
begin
execute immediate 'ddl sql …';
exit;
exception
when resource_busy then
dbms_lock.sleep(0.01);
end;
end loop;
end;

It works pretty well for me. Have a spare CPU and a cup of hot coffee before you run the script.

For 11g database, you can make ddl to wait by setting ddl_lock_timeout to number of seconds to wait before getting ORA-054 error. During the waiting time, Oracle tries to acquire the object lock needed until time out. No PL/SQL coding is necessary.

Friday, July 24, 2009

IO Measurement

It is not news that disk IO is slow. Comparing with the speed of CPU to process the data, or memory access of the data, disk IO is usually 1000 times slower. Because Oracle database relies on underneath computer system functions, it is common that disk IO is the limiting factor for database performance, especially it is true for large data warehouse. Therefore, measuring the disk IO throughput is important for database performance tuning, database capacity planning. With the trend of virtualization of data storage layer, IO throughput measuring is more import that before for Oracle database tuning and troubleshooting.

First let us measure the disk IO at server level. I use Linux as example and hdparm is a good starting point, such as

hdparm –tT /dev/sda

You need root privilege to do so.

#oracle >sudo /sbin/hdparm -tT /dev/cciss/c0d0p3

/dev/cciss/c0d0p3:

Timing cached reads: 2264 MB in 2.00 seconds = 1130.48 MB/sec

Timing buffered disk reads: 130 MB in 3.00 seconds = 43.30 MB/sec

#oracle >

If you do not have root privilege to run hdparm, try dd

#oracle>ls -l *

-rw-r--r-- 1 oracle dba 268435456 Jul 24 12:52 testfile

#oracle>time dd if=testfile of=/dev/null bs=16k

16384+0 records in

16384+0 records out

7.70s real 0.01s user 0.39s system

7.7 seconds to read 256 MB file so the reads is 33 MB/s. One catch is that data may read from file system buffer. Run same command again and the reading is totally different.

#oracle>time dd if=testfile of=/dev/null bs=16k

16384+0 records in

16384+0 records out

0.19s real 0.00s user 0.18s system

0.19 second to read 256 MB file so the reads is 1347 MB/s. That is the cache access speed. To clean out file system buffer, you can write a large file with double size of physical memory of the server using “dd” or use “/proc/sys/vm/drop_caches” to clear cache as root. New version dd command can support option “iflag=direct” that bypass file system buffer.

To benchmark the server write speed, one can use dd command like:

#oracle>time dd if=/dev/zero of=testfile bs=16k count=16384

16384+0 records in

16384+0 records out

8.70s real 0.03s user 0.65s system

It is easy to calculate that the write speed is 256 MB / 8.7 = 29 MB/s. Similarly, you can use option “oflag=direct” to bypass file system buffer cache for new version of dd.

Next, let us test how oracle utilizes the disk IO. In order to show true disk IO, the initial parameter filesystemio_options need to be set to “setall”, which is using directio that bypasses file system buffer cache and is using “async io” thst allowes cocurrent IO to disk. Before each test case, do “alter system flush buffer_cache;” to flush the data cache buffer.

It is easy to test large read and write by using “create table as select” and using optimal parallelism and nologging. For example:

SQL> create table gene tablespace mm_data parallel (degree 4) nologging as select * from gene2 ;

Table created.

Elapsed: 00:00:25.90

Table gene2 is cached so the reading of gene table does not compete IO with creating table gene. The size of gene table is 450 MB so the write speed is 17 MB/s.

To test bulk reading speed, the gene table is queried by full table scan with optimized parallelism. Autotrace is set to on in SQLPLUS to make sure of full table scan and physical read is accounted for majority of query time.

SQL> select /*+ parallel (gene, 4) */count(*) from gene;

COUNT(*)

----------

6311685

Elapsed: 00:00:12.40

Simple calculation shows that the reading speed is 36 MB/s. The calculation ignores memory sort for to get the count. It is also hard to timing physical IO for other database operations, such as index read, DML and so on.

To timing IO for database operation other than bulk read and write, SQL trace is used. To start a sql trace, run “alter session set events '10046 trace name context forever, level 12';” in sql plus. The trace file is under UDUMP directory and tkprof is used to format the report. Below is an sample output in report.

SELECT PROTEIN_ACCESSION

FROM

GENE WHERE ID = :B1

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 100001 8.26 7.87 0 0 0 0

Fetch 100001 5.38 114.56 37085 418332 0 100001

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 200003 13.64 122.43 37085 418332 0 100001

Misses in library cache during parse: 0

Optimizer mode: ALL_ROWS

Parsing user id: 59 (recursive depth: 1)

Rows Row Source Operation

------- ---------------------------------------------------

100001 TABLE ACCESS BY INDEX ROWID GENE (cr=418332 pr=37085 pw=0 time=114797430 us)

100001 INDEX UNIQUE SCAN GENE_PK (cr=300003 pr=6875 pw=0 time=21453148 us)(object id 142926)

As shown in execution plan, it is an index read followed by table row read. We will calculate the small physical read of table. From execution plan, we know that there are 418332 memory read and 37085 disk read and total time for table access is 114797430 micro-seconds. From other section of report, we know that memory access is 30 microseconds, so disk read is 114797430 – 30 * 418332 = 102247470 micro-seconds or 2757 microseconds per IO read. We know the data block size is 16K, so the read speed is 6 MB/s for small read.

Oracle 11g has new package called DBMS_RESOURCE_MANAGER.CALIBRATE_IO that measure and calibrate disk IO for cost based optimizer. It measures max_iops, max_mbps and max_latency.

max_iops : Maximum number of I/O requests per second that can be sustained. The I/O requests are randomly-distributed, database-block-sized reads. It is similar to small read.

max_mbps: Maximum throughput of I/O that can be sustained, expressed in megabytes per second. The I/O requests are randomly-distributed, 1 megabyte reads. It is like large read.

max_latency: Average latency of database-block-sized I/O requests at max_iops rate, expressed in milliseconds

Because caching in storage devices, such as disk, NAS and SAN, it is very hard to get measurement for disk IO and disk IO only. But from practical point of view, it is close to what IO speed for an average database system. The actual measurement can vary based percentage of data read from storage cache and other system load if the storage is shared.

Monday, July 20, 2009

Data pump sample parameter

Data pump sample parameter can be used for creating a subset of data by table or schema. For example, if sample=10, only 10% of data is exported by table or by schema. It is great if you want to create a testing or development database from a production database and scale it down for various reasons. However, there is one big catch for doing so. The referential constraint is not enforced by sampling. So when you do import (dpimp) of the sampled data dump, you will get ORA-02298 error if you have foreign key constraints. Here is an example of log file for the data pump import:


;;;

Import: Release 10.2.0.2.0 - 64bit Production on Friday, 17 July, 2009 16:29:45

Copyright (c) 2003, 2005, Oracle. All rights reserved.

;;;

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production

With the Partitioning, OLAP and Data Mining options

Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded

Starting "SYSTEM"."SYS_IMPORT_FULL_01": system/******** directory=WTC_EXP_DIR DUMPFILE=dbpump_sample10_exp.dmp remap_schema=dbpump:dbpump2 logfile=dbpump_sample10_imp.log

Processing object type SCHEMA_EXPORT/USER

ORA-31684: Object type USER:"DBPUMP2" already exists

Processing object type SCHEMA_EXPORT/SYSTEM_GRANT

Processing object type SCHEMA_EXPORT/ROLE_GRANT

Processing object type SCHEMA_EXPORT/DEFAULT_ROLE

Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA

Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE

Processing object type SCHEMA_EXPORT/TABLE/TABLE

Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA

. . imported "DBPUMP2"."CHILD" 2.132 MB 101427 rows

. . imported "DBPUMP2"."PARENT" 20.92 KB 979 rows

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT

Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT

ORA-39083: Object type REF_CONSTRAINT failed to create with error:

ORA-02298: cannot validate (DBPUMP2.SYS_C0052496) - parent keys not found

Failing sql is:

ALTER TABLE "DBPUMP2"."CHILD" ADD FOREIGN KEY ("P_ID") REFERENCES "DBPUMP2"."PARENT" ("P_ID") ENABLE

Job "SYSTEM"."SYS_IMPORT_FULL_01" completed with 2 error(s) at 16:29:52


Looks like I still need my PLSQL package for doing data dump of subset of data from production database.

Oracle direct IO

By default, init parameter filesystemio_options value is set to none, which means that system file cache buffer is used for data file io and alike. Many Oracle professional think that it should be set to “directio” or “setall” (directio and async io). Metalink note 462072.1, “File System's Buffer Cache versus Direct I/O” and Steve Adams’s web site (http://www.ixora.com.au/tips/avoid_buffered_io.htm) are such examples. The reasons directio is preferred are:
  • Cache data in Oracle data buffer cache and again in file system buffer cache is wasting of memory by double caching.
  • DBA can manage how the data block to be cached inside Oracle but not for file system cache buffer.
  • System buffer cache can mislead the Oracle system statistics for physical IO, since it maybe read from file system cache but as physical read from Oracle point of view.
  • It also complicates database tuning because the cost of physical IO is the not same because it may came from file system buffer cache or true disk IO.
  • On some operating system, it is possible that system buffer cache keeps expanding until all physical memory is consumed, resulting system crash.
I did some testing for Oracle 10g with RedHat AS4 update 3 on NFS (networked file system) and found out the read (full table scan) is similar for system buffer cache or Oracle data buffer cache at rate of about 400 MB/second. Because file system buffer cache read and Oracle data buffer cache read are all from local memory, the result will be the same no matter what kind of file system the data is sitting on. Because the cache age out methods for Oracle data buffer cache and Linux file system buffer cache are different, and the size of Oracle cache vs. file system cache, it is possible data is in one cache but not in the other one. For write, it was surprise to see the io with file system buffer is 3 times faster.

dirctio=yes
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:50.46


dirctio=none
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:12.40

The table gene is about 500 MB in size and is cached so not physical io is needed to read it. From Linux document, the write actually is on file system buffer cache until it is more than 30 seconds old or more that 10% of the cache. Any way, file system buffer cache can smooth out the disk write load.

Whether Oracle uses direct IO is not only dependent on the filesystemio_options parameter, but the OS, file system type, storage device, mount option, device driver and so on. The following procedure can be used to check if direct IO is used in Linux.
  1. startup database mount.
  2. ind the db writer by “select spid, program from v$process where program like '%DBW%';”, write down the spid number.
  3. do strace of db writer process, such as strace -o /tmp/strace.out -e trace=open -p 2260
  4. alter database open.
  5. Open the strace output file and looking for “O_DIRECT”, the sample file is some thing like open("/.../tst01.dbf", O_RDWR|O_SYNC|O_DIRECT) = 22.
Linux by default, use all memory that is not used by processes as file system buffer cache and allocate it out to processes as needed. However, it is possible that the memory used by Oracle is paged out to virtual memory and large size of physical memory is used for file system buffer.

In conclusion, to set filesystemio_options to directio or not is dependent on you system, hardware, database, application and load pattern. This is why DBA is paid for and is needed to make such decision.

Tuesday, July 7, 2009

Oracle Lock and Latch comparison

Oracle uses both lock and latch to protect data integrity and provides concurrency for user processes. But there are a number of differences between lock and latch.

• Lock protects disk data structure and latch protects memory data structure
• Lock is complex data structure and has shared access mode, latch is simple data structure with exclusive mode only
• Lock is slow because it is complex and involves disk operation, latch is fast because it is simple and memory operation only.
• Lock is queued so dead lock is possible. Latch is hit and run that uses spin and sleep for waiting.
• Lock is not needed for read because of Oracle’s multiversion concurrency control (MVCC), latch is needed for all data access in memory, including read.

Friday, March 27, 2009

TopLink and Hibernate

Talking about Object Relational Mapping (ORM) for Java, Hibernate comes to mind for almost every developer. Recently, I have opportunity to evaluate another ORM API – TopLink and want to share my impression of this product and compare it with Hibernate.


TopLink is an ORM API from Oracle and it is similar to Hibernate in term of concept. It is not free product but has a branched release called Eclipselink, which is open source and has most functionality of its commercial counterpart. When we pick up an API for a project, there are always trade offs that have positive or negative impacts on project quality, timeline and cost. Based on my testing, I would like to explain how the choice of TopLink vs. Hibernate will impact the project from my point of view. It is fair to say that except the most simple and basic application, ORM, such as Hibernate or TopLink, is a better choice than plain old JDBC. ORM provides a Java persistent layer mapping to relational database tables and give Java programmer a familiar set of classes to build on. You do not need to deal with SQL if you do not want to. You read and write data at object level instead of table, row or cell level.


But what you like to pick, Hibernate or TopLink? Well, majority would start with Hibernate because it is popular. A lot of people use Hibernate and they have good reasons to do so. Hibernate is an open source API you do not have to pay a red cent for and get tons of features for return. It works well with other open source software, API and framework, such as Spring, Eclipse, Google web tool kit. There are huge numbers of developer social network for Hibernate that you can ask questions and get you start on the right track. In my opinion, the support level is better than many paid service. And if you really want to have somebody to call in panic mode, you can get paid support as well. Because Hibernate is widely used, most bugs for its core functions are fixed and it is a stable and solid API. On the other hand, there are all kinds of API extensions and products based Hibernate from open source and third party vendors. Anything you wish to have, likely someone had the same pain and released a fix for it already. Plus Hibernate is good to put on you resume and make you more marketable. Another feature I like about Hibernate is Hibernate dialect. With Hibernate dialect, The SQL generated is comparable with the SQL engine of the database platform configured and make it possible for the application using Hibernate to be deployed to multiple database platform without code change.


So what TopLink can offer and Hibernate is not so good at it? First, if you use Oracle database, you may want to consider TopLink. Developer can leverage TopLink Workbench GUI tool to work seamless with Oracle database on whole life cycle of ORM, namely, generating table from xml description file or generating from xml description file to table, and generate Java class from xml description file. Hibernate has many tools for configuration but they are not as easy to use as TopLink workbench on Oracle database. Secondly, if you tried of getting LayzyInitializatonException from Hibernate, you may want to try TopLink. TopLink session is persistent as well. (Not like Hibernate, you have to new a session before use it and close it after the use.) In TopLink, you can call session without initialization. TopLink manages database connection or pooling for you so you do not have to worry about it. It is especially useful for lazy loading of child objects. The third point I like is TopLink transaction management. TopLink uses UnitOfWork as transaction container. When an object is registered with UnitOfWork, it creates a clone of such object and leave original object unchanged. Once the transaction is completed, the original object is updated with the clone. If transaction failed, the clone is destroyed and no change to original object in cache. In Hibernate, the change in transaction is done on cached object directly and if the transaction failed, it is developer’s responsibility to rollback the state of modified cache object. Secondly, if there are more than one transactions going on in a session, it is impossible for Hibernate to handle because it can not isolate the changes within the transaction. Unlike Hibernate, which rely on application server or JDBC for transaction management, TopLink comes with its own transaction management, such as cache version based optimistic locking and timestamp based optimistic locking. Developer can setup locking method for each persistence Java class. That is a great feature for developing high performance application based on the data read and write pattern of the application. For TopLink, if a transaction contains both parent objects and children objects, it can write to database of such objects in the right order to avoid Oracle parent record not found error for insert and child record exist error for deletion.


Toplink is a good API for Java based database application in general. But new comer need pay attention to some areas. One of them is cache. You can configure TopLink cache at session level, or application server level. Only newest version of cache is accessible in general for the scope of caching and it may not be the same data as in database. So it is recommended to only read the cached object but never to change it unless you register it into UnitOfWork. Do not assume the cached object is the same as that in database. Always refresh it if you need it synchronized with database.

Saturday, March 7, 2009

Oracle Express edition and spfile

I had some interesting experience using Oracle Express edition few days ago.

Oracle Express is for developer or new DBA for their personal use as development platform or play ground. It designed to be auto managed and comes with Apex application for simple database administration and object browsing.

I installed it few days ago to test out a product and it is very easy to install. After installation, I logged in Apex application and found out I can set target SGA memory so I changed it to 50 MB and bounced database. Looks like a minor change, right? However, I can not start the database any more and it complained the memory was too small to load large pool. I do not know what an average developer would do at this point. I know I need to change spfile but to do that, I need to start up the instance. But I can not start up the instance because target SGA parameter setting in spfile is too small. It seems my only option was to reinstall database and lost all data I added in. I ended up to open the spfile with text editor and made a pfile out of it. So much for self management of Oracle Express.

Thursday, February 26, 2009

Oracle RAC - Recap

What is RAC

RAC High Availability

Transparent Application failover
  • Connection failover is very fast (in few seconds)
  • Query failover is transparent as well and resumed from starting point
  • DML (insert, update, delete) and DDL (create table, alter table, …) failover is not supported. Need application layer error handling to make it transparent to user
Rolling upgrade
  • Not for all patches and upgrades
Management Complexity
  • More complicated, more chances to fail. (cluster Ready services daemon, Cluster Synchronization Services daemon, voting disks, Oracle Cluster Registry, virtual IP address, shared file system, to name a few.)
  • Require database bounce for some database initial parameter changes that can be changes dynamically for non-RAC configuration
RAC High Scalability

Query scalability
IO intensive query, such as full table scan, is limited by disk IO. Therefore, add RAC nodes will not scale well even there are multiple query processes to do the job. RAC could scale up well with number of connections if most read is logical read. (Memory access) But it is not typical situation for Oracle database.
OLTP scalability
OLTP scalability is 30% based on our testing. The limiting factors here is memory latches and inter-node memory data transfer. Even with multiple nodes, the memory latches are shared to protect data integration. For our testing, the inter-node connections are gigabit network card and because of latency, Remote memory access (cache fusion) is about 300 times slower than local memory access. It is only few times faster than disk access.
Load balance
Connection to a node is random assigned with weight on current node load. It works will for the most of cases. The connection will not redistributed to light load node once it is connected to a instance.

RAC Cost of Ownership

Large SMP always cost more that smaller server cluster with similar number of CPU and memory. A 32 CPU SMP costs $1.25 million, UNIX server cluster with same combined number of CPUs cost 1/3. Linux cluster cost 1/5. However, RAC license for each CPU is $20,000. Unless you build a RAC with more than 8 combined CPUs, you are not going to save cost. Oracle gives away RAC lisence for Standard Edition with ASM storage for free up to 4 -CPU configuration. It is very attractive if you want a high availability system. For large system, SMP always faster than RAC if they both have the same amount of CPU and memory. You get what you pay for.


RAC web Links