Oracle rolling upgrade allows the Oracle RAC configuration to apply patch or upgrade one node at one time, so the RAC cluster is still online during the patch or upgrade. Currently, most new releases of CPU patches and upgrades support rolling upgrade. Rolling upgrade works very well for software upgrade but many patches and upgrades need to run database script in exclusive mode. Because RAC has only one database image so DBAs have to use logical standby database to do the rolling upgrade, which is complicated and costly in term of Oracle license and hardware. Until one day, Oracle can extend Edition-based Redefinition new feature in 11g R2 to sys tables and data dictionary, rolling upgrade will become more common practice.
Friday, December 11, 2009
Monday, December 7, 2009
Oracle row level locking
We all know that Oracle provides row level lock for database transaction. I’d like to blog how it is done in details.
Let us start with creating a test table:
SQL> create table test (name varchar2(100)) tablespace users;
Table created.
SQL> insert into test values ('Mike');
1 row created.
SQL> insert into test values ('Steve');
1 row created.
SQL> insert into test values ('Tom');
1 row created.
SQL> commit;
Commit complete.
SQL> select rowid, name from test;
ROWID NAME
------------------ --------------------
AAANzsAAEAAAAPHAAA Mike
AAANzsAAEAAAAPHAAB Steve
AAANzsAAEAAAAPHAAC Tom
SQL> select dbms_rowid.rowid_relative_fno('AAANzsAAEAAAAPHAAA') file_no from dual;
FILE_NO
----------
4
SQL> select dbms_rowid.rowid_block_number('AAANzsAAEAAAAPHAAA') block_no from dual;
BLOCK_NO
----------
967
SQL> alter system dump datafile 4 block 967;
System altered.
Now let us do some updates on the test table
SQL> update test set name = 'Ed' where name = 'Mike';
1 row updated.
The update QSL should start a transaction and place row level lock on the updated row. To see the locks, we query V$LOCK and V$TRANSACTION:
SQL> select ADDR, sid, type, id1, id2, lmode from v$lock where sid = 149;
ADDR SID TY ID1 ID2 LMODE
---------------- ---------- -- ---------- ---------- ----------
0000000087A8D950 149 TM 56556 0 3
0000000087AAD388 149 TX 655364 1172 6
SQL> select addr, xidusn, xidslot, xidsqn from v$transaction;
ADDR XIDUSN XIDSLOT XIDSQN
---------------- ---------- ---------- ----------
0000000087AAD388 10 4 1172
From V$LOCK, we get table level DML lock TM with lock mode 3, which is row exclusive lock. ID1 56556 is the object id of the locked table, which is “TEST”. Other lock related to the update session is transaction lock TX with ID1 655364, which is decimal value of undo segment number and slot number, and ID2 1172, which is transaction sequence number. Convert 655364 to hex is 0xA0004 and it means undo segment 10 (0xA = 10), slot 4.
From V$TRANSACTION with same ADDR (0000000087AAD388), we got XIDUSN (Undo Segment Number) 10, XIDSLOT (Undo segment slot) 4, and XIDSQN (transaction sequence number) 1172. All matches well with V$LOCK.
Now we do another update and exam the V$LOCK and V$TRQANSACTION again.
SQL> update test set name = 'David' where name = 'Steve';
1 row updated.
SQL> select ADDR, sid, type, id1, id2, lmode from v$lock where sid = 149;
ADDR SID TY ID1 ID2 LMODE
---------------- ---------- -- ---------- ---------- ----------
0000000087A8D950 149 TM 56556 0 3
0000000087AAD388 149 TX 655364 1172 6
SQL> select addr, xidusn, xidslot, xidsqn from v$transaction;
ADDR XIDUSN XIDSLOT XIDSQN
---------------- ---------- ---------- ----------
0000000087AAD388 10 4 1172
Well, we still find only one TM table lock with row exclusive lock mode for this session even we have two row level locks, one for each row. In fact, TM lock with row exclusive lock mode means that there is a shared lock on the table and some rows in the table have exclusive lock. Oracle does not provide a system interface to view the row level locking. To view the Oracle row level locking, we have to go deep into Oracle data block.
Do the data block dump again and we find the interesting output:
0067F39D0 00C07F43 0000000B 00000000 022CC506 [C.............,.]
0067F39E0 64450201 0301002C 2C6D6F54 44050102 [..Ed,...Tom,...D]
0067F39F0 64697661 0401002C 656B694D C6900601 [avid,...Mike....]
……
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46
0x02 0x000a.004.00000494 0x00800342.0396.16 ---- 2 fsc 0x0000.00000000
-----
tab 0, row 0, @0x1f7a
tl: 6 fb: --H-FL-- lb: 0x2 cc: 1
col 0: [ 2] 45 64
tab 0, row 1, @0x1f87
tl: 9 fb: --H-FL-- lb: 0x2 cc: 1
col 0: [ 5] 44 61 76 69 64
tab 0, row 2, @0x1f80
tl: 7 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 3] 54 6f 6d
The second section is Interested Transaction List (Itl) and has two entries, 01 and 02. The Flag for it is ---- and that means a active transaction. Lck is 2, means two row level locking in this block. The Xid (Transaction ID) is 0x000a.004.00000494 and it means undo segment A (Undo segment 10 in hex), slot 4 and sequence 0x494, which is 1172 in decimal. So Itl entry 2 is the transaction lock for the two updates. In the third section for detailed information on each row, you can see “lb: 0x2” for row 0 and row 1. “lb” here stands for “lock byte” and the value 0x2 is for Itl entry 2. That is our row level locking. It is actually under TX lock, not TM lock.
Oracle actually stores the row level locking at row heading highlighted in yellow in section 1 of the dump output.
Row 0 heading is 022C 0201, the block dump is from Linux system with small endian format. To put the heading in right sequence, it is 2C 02 01 02 and the “02” in green highlight is the row level locking byte. Similarly, the row 1 heading is 2C 02 01 05. The heading for the third row is 2C 00 01 03, which is highlighted in blue in the dump section 1 and the lock byte is “00”, which means not locking.
So far so good. We do a commit and another data block dump. I expect the lock bytes for row 0 and row one to be changed to 0.
SQL> commit;
Commit complete.
And of course the TM lock and TX lock for this session are release. The data block dump after commit is following:
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46
0x02 0x000a.004.00000494 0x00800342.0396.16 --U- 2 fsc 0x0000.002dc9b8
……
tab 0, row 0, @0x1f7a
tl: 6 fb: --H-FL-- lb: 0x2 cc: 1
col 0: [ 2] 45 64
tab 0, row 1, @0x1f87
tl: 9 fb: --H-FL-- lb: 0x2 cc: 1
col 0: [ 5] 44 61 76 69 64
tab 0, row 2, @0x1f80
tl: 7 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 3] 54 6f 6d
Well, the lock byte for row 0 and row 1 do not change at all after commit. The only thing changed for this data block is the flag for Itl entry 2, besides block scn and checksum. Flag “--U-“ marks the transaction inactive and consider all row level locks under this Itl is release. This is why Oracle can do commit operation very fast. The row lock cleanup is piggy packed with next DML for the block and the Itl flag changes to “C---“ and lock byte changes to 0 after the clean up.
SQL> update test set name = 'Adam' where name = 'Ed';
1 row updated.
SQL> rollback;
Rollback complete.
SQL> alter system dump datafile 4 block 967;
System altered.
And the dump file is:
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.02f.00000493 0x0080033d.0396.36 C--- 0 scn 0x0000.002dba46
0x02 0x000a.004.00000494 0x00800342.0396.16 C--- 0 scn 0x0000.002dc9b8
……
tab 0, row 0, @0x1f6c
tl: 6 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 2] 45 64
tab 0, row 1, @0x1f87
tl: 9 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 5] 44 61 76 69 64
tab 0, row 2, @0x1f80
tl: 7 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 3] 54 6f 6d
To understand Oracle row level locking implementation is not just for fun, but has practical use as well. The row locking implementation for Oracle is at row heading level and do not have system wide structure to track them, so it is very scalable and make RAC row level locking relatively simple. Oracle do not have latch directly to protect row level locking, so if a process finds the row is locked by another session, it escalate row level locking to transaction lock and wait the transaction to be done. For rare case, such implementation blocks a session unnecessary.
Here is an example that was demonstrated by Scott Martin at http://www.tlingua.com/new/articles/Chapter1.html.
Let us start with session1:
SQL> update test set name = 'Adam' where name = 'Ed';
1 row updated.
SQL> savepoint a;
Savepoint created.
SQL> update test set name = 'Mike' where name = 'David';
1 row updated.
Now open session 2 and update row 2:
SQL> update test set name = 'Dave' where name = 'David';
Session is blocked by session 1 and waits the row lock.
From session 1, do rollback of update the row 2.
SQL> rollback to savepoint a;
Rollback complete.
However, session 1 is still waiting the row locking. Do a block dump and lock byte for row 2 is 0x0, so no row locking on the row session 2 is updating. In fact, session 2 is not waiting on row locking any more, it is waiting on session 1 transaction.
Block dump is
tab 0, row 1, @0x1f53
tl: 9 fb: --H-FL-- lb: 0x0 cc: 1
col 0: [ 5] 44 61 76 69 64
Let us start session 3 to update the same row:
session 3
SQL> update test set name = 'Magic' where name = 'David';
1 row updated.
The row is updated and session 2 is still blocked. Rollback session 1 and session 2 is waiting on session 3. Finally, the session 3 is rollback and session2’s update is completed.
Monday, November 2, 2009
Tuning SQL for third party application
One way to tune such application is to add views between the application and base tables so application queries views created by DBA instead of base tables. DBA can add hints for the views to make the application SQL use right execution plan. However, it is kind of troublesome because you have to rename all base tables and create views with original base table name. Plus, each base table may serve more than one SQL statement, so the hint on view can be harmful for some SQL statements.
Another way is to change database init parameters. Because it is system level tuning, DBA makes balance to improve the critical SQL statements without making negative impact on other SQLs too much. If an upgrade makes performance to deteriorate, set optimizer_features_enable to old version is a quick fix before a better solution is found.
What if DBA can construct an execution plan just for the slow SQL statement that is generated from application without modify it with hints? There is a way to do so by using stored outline. A stored outline is a set of hints Oracle use for a matched SQL string. If Oracle finds a SQL string in stored outline, it will add the hints for this stored outline to cost based optimizer. DBA can run the SQL to be tuned in a testing environment, with the change of init parameter either at system level or session level and add hint at view level similar to the methods we discussed previously to generate desired execution plan for the SQL and generate stored outline. Then such stored outline can be import into production database. By doing this way, no impact on any other SQL except the one you want to tune.
As we know, the stored outline can not change based on database statistics and if the hint is not valid any more, for example, the index for index hint is dropped, the optimizer just ignore the hint and make the good execution plan into a bad one. Start with Oracle 11g, the preferred way is to use SQL plan baseline. The SQL plan baseline stores the execution plan instead of hints, so it is more robust to optimizer changes. Additional, the SQL plan baseline can evolve to accept the better execution plan either manually or automatically.
Here are links to mini how to for stored outline and to mini how to for SQL Plan Baseline
Thursday, October 15, 2009
ORA-00054: resource busy with no wait
1. Go away and have a cup of coffee, then try again. If you are lucky, you can do this time.
2. Kill the session that locks database objects you want to modify. You do not have to wait but users may not be too happy about it.
3. Quiesce database by “ALTER SYSTEM QUIESCE RESTRICTED”. Wait until all active transactions are completed. So you can modify objects as DBA. However, you may wait a long time for changing database into quiesce state if some long transaction is running. Additional, it will block all user sessions, not just one that locks your objects.
4. You can brute force the DDL on locked objects by simple PL/SQL, which I found from DBA Tools web site (http://www.dbatools.net/experience/oracle_resource_busy.html).
declare
resource_busy exception;
pragma exception_init (resource_busy,-54);
begin
loop
begin
execute immediate 'ddl sql …';
exit;
exception
when resource_busy then
dbms_lock.sleep(0.01);
end;
end loop;
end;
It works pretty well for me. Have a spare CPU and a cup of hot coffee before you run the script.
For 11g database, you can make ddl to wait by setting ddl_lock_timeout to number of seconds to wait before getting ORA-054 error. During the waiting time, Oracle tries to acquire the object lock needed until time out. No PL/SQL coding is necessary.
Friday, July 24, 2009
IO Measurement
It is not news that disk IO is slow. Comparing with the speed of CPU to process the data, or memory access of the data, disk IO is usually 1000 times slower. Because Oracle database relies on underneath computer system functions, it is common that disk IO is the limiting factor for database performance, especially it is true for large data warehouse. Therefore, measuring the disk IO throughput is important for database performance tuning, database capacity planning. With the trend of virtualization of data storage layer, IO throughput measuring is more import that before for Oracle database tuning and troubleshooting.
First let us measure the disk IO at server level. I use Linux as example and hdparm is a good starting point, such as
hdparm –tT /dev/sda
You need root privilege to do so.
#oracle >sudo /sbin/hdparm -tT /dev/cciss/c0d0p3
/dev/cciss/c0d0p3:
Timing cached reads: 2264 MB in 2.00 seconds = 1130.48 MB/sec
Timing buffered disk reads: 130 MB in 3.00 seconds = 43.30 MB/sec
#oracle >
If you do not have root privilege to run hdparm, try dd
#oracle>ls -l *
-rw-r--r-- 1 oracle dba 268435456 Jul 24 12:52 testfile
#oracle>time dd if=testfile of=/dev/null bs=16k
16384+0 records in
16384+0 records out
7.70s real 0.01s user 0.39s system
7.7 seconds to read 256 MB file so the reads is 33 MB/s. One catch is that data may read from file system buffer. Run same command again and the reading is totally different.
#oracle>time dd if=testfile of=/dev/null bs=16k
16384+0 records in
16384+0 records out
0.19s real 0.00s user 0.18s system
0.19 second to read 256 MB file so the reads is 1347 MB/s. That is the cache access speed. To clean out file system buffer, you can write a large file with double size of physical memory of the server using “dd” or use “/proc/sys/vm/drop_caches” to clear cache as root. New version dd command can support option “iflag=direct” that bypass file system buffer.
To benchmark the server write speed, one can use dd command like:
#oracle>time dd if=/dev/zero of=testfile bs=16k count=16384
16384+0 records in
16384+0 records out
8.70s real 0.03s user 0.65s system
It is easy to calculate that the write speed is 256 MB / 8.7 = 29 MB/s. Similarly, you can use option “oflag=direct” to bypass file system buffer cache for new version of dd.
Next, let us test how oracle utilizes the disk IO. In order to show true disk IO, the initial parameter filesystemio_options need to be set to “setall”, which is using directio that bypasses file system buffer cache and is using “async io” thst allowes cocurrent IO to disk. Before each test case, do “alter system flush buffer_cache;” to flush the data cache buffer.
It is easy to test large read and write by using “create table as select” and using optimal parallelism and nologging. For example:
SQL> create table gene tablespace mm_data parallel (degree 4) nologging as select * from gene2 ;
Table created.
Elapsed: 00:00:25.90
Table gene2 is cached so the reading of gene table does not compete IO with creating table gene. The size of gene table is 450 MB so the write speed is 17 MB/s.
To test bulk reading speed, the gene table is queried by full table scan with optimized parallelism. Autotrace is set to on in SQLPLUS to make sure of full table scan and physical read is accounted for majority of query time.
SQL> select /*+ parallel (gene, 4) */count(*) from gene;
COUNT(*)
----------
6311685
Elapsed: 00:00:12.40
Simple calculation shows that the reading speed is 36 MB/s. The calculation ignores memory sort for to get the count. It is also hard to timing physical IO for other database operations, such as index read, DML and so on.
To timing IO for database operation other than bulk read and write, SQL trace is used. To start a sql trace, run “alter session set events '10046 trace name context forever, level 12';” in sql plus. The trace file is under UDUMP directory and tkprof is used to format the report. Below is an sample output in report.
SELECT PROTEIN_ACCESSION
FROM
GENE WHERE ID = :B1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 100001 8.26 7.87 0 0 0 0
Fetch 100001 5.38 114.56 37085 418332 0 100001
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 200003 13.64 122.43 37085 418332 0 100001
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 59 (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
100001 TABLE ACCESS BY INDEX ROWID GENE (cr=418332 pr=37085 pw=0 time=114797430 us)
100001 INDEX UNIQUE SCAN GENE_PK (cr=300003 pr=6875 pw=0 time=21453148 us)(object id 142926)
As shown in execution plan, it is an index read followed by table row read. We will calculate the small physical read of table. From execution plan, we know that there are 418332 memory read and 37085 disk read and total time for table access is 114797430 micro-seconds. From other section of report, we know that memory access is 30 microseconds, so disk read is 114797430 – 30 * 418332 = 102247470 micro-seconds or 2757 microseconds per IO read. We know the data block size is 16K, so the read speed is 6 MB/s for small read.
Oracle 11g has new package called DBMS_RESOURCE_MANAGER.CALIBRATE_IO that measure and calibrate disk IO for cost based optimizer. It measures max_iops, max_mbps and max_latency.
max_iops : Maximum number of I/O requests per second that can be sustained. The I/O requests are randomly-distributed, database-block-sized reads. It is similar to small read.
max_mbps: Maximum throughput of I/O that can be sustained, expressed in megabytes per second. The I/O requests are randomly-distributed, 1 megabyte reads. It is like large read.
max_latency: Average latency of database-block-sized I/O requests at max_iops rate, expressed in milliseconds
Because caching in storage devices, such as disk, NAS and SAN, it is very hard to get measurement for disk IO and disk IO only. But from practical point of view, it is close to what IO speed for an average database system. The actual measurement can vary based percentage of data read from storage cache and other system load if the storage is shared.
Monday, July 20, 2009
Data pump sample parameter
Data pump sample parameter can be used for creating a subset of data by table or schema. For example, if sample=10, only 10% of data is exported by table or by schema. It is great if you want to create a testing or development database from a production database and scale it down for various reasons. However, there is one big catch for doing so. The referential constraint is not enforced by sampling. So when you do import (dpimp) of the sampled data dump, you will get ORA-02298 error if you have foreign key constraints. Here is an example of log file for the data pump import:
;;;
Import: Release 10.2.0.2.0 - 64bit Production on Friday, 17 July, 2009 16:29:45
Copyright (c) 2003, 2005, Oracle. All rights reserved.
;;;
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYSTEM"."SYS_IMPORT_FULL_01": system/******** directory=WTC_EXP_DIR DUMPFILE=dbpump_sample10_exp.dmp remap_schema=dbpump:dbpump2 logfile=dbpump_sample10_imp.log
Processing object type SCHEMA_EXPORT/USER
ORA-31684: Object type USER:"DBPUMP2" already exists
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE
Processing object type SCHEMA_EXPORT/TABLE/TABLE
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
. . imported "DBPUMP2"."CHILD" 2.132 MB 101427 rows
. . imported "DBPUMP2"."PARENT" 20.92 KB 979 rows
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type SCHEMA_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT
ORA-39083: Object type REF_CONSTRAINT failed to create with error:
ORA-02298: cannot validate (DBPUMP2.SYS_C0052496) - parent keys not found
Failing sql is:
ALTER TABLE "DBPUMP2"."CHILD" ADD FOREIGN KEY ("P_ID") REFERENCES "DBPUMP2"."PARENT" ("P_ID") ENABLE
Job "SYSTEM"."SYS_IMPORT_FULL_01" completed with 2 error(s) at 16:29:52
Looks like I still need my PLSQL package for doing data dump of subset of data from production database.
Oracle direct IO
- Cache data in Oracle data buffer cache and again in file system buffer cache is wasting of memory by double caching.
- DBA can manage how the data block to be cached inside Oracle but not for file system cache buffer.
- System buffer cache can mislead the Oracle system statistics for physical IO, since it maybe read from file system cache but as physical read from Oracle point of view.
- It also complicates database tuning because the cost of physical IO is the not same because it may came from file system buffer cache or true disk IO.
- On some operating system, it is possible that system buffer cache keeps expanding until all physical memory is consumed, resulting system crash.
dirctio=yes
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:50.46
dirctio=none
SQL> create table gene_drop tablespace mm_data_temp nologging as select * from gene;
Table created.
Elapsed: 00:00:12.40
The table gene is about 500 MB in size and is cached so not physical io is needed to read it. From Linux document, the write actually is on file system buffer cache until it is more than 30 seconds old or more that 10% of the cache. Any way, file system buffer cache can smooth out the disk write load.
Whether Oracle uses direct IO is not only dependent on the filesystemio_options parameter, but the OS, file system type, storage device, mount option, device driver and so on. The following procedure can be used to check if direct IO is used in Linux.
- startup database mount.
- ind the db writer by “select spid, program from v$process where program like '%DBW%';”, write down the spid number.
- do strace of db writer process, such as strace -o /tmp/strace.out -e trace=open -p 2260
- alter database open.
- Open the strace output file and looking for “O_DIRECT”, the sample file is some thing like open("/.../tst01.dbf", O_RDWR|O_SYNC|O_DIRECT) = 22.
In conclusion, to set filesystemio_options to directio or not is dependent on you system, hardware, database, application and load pattern. This is why DBA is paid for and is needed to make such decision.
Tuesday, July 7, 2009
Oracle Lock and Latch comparison
• Lock protects disk data structure and latch protects memory data structure
• Lock is complex data structure and has shared access mode, latch is simple data structure with exclusive mode only
• Lock is slow because it is complex and involves disk operation, latch is fast because it is simple and memory operation only.
• Lock is queued so dead lock is possible. Latch is hit and run that uses spin and sleep for waiting.
• Lock is not needed for read because of Oracle’s multiversion concurrency control (MVCC), latch is needed for all data access in memory, including read.
Friday, March 27, 2009
TopLink and Hibernate
Talking about Object Relational Mapping (ORM) for Java, Hibernate comes to mind for almost every developer. Recently, I have opportunity to evaluate another ORM API – TopLink and want to share my impression of this product and compare it with Hibernate.
TopLink is an ORM API from Oracle and it is similar to Hibernate in term of concept. It is not free product but has a branched release called Eclipselink, which is open source and has most functionality of its commercial counterpart. When we pick up an API for a project, there are always trade offs that have positive or negative impacts on project quality, timeline and cost. Based on my testing, I would like to explain how the choice of TopLink vs. Hibernate will impact the project from my point of view. It is fair to say that except the most simple and basic application, ORM, such as Hibernate or TopLink, is a better choice than plain old JDBC. ORM provides a Java persistent layer mapping to relational database tables and give Java programmer a familiar set of classes to build on. You do not need to deal with SQL if you do not want to. You read and write data at object level instead of table, row or cell level.
Saturday, March 7, 2009
Oracle Express edition and spfile
Oracle Express is for developer or new DBA for their personal use as development platform or play ground. It designed to be auto managed and comes with Apex application for simple database administration and object browsing.
I installed it few days ago to test out a product and it is very easy to install. After installation, I logged in Apex application and found out I can set target SGA memory so I changed it to 50 MB and bounced database. Looks like a minor change, right? However, I can not start the database any more and it complained the memory was too small to load large pool. I do not know what an average developer would do at this point. I know I need to change spfile but to do that, I need to start up the instance. But I can not start up the instance because target SGA parameter setting in spfile is too small. It seems my only option was to reinstall database and lost all data I added in. I ended up to open the spfile with text editor and made a pfile out of it. So much for self management of Oracle Express.
Thursday, February 26, 2009
Oracle RAC - Recap
Transparent Application failover
- Connection failover is very fast (in few seconds)
- Query failover is transparent as well and resumed from starting point
- DML (insert, update, delete) and DDL (create table, alter table, …) failover is not supported. Need application layer error handling to make it transparent to user
- Not for all patches and upgrades
- More complicated, more chances to fail. (cluster Ready services daemon, Cluster Synchronization Services daemon, voting disks, Oracle Cluster Registry, virtual IP address, shared file system, to name a few.)
- Require database bounce for some database initial parameter changes that can be changes dynamically for non-RAC configuration
IO intensive query, such as full table scan, is limited by disk IO. Therefore, add RAC nodes will not scale well even there are multiple query processes to do the job. RAC could scale up well with number of connections if most read is logical read. (Memory access) But it is not typical situation for Oracle database.
OLTP scalability
OLTP scalability is 30% based on our testing. The limiting factors here is memory latches and inter-node memory data transfer. Even with multiple nodes, the memory latches are shared to protect data integration. For our testing, the inter-node connections are gigabit network card and because of latency, Remote memory access (cache fusion) is about 300 times slower than local memory access. It is only few times faster than disk access.
Load balance
Connection to a node is random assigned with weight on current node load. It works will for the most of cases. The connection will not redistributed to light load node once it is connected to a instance.
RAC Cost of Ownership
Large SMP always cost more that smaller server cluster with similar number of CPU and memory. A 32 CPU SMP costs $1.25 million, UNIX server cluster with same combined number of CPUs cost 1/3. Linux cluster cost 1/5. However, RAC license for each CPU is $20,000. Unless you build a RAC with more than 8 combined CPUs, you are not going to save cost. Oracle gives away RAC lisence for Standard Edition with ASM storage for free up to 4 -CPU configuration. It is very attractive if you want a high availability system. For large system, SMP always faster than RAC if they both have the same amount of CPU and memory. You get what you pay for.
RAC web Links
- Please find my related RAC handson blog Part 1, 2, 3, 4, 5, 6, 7, 8, 9.
- 10g RAC scalability proofpoint from Oracle is a good read to see what Oracle think RAC is
- Mogens Norgaard's article "You probably Don't Need RAC"
- Matthew Zito, Chief Scientist from GridApp Systems, article "You probably Do Need RAC"