Wednesday, February 25, 2009

Oracle RAC handson Part 6 RAC HA

RAC High Availability


RAC transparent fail-over is supported for OCI enabled applications, such as sqllplus, JDBC think client. The connection fail over is very fast and scable. Interreptted query is re-run automatically. DDL and DML will return an Oracle error.

Test case 1:

Name: Query fail-over with sqlplus as client

Purpose: Test query fail-over from a node to another node using OCI application.

Detail:

3.1 Connect to a node with sqlplus and record time to query a large table test.sequences (~ 3.6 GB) with full table scan.

SQL: SELECT DISTINCT species FROM test.sequences.

The execution plan is:

4 SELECT STATEMENT

3 PARTITION LIST [ALL]

2 HASH [UNIQUE]

1 TEST.SEQUENCES TABLE ACCESS [FULL]

3 base line runs have run time 64, 65, and 58 seconds. The average run time is 62 seconds.

3.2 Re-run the query and crash the instance (shutdown abort) when full table scan is half finished (at 38 seconds). The run time is 110, 110, and 101 seconds. The average run time is 107 seconds.

3.3 Re-run the query and crash the instance (shutdown abort) when full table scan is just started (at 0 seconds). The run time is 61, 58, and 67 seconds. The average run time is 62 seconds.

Test result:

Oracle RAC node failover for query is transparent to OCI enabled application, such as sqlplus and JDBC fat client. The failover is fast (in a few seconds) and there is no notable delay on query caused by fail over. (Compare data with base line 3.1 and failover at 0 seconds 3.3) However, the failovers at 38 seconds are 107 seconds on average (test 3.2). That is 69 seconds after failover. That is about the time to re-run the query. This observation is agreed with new Oracle cache fusion, that only Global Resource Directory is synced on all nodes. When the local node crashed, the connection fail over to another node but the data block in the memory of failed node is gone, so Oracle have to re-read it from disk again.

Issues:

1. After crash the rac1 instance, restart got:

ORA-01105: mount is incompatible with mounts by other instances

ORA-38767: flashback retention target parameter mismatch

Reset the flashback_retention_target to 1440 and restart the instance.

Test Case 2:

Name: Query fail-over with JDBC as thin client

Purpose: Test query fail-over from a node to another node with thin client.

Detail:

4.1 Connect to a node with JDBC thin client and record time to query a large table test.sequences (~ 3.6 GB) with full table scan.

SQL: SELECT DISTINCT species FROM test.sequences.

The execution plan is:

4 SELECT STATEMENT

3 PARTITION LIST [ALL]

2 HASH [UNIQUE]

2 TEST.SEQUENCES TABLE ACCESS [FULL]

3 base line runs have run time 73, 75, and 72 seconds. The average run time is 73 seconds.

4.2 Re-run the query and crash the instance (shutdown abort) when full table scan is half finished (at 38 seconds). JDBC exception was received.

Java code is as following:

public class RacQuery {

String host = "10.8.166.91"; //blade1.wyeth.com

String host1 = "10.8.166.91"; //blade1.wyeth.com

String host2 = "10.8.166.92"; //blade2.wyeth.com

String host3 = "10.8.166.93"; //blade3.wyeth.com

String databaseSID="rac1";

String serviceName = "rac.wyeth.com";

String databasePort = "1521";

//String url = "jdbc:oracle:thin:@" + host + ":" + databasePort + ":" + databaseSID;

/* String url = "jdbc:oracle:thin:@(DESCRIPTION=(FAILOVER=ON)(LOAD_BALANCE=on)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=" +

host1 + ")(PORT=" + databasePort + "))(ADDRESS=(PROTOCOL=TCP)(HOST=" +

host2 + ")(PORT=" + databasePort + "))(ADDRESS=(PROTOCOL=TCP)(HOST=" +

host3 + ")(PORT=" + databasePort + ")))(CONNECT_DATA=(SERVICE_NAME=" + serviceName + ")(SERVER=DEDICATED)))";

*/

//String url = "jdbc:oracle:thin:@10.8.166.91:1521:rac1";

String url = "jdbc:oracle:thin:@(DESCRIPTION=" +

"(LOAD_BALANCE=yes)" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host1 + ")(PORT=" + databasePort + "))" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host2 + ")(PORT=" + databasePort + "))" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host3 + ")(PORT=" + databasePort + "))" +

"(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=" + serviceName + ")))";

String userName = "test";

String password = "password";

Connection conn;

public RacQuery() {

}

public Connection getConnection() throws java.sql.SQLException {

DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());

conn = DriverManager.getConnection

(url, userName, password);

return conn;

}

public static void main(String[] args) {

RacQuery racquery = new RacQuery();

try {

String sqlStr1 = "SELECT instance_name FROM v$instance";

String sqlStr2 = "SELECT DISTINCT species FROM test.sequences";

Connection conn = racquery.getConnection();

Statement stmt = conn.createStatement();

ResultSet rs = stmt.executeQuery(sqlStr1);

rs.next();

System.out.println("instance is " + rs.getString(1));

Date startTime = new Date(System.currentTimeMillis());

rs = stmt.executeQuery(sqlStr2);

while (rs.next()) {

System.out.println(rs.getString(1));

}

stmt.close();

conn.close();

Date endTime = new Date(System.currentTimeMillis());

long executionSeconds = (endTime.getTime() - startTime.getTime())/1000;

System.out.print("execution time: " + executionSeconds + " seconds.");

} catch (SQLException sqle) {

System.out.print("JDBC exception: " + sqle.getMessage());

}

}

}

Test result:

Oracle RAC node failover is not supported for JDBC thin client. However, load balance is supported for JDBC thin client. JDBC thick client supports both node failover and load balance.

Test Case 3:

Name: Insert operation fail-over with sqlplus as client

Purpose: Test DML fail-over from a node to another node using OCI application.

Detail:

1. Connect to a node with sqlplus and run “create table sequence_bk as select * from sequences” (~ 3.6 GB). Crash the instance (shutdown abort) when create table operation is half finished. Get ORA-:

SQL> create table sequences_bk as select * from sequences;

create table sequences_bk as select * from sequences

*

ERROR at line 1:

ORA-25408: can not safely replay call

Test result:

Oracle RAC node failover for DML/DDL is not supported. Test case 6 is dropped.

Test Case 4:

Name: Fail-over with large number of connections.

Purpose: Test reliability and time to fail-over connections from a node to another node.

Detail:

1. Connect to rac with 100 JDBC thick connections. Each connection query failed_over column of its session (SELECT failed_over FROM V$SESSION WHERE audsid = SYS_CONTEXT('userenv','sessionid');), every seconds and record the time after complete the query, record time insert fail-over time and session_id into table session_failover, and exit once the fail-over column value changes to “yes”

2. Crash the instance (shutdown abort).

3. Query the table session_failover to see the time needed to fail-over.

Test result:

Failover of 30 connections takes 5 seconds to 6 seconds. It is not much slower than failover of single session that takes 4 – 5 seconds. RAC connection failover is very scalable.

Test Code:

Java code:

import java.sql.*;

import java.util.Calendar;


public class RacQuery {

String host = "10.8.166.91"; //blade1.wyeth.com

String host1 = "10.8.166.91"; //blade1.wyeth.com

String host2 = "10.8.166.92"; //blade2.wyeth.com

String host3 = "10.8.166.93"; //blade3.wyeth.com

String databaseSID="rac1";

String serviceName = "rac.wyeth.com";

String databasePort = "1521";

//String url = "jdbc:oracle:thin:@" + host + ":" + databasePort + ":" + databaseSID;

/*String url = "jdbc:oracle:thin:@(DESCRIPTION=" +

"(LOAD_BALANCE=yes)" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host1 + ")(PORT=" + databasePort + "))" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host2 + ")(PORT=" + databasePort + "))" +

"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host3 + ")(PORT=" + databasePort + "))" +

"(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=" + serviceName + ")))";

*/

String url = "jdbc:oracle:oci:@rac";

String userName = "test";

String password = "tt123";

Connection conn;

public RacQuery() {

}

public Connection getConnection() throws java.sql.SQLException {

DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());

conn = DriverManager.getConnection

(url, userName, password);

return conn;

}

public static void main(String[] args) throws InterruptedException {

RacQuery racquery = new RacQuery();

try {

int loopExt = 0;

Date startTime = new Date(System.currentTimeMillis());

Date endTime = new Date(System.currentTimeMillis());

String sqlStr1 = "select instance_name from v$instance";

String mySession = args[0];

String failover;

String sqlStr2 = "insert into session_failover (session_id, time) values (?, ?)";

Connection conn = racquery.getConnection();

Statement stmt = conn.createStatement();

ResultSet rs = stmt.executeQuery(sqlStr1);

rs.next();

System.out.print("instance is " + rs.getString(1));

sqlStr1 = "select failed_over from v$session where audsid = sys_context('userenv', 'sessionid')";

PreparedStatement pstmt;

while (loopExt == 0) {

Thread.sleep(1000);

rs = stmt.executeQuery(sqlStr1);

rs.next();

failover = rs.getString(1);

if (failover.equalsIgnoreCase("yes")) {

loopExt = 1;

stmt.close();

endTime.setTime(System.currentTimeMillis());

pstmt = conn.prepareStatement(sqlStr2);

pstmt.setString(1, mySession);

pstmt.setFloat(2, new Float((endTime.getTime() - startTime.getTime())/1000).floatValue());

pstmt.executeUpdate();

pstmt.close();

} else {

startTime.setTime(System.currentTimeMillis());

}

}

conn.close();

} catch (SQLException sqle) {

System.out.print("JDBC exception: " + sqle.getMessage());

}

}

}

ksh code:

!#/usr/bin/ksh

#LD_LIBRARY_PATH must use 32 bit lib

LD_LIBRARY_PATH=/data01/app/oracle/product/9.2.0/lib32:$LD_LIBRARY_PATH

counter=0

echo `date` > runTesting.log

until [[ $counter = 100 ]] ; do

echo The counter is $counter

let counter=counter+1;

java -classpath /export/home/oracle/racTesting:/data01/app/oracle/product/9.2.0/jdbc/lib/ojdbc14_g.jar RacQuery $counter >

> runTesting.log &

done

Table DDL

CREATE TABLE TEST.SESSION_FAILOVER (SESSION_ID VARCHAR2(4), TIME NUMBER);

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete