RAC High Scalability
Summary:
Both Oracle oci and thin client support connection load balance. It also indicates that instance load at the connection time is a factor for load balance. If loading on RAC nodes are similar, Oracle randomly decides which node to connect. Besides load balance, we tested the query and OLTP operation performance against 1-node, 2-nodes, and 3-nodes RAC. Under best test case, the performance increase is 35% for adding node from 1 to 2, (Linear increase should be 100%)and 15% for adding node from 2 to 3. (Linear increase should be 50%). CPU intensive operation will benefit the most from RAC scale-up. Inter-node memory access is very slow compare with local memory access (remote memory access is 300 times slower than local memory access and is the twice of the speed of disk access). Other resource constraints such as latch contention, disk IO limitation for data file and log file prevent the near linear scale up for RAC unless everything is partitioned. (For example, Application A uses schema A connects to node A on disk array A. Application B uses schema B connects to node B on disk array B. and so on.)
Test Case 9:
Name: Load balance
Purpose: Test load balance over nodes
Detail:
- Connect to RAC with 100 JDBC thick connections and count the connections on each RAC instance for connections.
Test result:
Each nodes gets about 30 connections. Looks like the connection is randomly made to RAC instance. Separate test suggests the instance loading is a fact with RAC making the connection.
Test Code:
Java code:
import java.sql.*;
import java.util.Calendar;
public class RacQuery {
String host = "10.8.166.91"; //blade1.wyeth.com
String host1 = "10.8.166.91"; //blade1.wyeth.com
String host2 = "10.8.166.92"; //blade2.wyeth.com
String host3 = "10.8.166.93"; //blade3.wyeth.com
String databaseSID="rac1";
String serviceName = "rac.wyeth.com";
String databasePort = "1521";
//String url = "jdbc:oracle:thin:@" + host + ":" + databasePort + ":" + databaseSID;
/*String url = "jdbc:oracle:thin:@(DESCRIPTION=" +
"(LOAD_BALANCE=yes)" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host1 + ")(PORT=" + databasePort + "))" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host2 + ")(PORT=" + databasePort + "))" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host3 + ")(PORT=" + databasePort + "))" +
"(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=" + serviceName + ")))";
*/
String url = "jdbc:oracle:oci:@rac";
String userName = "test";
String password = "tt123";
Connection conn;
public RacQuery() {
}
public Connection getConnection() throws java.sql.SQLException {
DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());
conn = DriverManager.getConnection
(url, userName, password);
return conn;
}
public static void main(String[] args) throws InterruptedException {
RacQuery racquery = new RacQuery();
try {
int loopExt = 0;
Date startTime = new Date(System.currentTimeMillis());
Date endTime = new Date(System.currentTimeMillis());
String sqlStr1 = "select instance_name from v$instance";
String mySession = args[0];
String failover;
String sqlStr2 = "insert into session_failover (session_id, time) values (?, ?)";
Connection conn = racquery.getConnection();
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(sqlStr1);
rs.next();
System.out.print("instance is " + rs.getString(1));
sqlStr1 = "select failed_over from v$session where audsid = sys_context('userenv', 'sessionid')";
PreparedStatement pstmt;
while (loopExt == 0) {
Thread.sleep(1000);
rs = stmt.executeQuery(sqlStr1);
rs.next();
failover = rs.getString(1);
if (failover.equalsIgnoreCase("yes")) {
loopExt = 1;
stmt.close();
endTime.setTime(System.currentTimeMillis());
pstmt = conn.prepareStatement(sqlStr2);
pstmt.setString(1, mySession);
pstmt.setFloat(2, new Float((endTime.getTime() - startTime.getTime())/1000).floatValue());
pstmt.executeUpdate();
pstmt.close();
} else {
startTime.setTime(System.currentTimeMillis());
}
}
conn.close();
} catch (SQLException sqle) {
System.out.print("JDBC exception: " + sqle.getMessage());
}
}
}
ksh code:
!#/usr/bin/ksh
#LD_LIBRARY_PATH must use 32 bit lib
LD_LIBRARY_PATH=/data01/app/oracle/product/9.2.0/lib32:$LD_LIBRARY_PATH
counter=0
echo `date` > runTesting.log
until [[ $counter = 100 ]] ; do
echo The counter is $counter
let counter=counter+1;
java -classpath /export/home/oracle/racTesting:/data01/app/oracle/product/9.2.0/jdbc/lib/ojdbc14_g.jar RacQuery $counter >
> runTesting.log &
done
Table DDL
CREATE TABLE TEST.SESSION_FAILOVER (SESSION_ID VARCHAR2(4), TIME NUMBER);
Test Case 10:
Name: Query scalability
Purpose: Test scalability of query with full table scan.
Detail:
1. Startup with only 1 instance.
2. Record time to full table scan a large table (3.6 GB) with parallel 6 three times. (
select /*+ parallel (sequences, 6) */ distinct species from test.sequences
UNION
select /*+ parallel (sequences, 6) */ distinct species from test.sequences
UNION
select /*+ parallel (sequences, 6) */ distinct species from test.sequences;)
3. Repeat step 2 against 2-nodes and 3-nodes RAC.
Retested with SQL (select distinct species from test.sequences) and it takes 53 seconds on single node RAC, 53 seconds on two nodes RAC, and three nodes RAC takes 49 seconds. Only 1 session on connected instance does the query. That proves the global cache service is scale well with number of nodes.
Test result:
For single node, it started 10 parallel processes. For two nodes RAC, there are 6 parallel processes on each node. For three nodes RAC, there are 4 parallel processes on each node. Adding nodes to RAC seems not speed up the query even all nodes are working on it. The full table scan is limited by IO of SAN.
One Node: 35 seconds, 35 seconds, 35 seconds. Average 35 seconds
Two Node: 36 seconds, 36 seconds, 36 seconds. Average: 36 seconds
Three Node: 36 seconds, 36 seconds, 37 seconds. Average: 36 seconds
Test Case 11:
Name: Query scalability
Purpose: Test scalability of query with full table scan of hash portioned index organized table with three partitions.
Detail:
1. Startup with only 1 instance.
2. Record time to range-scan and full table scan the large table (3.6 GB) with parallel 6. (select /*+ parallel(cellfeaure, 6) */count(*) from cellfeature where valdbl > 2)
3. Repeat step 2 against 2-nodes and 3-nodes RAC.
4.
Test result:
The query does fast full scan of primary key index. For single node, it started 6 parallel processes. For two nodes RAC, there are 6 parallel processes on remote node and one is on local node. For three nodes RAC, there are 2 parallel processes and 4 parallel processes on remote nodes. Only one process is on local node. Adding nodes to RAC seems not speed up the query even all nodes are working on it. The fast full index scan is limited by IO of SAN. Query takes 96 seconds to complete without parallel execution.
One Node: 12 seconds, 13 seconds, 12 seconds. Average 12 seconds
Two Node: 13 seconds, 13 seconds, 13 seconds. Average: 13 seconds
Three Node: 12 seconds, 13 seconds, 14 seconds. Average: 13 seconds
Test Case 12:
Name: Query remote cache
Purpose: Test buffer get from local and remote buff cache
Detail:
1. Create table sequences_small as select temp_s.nextval as T_ID, SEQUENCE_ID, ACCESSION, AUTHORITY, WYSECAT_AVA, SPECIES, DESCRIPTION from sequences where rownum <>
2. Cache table sequences_smaller with size of 300 MB into data cache buffer.
3. Record time to scan sequences_smaller table (300 MB) that connects to the same instance. (select count(*) from sequences_smaller; ) Query reads from disk takes 7 seconds. Query reads from memory takes 0.01 seconds.
4. Connect to other node and run the same query and record the time. It takes 3.04 seconds for other node to run query first time. Auto trace shows all reads are consistent gets.
Test result:
The query that access remote cache buffer is 1.3 times faster than disk reads (7 seconds vs. 3.04) but 300 times slower than local memory access (0.01 seconds vs. 3.04)
Issues:
1. set sga_target to non zero value enables the sga auto management. But the SGA will not change of full table scan of cached table and oracle does disk read every time.
Test Case 13:
Name: OLTP scalability
Purpose: RAC scalability for online transaction processes (OLTP)
Detail:
1. Set up a table with 1 million rows and has a number field beside primary key. (sequences_smaller)
2. Connect to a node with 100 JDBC connections and each connection updates random row with a sequence number for 5 minutes. Record the sequence number before the test and after test.
3. Connect to 2-node RAC and 3-node RAC with 100 JDBC connections and each connection updates random row with a sequence number for 10 minutes. Record the sequence number before the test and after test.
Test result:
The transaction throughput with one node is 4307 per second. (43 per-connection per second). The transaction throughput with two node is 4550 per second. (46 per-connection per second). The transaction throughput with three node is 5306 per second. (53 per-connection per second). Java server CPU is at 100% and some connections generate shared pool too small error. The throughput is limited by client side CPU and shared_pool_size is too small as well.
Re-run the test with 30 concurrent connections and more powerful client server (cela166052). The transaction throughput with one node is 7813 per second. (260 per-connection per second). The transaction throughput with two node is 9853 per second. (328 per-connection per second). The transaction throughput with three node is 10000 per second. (333 per-connection per second). Looks like both client server and RAC have some room for more load.
Re-run the test with 100 concurrent connections. The transaction throughput with one node is 9658 per second. (96 per-connection per second). The transaction throughput with two node is 13098 per second. (131 per-connection per second). The transaction throughput with three node is 15081 per second. (151 per-connection per second). 20% time on cluster overhead. (interconnect block transfer) 40% on commit (memory latches, log file sync) and 40% on CPU.
Because RAC’s share everything architecture, It is impossible to scale up to near linear unless partition everything. (applications, instance, schema, disks) For our best test cases, the scalability is about 35% from 1-node to 2 – nodes, and 15% from 2-nodes to 3-nodes.
Test Code:
Java code:
public class RacQuery {
String host = "10.8.166.91"; //blade1.wyeth.com
String host1 = "10.8.166.91"; //blade1.wyeth.com
String host2 = "10.8.166.92"; //blade2.wyeth.com
String host3 = "10.8.166.93"; //blade3.wyeth.com
String databaseSID="rac1";
String serviceName = "rac.wyeth.com";
String databasePort = "1521";
//String url = "jdbc:oracle:thin:@" + host + ":" + databasePort + ":" + databaseSID;
/*String url = "jdbc:oracle:thin:@(DESCRIPTION=" +
"(LOAD_BALANCE=yes)" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host1 + ")(PORT=" + databasePort + "))" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host2 + ")(PORT=" + databasePort + "))" +
"(ADDRESS=(PROTOCOL=TCP)(HOST=" + host3 + ")(PORT=" + databasePort + "))" +
"(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=" + serviceName + ")))";
*/
String url = "jdbc:oracle:oci:@rac";
String userName = "test";
String password = "";
Connection conn;
public RacQuery() {
}
public Connection getConnection() throws java.sql.SQLException {
DriverManager.registerDriver (new oracle.jdbc.driver.OracleDriver());
conn = DriverManager.getConnection
(url, userName, password);
return conn;
}
public static void main(String[] args) throws InterruptedException {
RacQuery racquery = new RacQuery();
try {
int loopExt = 0;
Date startTime = new Date(System.currentTimeMillis());
Date endTime = new Date(System.currentTimeMillis());
//String sqlStr1 = "commit";
String mySession = args[0];
String failover;
String sqlStr2 = "update sequences_smaller set t_value = temp_s.nextval where t_id = ?";
Connection conn = racquery.getConnection();
ResultSet rs;
PreparedStatement pstmt;
Integer randomValue = new Integer(0);
pstmt = conn.prepareStatement(sqlStr2);
startTime = new Date(System.currentTimeMillis());
while (loopExt == 0) {
randomValue = new Integer(new Double(Math.random()*1000000).intValue());
//System.out.println("random number = " + randomValue);
endTime.setTime(System.currentTimeMillis());
pstmt.setInt(1, randomValue.intValue());
pstmt.executeUpdate();
conn.commit();
endTime.setTime(System.currentTimeMillis());
if (endTime.getTime() - startTime.getTime() > 300000) { //5 minutes 300000
loopExt = 1;
}
}
System.out.println("Time passed: " + (endTime.getTime() - startTime.getTime())/1000);
pstmt.close();
conn.close();
} catch (SQLException sqle) {
System.out.print("JDBC exception: " + sqle.getMessage());
}
}
}
ksh code:
compileTesting.sh
#!/usr/bin/ksh
LD_LIBRARY_PATH=/opt/app/oracle/product/10.2.0/lib32:.
ORACLE_HOME=/opt/app/oracle/product/10.2.0
/opt/app/oracle/product/10.2.0/jdk/bin/javac -classpath /home/oracle/racTesting:$ORACLE_HOME/jdbc/lib/ojdbc14_g.jar:$ORACLE_HOME/jdk/lib /home/oracle/racTesting/RacQuery.java
runTesting.sh
#!/usr/bin/ksh
ORACLE_HOME=/opt/app/oracle/product/10.2.0
LD_LIBRARY_PATH=$ORACLE_HOME/lib32:$ORACLE_HOME/jdbc/lib:$ORACLE_HOME/jdk/lib:.
counter=0
echo `date` > runTesting.log
until [[ $counter = 100 ]] ; do
echo The counter is $counter
let counter=counter+1;
$ORACLE_HOME/jdk/bin/java -classpath /home/oracle/racTesting:$ORACLE_HOME/jdbc/lib/ojdbc14_g.jar RacQuery $counter >> runTesting.log &
done
Database Objects:
SQL> desc sequences_smaller
Name Null? Type
---------------------------------- --------------- ---------------
T_ID NOT NULL NUMBER
SEQUENCE_ID NOT NULL VARCHAR2(50)
ACCESSION NOT NULL VARCHAR2(30)
AUTHORITY NOT NULL VARCHAR2(30)
WYSECAT_AVA VARCHAR2(60)
SPECIES VARCHAR2(200)
DESCRIPTION VARCHAR2(4000)
T_VALUE NUMBER
create sequence temp_s start with 1 cache 50000;
Test Case 14:
Name: Table join scalability
Purpose: RAC scalability for join operation
Detail:
1. Set up a table with 12 million rows.
2. Connect to a node with sqlplus and do a query with hash join of the table. (
select /*+ parallel (a,4) parallel (b,4) */ b.accession, a.accession from sequences a, (select sequence_id, accession from sequences union select sequence_id, accession from sequences_small) b
where a.sequence_id = b.sequence_id and a.accession like 'NM%' and b.accession like '%_017067.1'
) Record the time to complete the query, which is 30 seconds.
3. Repeat step 2 with 2-nodes RAC and 3-nodes RAC. Since no remote parallel processes are started, add node will not speed up the query.
4. Repeat step 2 and 3 with nested loop join operation. (
select /*+ use_nl(a,b) parallel (a,4) parallel (b,4) */ b.accession, a.accession from sequences a, (select sequence_id, accession from sequences union select sequence_id, accession from sequences_small) b
where a.sequence_id = b.sequence_id and a.accession like 'NM%' and b.accession like '%_017067.1'
). Record the time to complete the query, which is 16 seconds. No remote parallel processes are started.
Test result:
The table join operation will not benefit from multi-nodes RAC directly.