Troubleshooting ORA-03135: connection lost contact issue

Jephe Wu - http://linuxtechres.blogspot.com


Environment: Oracle database 10.2.0.3 RAC 64bit, websphere application server on RHEL4.5 use JDBC driver,Juniper firewalls (failover)
Problem: application server got error "java.sql.SQLException: ORA-03135: connection lost contact" and connection timeout from database
Objective: to fix the connection lost issue


Steps:
1. search the error message to find out the root cause in Oracle support and google:

Troubleshooting ORA-3135 Connection Lost Contact [ID 787354.1]
Resolving Problems with Connection Idle Timeout With Firewall [ID 257650.1]
Achive Log shipments failing in environment with Juniper firewall [ID 1075432.1]
Logs are not shipped to the physical standby database [ID 1130523.1]
ORA-03135: connetion lost contact while shipping from Primary server to standby server [ID 739522.1]

According to Oracle doc ID 787354.1, this ORA-3135 might be caused by one of the following:

(1) Transparent Application Failover (TAF)
(2) Idle Connection Timeout
(3) Firewall Manipulating  Sqlnet Data.

Otherwise, it might need further analysis.
Also, according to ORA-3135 connection lost contact tips: http://www.dba-oracle.com/t_ora_03135_connection_lost_contact.htm, it might be due to customized profile.

sqlplus / as sysdba
sql> set line 32000
sql> select profile, resource_name, limit
from dba_profiles where resource_name IN ('IDLE_TIME', 'CONNECT_TIME');

note: some commands:
sql> create profile myprof limit connect_time 30;


From application side, to avoid this idle connection timeout, refer to -  Putting a firewall between your AppServer and DBMS -  http://www.websphere-world.com/modules.php?name=News&file=article&sid=522to

2. check firewall idle connection timeout and ALG settings
Checked Juniper firewall SQL ALG setting and idle connection timeout

# check alg
# check service ssh
# check service SQL*Net V2        

And some articles:

a. What is Idle timeouts when any service set - J-Net Community - http://forums.juniper.net/t5/ScreenOS-Firewalls-NOT-SRX/What-is-Idle-timeouts-when-Any-service-set/td-p/163

b. Viewing list of ALGs and disabling an ALG differs on screenos versions - http://kb.juniper.net/InfoCenter/index?page=content&id=KB13509
c. Issues with SQL database applications when traffic traverses SRX - http://kb.juniper.net/InfoCenter/index?page=content&id=KB21550
d. Increasing the session idle timeout of a particular service - http://kb.juniper.net/InfoCenter/index?page=content&id=KB4652&actp=LIST
e. How do I verify the NCP settings? (for SQL ALG) - http://kb.juniper.net/InfoCenter/index?page=content&id=KB9283
f. [SRX]what conditions are required for SQL ALG? - http://kb.juniper.net/InfoCenter/index?page=content&id=KB22418&cat=JUNOS&actp=LIST
g. Users losing connections when the IVE Active/Passive Cluster Fails over - http://kb.juniper.net/InfoCenter/index?page=content&id=KB8539&cat=SSL_VPN&actp=LIST

3. We have disabled SQL ALG and application server seemed busy at the time of connection lost.
Our monitoring shows Firewall actually failed over to another one at the time of connection lost and switched back, so the application server got "ORA-03135: connection lost contact" error twice and need to restart application.