各位大牛们好,我有个问题纠结我2个月了,请大家帮忙分析。
问题:
客户现场有2个table, 一个叫stage, 里面有1亿条记录,现在需要insert到另一个表Pattern, 原表没有primary 和foreign key, 也没Index,目的表有primary 和foreign key, 和Index,这2表都只有2列。目前用的merge into 语句,这个功能是定期做merge,把不在pattern的但是在stage的merge, 如果是delta数据,现在没有问题,如果做了clean(就是把pattern truncate)后应用会hang住。
从现场的stacktrace, 应用hang在executeUpdate:
static void doUpdateSql(Connection conn, String sqlStr) throws SQLException {
PreparedStatement stmt = null;
ResultSet rs = null;
try {
stmt = conn.prepareStatement(sqlStr);
stmt.executeUpdate(sqlStr);
} finally {
JDBCUtils.close( stmt, rs );
logger.debug( "smtm closed");
}
}
堆栈为:
"Timer-8" daemon prio=10 tid=0x00007f6798395800 nid=0x3972 runnable [0x00007f67ea8e6000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Packet.java:282)
at oracle.net.ns.DataPacket.receive(DataPacket.java:103)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:230)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:175)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:100)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:85)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:122)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:78)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1179)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1155)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:279)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1008)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1307)
at oracle.jdbc.driver.OracleStatement.executeUpdateInternal(OracleStatement.java:1814)
at oracle.jdbc.driver.OracleStatement.executeUpdate(OracleStatement.java:1779)
- locked <0x00007f83b3b03080> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.executeUpdate(OracleStatementWrapper.java:277)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:1044)
at com.cache.OracleSqlldr.doUpdateSql(OracleSqlldr.java:276)
用的c3p0连接池,配置为:
2020-06-26 11:01:00,038 INFO [impl.AbstractPoolBackedDataSource]
Initializing c3p0 pool...
com.mchange.v2.c3p0.ComboPooledDataSource [
acquireIncrement -> 4,
acquireRetryAttempts -> 30,
acquireRetryDelay -> 1000,
autoCommitOnClose -> true,
automaticTestTable -> null,
breakAfterAcquireFailure -> false,
checkoutTimeout -> 0,
connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester,
dataSourceName -> 2t2vcfaa1ujvobpl05hg7|30e3c624,
debugUnreturnedConnectionStackTraces -> false,
description -> null,
driverClass -> oracle.jdbc.OracleDriver,
factoryClassLocation -> null,
forceIgnoreUnresolvedTransactions -> false,
idleConnectionTestPeriod -> 1200,
initialPoolSize -> 3,
lastAcquisitionFailureDefaultUser -> null,
maxAdministrativeTaskTime -> 0,
maxConnectionAge -> 0,
maxIdleTime -> 0,
maxIdleTimeExcessConnections -> 0,
maxPoolSize -> 128,
maxStatements -> 0,
maxStatementsPerConnection -> 0,
minPoolSize -> 4,
numHelperThreads -> 3,
numThreadsAwaitingCheckoutDefaultUser -> 0,
propertyCycle -> 0,
testConnectionOnCheckin -> false,
testConnectionOnCheckout -> true,
unreturnedConnectionTimeout -> 0,
usesTraditionalReflectiveProxies -> false ]
本地一直没法重现,本地1亿数据,由于关闭了constraints和index, 大概1小时能merge 完,现场有constaint 和index, 需要2个半小时才看到pattern table有数据,但是应用这边一直都是等待。
查过oracle alert log,里面只有些下面:
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for Linux: Version 12.1.0.2.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.1.0.2.0 - Production
Time: 11-DEC-2019 12:11:41
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505
TNS-00505: Operation timed out
nt secondary err code: 110
nt OS err code: 0
大牛们看看这个我该怎么解决呢。到底是数据库有问题还是应用问题还是网络问题还是防火墙问问题?
还有个问题,也一直没想明白,本地测试时,用executeUpdate 发出merge into 命令后,由于这个merge into要执行1个小时,这是kill 程序,oracle那边一直还是在执行,session也是active 状态,这个对不对,想模拟现场看看是不是防火墙把长连接段了,我在oracle 机器上(solaris)设置了 tcp keepalive 为10分钟,但是没能生效,不过我没重启机器,不知道要不要重启。
ndd -set /dev/tcp tcp_keepalive_interval 600000
这又是什么原因呢