检查连接泄露

Druid 连接泄露检测与处理：保障连接池资源安全

数据库连接泄露是后端开发中常见的隐蔽问题，表现为连接池连接被耗尽、应用响应缓慢甚至超时。Druid 连接池内置了连接泄露检测功能，可自动识别并回收长时间未归还的连接。本文将详细解析 Druid 连接泄露检测的原理、配置方法及实战排查技巧，帮助你及时发现并解决连接泄露问题。

连接泄露的危害与成因

什么是连接泄露？

连接泄露指应用从连接池获取连接（getConnection()）后，未正常关闭（close()），导致连接长期占用连接池资源，无法被其他请求复用。

泄露的危害

连接池耗尽：连接泄露累积到一定程度，会导致 activeCount 达到 maxActive 上限，新请求无法获取连接，出现 TimeoutException；
性能下降：数据库连接是稀缺资源，泄露会导致连接池频繁创建新连接，增加数据库和应用服务器负担；
业务中断：严重时所有请求阻塞，应用完全无法响应。

常见泄露成因

代码缺陷：未在finally块中关闭连接，或因异常导致close()语句未执行；

// 错误示例：未在 finally 中关闭连接  
Connection conn = dataSource.getConnection();  
try {  
    // 业务逻辑（若抛出异常，conn.close() 不会执行）  
} catch (SQLException e) {  
    e.printStackTrace();  
}  
conn.close();  // 若上述代码抛异常，此处不会执行

框架漏洞：ORM 框架（如 MyBatis、Hibernate）配置不当，导致连接未自动释放；
长事务：连接被用于长时间运行的事务（如批量处理），未及时释放。

Druid 连接泄露检测机制

Druid 提供 removeAbandoned 系列配置，通过定时扫描连接池中的活跃连接，识别并回收 “超时未归还” 的连接，从根源上缓解泄露问题。

核心配置参数

配置项	缺省值	作用与说明
`removeAbandoned`	false	是否开启连接泄露检测（`true` 开启，`false` 关闭）。建议仅在排查问题时开启（对性能有一定影响）。
`removeAbandonedTimeout`	300	连接泄露超时时间（秒）。若连接借出后超过此时长未归还，则判定为泄露。建议设为业务最长执行时间的 2-3 倍（如 180 秒）。
`logAbandoned`	false	检测到泄露时是否打印日志。建议开启，日志会包含泄露连接的获取栈和当前线程栈，便于定位问题。

工作原理

Druid 连接泄露检测的核心逻辑在 removeAbandoned() 方法中，流程如下：

定时扫描：每隔 timeBetweenEvictionRunsMillis（默认 60 秒），检测线程会扫描连接池中的活跃连接；
超时判断：计算连接借出时间（当前时间 - 连接获取时间），若超过 removeAbandonedTimeout，判定为泄露；
强制回收：对泄露连接执行 close() 操作，释放资源，并从活跃连接列表中移除；
日志记录：若 logAbandoned=true，打印泄露连接的获取栈和当前线程状态，便于排查。

配置示例

# Druid 连接泄露检测配置（仅建议在排查问题时开启）  
spring.datasource.druid.remove-abandoned=true  # 开启检测  
spring.datasource.druid.remove-abandoned-timeout=180  # 超时时间 180 秒（3 分钟）  
spring.datasource.druid.log-abandoned=true  # 打印泄露日志

泄露日志解析与问题定位

开启 logAbandoned=true 后，Druid 会在检测到泄露时输出详细日志，包含连接获取栈和当前线程栈，是定位问题的关键。

典型泄露日志

abandon connection, owner thread: http-nio-8080-exec-1, connected at : 1680000000000, open stackTrace  
    at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1234)  
    at com.example.service.UserService.queryUser(UserService.java:45)  // 连接获取位置  
    at com.example.controller.UserController.getUser(UserController.java:30)  
ownerThread current state is RUNNABLE, current stackTrace  
    at java.sql.Statement.executeQuery(Statement.java:146)  
    at com.example.service.UserService.queryUser(UserService.java:50)  // 线程当前执行位置  
    at com.example.controller.UserController.getUser(UserController.java:30)

日志解读

open stackTrace：记录连接的获取路径（即 getConnection() 被调用的位置），对应代码中获取连接的地方；
current stackTrace：记录泄露连接当前线程的执行路径，可定位到未释放连接的业务逻辑；
owner thread：持有泄露连接的线程名，便于在线程 dump 中进一步分析。

定位步骤

根据 open stackTrace 找到连接获取的代码位置（如 UserService.java:45）；
检查该位置的连接是否在 finally 块中关闭；
根据 current stackTrace 分析线程是否卡在某步操作（如长时间未返回的 SQL 执行），导致连接无法释放。

实战：解决连接泄露问题

代码层面：规范连接关闭

强制在 finally 中关闭连接：无论业务逻辑是否抛出异常，确保连接被关闭；

// 正确示例：在 finally 中关闭连接  
Connection conn = null;  
try {  
    conn = dataSource.getConnection();  
    // 业务逻辑  
} catch (SQLException e) {  
    e.printStackTrace();  
} finally {  
    if (conn != null) {  
        try {  
            conn.close();  // 确保关闭  
        } catch (SQLException e) {  
            e.printStackTrace();  
        }  
    }  
}

使用 try-with-resources：Java 7+ 支持自动关闭实现AutoCloseable的资源（如Connection）；

// 更简洁的写法：try-with-resources 自动关闭连接  
try (Connection conn = dataSource.getConnection()) {  
    // 业务逻辑（无需手动 close()）  
} catch (SQLException e) {  
    e.printStackTrace();  
}

框架层面：检查 ORM 配置

MyBatis：确保 SqlSession 被正确关闭（sqlSession.close()），或使用 SqlSessionTemplate（Spring 自动管理）；
Hibernate：检查 Session 是否通过 session.close() 或 Spring 事务管理器自动释放；
Spring JDBC：使用 JdbcTemplate（自动管理连接），避免直接操作 Connection。

监控层面：实时发现泄露

Druid 监控页面：访问 /druid/datasource.html，关注 ActiveCount（活跃连接数）和 RemoveAbandonedCount（泄露回收数），若 RemoveAbandonedCount 持续增长，说明存在泄露；
自定义告警：通过 Druid 的 JMX 指标（如 removeAbandonedCount）配置告警，当数值超过阈值时通知开发人员。

临时缓解：调整配置参数

若泄露问题暂时无法根治，可通过以下配置缓解影响：

增大 maxActive 临时提高连接池容量；
减小 removeAbandonedTimeout（如 60 秒），加快泄露连接的回收；
开启 logAbandoned 持续收集泄露日志，为后续排查提供依据。

生产环境使用建议

连接泄露检测功能对性能有一定影响（定时扫描和栈跟踪会消耗资源），生产环境需谨慎使用：

启用时机

不建议默认开启：正常运行时关闭 removeAbandoned，避免性能损耗；
排查阶段开启：当出现连接池耗尽、TimeoutException 等症状时，临时开启检测，定位泄露点；
灰度开启：对核心服务，可先在非核心节点开启检测，避免影响整体性能。

性能优化

合理设置扫描间隔：timeBetweenEvictionRunsMillis 建议设为 60 秒以上，减少扫描频率；
调整超时时间：removeAbandonedTimeout 不宜过短（避免误判长事务），也不宜过长（导致泄露连接长期占用资源）；
结合日志采样：若日志量过大，可临时关闭 logAbandoned，仅通过 removeAbandonedCount 判断是否存在泄露。

长效机制

代码审查：重点检查直接操作 Connection 的代码，确保 finally 块中关闭连接；
单元测试：为数据库操作编写单元测试，通过 DruidDataSource 的 getActiveCount() 验证连接是否正常释放；
压测验证：上线前通过压测工具（如 JMeter）模拟高并发，观察连接池指标是否稳定。

常见问题与解决方案

1. 误判长事务为泄露

现象：正常的长事务（如批量数据导入）被 removeAbandoned 误回收；
解决：增大 removeAbandonedTimeout 至长事务最大执行时间的 2 倍（如事务最长 5 分钟，超时设为 10 分钟）。

2. 泄露日志不打印

现象：logAbandoned=true 但无日志输出；
排查：
- 确认 removeAbandoned 已开启；
- 检查连接泄露是否达到 removeAbandonedTimeout 阈值；
- 确认日志级别（logAbandoned 输出 ERROR 级日志，需确保日志框架未过滤 ERROR 级别）。

3. 开启检测后性能下降

现象：应用响应时间变长，CPU 占用升高；
解决：关闭 logAbandoned（栈跟踪最耗性能），仅保留 removeAbandoned 回收功能，待问题定位后关闭检测。

Druid源码如下

public int removeAbandoned() {
    int removeCount = 0;

    long currrentNanos = System.nanoTime();

    List<DruidPooledConnection> abandonedList = new ArrayList<DruidPooledConnection>();

    activeConnectionLock.lock();
    try {
        Iterator<DruidPooledConnection> iter = activeConnections.keySet().iterator();

        for (; iter.hasNext();) {
            DruidPooledConnection pooledConnection = iter.next();

            if (pooledConnection.isRunning()) {
                continue;
            }

            long timeMillis = (currrentNanos - pooledConnection.getConnectedTimeNano()) / (1000 * 1000);

            if (timeMillis >= removeAbandonedTimeoutMillis) {
                iter.remove();
                pooledConnection.setTraceEnable(false);
                abandonedList.add(pooledConnection);
            }
        }
    } finally {
        activeConnectionLock.unlock();
    }

    if (abandonedList.size() > 0) {
        for (DruidPooledConnection pooledConnection : abandonedList) {
            final ReentrantLock lock = pooledConnection.lock;
            lock.lock();
            try {
                if (pooledConnection.isDisable()) {
                    continue;
                }
            } finally {
                lock.unlock();
            }

            JdbcUtils.close(pooledConnection);
            pooledConnection.abandond();
            removeAbandonedCount++;
            removeCount++;

            if (isLogAbandoned()) {
                StringBuilder buf = new StringBuilder();
                buf.append("abandon connection, owner thread: ");
                buf.append(pooledConnection.getOwnerThread().getName());
                buf.append(", connected at : ");
                buf.append(pooledConnection.getConnectedTimeMillis());
                buf.append(", open stackTrace\n");

                StackTraceElement[] trace = pooledConnection.getConnectStackTrace();
                for (int i = 0; i < trace.length; i++) {
                    buf.append("\tat ");
                    buf.append(trace[i].toString());
                    buf.append("\n");
                }

                buf.append("ownerThread current state is " + pooledConnection.getOwnerThread().getState()
                           + ", current stackTrace\n");
                trace = pooledConnection.getOwnerThread().getStackTrace();
                for (int i = 0; i < trace.length; i++) {
                    buf.append("\tat ");
                    buf.append(trace[i].toString());
                    buf.append("\n");
                }

                LOG.error(buf.toString());
            }
        }
    }

    return removeCount;
}