--- title: Nginx nchan 模块导致 SSL 证书批量续期失败 createTime: 2026/04/22 08:43:00 tags: - nginx - ssl - certbot - nchan --- ## 问题背景 在例行检查 SSL 证书自动续期时,发现 `certbot renew --dry-run` 命令出现大量失败。13 个域名中,前 6 个续期成功,后 7 个全部失败,返回 **504 Gateway Timeout** 错误。 ## 环境信息 | 组件 | 版本/配置 | | -------- | --------- | | Nginx | 1.24.0 | | Certbot | 2.9.0 | | 域名数量 | 13 个 | ## 错误信息 ```text Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems: Domain: example.a.com Type: unauthorized Detail: 12.34.56.78: Invalid response from http://example.a.com/.well-known/acme-challenge/xxx: 504 ``` ## 初步排查 ### 1. 检查 Nginx 状态 Nginx 服务显示运行正常,但发现异常: ```bash $ pgrep nginx | wc -l 147 ``` 有 **147 个 nginx 进程**,远超正常数量(通常 1 master + N workers)。 并且无法正常访问到任何nginx代理的服务,疑似进程阻塞。 ### 2. 检查错误日志 ```bash $ grep "23:18" /var/log/nginx/error.log ``` 发现大量 worker 进程崩溃记录: ```text 2026/04/21 23:18:20 [alert] 149662#149662: worker process 153306 exited on signal 6 (core dumped) 2026/04/21 23:18:20 [alert] 149662#149662: shared memory zone "memstore" was locked by 153306 2026/04/21 23:18:20 [alert] 149662#149662: worker process 153307 exited on signal 6 (core dumped) 2026/04/21 23:18:20 [alert] 149662#149662: shared memory zone "memstore" was locked by 153307 ... ``` 统计崩溃次数: ```bash $ grep -c "exited on signal 6" /var/log/nginx/error.log 2361 ``` **2361 次 worker 进程崩溃!** ## 问题分析 ```text certbot renew → 修改 nginx 配置 → nginx reload → nchan 模块 bug → worker 崩溃 → 无法处理请求 → Let's Encrypt 等待超过 60 秒 → 504 超时 ``` certbot 按顺序处理证书,每个证书需要: 1. 修改 nginx 配置添加临时验证路径 2. reload nginx 3. 等待 Let's Encrypt 验证 4. 恢复配置 当处理到第 7 个证书时,频繁的 reload 触发了 nchan 模块的 bug,导致 worker 进程批量崩溃。此时 nginx 无法正常响应请求,后续所有证书验证都超时失败。 根据 [Nginx Ticket #1135](https://trac.nginx.org/nginx/ticket/1135) 的记录,nchan 模块在 nginx reload 时存在已知问题: > After upgrading from 1.10.1 without ALPN support to 1.10.2 with ALPN support... we've been getting into situations where Nginx completely stops serving connections without any warning. > > The nginx error log on the affected hosts gets these odd messages: > > ```text > worker process exited on signal 6 (core dumped) > shared memory zone "memstore" was locked by xxx > ``` ## 解决方案 禁用 nchan 模块 ```bash # 1. 找到 nchan 模块配置 ls -la /etc/nginx/modules-enabled/ | grep nchan # lrwxrwxrwx 1 root root 49 Apr 20 18:50 50-mod-nchan.conf -> /usr/share/nginx/modules-available/mod-nchan.conf # 2. 删除符号链接(禁用模块) sudo rm /etc/nginx/modules-enabled/50-mod-nchan.conf # 3. 测试配置 sudo nginx -t # nginx: the configuration file /etc/nginx/nginx.conf syntax is ok # nginx: configuration file /etc/nginx/nginx.conf test is successful # 4. 重启 nginx sudo service nginx restart ``` 重新运行 certbot 续期测试: ```bash sudo certbot renew --dry-run ``` 结果 13 个证书全部续期成功 ## 参考链接 - [Nginx Ticket #1135 - Connections timing out after upgrading to 1.10.2](https://trac.nginx.org/nginx/ticket/1135) - [nchan 官方文档](https://nchan.io/) - [Certbot 文档](https://eff-certbot.readthedocs.io/)