背景:有一个应用需要在每天凌晨重启,99%的凌晨重启都是没有问题,但是会有那么几天重启失败,最终容器显示Exited(137),显然是被强杀的,最后也没有被拉起来。注:并没有使用k8s之类的管理工具
sudo docker-compose restart
Jun 21 00:10:02 kernel: [4951532.380161] traps: dotnet[1187869] general protection fault ip:7fdeb41b8602 sp:7ffd89054080 error:0 in libc-2.31.so[7fdeb41b8000+159000]
Jun 21 00:10:06 promtail[2168718]: level=info ts=2024-06-20T16:10:06.112051714Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/ora_errs.log
Jun 21 00:10:06 promtail[2168718]: level=info ts=2024-06-20T16:10:06.112100314Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/sqlnet.log
Jun 21 00:10:11 dockerd[1719]: time="2024-06-21T00:10:11.566049088+08:00" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 spanID=49e3f24564d76194 traceID=575aacdbed2a4f0790655e9a5124fee6
Jun 21 00:10:13 dotnet: QueryProvider ElementType 1= Crm.Entity.new_srv_partsapply
Jun 21 00:10:13 dotnet: QueryProvider ElementType 1= Crm.Entity.new_srv_partsapplydetil
Jun 21 00:10:24 promtail[2168718]: level=info ts=2024-06-20T16:10:24.715678816Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/ora_errs.log
Jun 21 00:10:24 promtail[2168718]: level=info ts=2024-06-20T16:10:24.715732316Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/sqlnet.log
Jun 21 00:10:30 promtail[2168718]: level=info ts=2024-06-20T16:10:30.557965189Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/ora_errs.log
Jun 21 00:10:30 promtail[2168718]: level=info ts=2024-06-20T16:10:30.558008889Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/sqlnet.log
Jun 21 00:10:32 dockerd[1719]: time="2024-06-21T00:10:32.706205599+08:00" level=warning msg="Container failed to exit within 10s of kill - trying direct SIGKILL" container=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 error="context deadline exceeded"
Jun 21 00:10:35 systemd[1]: docker-7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4.scope: Succeeded.
Jun 21 00:10:35 systemd[1]: docker-7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4.scope: Consumed 25min 1.177s CPU time.
Jun 21 00:10:35 containerd[980]: time="2024-06-21T00:10:35.084327378+08:00" level=info msg="shim disconnected" id=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4
Jun 21 00:10:35 containerd[980]: time="2024-06-21T00:10:35.084401578+08:00" level=warning msg="cleaning up after shim disconnected" id=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 namespace=moby
Jun 21 00:10:35 containerd[980]: time="2024-06-21T00:10:35.084416978+08:00" level=info msg="cleaning up dead shim"
Jun 21 00:10:35 dockerd[1719]: time="2024-06-21T00:10:35.084337578+08:00" level=info msg="ignoring event" container=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jun 21 00:10:35 containerd[980]: time="2024-06-21T00:10:35.095510029+08:00" level=warning msg="cleanup warnings time=\"2024-06-21T00:10:35+08:00\" level=info msg=\"starting signal loop\" namespace=moby pid=1696094 runtime=io.containerd.runc.v2\n"
Jun 21 00:10:35 dockerd[1719]: time="2024-06-21T00:10:35.096187332+08:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 daemonShuttingDown=false error="restart canceled" execDuration=24h0m3.432490054s exitStatus="{137 2024-06-20 16:10:35.065257491 +0000 UTC}" hasBeenManuallyStopped=true restartCount=0
Jun 21 00:10:36 promtail[2168718]: level=info ts=2024-06-20T16:10:36.396917782Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/ora_errs.log
Jun 21 00:10:36 promtail[2168718]: level=info ts=2024-06-20T16:10:36.397003983Z caller=filetarget.go:348 msg="failed to tail file" error="file is a directory" filename=/db/websites/servicecloud/20230327_V6_13/public/log/sqlnet.log
Jun 21 00:10:36 dockerd[1719]: time="2024-06-21T00:10:36.772699401+08:00" level=error msg="error killing container: context deadline exceeded" container=7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4 error="tried to kill container, but did not receive an exit event" spanID=49e3f24564d76194 traceID=575aacdbed2a4f0790655e9a5124fee6
Jun 21 00:10:36 dockerd[1719]: time="2024-06-21T00:10:36.866932032+08:00" level=error msg="Handler for POST /v1.45/containers/7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4/restart returned error: Cannot restart container 7937a5b339e25a6cd8ef43703f9a62e81b789d5aa40acf25bf1221d7f0f9f4a4: tried to kill container, but did not receive an exit event" spanID=49e3f24564d76194 traceID=575aacdbed2a4f0790655e9a5124fee6
分析错误可以看出是stop 10s超时了,通过翻阅一些资料排查到应用程序并没有优雅退出,咱们博客园里也有类似的问题https://q.cnblogs.com/q/113552,我有几个疑问:
OnShutdown()
方法来显式关闭?OnShutdown()
中使用Environment.Exit(143);
直接退出,是正确的方式吗?同时大佬们看下有没有遇到类似的问题,有没有别的什么思路,感谢~
从错误日志中可以看出,你的应用程序没有在给定的时间内完成停止动作,导致超时错误。这可能是由于应用程序没有正确处理优雅退出的信号或关闭操作引起的。
针对你的问题和疑问,以下是一些建议和解释:
OnShutdown
方法显式关闭应用程序:在 ASP.NET Core 应用程序中,可以通过 Host
或 WebHost
来注册 OnShutdown
事件,以便在应用程序关闭时执行一些清理操作。例如释放资源、保存状态等。在该事件处理程序中,你可以使用合适的方法来关闭应用程序,例如调用 Stop
方法或发送合适的指令给应用程序。这可以确保应用程序正确地处理关闭操作,而不会超时。Environment.Exit(143)
来退出应用程序:Environment.Exit
方法会立即退出应用程序,并返回指定的退出代码。在某些情况下,例如在应用程序接收到某个特定的信号时,使用 Environment.Exit
退出应用程序可能是合适的。但需要注意的是,这个方法会直接终止应用程序的执行,可能会导致一些资源没有被正确释放或清理。OnShutdown
事件处理程序中,你应该避免使用 Environment.Exit
方法。相反,你可以调用适当的方法来正常停止应用程序的执行。这可能包括关闭服务、终止长时间运行的任务、保存状态等。