欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

tornado服务某个实例核吃满问题

时间:2023-04-24

tornado底层epoll陷入死循环,cpu100%

该进程所有fd详情信息:其中fd对应信息如下所示
4:tornado的epoll监听树的(epfd)句柄
3:listen监听socket
5:怀疑是gevent启动的epoll监听树的(epfd)句柄            
lrwx------ 1 root root 64 Feb 22 15:42 0 -> /1
lrwx------ 1 root root 64 Feb 22 15:42 1 -> /1
lrwx------ 1 root root 64 Feb 22 15:42 10 -> socket:[3858679084]
lrwx------ 1 root root 64 Feb 22 15:42 11 -> socket:[3858531359]
lrwx------ 1 root root 64 Feb 22 15:42 12 -> socket:[3893062439]
lrwx------ 1 root root 64 Feb 22 15:42 13 -> socket:[4021272049]
lrwx------ 1 root root 64 Feb 22 15:42 14 -> socket:[3736829588]
lrwx------ 1 root root 64 Feb 22 15:42 15 -> socket:[3858821377]
lrwx------ 1 root root 64 Feb 19 13:42 2 -> /1
lrwx------ 1 root root 64 Feb 22 15:42 3 -> socket:[3736762451]
lrwx------ 1 root root 64 Feb 22 15:42 35 -> socket:[3858851998]
lrwx------ 1 root root 64 Feb 22 15:42 36 -> socket:[3858856577]
lrwx------ 1 root root 64 Feb 22 15:42 37 -> socket:[3858548454]
lrwx------ 1 root root 64 Feb 22 15:42 38 -> socket:[3858769612]
lrwx------ 1 root root 64 Feb 22 15:42 39 -> socket:[3858868371]
lrwx------ 1 root root 64 Feb 22 15:42 4 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Feb 22 15:42 40 -> socket:[3858870568]
lrwx------ 1 root root 64 Feb 22 15:42 41 -> socket:[3858870569]
lrwx------ 1 root root 64 Feb 22 15:42 5 -> anon_inode:[eventfd]
l-wx------ 1 root root 64 Feb 22 15:42 6 -> /opt/log/api_20220225102901.log
lrwx------ 1 root root 64 Feb 22 15:42 7 -> socket:[3736762452]
lrwx------ 1 root root 64 Feb 22 15:42 8 -> socket:[3736745829]
lrwx------ 1 root root 64 Feb 22


tornado服务,监听fd为:3和7
python   32 root    3u     IPv4         3736762451      0t0        TCP *:http (LISTEN)
python   32 root    7u     IPv6     

堆栈一直在循环打印下面的信息(10,35,36,37,15,39,11,38)
15:24:57.286761 epoll_wait(4, [{EPOLLIN, {u32=10, u64=145376053035018}}, {EPOLLIN, {u32=35, u64=119443040501795}}, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=36, u64=105853763977252}}, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=37, u64=92676804313125}}, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=15, u64=163247411953679}}, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=39, u64=47201690583079}}, {EPOLLIN, {u32=11, u64=156190780686347}}, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=38, u64=66795331387430}}], 64, 140) = 8
15:24:57.286845 epoll_ctl(4, EPOLL_CTL_MOD, 36, {EPOLLIN, {u32=36, u64=105853763977252}}) = 0
15:24:57.286902 epoll_ctl(4, EPOLL_CTL_MOD, 37, {EPOLLIN, {u32=37, u64=92676804313125}}) = 0
15:24:57.286960 epoll_ctl(4, EPOLL_CTL_MOD, 15, {EPOLLIN, {u32=15, u64=163247411953679}}) = 0
15:24:57.287029 epoll_ctl(4, EPOLL_CTL_MOD, 39, {EPOLLIN, {u32=39, u64=47201690583079}}) = 0
15:24:57.287088 epoll_ctl(4, EPOLL_CTL_MOD, 38, {EPOLLIN, {u32=38, u64=66795331387430}}) = 0

疑惑点
如果是陷入tornado epoll的死循环,epoll_wait为啥不能接受新的连接请求(也就是获取到客户端的请求),因为epoll_wait没有看到fd=3的读事件,按理说是可以的。
是fd=3 socket已经重红黑树里面删掉了还是有其他原因。当然也有可能是没看到,如果有的话,应该会多处于CLOSE_WAIT状态的socket


其中fd=10,fd=35,fd=11的都是tcp连接,服务端TCP状态为Close_WAIT(远程客户端调用服务端的请求,信息发送完毕,客户端已经发起并关闭socket,服务端socket接收缓冲区有数据),不知道为啥不会处理?
下面是这两个fd对应的socket信息:netstat -p 查出来的数据
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp     1586      0 zte-rdcloud-testce:http 172.30.0.4:60970        CLOSE_WAIT  32/python
tcp     4088      0 zte-rdcloud-testce:http 172.30.0.4:43076        CLOSE_WAIT  32/python
tcp     4088      0 zte-rdcloud-testce:http 172.30COMMAND PID USER   FD      TYPE     .0.4:43076        CLOSE_WAIT  32/python

fd:10,35,31用lsof查出来的数据:
lsof:
python   32 root   10u     IPv4         3858679084      0t0        TCP zte-rdcloud-testcenter-testlib-41-fwx5f:http->172.30.0.4:60970 (CLOSE_WAIT)
python   32 root   35u     IPv4         3858851998      0t0        TCP zte-rdcloud-testcenter-testlib-41-fwx5f:http->172.30.0.4:43076 (CLOSE_WAIT)
python   32 root   11u     IPv4         3858531359      0t0        TCP zte-rdcloud-testcenter-testlib-41-fwx5f:http->172.30.0.3:40016 (CLOSE_WAIT)


其他fd(15,36,37,38,39):netstat 查不出来,lsof查出来的数据
lsof:
COMMAND PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
python   32 root   15u     sock                0,7      0t0 3858821377 protocol: TCP
python   32 root   36u     sock                0,7      0t0 3858856577 protocol: TCP
python   32 root   37u     sock                0,7      0t0 3858548454 protocol: TCP
python   32 root   38u     sock                0,7      0t0 3858769612 protocol: TCP
python   32 root   39u     sock                0,7      0t0 3858868371 protocol: TCP

---------------------------------------------------------------------------------------------------------------------------------

新进展:

进程号:41
tornado:死循环调用epoll_wait,又不处理socket里面的数据

strace堆栈信息:循环打印下面的信息,显示一直循环调用。这这个socket都是客户端的连接,并且已经发送完数据
epoll_wait(4, [{EPOLLIN, {u32=24, u64=523904405733400}}, {EPOLLIN, {u32=25, u64=551181743030297}}, {EPOLLIN, {u32=15, u64=528938107404303}}, {EPOLLIN, {u32=37, u64=382690176008229}}, {EPOLLIN, {u32=27, u64=446019468787739}}, {EPOLLIN, {u32=26, u64=499624955609114}}, {EPOLLIN, {u32=38, u64=326211356065830}}], 64, 256) = 7

netstat:查看这几个socket,tcp状态信息(显示接受队列有数据,但是不读取)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        1      0 zte-rdcloud-testce:http 172.30.0.2:50762        CLOSE_WAIT  41/python
tcp     1830      0 zte-rdcloud-testce:http 172.30.0.4:38716        CLOSE_WAIT  41/python
tcp      884      0 zte-rdcloud-testce:http 172.30.0.3:51108        CLOSE_WAIT  41/python
tcp     1011      0 zte-rdcloud-testce:http 172.30.0.3:36216        CLOSE_WAIT  41/python
tcp        1      0 zte-rdcloud-testce:http 172.30.0.2:53818        CLOSE_WAIT  41/python
tcp        1      0 zte-rdcloud-testce:http 172.30.0.4:56326        CLOSE_WAIT  41/python
tcp     6979      0 zte-rdcloud-testce:http 172.30.0.3:50746        CLOSE_WAIT  41/python

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。