|
| 1 | +.TH NETTRACE 8 "20 JULY 2022" Linux "User Manuals" |
| 2 | +.SH NAME |
| 3 | +.PP |
| 4 | +nettrace \- Linux系统下的网络报文跟踪、网络问题诊断工具 |
| 5 | +.SH SYNOPSIS |
| 6 | +.PP |
| 7 | +\fB\fCnettrace\fR [选项] |
| 8 | +.SH DESCRIPTION |
| 9 | +.PP |
| 10 | +\fB\fCnettrace\fR是基于eBPF的集网络报文跟踪(故障定位)、网络故障诊断、网络异常监控于一体的网 |
| 11 | +络工具集,旨在能够提供一种更加高效、易用的方法来解决复杂场景下的网络问题。 |
| 12 | +.SH OPTIONS |
| 13 | +.TP |
| 14 | +\fB\fC\-s,\-\-saddr\fR \fIsource_address\fP |
| 15 | +根据IP源地址来进行报文筛选 |
| 16 | +.TP |
| 17 | +\fB\fC\-d,\-\-daddr\fR \fIdest_address\fP |
| 18 | +根据IP目的地址来进行报文筛选 |
| 19 | +.TP |
| 20 | +\fB\fC\-\-addr\fR \fIaddress\fP |
| 21 | +根据IP源地址或者目的地址来进行报文筛选 |
| 22 | +.TP |
| 23 | +\fB\fC\-S,\-\-sport\fR \fIsource_port\fP |
| 24 | +根据UDP/TCP源端口进行报文筛选 |
| 25 | +.TP |
| 26 | +\fB\fC\-D,\-\-dport\fR \fIdest_port\fP |
| 27 | +根据UDP/TCP目的端口进行报文筛选 |
| 28 | +.TP |
| 29 | +\fB\fC\-\-port\fR \fIport\fP |
| 30 | +根据UDP/TCP源端口或者目的端口进行报文筛选 |
| 31 | +.TP |
| 32 | +\fB\fC\-p,\-\-proto\fR \fIprotocol\fP |
| 33 | +根据报文的协议(三层或者四层)进行过滤,如\fI\-p udp\fP |
| 34 | +.TP |
| 35 | +\fB\fC\-t,\-\-trace\fR \fItraces\fP |
| 36 | +要启用(跟踪)的内核函数、tracepoint。 |
| 37 | +.IP |
| 38 | +这里将这些被跟踪的对象(内核函数、tracepoint等)简称为跟踪器, |
| 39 | +所有的跟踪器以树状图的方式被组织了起来,使用命令: |
| 40 | +\fInettrace \-t ?\fP |
| 41 | +可以查看所有的跟踪器。 |
| 42 | +.IP |
| 43 | +默认情况下,大部分的跟踪器会被启用,一些设备相关的跟踪器(如ipvlan、bridge等)默认 |
| 44 | +不启用。使用参数\fI\-t all\fP可启用所有的跟踪器。 |
| 45 | +.IP |
| 46 | +可以同时指定多个跟踪器,以\fI,\fP分隔,比如\fInettrace \-t ip,link,kfree_skb\fP。 |
| 47 | +可以指定跟踪器的目录,也可以直接指定跟踪器。 |
| 48 | +.TP |
| 49 | +\fB\fC\-\-ret\fR |
| 50 | +显示被跟踪的内核函数的返回值 |
| 51 | +.TP |
| 52 | +\fB\fC\-\-detail\fR |
| 53 | +显示跟踪详细信息,包括当前的进程、网口和CPU等信息 |
| 54 | +.TP |
| 55 | +\fB\fC\-\-basic\fR |
| 56 | +启用\fB\fCbasic\fR跟踪模式。默认情况下,启用的是生命周期跟踪模式。启用该模式后,会直接打印 |
| 57 | +出报文所经过的内核函数/tracepoint |
| 58 | +.TP |
| 59 | +\fB\fC\-\-intel\fR |
| 60 | +启用诊断模式 |
| 61 | +.TP |
| 62 | +\fB\fC\-\-intel\-quiet\fR |
| 63 | +只显示出现存在问题的报文,不显示正常的报文 |
| 64 | +.TP |
| 65 | +\fB\fC\-\-intel\-keep\fR |
| 66 | +持续跟踪。\fB\fCintel\fR模式下,默认在跟踪到异常报文后会停止跟踪,使用该参数后,会持续跟踪下去。 |
| 67 | +.TP |
| 68 | +\fB\fC\-\-hooks\fR |
| 69 | +打印netfilter上的钩子函数 |
| 70 | +.TP |
| 71 | +\fB\fC\-v\fR |
| 72 | +显示程序启动的日志信息 |
| 73 | +.TP |
| 74 | +\fB\fC\-\-debug\fR |
| 75 | +显示调试信息 |
| 76 | +.SH EXAMPLES |
| 77 | +.SS 生命周期跟踪 |
| 78 | +.TP |
| 79 | +跟踪源地址为\fB\fC192.168.1.8\fR的ping报文: |
| 80 | +\fInettrace \-p icmp \-s 192.168.1.8\fP |
| 81 | +.TP |
| 82 | +跟踪源地址为\fB\fC192.168.1.8\fR的ping报文在IP协议层和ICMP协议层的路径: |
| 83 | +\fInettrace \-p icmp \-s 192.168.1.8 \-t ip,icmp\fP |
| 84 | +.TP |
| 85 | +显示详细信息: |
| 86 | +\fInettrace \-p icmp \-s 192.168.1.8 \-\-detail\fP |
| 87 | +.SS 诊断模式 |
| 88 | +.PP |
| 89 | +使用方式与上面的一致,加个\fB\fCintel\fR参数即可使用诊断模式。上文的生命周期模式对于使用者的 |
| 90 | +要求比较高,需要了解内核协议栈各个函数的用法、返回值的意义等,易用性较差。诊断模式是在 |
| 91 | +生命周期模式的基础上,提供了更加丰富的信息,使得没有网络开发经验的人也可进行复杂 |
| 92 | +网络问题的定位和分析。 |
| 93 | +.PP |
| 94 | +比于普通模式,诊断模式提供了更多的可供参考的信息,包括当前报文经过了iptables的哪些表和 |
| 95 | +哪些链、报文发生了NAT、报文被克隆了等。诊断模式设置了三种提示级别: |
| 96 | +.RS |
| 97 | +.IP \(bu 2 |
| 98 | +\fB\fCINFO\fR:正常的信息提示 |
| 99 | +.IP \(bu 2 |
| 100 | +\fB\fCWARN\fR:警告信息,该报文可能存在一定的问题,需要关注 |
| 101 | +.IP \(bu 2 |
| 102 | +\fB\fCERROR\fR:异常信息,报文发生了问题(比如被丢弃)。 |
| 103 | +.RE |
| 104 | +.PP |
| 105 | +如果当前报文存在\fB\fCERROR\fR,那么工具会给出一定的诊断修复建议,并终止当前诊断操作。通过添 |
| 106 | +加\fB\fCintel\-keep\fR可以在发生\fB\fCERROR\fR事件时不退出,继续进行跟踪分析。下面是发生异常时的日志: |
| 107 | +.PP |
| 108 | +.RS |
| 109 | +.nf |
| 110 | +\&./nettrace \-p icmp \-\-intel \-\-saddr 192.168.122.8 |
| 111 | +begin trace... |
| 112 | +***************** ffff889fb3c64f00 *************** |
| 113 | +[4049.295546] [__netif_receive_skb_core] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 114 | +[4049.295566] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *ipv4 in chain: PRE_ROUTING* |
| 115 | +[4049.295578] [nft_do_chain ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *iptables table:nat, chain:PREROUT* *packet is accepted* |
| 116 | +[4049.295594] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *bridge in chain: PRE_ROUTING* |
| 117 | +[4049.295612] [__netif_receive_skb_core] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 118 | +[4049.295624] [ip_rcv ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 119 | +[4049.295629] [ip_rcv_core ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 120 | +[4049.295640] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *ipv4 in chain: PRE_ROUTING* |
| 121 | +[4049.295644] [ip_rcv_finish ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 122 | +[4049.295655] [ip_route_input_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 123 | +[4049.295664] [fib_validate_source ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 124 | +[4049.295683] [ip_forward ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 |
| 125 | +[4049.295687] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *ipv4 in chain: FORWARD* *packet is dropped by netfilter (NF_DROP)* |
| 126 | +[4049.295695] [nft_do_chain ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *iptables table:filter, chain:FORWARD* *packet is dropped by iptables/iptables\-nft* |
| 127 | +[4049.295711] [kfree_skb ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 0 *packet is dropped by kernel* |
| 128 | +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- ANALYSIS RESULT \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- |
| 129 | +[1] ERROR happens in nf_hook_slow(netfilter): |
| 130 | + packet is dropped by netfilter (NF_DROP) |
| 131 | + fix advice: |
| 132 | + check your netfilter rule |
| 133 | + |
| 134 | +[2] ERROR happens in nft_do_chain(netfilter): |
| 135 | + packet is dropped by iptables/iptables\-nft |
| 136 | + fix advice: |
| 137 | + check your iptables rule |
| 138 | + |
| 139 | +[3] ERROR happens in kfree_skb(life): |
| 140 | + packet is dropped by kernel |
| 141 | + location: |
| 142 | + nf_hook_slow+0x96 |
| 143 | + drop reason: |
| 144 | + NETFILTER_DROP |
| 145 | + |
| 146 | +analysis finished! |
| 147 | + |
| 148 | +end trace... |
| 149 | +.fi |
| 150 | +.RE |
| 151 | +.PP |
| 152 | +从这里的日志可以看出,在报文经过iptables的filter表的forward链的时候,发生了丢包。在 |
| 153 | +诊断结果里,会列出所有的异常事件,一个报文跟踪可能会命中多条诊断结果。这里的诊断建议是让 |
| 154 | +用户检查iptables中的规则是否存在问题。 |
| 155 | +.PP |
| 156 | +其中,\fB\fCkfree_skb\fR这个跟踪点是对\fB\fCdrop reason\fR内核特性(详见droptrace中的介绍)做了 |
| 157 | +适配的,可以理解为将droptrace的功能集成到了这里的诊断结果中,这里可以看出其给出的丢包 |
| 158 | +原因是\fB\fCNETFILTER_DROP\fR。因此,可以通过一下命令来监控内核中所有的丢包事件以及丢包原因: |
| 159 | +.PP |
| 160 | +\fInettrace \-t kfree_skb \-\-intel \-\-intel\-keep\fP |
| 161 | +.SS netfilter支持 |
| 162 | +.PP |
| 163 | +网络防火墙是网络故障、网络不同发生的重灾区,因此\fB\fCnetfilter\fR工具对\fB\fCnetfilter\fR提供了 |
| 164 | +完美适配,包括老版本的\fB\fCiptables\-legacy\fR和新版本的\fB\fCiptables\-nft\fR。诊断模式下, |
| 165 | +\fB\fCnettrace\fR能够跟踪报文所经过的\fB\fCiptables\fR表和\fB\fCiptables\fR链,并在发生由于iptables |
| 166 | +导致的丢包时给出一定的提示,上面的示例充分展现出了这部分。出了对iptables的支持, |
| 167 | +\fB\fCnettrace\fR对整个netfilter大模块也提供了支持,能够显示在经过每个HOOK点时对应的协议族 |
| 168 | +和链的名称。除此之外,为了应对一些注册到netfilter中的第三方内核模块导致的丢包问题, |
| 169 | +\fB\fCnettrace\fR还可以通过添加参数\fB\fChooks\fR来打印出当前\fB\fCHOOK\fR上所有的的钩子函数,从而深入 |
| 170 | +分析问题: |
| 171 | +.PP |
| 172 | +.RS |
| 173 | +.nf |
| 174 | +\&./nettrace \-p icmp \-\-intel \-\-saddr 192.168.122.8 \-\-hooks |
| 175 | +begin trace... |
| 176 | +***************** ffff889faa054500 *************** |
| 177 | +[5810.702473] [__netif_receive_skb_core] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 178 | +[5810.702491] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *ipv4 in chain: PRE_ROUTING* |
| 179 | +[5810.702504] [nft_do_chain ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *iptables table:nat, chain:PREROUT* *packet is accepted* |
| 180 | +[5810.702519] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *bridge in chain: PRE_ROUTING* |
| 181 | +[5810.702527] [__netif_receive_skb_core] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 182 | +[5810.702535] [ip_rcv ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 183 | +[5810.702540] [ip_rcv_core ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 184 | +[5810.702546] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *ipv4 in chain: PRE_ROUTING* |
| 185 | +[5810.702551] [ip_rcv_finish ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 186 | +[5810.702556] [ip_route_input_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 187 | +[5810.702565] [fib_validate_source ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 188 | +[5810.702579] [ip_forward ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 |
| 189 | +[5810.702583] [nf_hook_slow ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *ipv4 in chain: FORWARD* *packet is dropped by netfilter (NF_DROP)* |
| 190 | +[5810.702586] [nft_do_chain ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *iptables table:filter, chain:FORWARD* *packet is dropped by iptables/iptables\-nft* |
| 191 | +[5810.702599] [kfree_skb ] ICMP: 192.168.122.8 \-> 10.123.119.98 ping request, seq: 943 *packet is dropped by kernel* |
| 192 | +\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- ANALYSIS RESULT \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- |
| 193 | +[1] ERROR happens in nf_hook_slow(netfilter): |
| 194 | + packet is dropped by netfilter (NF_DROP) |
| 195 | + |
| 196 | + following hook functions are blamed: |
| 197 | + nft_do_chain_ipv4 |
| 198 | + |
| 199 | + fix advice: |
| 200 | + check your netfilter rule |
| 201 | + |
| 202 | +[2] ERROR happens in nft_do_chain(netfilter): |
| 203 | + packet is dropped by iptables/iptables\-nft |
| 204 | + fix advice: |
| 205 | + check your iptables rule |
| 206 | + |
| 207 | +[3] ERROR happens in kfree_skb(life): |
| 208 | + packet is dropped by kernel |
| 209 | + location: |
| 210 | + nf_hook_slow+0x96 |
| 211 | + drop reason: |
| 212 | + NETFILTER_DROP |
| 213 | + |
| 214 | +analysis finished! |
| 215 | + |
| 216 | +end trace... |
| 217 | +.fi |
| 218 | +.RE |
| 219 | +.PP |
| 220 | +可以看出,上面\fB\fCfollowing hook functions are blamed\fR中列出了导致当前\fB\fCnetfilter\fR |
| 221 | +丢包的所有的钩子函数,这里只有\fB\fCiptables\fR一个钩子函数。 |
| 222 | +.SH REQUIREMENTS |
| 223 | +.PP |
| 224 | +内核需要支持CONFIG\fIBPF, CONFIG\fPKPROBE功能 |
| 225 | +.SH OS |
| 226 | +.PP |
| 227 | +Linux |
| 228 | +.SH AUTHOR |
| 229 | +.PP |
| 230 | +Menglong Dong |
| 231 | +.SH SEE ALSO |
| 232 | +.PP |
| 233 | +.BR nettrace-legacy (8), |
| 234 | +.BR droptrace (8) |
0 commit comments