[Deepin-Kernel-SIG] [linux 6.6.y] [Upstream] tcp: remove 64 KByte limit for initial tp->rcv_wnd value#1713
Conversation
mainline inclusion from mainline-v6.10-rc1 categroy: bugfix Recently, we had some servers upgraded to the latest kernel and noticed the indicator from the user side showed worse results than before. It is caused by the limitation of tp->rcv_wnd. In 2018 commit a337531 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most CDN teams would not benefit from this change because they cannot have a large window to receive a big packet, which will be slowed down especially in long RTT. Small rcv_wnd means slow transfer speed, to some extent. It's the side effect for the latency/time-sensitive users. To avoid future confusion, current change doesn't affect the initial receive window on the wire in a SYN or SYN+ACK packet which are set within 65535 bytes according to RFC 7323 also due to the limit in __tcp_transmit_skb(): th->window = htons(min(tp->rcv_wnd, 65535U)); In one word, __tcp_transmit_skb() already ensures that constraint is respected, no matter how large tp->rcv_wnd is. The change doesn't violate RFC. Let me provide one example if with or without the patch: Before: client --- SYN: rwindow=65535 ---> server client <--- SYN+ACK: rwindow=65535 ---- server client --- ACK: rwindow=65536 ---> server Note: for the last ACK, the calculation is 512 << 7. After: client --- SYN: rwindow=65535 ---> server client <--- SYN+ACK: rwindow=65535 ---- server client --- ACK: rwindow=175232 ---> server Note: I use the following command to make it work: ip route change default via [ip] dev eth0 metric 100 initrwnd 120 For the last ACK, the calculation is 1369 << 7. When we apply such a patch, having a large rcv_wnd if the user tweak this knob can help transfer data more rapidly and save some rtts. Fixes: a337531 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240521134220.12510-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 378979e) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
mainline inclusion from mainline-v6.10-rc2 category: bugfix Jason commit made checks against ACK sequence less strict and can be exploited by attackers to establish spoofed flows with less probes. Innocent users might use tcp_rmem[1] == 1,000,000,000, or something more reasonable. An attacker can use a regular TCP connection to learn the server initial tp->rcv_wnd, and use it to optimize the attack. If we make sure that only the announced window (smaller than 65535) is used for ACK validation, we force an attacker to use 65537 packets to complete the 3WHS (assuming server ISN is unknown) Fixes: 378979e ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value") Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240523130528.60376-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Conflicts: net/ipv4/tcp_ipv4.c net/ipv6/tcp_ipv6.c (cherry picked from commit f4dca95) Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Reviewer's GuideAdjusts TCP initial receive window handling to remove the 64KB cap while ensuring SYNACK-exposed window and SYN-RECV state validation remain limited to 64KB for hardening, reusing a new helper across IPv4/IPv6 paths. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider using the existing U16_MAX (or a TCP-specific constant) instead of the hard-coded 65535U in tcp_synack_window() to make the RFC limit clearer and avoid magic numbers.
- tcp_synack_window() is TCP-specific but currently lives in request_sock.h, which is fairly generic; consider moving it to a TCP header (or at least prefixing the comment more explicitly as TCP-only) to avoid leaking protocol-specific helpers into generic request_sock code.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider using the existing U16_MAX (or a TCP-specific constant) instead of the hard-coded 65535U in tcp_synack_window() to make the RFC limit clearer and avoid magic numbers.
- tcp_synack_window() is TCP-specific but currently lives in request_sock.h, which is fairly generic; consider moving it to a TCP header (or at least prefixing the comment more explicitly as TCP-only) to avoid leaking protocol-specific helpers into generic request_sock code.
## Individual Comments
### Comment 1
<location path="include/net/request_sock.h" line_range="249-251" />
<code_context>
+ * This means the SEG.WND carried in SYNACK can not exceed 65535.
+ * We use this property to harden TCP stack while in NEW_SYN_RECV state.
+ */
+static inline u32 tcp_synack_window(const struct request_sock *req)
+{
+ return min(req->rsk_rcv_wnd, 65535U);
+}
#endif /* _REQUEST_SOCK_H */
</code_context>
<issue_to_address>
**nitpick:** Use U16_MAX instead of hard-coded 65535U for clarity and consistency
Since this helper is specifically constraining the window to 16 bits, using U16_MAX here makes that intent explicit, avoids a magic number, and stays consistent with other TCP code that uses the same limit.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| static inline u32 tcp_synack_window(const struct request_sock *req) | ||
| { | ||
| return min(req->rsk_rcv_wnd, 65535U); |
There was a problem hiding this comment.
nitpick: Use U16_MAX instead of hard-coded 65535U for clarity and consistency
Since this helper is specifically constraining the window to 16 bits, using U16_MAX here makes that intent explicit, avoids a magic number, and stays consistent with other TCP code that uses the same limit.
There was a problem hiding this comment.
Pull request overview
This PR backports an upstream Linux TCP change to the deepin 6.6.y kernel that removes the 64KB cap on the initial tp->rcv_wnd while still ensuring SYNACK segments advertise a window that fits within the 16-bit SEG.WND field. A new tcp_synack_window() helper centralizes the 64KB clamp and is reused for IPv4/IPv6 ACK paths and the tcp_check_req() window check.
Changes:
- Remove
U16_MAXcap from initial receive window selection intcp_select_initial_window(). - Introduce
tcp_synack_window()helper ininclude/net/request_sock.hthat clampsrsk_rcv_wndto 65535. - Use the helper in IPv4/IPv6
reqsk_send_ackpaths and intcp_check_req()window validation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| include/net/request_sock.h | Adds the tcp_synack_window() helper enforcing the 64KB cap, with RFC 7323 explanatory comment. |
| net/ipv4/tcp_output.c | Removes the U16_MAX clamp on initial rcv_wnd in tcp_select_initial_window(). |
| net/ipv4/tcp_minisocks.c | Uses tcp_synack_window() in the in-window check of tcp_check_req(). |
| net/ipv4/tcp_ipv4.c | Replaces direct req->rsk_rcv_wnd with tcp_synack_window(req) in tcp_v4_reqsk_send_ack(); drops moved RFC comment. |
| net/ipv6/tcp_ipv6.c | Same substitution for tcp_v6_reqsk_send_ack(); drops moved RFC comment. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary by Sourcery
Relax TCP initial receive window limits while constraining SYNACK window advertisements for security and RFC compliance.
Bug Fixes:
Enhancements: