Tuesday, May 27, 2008

scp stalled through firewall, ssh no problem

  • someone says:
I have seen similar effects by several reasons:
- disallow icmp and mtu mismatch between networks (e.g. different
networks with then ethernet).
- Split routes
- firewall timeouts
- auto negotiation
========
Stalled "scp" session
Symptom: When "scp" huge files (> 4GB) between hosts, it stalls forever at random instants. It even happens with ftp/rsync. Two reasons may attribute to this problem:
1. Since scp greedyly grabs as much bandwidth of the network as possible when it transfers files, any delay caused by the network switch or the SuSE firewall can easily make the TCP connection stalled.
For this reason, the solution is to limit the bandwidth quota for scp as below:
username@localhost> scp -l 2000 SOURCE DESTINATION # The option "-l 2000" limits the bandwidth up to 2000 Kbit/s which is safe and fast enough.
2. It is due to the Linux SACK implementation problem for
both 2.4 and 2.6 when the TCP window is > 20 MB. Linux
takes such long time to locate the SACKed packet that
a TCP timeout is easily reached and CWND goes back to
the first packet when there are too many packets in flight
and a SACK event is invoked.
Please refer to the following links for information about
SACK:
http://www.ietf.org/rfc/rfc2018.txt
http://www.ietf.org/rfc/rfc1072.txt
It might be working to restrict the TCP buffer size to about 12 MB. However,
the total throughput is limited. The better solution may be:
username@localhost> su # Enter the root password
append "net.ipv4.tcp_sack=0" to /etc/sysctl.conf
username@localhost> sysctl -p
Or
username@localhost> su # Enter the root password
username@localhost> cat 0 > /proc/sys/net/ipv4/tcp_sack
Or
username@localhost> su # Enter the root password
username@localhost> sysctl -w net.ipv4.tcp_sack=0
With this configuration, the SSH transfer of huge-sized file will stall occasionally with every short period of less than 1 second and then recover automatically. That means the simple cumulative acknowledgement scheme of TCP is robust enough.
FYI: There are many other suggestions through the internet as listed below (unfortunately, non of them worked on my machine):
1. Eliminating all the DROP rules for port 22 inside the iptables.
2. Turning off SuSEfirewall2.
3. Limiting the bandwidth by:
username@localhost> scp -l 2000
4. Changing the MTU of NIC by:
username@localhost> ifconfig eth0 mtu xxx
5. Increasing the queue for transmission by
username@localhost> ifconfig eth0 txqueuelen 2000
6. Tuning TCP performance by
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.netdev_max_backlog=2500
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_timestamps=0
7. Turning off the buggy TCP segmentation offload by
username@localhost> ethtool -K eth0 tso off
8. Compressing the files being transfered by
username@localhost> scp -C
9. Using pipe and std io to avoid possible "scp" huge file
limitation by
username@localhost> cat localfile | ssh ravana cat ">" remotefile
Or
username@localhost> tar cf - . | ssh ravana tar xvpf -
10. Clamping MSS by
username@localhost> iptables -I FORWARD 1 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu"

====for my company, I used the following over 2M lease line link

scp -l 1500 VMware-server-1.0.5-80187.i386.rpm 1.2.3.4:/tmp

4 comments:

boes said...

This problem can be solved by adding a line to the iptables configuration:

-A RH-Firewall-1-INPUT -p tcp -m state --state INVALID -m tcp --sport 22 -j ACCEPT

See https://bugzilla.redhat.com/show_bug.cgi?id=161898 for details.

Andrew LeTourneau www.CenterOrbit.com said...

I was experiencing this stall problem very often on my work server. I tried every solution you had to offer with no luck. I did find out the problem though, my server had a few virtuals which essential sucked up all system RAM, so as soon as a transfer started it quickly choked the transfer and caused stalls. After limiting the virtuals RAM amount so the host OS didnt need to page so actively I had no more stall problems and transfer rates doubled. I figured I would post a comment so if anyone had my problem, they would find this fix! By the way thank you so much for this post, it's posts like these that really help troubleshoot big problems, you have helped many. Thank you

aalia lyon said...

That’s a nice blog , this blog helps you, please check this url and solve your windows 7 related problem.
windows firewall error 1068 windows 7
Thanks
Aalia lyon

Chan PK said...

This is the simplest, clearest explanation of this topic I have found. Thanks!
linux scp