[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
ADSL MTU mystery - partial (complete?) solution
- To: linux-il <linux-il(at-nospam)linux.org.il>, support(at-nospam)bezeqint.net
- Subject: ADSL MTU mystery - partial (complete?) solution
- From: Shachar Shemesh <linuxil(at-nospam)consumer.org.il>
- Date: Fri, 14 Sep 2001 01:30:18 +0300
- Delivered-To: linux.org.il-linux-il@linux.org.il
- Sender: linux-il-bounce(at-nospam)cs.huji.ac.il
- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3+) Gecko/20010823
I am CCing this email to the bezeq intl. support email. The specific
problem was solved for the person who was involved, but perhaps they
will choose to fix the overall problem.
I have had a call from a good friend, who is connecting to the Internet
using a Linux box doing IPChains masquarading, and a dial-up (56K). No
ADSL, or any other fancy stuff.
It turns out that, starting two days ago, whenever a bigger than trivial
mail message arrives for him, outlook express hangs when trying to D/L
it using POP3. Yes, they are hosting a W2K machine behind the Linux gateway.
SSHed into his box, and spent the next hour and a half or so trying to
figure out what was going on. tcpdump proved to be just useful enough to
be able to capture the packets into a file, which I then scped into my
machine, and analyzed with ethereal.
It turns out that the pop3 server (mail.bezeqint.net) was sending
packets with the "Don't Fragment" bit set. So far - no great suprise
(PMTU discovery). It also turned out that packets amounting to 1500
bytes were dropped. I know this because I did receive the packet
following the 1500 bytes dropped packet, and I could calculate it's size
by the sequence numbers delta.
Packets as big as 1440 bytes seemed to go through with no problem.
My conclusion was that the mail server was configured to use a local MTU
of 1500 bytes. One of the routers along the path between the POP server
and my friend's IP had a lower MTU (if anyone can guess why?), and since
the packets were marked "DF", dropped it. A router (possibly the same
one) was configured to block (probably) all ICMPs, and so the
"Fragmantation Needed" ICMP never reached the POP server, which simply
tried again and again to retransmit.
Has I had the desire, I could have found out exactly which router it was
that had the MTU lower, and which blocked the ICMPs. I don't think it's
my job, however. I have notified bezeq intl. by phone of the problem
(more details later), as well as CCing them on this email, so they
should be able to solve the problem on their own.
Two points of interest are:
1. How come no "legitimate", i.e. - Windows only - user came across this
problem.
2. How to solve the problem for my friend.
2 seemed easier. We all heard of this problem before, and know that you
need to set the MTU lower. Doing this on the Linux gateway indeed
allowed us to telnet to port 110, and D/L the message. It did not allow
the window machine, however. Why? Why would lowering the MTU solve the
problem, as the problem was for packets coming from the server. It
appears that setting the MTU for out machine shouldn't affect the
problem for incoming packets.
The answer lies in a field called "MSS", or "Maximum Segment Size". This
is negotiated between the hosts during connection establishment. Each
side tells the other one not to send packets bigger than X. The value
each host chooses for it's advertised MSS is the MTU for the same
interface!! Lowering the MTU on the Linux machine caused it to send out
a lower MSS, which meant that the packets never required fragmantation,
and the problem was bypassed.
This also explains why lowering the MTU on the linux's interface didn't
help clients behind it. Lowering the MTU on the Linux's side indeed
caused hosts behind it to lower the chunk size of outgoing packets, but
did not affect the MSS negotiated at startup, and therefor did not work
around the configuration bug in the ISP's routers.
As for question number 2 - why don't Windows machine suffer from the
same problem? The answer is that the default MTU for a PPP connection on
Windows is 512 bytes.
I called Bezeq International on 20 past midnight. I waited approx. 10
minutes on hold. At the end I was answered by a nice girl. To save me a
lot of explaining I started by asking her whether she knew what "Don't
Fragment", "Fragments" and "MSS" meant. She started guessing about the
fragmentation neede thing, and I told her that I wanted to report a
router misconfiguration, and she would save me time if she transferred
this to someone who could actually fix it.
About 5 minutes later a technician called "Assi" called me back. it took
him a little while of totally not following me, and then he said "ok,
we'll take care of it". Whether he actually understood me, or just got
tired of me is left as an exercise for the readers.
Summery:
A. The connectivity problems people have been expriencing are a result
of routers dropping ICMPs, and (possibly other) routers needing smaller
MTUs. This is a configuration problem at the ISP, and is not a result of
misconfiguration on your end (ADSL, NAT, or otherwise). Some ISPs (and
Bezeq Intl does, sadly, fall under that category) won't recognize this.
B. Luckily, you can work around this problem by lowering the MTU on ALL
MACHINES that participate in the communication. This works not because
the MTU is too high, but because the MSS is taken from the MTU. It will
therefor not help to lower the MTU only on the gateway, or to change MTU
after the connection is already established.
C. I tried to contact Mulix on IRC to find out how to lower the MTU on
Windows 2000. While he didn't know the answer to that one, he was able
to tell me that 2.4.4 and higher is capable of rewriting the MSS (I
think that's what it means) on packets going through. There is also a
module for 2.2 that does the same (STFW for clamp_mss).
D. There is a (very incomplete) KB article at Microsoft's that explains
how to change Window's MTU. The very short summery is "change the
registry at (for Nt or 2000)
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces
Find the right interface (by the IP address), and add a DWORD type value
called "MTU". Set it for the right MTU for that interface, and reboot
the machine (this is Windows, remember?). As usual - apply at your own
risk, I will not be liable blah blah blah.
E. ISP's support can be downright MEAN when they want to. Bezeq's Intl
asked my friend's mother for her WINDOWS REGISTRATION KEY!!!! What
possible relevance can it have - noone knows.
I hope this helps. It certanly shed some light on the mystery for me. I
believe the offending router drops all ICMP packets, of any type.
traceroute out of the machine miracolously siezes to return replies two
hops away from the machine.
The problem appeared all of a sudden, about two days ago. My first
suggestion was that traffice to the POP server was directed through
routers residing in downtown new-york, in the world trade center's tower
buildings. And yet, noone at Bezeq Intl. was in a position to know
anything about it.
Shachar
=================================================================
To unsubscribe, send mail to linux-il-request@linux.org.il with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail linux-il-request@linux.org.il