[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ADSL MTU mystery - partial (complete?) solution



Shachar,
The problem you described gave us a lot of troubles ever since the ADSL
service started. A few weeks ago I was able to make a complete study and
published a detailed doc describing the problem, the cause and the correct
client side configuration that can bypass it.
see:
http://damyen.technion.ac.il/~dani/adsl-mtu.txt

It is a pity that israeli ISP concentrate on marketing and not on
technical skills. The problem of "black holing" due to the path maximum
MTU discovery is known and documented in a special rfc, so there is no
excuse for any ISP who causes it.

Dani

On Fri, 14 Sep 2001, Shachar Shemesh wrote:

> I am CCing this email to the bezeq intl. support email. The specific
> problem was solved for the person who was involved, but perhaps they
> will choose to fix the overall problem.
>
> I have had a call from a good friend, who is connecting to the Internet
> using a Linux box doing IPChains masquarading, and a dial-up (56K). No
> ADSL, or any other fancy stuff.
>
> It turns out that, starting two days ago, whenever a bigger than trivial
> mail message arrives for him, outlook express hangs when trying to D/L
> it using POP3. Yes, they are hosting a W2K machine behind the Linux gateway.
>
> SSHed into his box, and spent the next hour and a half or so trying to
> figure out what was going on. tcpdump proved to be just useful enough to
> be able to capture the packets into a file, which I then scped into my
> machine, and analyzed with ethereal.
>
> It turns out that the pop3 server (mail.bezeqint.net) was sending
> packets with the "Don't Fragment" bit set. So far - no great suprise
> (PMTU discovery). It also turned out that packets amounting to 1500
> bytes were dropped. I know this because I did receive the packet
> following the 1500 bytes dropped packet, and I could calculate it's size
> by the sequence numbers delta.
>
> Packets as big as 1440 bytes seemed to go through with no problem.
>
> My conclusion was that the mail server was configured to use a local MTU
> of 1500 bytes. One of the routers along the path between the POP server
> and my friend's IP had a lower MTU (if anyone can guess why?), and since
> the packets were marked "DF", dropped it. A router (possibly the same
> one) was configured to block (probably) all ICMPs, and so the
> "Fragmantation Needed" ICMP never reached the POP server, which simply
> tried again and again to retransmit.
>
> Has I had the desire, I could have found out exactly which router it was
> that had the MTU lower, and which blocked the ICMPs. I don't think it's
> my job, however. I have notified bezeq intl. by phone of the problem
> (more details later), as well as CCing them on this email, so they
> should be able to solve the problem on their own.
>
> Two points of interest are:
> 1. How come no "legitimate", i.e. - Windows only - user came across this
> problem.
> 2. How to solve the problem for my friend.
>
> 2 seemed easier. We all heard of this problem before, and know that you
> need to set the MTU lower. Doing this on the Linux gateway indeed
> allowed us to telnet to port 110, and D/L the message. It did not allow
> the window machine, however. Why? Why would lowering the MTU solve the
> problem, as the problem was for packets coming from the server. It
> appears that setting the MTU for out machine shouldn't affect the
> problem for incoming packets.
>
> The answer lies in a field called "MSS", or "Maximum Segment Size". This
> is negotiated between the hosts during connection establishment. Each
> side tells the other one not to send packets bigger than X. The value
> each host chooses for it's advertised MSS is the MTU for the same
> interface!! Lowering the MTU on the Linux machine caused it to send out
> a lower MSS, which meant that the packets never required fragmantation,
> and the problem was bypassed.
>
> This also explains why lowering the MTU on the linux's interface didn't
> help clients behind it. Lowering the MTU on the Linux's side indeed
> caused hosts behind it to lower the chunk size of outgoing packets, but
> did not affect the MSS negotiated at startup, and therefor did not work
> around the configuration bug in the ISP's routers.
>
> As for question number 2 - why don't Windows machine suffer from the
> same problem? The answer is that the default MTU for a PPP connection on
> Windows is 512 bytes.
>
> I called Bezeq International on 20 past midnight. I waited approx. 10
> minutes on hold. At the end I was answered by a nice girl. To save me a
> lot of explaining I started by asking her whether she knew what "Don't
> Fragment", "Fragments" and "MSS" meant. She started guessing about the
> fragmentation neede thing, and I told her that I wanted to report a
> router misconfiguration, and she would save me time if she transferred
> this to someone who could actually fix it.
>
> About 5 minutes later a technician called "Assi" called me back. it took
> him a little while of totally not following me, and then he said "ok,
> we'll take care of it". Whether he actually understood me, or just got
> tired of me is left as an exercise for the readers.
>
> Summery:
> A. The connectivity problems people have been expriencing are a result
> of routers dropping ICMPs, and (possibly other) routers needing smaller
> MTUs. This is a configuration problem at the ISP, and is not a result of
> misconfiguration on your end (ADSL, NAT, or otherwise). Some ISPs (and
> Bezeq Intl does, sadly, fall under that category) won't recognize this.
> B. Luckily, you can work around this problem by lowering the MTU on ALL
> MACHINES that participate in the communication. This works not because
> the MTU is too high, but because the MSS is taken from the MTU. It will
> therefor not help to lower the MTU only on the gateway, or to change MTU
> after the connection is already established.
> C. I tried to contact Mulix on IRC to find out how to lower the MTU on
> Windows 2000. While he didn't know the answer to that one, he was able
> to tell me that 2.4.4 and higher is capable of rewriting the MSS (I
> think that's what it means) on packets going through. There is also a
> module for 2.2 that does the same (STFW for clamp_mss).
> D. There is a (very incomplete) KB article at Microsoft's that explains
> how to change Window's MTU. The very short summery is "change the
> registry at (for Nt or 2000)
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces
> Find the right interface (by the IP address), and add a DWORD type value
> called "MTU". Set it for the right MTU for that interface, and reboot
> the machine (this is Windows, remember?). As usual - apply at your own
> risk, I will not be liable blah blah blah.
> E. ISP's support can be downright MEAN when they want to. Bezeq's Intl
> asked my friend's mother for her WINDOWS REGISTRATION KEY!!!! What
> possible relevance can it have - noone knows.
>
> I hope this helps. It certanly shed some light on the mystery for me. I
> believe the offending router drops all ICMP packets, of any type.
> traceroute out of the machine miracolously siezes to return replies two
> hops away from the machine.
>
> The problem appeared all of a sudden, about two days ago. My first
> suggestion was that traffice to the POP server was directed through
> routers residing in downtown new-york, in the world trade center's tower
> buildings. And yet, noone at Bezeq Intl. was in a position to know
> anything about it.
>
>             Shachar
>
>
>
> =================================================================
> To unsubscribe, send mail to linux-il-request@linux.org.il with
> the word "unsubscribe" in the message body, e.g., run the command
> echo unsubscribe | mail linux-il-request@linux.org.il
>


=================================================================
To unsubscribe, send mail to linux-il-request@linux.org.il with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail linux-il-request@linux.org.il