Re: TCP-UDP demon

From: Miguel Arroz (arroz@guiamac.com)
Date: Wed 01 Sep 2004 - 23:03:07 GMT


Hi!

   I brought this TCP subject to the mailing-list some weeks ago, and I
will try to explain the problem the best I can (english is not my
native language).

   I am developing a client-base for the RoboCup Rescue in the LISP
language. The main reason is that LISP is a very, very powerful
language, with features you can not find in other popular languages,
and it's really easy and fast to develop agents with a very interesting
behaviour in this language. I'm sure many of you do not agree with me,
or may agree, but even so, they prefer other languages for some reason.
That's OK, and I think that having more choice on the base language is
a good thing. Someone asked about a Python base, and I think that's
great.

   Because of this, I had to carefully analize the protocol
specification, and it's implementation in Java (because, unfortunately,
the documentation is not updated, and some modification were only
reflected in the code, not the docs). I found out some bad things. For
example, the Java implementation doesn't check a lot of stuff (it
doesn't even check the magic number, so a "strange" packet may arrive
to the machine, on the same port, and the program may simply crash).
But, worst of all, the protocol does NOT work. More exactly, its
specification is not correct, so **there is no guarantee that it will
work in any situation**. Obviously, if you have very fast machines, and
a network segment just for your usage, the problem might never show up.
But it's there, and, as Murphy well said, it will hit you on the worst
possible time (probably in the middle of RoboCup 2005!).

   As I am developing the LISP kernel on my personal notebook (an Apple
Powerbook G3 @ 500 Mhz), I cannot say that my machine is fast, for
todays standards. Even worst, my server is a Pentium 4 @ 3 Ghz, so it
pushes information very fast, and the powerbook cannot handle it. I
knew that I *could* loose packets, but anyway, I implemented most of
the LISP base. And, surprise, surprise, I'm loosing packets. Right on
the first message kernel sends to the agent, with all the world
information. It's about 230 packets, and half of them do not arrive. As
you described, they arrive to the network card, but not to the process.

   The main problem is that this is supposed to happen! UDP protocol
does not offer you ANY guaratee. Even on the same machine, you simply
DO NOT HAVE THE GUARANTEE that the information will arrive ok. Of
course, in near-optimal conditions, that happens most of the time. But
it's pure luck. And I think RoboCup Rescue should not be based on luck.
If anyone wants to implement a protocol on top of UDP, that protocol
MUST be fault-tolerant. That means the protocol MUST be ready to
recover from failures that may include "strange" packets arriving
(packets from other computer, not expected), packets not arriving in
the right order, repeated packets, and lost packets. It's obvious that
the RocoCup protocol does not support some of this features, so it
cannot be done over UDP, because there are no guarantees that it will
work (and as you may see, several people are having problems).

   This is why I think that building a UDP-TCP bridge is no solution at
all (in the end, it will make the problem worse, because it's another
process slowing down the computer), and it will be a lot of work for
nothing. If we are decided to make a good solution, we have to improove
it's quality, and build a good working base. We should NOT make hacks,
that solve nothing, require work, and that may delay the right solution
to be made (because, if people are busy doing the hack, they cannot be
doing the "good thing").

   I hope I have made my point clear. Is there is something that you do
not understand, due to my english, or some other reason, feel free to
send mail.

   Yours

Miguel Arroz

On 1 de set de 2004, at 8:00, Cameron Skinner wrote:

> I'm not sure if the TCP-UDP bridge idea will fix the problem or not,
> largely because no-one seems to know exactly where the problem is. For
> example, I used tcpdump to track all packets going between our
> simulator
> machine and our agent machine. As far as I could see every packet that
> was
> sent did arrive at the agent machine's network card, but some did not
> make
> it to the agent processes. This suggests a buffer overflow, or possibly
> malformed packets. On the other hand, when both the simulators and the
> agents are run on one machine there are no lost packets - which
> suggests a
> network problem rather than an OS buffer problem.
>
> In short, we do not know what the problem is. Perhaps someone
> (preferably
> the person who suggested the idea) could implement the TCP-UDP bridge
> and
> let us all know how it performs.
>
> In the meantime, do we have any volunteers to take responsibility for
> migrating the kernel and simulators to TCP? Come to think of it, a good
> first step would be to create a robocup rescue library in C. I've just
> added a module to the CVS repository on sourceforge called librescue -
> feel free to start writing stuff and submit it via the sourceforge web
> page (only the Technical Committee have commit rights on the CVS
> repository).
>
> Cameron Skinner
> TC Chair
>
> --
> Cameron Skinner
> Artificial Intelligence Group
> Department of Computer Science
> The University of Auckland
>
> email: cam@cs.auckland.ac.nz
> phone: +64 9 3737599 x82924
>
>

       "I felt like putting a bullet between
        the eyes of every Panda that wouldn't
        scr*w to save its species." -- Fight Club

   Miguel Arroz - arroz@guiamac.com - http://www.guiamac.com



This archive was generated by hypermail 2.1.3 : Wed 01 Sep 2004 - 23:03:32 GMT