Re: TCP-UDP demon

From: Mahdi Milani Fard (m.milanifard@ece.ut.ac.ir)
Date: Thu 02 Sep 2004 - 07:27:36 GMT


Hi,

Well if you have buffer overflows that means the module is not reading
from the buffer fast enough to get all the data recived on the socket.
Even if it goes on TCP you still can not get all the data due to the slow
module. You can increase the UDP buffer, but even with that, the problem
would not be solved (please do change the buffer size and see if it makes
any difference). The only thing that would change in the TCP
commiunication is that you get the failaure masseges. But there is a
problem here. TCP commiunication retries to send the packets if the dest
computer does not comfirm that the packets are recived. If it fails for a
few time, then it sends a faliure message to the sender. But what can you
do if you find out that in 5 cycles ago a message was lost? You have the
garantee that even if the packet is not recived, you'll get a faliure
messaage. But it doesn't mean that you can do anything useful with that.
The faliure message might be too late.

I believe we should be working to find the "real" problem here. As someone
said, if the kernel and agent modules run on the same machine, no packet
losts occurs which means that this is a network commiunication error. And
there are those who claim that the problem is with the buffer overflow. I
believe it'a better to find the problem rather than comming up with a
solution (TCP, I mean) which might not be of much use. It's easier to
trace the problem on a TCP based system, but what if the TCP does not
change anything at all?

Well, I wanted to start a base in python. Python has many potentials that
seems to nicely fit the needs for a base. As for YabAPI there are many
usages of lists, properties, conditions and actions, and all of these can
be easily implemented in python with much less labor than it would take
for Java or C++. But as I'm a newbie here, I don't think I can do that by
myself. We are a new team here in UT and it's a bit difficult for us to
start a base from scratch. So if anyone is interested and wants to help,
I'll be happy to start the base with the help.

--Millenarian

> Hi!
>
> I brought this TCP subject to the mailing-list some weeks ago, and I
> will try to explain the problem the best I can (english is not my
> native language).
>
> I am developing a client-base for the RoboCup Rescue in the LISP
> language. The main reason is that LISP is a very, very powerful
> language, with features you can not find in other popular languages,
> and it's really easy and fast to develop agents with a very interesting
> behaviour in this language. I'm sure many of you do not agree with me,
> or may agree, but even so, they prefer other languages for some reason.
> That's OK, and I think that having more choice on the base language is
> a good thing. Someone asked about a Python base, and I think that's
> great.
>
> Because of this, I had to carefully analize the protocol
> specification, and it's implementation in Java (because, unfortunately,
> the documentation is not updated, and some modification were only
> reflected in the code, not the docs). I found out some bad things. For
> example, the Java implementation doesn't check a lot of stuff (it
> doesn't even check the magic number, so a "strange" packet may arrive
> to the machine, on the same port, and the program may simply crash).
> But, worst of all, the protocol does NOT work. More exactly, its
> specification is not correct, so **there is no guarantee that it will
> work in any situation**. Obviously, if you have very fast machines, and
> a network segment just for your usage, the problem might never show up.
> But it's there, and, as Murphy well said, it will hit you on the worst
> possible time (probably in the middle of RoboCup 2005!).
>
> As I am developing the LISP kernel on my personal notebook (an Apple
> Powerbook G3 @ 500 Mhz), I cannot say that my machine is fast, for
> todays standards. Even worst, my server is a Pentium 4 @ 3 Ghz, so it
> pushes information very fast, and the powerbook cannot handle it. I
> knew that I *could* loose packets, but anyway, I implemented most of
> the LISP base. And, surprise, surprise, I'm loosing packets. Right on
> the first message kernel sends to the agent, with all the world
> information. It's about 230 packets, and half of them do not arrive. As
> you described, they arrive to the network card, but not to the process.
>
> The main problem is that this is supposed to happen! UDP protocol
> does not offer you ANY guaratee. Even on the same machine, you simply
> DO NOT HAVE THE GUARANTEE that the information will arrive ok. Of
> course, in near-optimal conditions, that happens most of the time. But
> it's pure luck. And I think RoboCup Rescue should not be based on luck.
> If anyone wants to implement a protocol on top of UDP, that protocol
> MUST be fault-tolerant. That means the protocol MUST be ready to
> recover from failures that may include "strange" packets arriving
> (packets from other computer, not expected), packets not arriving in
> the right order, repeated packets, and lost packets. It's obvious that
> the RocoCup protocol does not support some of this features, so it
> cannot be done over UDP, because there are no guarantees that it will
> work (and as you may see, several people are having problems).
>
> This is why I think that building a UDP-TCP bridge is no solution at
> all (in the end, it will make the problem worse, because it's another
> process slowing down the computer), and it will be a lot of work for
> nothing. If we are decided to make a good solution, we have to improove
> it's quality, and build a good working base. We should NOT make hacks,
> that solve nothing, require work, and that may delay the right solution
> to be made (because, if people are busy doing the hack, they cannot be
> doing the "good thing").
>
> I hope I have made my point clear. Is there is something that you do
> not understand, due to my english, or some other reason, feel free to
> send mail.
>
> Yours
>
> Miguel Arroz
>
> On 1 de set de 2004, at 8:00, Cameron Skinner wrote:
>
>> I'm not sure if the TCP-UDP bridge idea will fix the problem or not,
>> largely because no-one seems to know exactly where the problem is. For
>> example, I used tcpdump to track all packets going between our
>> simulator
>> machine and our agent machine. As far as I could see every packet that
>> was
>> sent did arrive at the agent machine's network card, but some did not
>> make
>> it to the agent processes. This suggests a buffer overflow, or
>> possibly malformed packets. On the other hand, when both the
>> simulators and the agents are run on one machine there are no lost
>> packets - which suggests a
>> network problem rather than an OS buffer problem.
>>
>> In short, we do not know what the problem is. Perhaps someone
>> (preferably
>> the person who suggested the idea) could implement the TCP-UDP bridge
>> and
>> let us all know how it performs.
>>
>> In the meantime, do we have any volunteers to take responsibility for
>> migrating the kernel and simulators to TCP? Come to think of it, a
>> good first step would be to create a robocup rescue library in C. I've
>> just added a module to the CVS repository on sourceforge called
>> librescue - feel free to start writing stuff and submit it via the
>> sourceforge web page (only the Technical Committee have commit rights
>> on the CVS repository).
>>
>> Cameron Skinner
>> TC Chair
>>
>> --
>> Cameron Skinner
>> Artificial Intelligence Group
>> Department of Computer Science
>> The University of Auckland
>>
>> email: cam@cs.auckland.ac.nz
>> phone: +64 9 3737599 x82924
>>
>>
>
> "I felt like putting a bullet between
> the eyes of every Panda that wouldn't
> scr*w to save its species." -- Fight Club
>
> Miguel Arroz - arroz@guiamac.com - http://www.guiamac.com



This archive was generated by hypermail 2.1.3 : Thu 02 Sep 2004 - 07:45:34 GMT