Proposals with Learning and Knowledge

From: Zijian Ren (zijian.ren@gmail.com)
Date: Mon 29 Nov 2004 - 23:09:20 GMT

Previous message: Newton Shoemaker: "Instead of the centurion,"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello everyone,

I propose the rescue simulation competition procedures for team's learning
and knowledge abilities.

A. Competition Setting for Learning
To demonstrate the learning performance, there should have two phases:
training and testing. Rescue settings between training and testing
should have inherent relations.

I propose several training and testing scenarios:
1. Training and testing use the exactly same map setting, which is a
popular scenario. Set a fix number of runs (such as 10).

2. Training and testing use different parts of the same map.
For example, there are a full map A and two half maps B1, B2 and four
quadruple maps C1, C2, C3, C4.
a) Training uses A and testing uses B1, C2, etc.
b) Training use B1, C3 and testing uses A, B2, C1, etc.

3. Training and testing use the same city map with different settings:
number of rescue agents, fire points, etc.

4. Training and testing use the same disaster type with different city maps.
For example, one disaster type may be "few fires and severe collapse"
and another may be "intense fires and slight collapse."

B. Performance Metric
1. Teams are judged by average score of total, city or disaster type.
For example, if there are 3 city maps and 4 disaster types, a total of
12 tests can be used.
Teams can be ranked by average of all 12 tests, average of city 1 to
3, and average of disaster type I to IV.

2. Compare the initial (before learning) and final (after learning) score.
This method is mostly used with the same training and test setting.
There is a drawback in competition: a team may intentionally worsen
the initial performance to gain more improvement.

3. Subjective evaluation of learned knowledge.

C. Knowledge with Software Implementation
The purpose of learning is to acquire knowledge but how to store knowledge?
With software programs, a convenient way to store knowledge may be
disk files. Knowledge file may be one file "All_Knowledge.txt" for all
agents or several files corresponding to different agents such as
"FireBrigade_Knowledge.txt". The file format is preferably
human-readable and could be understood by other teams or even general
public.
GUIDELINE: these files CANNOT be used as agent communication method
during one run (300 steps).

Below is an example of fire brigade's knowledge in my mind:
"FireBrigade_Knowledge.txt"
-----------------------------------------------------------------------------------------------
Amount of water refill = 3000 in disaster type I.
Amount of water refill = 2000 in early stage of disaster type II
(before cycle 30).
Amount of water refill = 6000 in late stage of disaster type II (after
cycle 100).

if (number of active FB/number of fires > 4) then try to put off all fires.
if (number of active FB/number of fires < 1/3) then try to stop fire
spread or emphasis on helping ambulance teams (burning buildings where
ambulance teams are rescuing persons have the higher priority).

If the blockade is slight, the principle of path planning is the shortest path.
If the blockade is severe, the principle of path planning is to use
broader roads as much as possible, even this path may be a little
longer.

/*If the collapse is severe and there are few fires, messages with
ambulance teams have more transferring weights within limited
communication capacities. */
weight of FB messages = 0.2, weight of AT messages = 0.6, and weight
of PF messages = 0.2 (total weight is 1.0)

/*If the blockade is severe, messages with police forces have more
transferring weights within limited communication capacities. */
weight of FB messages = 0.3, weight of AT messages = 0.2, and weight
of PF messages = 0.5 (total weight is 1.0)
________________________________________________________________________

D. Online Learning during One Run
Learning (knowledge acquisition) in above situations is mostly between
runs. In current run, agents use knowledge acquired in past runs.
How about agents learn and improve performance during one run? In my
opinion, 300 steps in one run may be too short. One run with 1000,
3000 or more steps may be more suitable for online learning during one
run.

Zijian Ren
Phoenix Team
Email: zijian.ren@gmail.com
URL: http://www.geocities.com/zijian_ren

Previous message: Newton Shoemaker: "Instead of the centurion,"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.3 : Mon 29 Nov 2004 - 23:09:58 GMT