colossus-developers Mailing List for Colossus (Page 17)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Long post coming up here as I catch up on the last weeks worth of 
discussions. So I apologize for the length as I ramble on answering some 
of the things discussed.

> 2) "learning" in terms of let it do some exercise, tweak parameters,
> redo same/similar exercise, if result better it was a good change
> otherwise not.
>
> The hard part here is, it's "hard" to "redo" same, because e.g. dice
> rolls will be different, opponent might act differently; not to mention
> the "skew" of it would learn to exploit "other Colossus AIs" weaknesses.
>
> Argh, the glibber between my ears is producing too many ideas again!

Actually this isn't going to be as bad as you think. In theory other 
than the dice rolls the AI vs AI should make the exact same moves or 
very close. This is the method I favor which I'll outline further down 
what I mean.

> Other than the "non-rondom dice" (which is a fixed sequence, and might
> be unfair nevertheless, depending on who draws how many numbers),
> another approach:
>
> "hits as per expectation":
>
> Let's say:
> 6 rolls for a 4-6, 3 hits.
> 9 rolls for a 6, 1+ 1/2. hm, 1.5
> 5 rolls for a 3 =>  5 x 2/3 = 10/3 = 3.33.
>
> Now, three options: round up, round down, "levelled". Levelled, round down or up what is closer, but keep that in mind and put it into calculation for next roll.

I agree with this too. Limit the randomness as much as possible to get 
reasonably deterministic dice rolls. What I would favor doing is for 
each 6 dice rolls you get EXACTLY 1 of each number. Then for fractional 
left overs you roll randomly no duplicates. So if for example a unit 
rolls 8 dice it would get 1,2,3,4,5,6 and then 2 numbers rolled randomly 
but if those 2 rolls were 5,5 it would re-roll that 2nd 5 so that no 
number could appear 3 times. That's seems to be as close to expected 
value as we can get

> What about reinforcement learning?
>
> I was thinking (but can't find an easy-to-use library) to use the
> creatures/terrain/whatever as the 'input vector', and to try to get a good
> output by:
>
> 1) creating an 'attack' objective for each of the Legion's creatures
> 2) creating a 'preserve' objective for each of the Legion's creatures
> 3) creating a 'destroy' objective for each of the opposite Legion's creatures
>
> and the 'output vector' would be the priorities for the objectives (and
> because the evaluation of each objective as a lot of parameters, we might
> throw a few of those in here as well).
>
> Then for a battle with no other objectives (such as preserving a Titan), a
> simplified final result could be the reward in the reinforcement system (using
> for instance a formula like "my points left minus the other points left").
>
> Then replaying the same battle over and over with randomized parameters could
> be used as a training set, with both 'good' and 'bad' results. The we'd see if
> the system can suggest a 'good' set of parameters for the battle.

Yes, I agree 100% with what Romain proposes here. This is the method I 
would use to train the AI.  Romain already has done a fantastic job with 
the battle simulator where he showed the results of his AI vs the others 
with identical units. That I believe can be used to do what we want.

Here's what I have been thinking about over the last week while reading 
these emails and reviewing the lecture notes There are a lot of 
different ways to do machine learning and the professor stressed MANY 
times that choosing the right one was probably the most important thing 
you could do in order to avoid wasting a lot of time so some up front 
simple examples to tell if you are on the right track was one of his 
recommendations. That way before you invest more time you know whether 
you have a reasonable chance to succeed.

I mentioned a couple of emails back about having a lot of variables to 
the neural network concept. I still think that's going to be a way to 
go. What I would want to try to do is get accurate value for every unit 
in the battle. This value would be calculated each time it was your 
action (so potentially 14 times in all, twice per turn). Here's how I 
would initially start with valuing each unit (this would be a function 
that would take a unit and return a value).

1) Base value: power*skill. So an angel would be 6*4=24. No surprises 
there. However, the base value should be modified downward as the unit 
takes damage so an Angel with 2 hits on it is now worth less than 24. 
How much less? I propose initially to make it scaling with the first hit 
removing the least amount of value and the last hit removing the most. 
So for example a Centaur is 3*4=12. Rather than each hit being worth 4 
(so that a 1 damage Centaur is worth 2*4=8 I suggest using 2,4,6. So a 1 
damage Centaur is still worth 4+6=10 and a 2 damage Centaur is still 
worth 6. I do it this way so that when a unit is down to it's last hits, 
applying those hits is worth a lot when deciding WHICH unit to strike 
(ie this helps the AI eliminate weak units). During testing we'll let 
the learning process figure out how much each hit is worth. So the Angel 
hits would be 1.5,2.5,3.5,4.5,5.5,6.5 and so on for other units.
2) Recruit Value: This is whether the unit can recruit anywhere at all 
(other than the basic 3 tower units). In the actual game it would have 
to check the caretaker stacks etc to see what remained to recruit, in 
the simulation we can manually set this to test scenarios. The recruit 
value would be equal to the best unit the unit in theory might be able 
to directly recruit in the future. This would take into account multiple 
units in this stack at the moment. So if for example a stack contained a 
single Giant, it's best recruit value is another Giant. If a stack 
contained 2 Giants the recruit value would be a Colossus. This is hard 
to value of course but again I suggest starting with something like 
(Level in Tree*Unit Value/2) so that higher level units are worth more.  
By level, I mean a Centaur is a L1, a Lion a L2, a Minotaur a L3, a 
Dragon a L4 and a Colossus a L5 so a Centaur would be worth 1*12/2=6, a 
Lion 2*15/2=15, a Minotaur 3*16/2=24 and so on.
3) Recruit Here: As in this hex where the battle is taking place. If the 
answer is yes, double the value obtained in (2).
4) Titan: If the unit is a Titan multiply it's value*5. Should be 
obviously this makes Titans prime targets and prime things to defend. 
This might need to be increased of course if it's not high enough.

Now we may need a few other things in this calculation. But this would 
be my initial crack at it. This should do what Romain wanted to do above 
(attack, preserve, destroy) by making units valuable. So in his example 
of 2 Ogres, 1 Troll, if we changed the simulation to occur on Hills for 
example (Where a Mino can be recruited by the Ogres) then the AI should 
try to preserve the Ogres since they are more valuable. We'll know how 
good a job the AI did by doing the following:

Calculate the value of the units at battle start.
Calculate the value of the units at battle end (all damage healed of 
course).
The closer you are to the initial value, the better you have done since 
the AI should attempt to save the most valuable units / kill the most 
valuable enemy units.

The goal would be to figure out the right values for each unit which 
means figuring out the value of each of those 4 parts of the 
calculation. The advantage of Romain's simulation is that we can simply 
tell it where the combat taking place and run a lot of combats to see if 
this is working and adjust the numbers as needed.

Once this part is done, the next step will be improving the movement on 
the board (though Romain's existing experimental AI may already be good 
enough if we get proper unit valuations).

Tim

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (9)	Nov (11)	Dec (18)
2002	Jan (68)	Feb (194)	Mar (75)	Apr (44)	May (48)	Jun (29)	Jul (60)	Aug (74)	Sep (12)	Oct (13)	Nov (30)	Dec (62)
2003	Jan (63)	Feb (28)	Mar (63)	Apr (27)	May (53)	Jun (8)	Jul (17)	Aug (2)	Sep (95)	Oct (28)	Nov (36)	Dec (24)
2004	Jan (92)	Feb (47)	Mar (43)	Apr (86)	May (64)	Jun (10)	Jul (4)	Aug (4)	Sep	Oct	Nov	Dec
2005	Jan (1)	Feb (4)	Mar (3)	Apr (5)	May	Jun	Jul (14)	Aug (3)	Sep	Oct	Nov	Dec (7)
2006	Jan (1)	Feb (4)	Mar (14)	Apr (22)	May (51)	Jun	Jul	Aug (6)	Sep	Oct	Nov (25)	Dec (1)
2007	Jan	Feb (7)	Mar (80)	Apr (27)	May (15)	Jun (6)	Jul (25)	Aug (1)	Sep (3)	Oct (17)	Nov (174)	Dec (176)
2008	Jan (355)	Feb (194)	Mar (5)	Apr (28)	May (49)	Jun	Jul (28)	Aug (61)	Sep (61)	Oct (49)	Nov (71)	Dec (2)
2009	Jan (12)	Feb (216)	Mar (299)	Apr (257)	May (324)	Jun (222)	Jul (103)	Aug (127)	Sep (72)	Oct (76)	Nov (2)	Dec (23)
2010	Jan (23)	Feb (11)	Mar (11)	Apr (112)	May (19)	Jun (37)	Jul (44)	Aug (25)	Sep (10)	Oct (4)	Nov (5)	Dec (25)
2011	Jan (44)	Feb (19)	Mar (18)	Apr (3)	May (14)	Jun (1)	Jul (22)	Aug (7)	Sep	Oct	Nov	Dec
2012	Jan (51)	Feb (42)	Mar (9)	Apr (9)	May (2)	Jun (29)	Jul (47)	Aug (5)	Sep	Oct (38)	Nov (33)	Dec (13)
2013	Jan	Feb (7)	Mar (9)	Apr	May	Jun	Jul (3)	Aug (2)	Sep (9)	Oct (22)	Nov (18)	Dec (7)
2014	Jan (2)	Feb	Mar (2)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb (5)	Mar	Apr (24)	May	Jun (18)	Jul (10)	Aug (21)	Sep	Oct	Nov (3)	Dec (1)
2016	Jan (1)	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2018	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2019	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2020	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

colossus-developers Mailing List for Colossus (Page 17)

colossus-developers — For discussion of design and coding issues