Machine Learning AI Part 2


Well it's been a long time coming but I've finally pulled my socks up and written a part 2 to my machine learning post. Firstly, I'd like to address the video that I have attached to this post - chiefly, how I am demonstrating that the MLAgents AI within Unity has learned to be as quick as I am...in less than three hours.

Yup. That's pretty cool, right? As a game designer, I have spent weeks designing classical AI implementations to solve problems that are far less complex than the issues that these race-driver-spaceship-pilot AI guys have to solve, all whilst achieving much less convincing results. So, what's the magic sauce? Well, it wasn't as simple as just training these guys for a few hours...I might have guiltily misled you, oh gullible reader, for it took more like three weeks to actually work out an environment and set of hyperperameters that worked consistently. That said, once I'd done it, it really did take just three hours to build an AI that could race around faster or as fast as I can.

The key principle is to approach the problem in a similar way to how one would structure a classical AI: think of all of the problems that the AI will have to solve. In this case, I want my AI pilots to be able to fly around without touching the walls (much), without hitting each other (much) and faster or as fast as an average human player. can do all of these things (go me!) because I have eyes with which I can see the walls, players and AI bots, whilst I have an understanding that I need to race in the correct direction around this track and cross the finish line as quickly as I possibly can. Fortunately, we can simulate eyes for the AI bots by providing them with "Raycasters" (built in as part of the MLAgents toolkit - these scripts cause a number of raycasts to be fired out from a chosen point on your gameObject, in a circular or semi-circular pattern). These raycasts return information about the objects that they intersect, so, if we give the walls and other AI bots appropriate name tags, the AI can "see" these objects every time a raycast hits it. Cool! But you can't just give the AI that information and hope it learns what to do with it...

Intrinsically, the AI has no knowledge of what a race even is, let alone how to participate in one. Fortunately, we can show them in several ways. First of all, I created numerous checkpoint objects around the circuit and allowed the AI to see them with its raycasts, as well as feeding the AI vector information about the alignment of these checkpoints - i.e which way is forward. This brings on the kind-of tricky part, which is designing the reward/punishment scheme for the bots. I started by giving them a +1 reward for hitting the correct checkpoint and a -1 reward (punishment) for hitting the wrong one. This should teach them to go only in the correct direction. I gave them a -0.5 reward for hitting a wall or a player/AI, whilst giving them a -0.01 reward for each frame that they remain in contact with a wall for. With this,  I was ready to go!

They learned absolutely nothing. For hours I watched, my spirit slowly waning as my brave young pilots repeatedly headbutted walls, seemingly masochistic in their pursuit of punishment until finally, they came to rest in a state of absolute stillness - perhaps afraid to continue receiving punishment, perhaps satisfied that they had learned to simply stay out of trouble and avoid ambition - I felt like an abusive father who had crushed the hopes and dreams of his children. These guys were going to need some help.

Help was on its way, in due course, for it didn't take too much reading for me to find that the best way to teach these idiot children of mine to do something useful for once was to simply show them how. Oh! I can do that? Yes, I could. The MLAgents toolkit allows one to run a demonstration for the AI and include that as part of the learning process. I proceeded with many mindless laps of the training environment before being satisfied that only a terminally stupid machine could avoid gleaning what it was meant to do. Time to hit the learn button.

They did...better? Sometimes, they still liked to headbutt walls for minutes on end but sometimes, in glorious moments of near-sentience, they would complete a lap. Often, they would complete another lap before falling back into their favourite pastime of smashing into things and accruing as much punishment as they could. So what was I missing? What could I not see? Bingo! That was it - I wasn't the only blind one - my children couldn't see past the checkpoints! I gleefully explained to my partner that I had accidentally blinded my children and she congratulated me on working it out. Picture this: an AI agent fires multiple rays to gather information on what is around them. These rays intersect walls, other AI and checkpoints. When a ray hits one of these things, it terminates. Therefore, if an AI agent can see a checkpoint in front of it, it can't see if there is a wall on the other side. Solution? More eyes! Instead of grotesque, headbanging cyclopses, my children needed two eyes: one for seeing checkpoints & AI and one that only sees walls.

By the way, although this is a short story, I believe I was firmly into week two of doing this before I worked out the need for multiple sets of raycasters, so believe me that by this time I had actually began to consider these thick-as-pig-s**t AI idiots as my children. I only wanted what was best for them, so I dialed back the punishment a touch, gave them eyes for them to see walls and busily recorded newer, more in depth demonstrations before reading them a bedtime story and tucking them in for the night. Tomorrow, they would start big-school and I wanted nothing more than for them to be less fundamentally idiotic than they had thus far demonstrated themselves (and me, I guess) to be.

IT WORKED! Almost immediately, the improvements were there. Active wall avoidance, iterations on what was the fastest line to take around the circuit, avoiding crashes and racing side-by-side! Wow! But why are they driving in reverse now? OK, a small punishment for using reverse and...go. Better again but they are too cautious - what about a small, trickling punishment that happens every frame to encourage them to reach the next checkpoint faster in order to offset that punishment? Much like my home life as a child (tinyViolin.gif). It was really working now - every few hundred thousand "steps" (learning cycles), I was able to pause and export an updated brain. We had ok-ish brains, good brains, great brains and super-human brains - this should be enough to include several difficulty modes to challenge all sorts of players.

I had done it. Parenthood +1 to my imaginary RPG stats, and a huge sigh of relief that I could finally move on to another part of the project. I am seriously looking forward to uploading a new version of the game with these AI agents included and I hope that some of you find the time to give me some feedback regarding the way it plays. Thanks to anyone who has read this far and if you have, maybe you'd be so kind as to go just one step further and follow my dev journey on one of the social media sites linked on our website: www.spudcannongames.com

I promise to write gain, soon!

Bumbazoid

Get Oh! Those Brave Canyon Pilots

Download NowName your own price

Leave a comment

Log in with itch.io to leave a comment.