Week 7 & 8 - Finding Shelter(Part Two)

Abhijit Baruah
Jul 19, 2022
2 min read

The issues mentioned in the week 6 blog post :-

https://baruahabhijit97.wixsite.com/website/post/week-6-finding-shelter

proved to be a very tough problem to solve.

The main issue was teaching the ML agent to essentially do "two tasks" at the same time, which is an ongoing topic in the ML agents community.

One possible solution was mentioned in this thread :-

https://forum.unity.com/threads/multi-task-reinforcement-learning-in-ml-agents.1005195/

Personally I did not want to implement either solution as the main goal of this project was to train "ONE" brain to do multiple things to survive as opposed to two different brains on the same agent.

I spent the majority of week 7 using Imitation learning AND behavioral cloning on top of the extrinsic reinforcement learning. On doing this a weird problem surfaced where in the agents would learn what to do very quickly and regress over time completely failing to get better. This was also an issue reported in the ML agents forum :-

https://github.com/Unity-Technologies/ml-agents/issues/5657 \

During week 8, I first focused on reducing my bloated observation space, in order to accomplish this I first removed the zone system mentioned in week 3 :-

https://baruahabhijit97.wixsite.com/website/post/week3-dynamic-world

I decided to pad the number 0 into an observation vector when the world had no food and tweaked the model to now ignore the observations in that vector if they were 0. This dropped my observation space from 62 to 22.

Additionally I now converted a lot of my observations to a ratio between 0 to 1 rather than just a bool that would only ever be 0 and 1.

This enabled my agent to "sense" the probability of the environment changing and react accordingly.

Finally I changed my model parameters to now use ONLY imitation learning on top of its extrinsic reinforcement learning.

This resulted in the impossible, the model now trained and slowly converged at the desired result in over 14+ hours of training (sigh). :-

https://video.wixstatic.com/video/01117d_5d635113ca444d8583a22f999da9851c/1080p/mp4/file.mp4

The yellow cube is the shelter where the agent has to spend some amount of time in during certain events in the environment which in this demonstration is indicated by the text - "ANIMAL ARRIVING", if the agent fails to reach the shelter in time or leaves the shelter before the event is finished, it fails to survive.

The images above show that training peaked somewhere around 18M iterations which correlates perfectly in the reduction in the imitation learning loss.

For the next week I am going to focus on adding more visual polish for the environment events needed to take shelter!

Programmer

Week 7 & 8 - Finding Shelter(Part Two)

Recent Posts

Comments