Week 6: Finding Shelter
- Abhijit Baruah
- Jul 5, 2022
- 2 min read
For this week I set out to try and teach the bot to take shelter during certain events. I decided to come up with the concept of an animal invasion in the world which would happen randomly in the world. During the event the agent had a fixed amount of time to reach the shelter and had to remain in the shelter till the event finished. If the agent failed to reach the shelter in time/left the shelter before the event finished it would die and the episode would reset.
This proved a harder task than I anticipated. At first I added a positive reward for every frame the agent was in the shelter during the animal invasion and added the position of the shelter to its observations.
On training the model for over 3 hours I observed that the model hit a plateau , i.e. the agent would collect food , deposit it and enter the shelter during the invasion but immediately exit it causing its run to end.
To counter this I added a negative reward for every frame the agent would move inside the shelter during the invasion, i.e. give it an incentive to stop if it found itself inside the shelter.
This led to another problem, now the agents would remain in the shelter for long periods of time even after the invasion thus diminishing its health and its subsequent chances of surviving.
To help solve the problem I tried adding a negative reward every 20 seconds once if the agent was in the shelter and there was no animal invasion, this also proved to be futile as now the agents were simply clueless if they had to stay or move.
Frustrated, I decided to do more research during which I found out that I would need additional methods of training on top of Reinforcement learning, the reason was because the chances of the agent's inputs leading to it stopping in the shelter the exact time during the invasion and not leaving until it was done and then driving the agent back to collecting food was very very slim. To improve training for next week I will try to implement imitation based learning using GAIL and behavioral cloning on top of RL to see if it yields better results. (sigh).
For more information about GAIL ,Behavioral Cloning in ml-agents and imitation learning :


Comments