IBM Torcs AI racing competition
Hello readers!
Me and my friend Patryk are taking part in the 2026 edition of IBM AI Racing Competition.
The goal of it is to develop an AI agent capable of driving a car in Torcs simulator on "Corkscrew" track and perform a full lap.
For our solution, we trained an TD3 RL neural network agent to drive the car.
Reinforcement Learning networks can control things and learn thru points - in a nutshell, they recieve positive points (reward) for doing something correctly, and negative points (punishment) for doing something incorrectly. They learn how to behave to maximize the reward. Of course, they lack understanding of purpose, so we have to program constrains and how the point system works ourselves. It is challenging, as they like to find workarounds to get free points or stop losing them. If the grading system is designed wrong, the neural network may end up learning to quickly fail or never do anything, for example.
First, we pre-trained our network to drive thru the whole circuit without falling off-track from start to finish, and reduced maximum speed to a low one. After many hours of trial and error, TD3 learned to finish a lap.
Since it was driving very chaotically, next fine-tuning session focused on stability. The neural network learned to drive straight instead of turning all the time.
For next fine-tuning, we unlocked the max speed and implemented automatic gear switching. The AI model learned how to control speed, but became less effective in finishing a lap.
Final fine-tuning restored its ability to finish the whole track. The AI car could finally finish a whole race!
Since some random failures were still occuring, we decided to add a safety envelope for the AI system in form of a deterministic fallback. If the neural network stopped responding or the car fell off the track and the AI was unable to return back onto the road, the deterministic program takes control until the AI can drive again.
We used IBM Granite LLM model, more specifically Granite 4.0 H 32B, to help us understand how Torcs works, optimize code and consult solutions and grading system for the TD3.
We are thankful for the opportunity to take part in the competition. We wish best of luck to all other participants, and hope our solution will perform well.
Comments
Post a Comment