ISKRA Experiment 07_10_2024-1 - Maze
Hello everyone reading this!
Last time we were testing how ChatGPT can deal with a task of fighting a stationary enemy in Minecraft. In the long run, that research may benefit development of intelligent NPCs in games. But unlike an NPC which usually have full data on everything that happens in the game, the AI was given only how much a human Minecraft player can see. It was playing like a human player would.
In next experiment, I were supposed to continue creating the artifical Minecraft player, but I got a question from someone who read my previous post:
How AI would deal with a maze?
Curiosity and questions first, so let's find out.
But before we start, let me remind what Project ISKRA is about:
Project ISKRA is my research project supposed to see how generative AI can deal with practical tasks and control robots or digital bodies to accomplish objectives. The goal is to do that on default models, without fine-turning (teaching AI how to deal with specific tasks on its internal level) nor making AIs from scratch.
This experiment will not involve a robot nor a physical body (next itrations with for example 3D mazes may tho), but it will still check how AI can deal with the task of solving a maze, and what can be done to make it good in it.
Setup:
Let's first think how to interact with the maze.
Let's try drawing straight lines from point X_1 Y_1 to X_2 Y_2. I created a function which does that.
def draw_line(image, x1, y1, x2, y2):
d = ImageDraw.Draw(image)
d.line([(x1, y1), (x2, y2)], fill=(255,0,0), width=4)
image.save("log/"+str(time.time())+".jpg")
return image
This function will also save every version of the maze image to the hard drive. Our AI will be generating starting and ending points, which will be used by this function to draw the line. It will solve the maze step by step with such approach.
Let's define the AIs. We will use a similar approach as in Minecraft - Brain AI which will be solving the maze, and Movement AI which will be extracting the drawing commands, because it is just easier to use a GPT-3.5-turbo for that than write the text extraction from scratch.
Brain and Movement are the names for the AIs which I used in the Minecraft experiment. I will stick to them, even if Movement's task will not be related to moving anything now.
Let's now define the AI agents, I will use my own interface for interacting with the OpenAI API.
Here is code which defines them and also contains first versions of their SYSTEM messages (SYSTEM message is text which determines how AI is supposed to behave):
system_brain = '''You will recieve a picture of a maze. Solve it by getting from point A to B.
You can draw a straight red line on the maze image by providing start and end X and Y coordinates in pixels. Separate the values with |.
For example, to draw a line from point (100,200) to point (150,300) write:
100|200|150|300
Draw new lines one by one to solve the maze. Do only one step at a time.'''
system_movement = '''You will recieve a text, which will contain a string which determines how to draw a line.
For example, command to draw a line from point (100,200) to point (150,300) is:
100|200|150|300
Extract the command from text given to you. Write only the command, never write anything else.
'''
brain = ImgChatGPTAgent(api_key, 'gpt-4o-mini', 500, system_brain)
movement = ChatGPTAgent(api_key, 'gpt-3.5-turbo', 500, system_movement)
I added little more code to send the image to the AI, to see if they work. And apparently they do:
Now let's add a loop and drawing the line. The new code looks like this:
while True:
response = brain.GetResponseWithImg(image,"Continue solving the maze")
commands = movement.GetResponse(response)
print(time.time())
print("Response: "+response)
print("Commands: "+commands)
command_table = commands.split("|")
image = draw_line(image, int(command_table[0]), int(command_table[1]), int(command_table[2]), int(command_table[3]))
Let's now start the AI and see what happens.
Tests:
Sadly, after a longer while, we got something like this:
Let's try again with GPT-4o instead of GPT-4o-mini. GPT-4o is a more advanced model.
The result is sadly not good either, tho seems to make more sense.
Sadly, GPT-3.5-turbo answered with more than what it was supposed to crashing the program. Maybe the AI would eventually get to B, even if not by solving a maze.
"Commands: There seems to be an error in the input. Let's try again.
Move down:
400|520|400|600"
- GPT-3.5-turbo's (Movement's) response
Let's lower TEMPERATURE of Movement a little and try again. I will set it to 0.8, this decreases randomness making random tokens (words) less likely to appear. Fun fact: very high TEMPERATURE makes ChatGPT start speaking gibberish until token (word) limit ends. This is because end of whole message is a token by itself, and when tokens become very random due to high TEMPERATURE, the "I stopped generating text" invisible symbol (token) does not manage to appear. Lower TEMPERATURE makes AI respond in more generic manner, high makes AI more random and therefore more "creative" in writing, too high results in gibberish. If you want AI to reliably copy a text (like in our case now), I recommend low TEMPERATURE settings. Anyway, movement's definition is now:
movement = ChatGPTAgent(api_key, 'gpt-3.5-turbo', 500, system_movement, 0.8)
Let's run the code again now.
Sadly, it makes even less sense now:
Let's lower TEMPERATURE of Brain to a lower value, let's say 0.5. And try again. It's done analogically as with Movement.
The result is very similar as when the TEMPERATURE was default 1.
Let's try now with TEMPERATURE of 1.4. Hopefully it won't end up writing gibberish.
After giving it same amount of time, the result also did not make sense.
Maybe trying out different TEMPERATURES turned out pointless in the end, but it was nice to check how that setting will affect the program.
My personal conclusion is: it is not capable of pixel-perfect image analysis, this is why it heavily approximates starting and ending points to draw the lines.
It was of course never designed to do such tasks either.
Of couse a solution could be fine-turning (further training of the AI) into solving mazes, but let's try to do as much as we can on the default AI models available.
Can ChatGPT actually determine coordinates of anything in an image?:
Let's first see if GPT-4o or 4o-mini are pixel-perfect. To do that, I will first create a image:
This image I just drew in Gnome Paint (I am now working on my second, Linux laptop btw) contains two purple squares, one blue point, one green point and likely not visible but one red pixel too (around top left corner). The image is 300x300 pixels big.
Let's see if ChatGPT can tell exact positions of these points. Let's first try cheaper and faster GPT-4o-mini:
def draw_square(image, x, y, wall=40):
new_image = image.copy()
d = ImageDraw.Draw(new_image)
c_x = wall//2
c_y = wall//2
if x>c_x and y>c_y:
d.rectangle((x-c_x , y-c_y , x-c_x+wall , y-c_y+wall), fill=(255,0,0), width=1)
else:
d.rectangle((x , y , x+wall , y+wall), fill=(255,0,0), width=1)
new_image.save("log/"+str(time.time())+".jpg")
return new_image
current_x=230
current_y=100
new_image = draw_square(image, current_x, current_y)
#image.show()
system_brain = '''You will recieve a picture of a maze. Solve it by getting from point A to B.
You control a red square. Solve the maze by moving it from point A to B.
Control the red square by writing direction and then amount of pixels. Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Move only once per iteration. Never make multiple movements at once.
'''
system_movement = '''You will recieve a text, which will contain a letter and number, which indicate movement.
Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Extract the command from text given to you. Write only the command, never write anything else.
'''
brain = ImgChatGPTAgent(api_key, 'gpt-4o', 500, system_brain, 1.4)
movement = ChatGPTAgent(api_key, 'gpt-3.5-turbo', 500, system_movement, 0.8)
while True:
response = brain.GetResponseWithImg(new_image,"Continue solving the maze")
commands = movement.GetResponse(response)
print(time.time())
print("Response: "+response)
print("Commands: "+commands)
command_table = []
command_table.append(str(commands[0]).lower())
command_table.append(int(commands[1:]))
if command_table[0]=="w":
current_y = current_y - command_table[1]
elif command_table[0]=="s":
current_y = current_y + command_table[1]
elif command_table[0]=="a":
current_x = current_x - command_table[1]
elif command_table[0]=="d":
current_x = current_x + command_table[1]
else:
print("An error occured.")
quit()
new_image = draw_square(image, current_x, current_y)
while True:
response = brain.GetResponseWithImg(new_image,"Continue solving the maze")
if str(response.strip()).lower()=="end":
print("DONE")
quit()
commands = movement.GetResponse(response)
...
%Run ISKRAlabirynt.py
Type image file name: maze.jpg
1728501065.2808523
Response: The red square is at the starting point A. Let's move it towards point B.
Move 50 pixels down:
S50
Commands: S50
1728501069.1735294
Response: The red square is currently 50 pixels down from the top left corner.
Move 50 pixels right:
D50
Commands: D50
1728501071.7336936
Response: The red square is at the starting point A. Let's move to solve the maze toward point B.
Move 100 pixels down:
S100
Commands: S100
1728501074.598248
Response: The red square is currently positioned near the top left, labeled A.
Move 100 pixels down:
S100
Commands: S100
1728501077.463576
Response: The red square is currently near the middle on the left side. Let's continue navigating through the maze.
Move 50 pixels down:
S50
Commands: S50
1728501080.1276205
Response: The red square is currently a little below the starting point A.
Move 100 pixels down:
S100
Commands: S100
1728501082.8906786
Response: The red square is near the bottom left. Let's move it along the path to point B.
Move 100 pixels right:
D100
Commands: D100
1728501085.8643894
Response: The red square is positioned a bit down the path on the left side. Let’s continue solving.
Move 50 pixels down:
S50
Commands: S50
1728501088.6266737
Response: The red square is on the left side of the maze. Let's move right.
Move 50 pixels right:
D50
Commands: D50
1728501091.595703
Response: The red square is on the left side of the maze.
Move 50 pixels right:
D50
Commands: D50
1728501094.3633986
Response: The red square has moved slightly to the right. Let's continue navigating.
Move 50 pixels down:
S50
Commands: S50
1728501096.9310725
Response: The red square is currently in the middle of the left side. Let's move further.
Move 100 pixels down:
S100
Commands: S100
1728501099.3823495
Response: The red square is currently near the bottom left corner of the maze.
Move 100 pixels right:
D100
Commands: D100
1728501104.0511148
Response: The red square is near the middle on the left side. Let's continue navigating through the maze.
Move 50 pixels up:
W50
Commands: W50
1728501107.1614254
Response: The red square is in the middle section of the maze. Let's move it down.
Move 50 pixels down:
S50
Commands: S50
1728501109.8225186
Response: The red square is currently positioned on the left side of the maze.
Move 50 pixels right:
D50
Commands: D50
1728501112.690546
Response: The red square is close to point B. Let's move toward it.
Move 50 pixels up:
W50
Commands: W50
1728501115.8683639
Response: The red square is close to point B, on the left side of the inner path. Let’s proceed towards B.
Move 50 pixels up:
W50
Commands: W50
1728501118.9399807
Response: The red square is close to the center, near point B.
Move 50 pixels right:
D50
Commands: D50
1728501123.2421985
Response: The red square is close to the inner section near point B.
Move 50 pixels left:
A50
Commands: A50
1728501125.9731746
Response: The red square is close to the center and needs to reach point B. Let's move down.
Move 50 pixels down:
S50
Commands: S50
1728501128.7706983
Response: The red square is on the left side, near the center. Let's move it toward point B.
Move 50 pixels right:
D50
Commands: D50
1728501131.3271213
Response: The red square is positioned near the central area on the right side.
Move 100 pixels up:
W100
Commands: W100
1728501134.7072434
Response: The red square is close to point B on the inner path.
Move 100 pixels down:
S100
Commands: S100
1728501137.5772278
Response: The red square is near point B on the right side. Let's move it closer.
Move 50 pixels up:
W50
Commands: W50
1728501140.239705
Response: The red square is close to point B on the right side. Let's move towards B.
Move 50 pixels left:
A50
Commands: A50
1728501143.0052705
Response: The red square is currently near the center on the left side of the maze. Let's move it towards point B.
Move 50 pixels right:
D50
Commands: D50
1728501145.97357
Response: The red square is currently near the middle, just to the right of point B.
Move 50 pixels up:
W50
Commands: W50
1728501151.4021986
Response: The red square is currently close to the center on the left side of the path. Let's move toward point B.
Move 50 pixels right:
D50
Commands: D50
1728501154.6785378
Response: The red square is currently close to the center at point B. The maze is solved! If there's anything else you need, feel free to ask.
Commands: Great job! If you have any more tasks or questions, feel free to ask.
Traceback (most recent call last):
File "/home/wojtekb30/Pulpit/ISKRAlabirynt/ISKRAlabirynt.py", line 88, in module
command_table.append(int(commands[1:]))
ValueError: invalid literal for int() with base 10: 'reat job! If you have any more tasks or questions, feel free to ask.'
current_x=230
current_y=100
new_image = draw_square(image, current_x, current_y)
#image.show()
system_brain = '''You will recieve a picture of a maze. Solve it by getting from point A to B.
You control a red square. Solve the maze by moving it from point A to B.
Control the red square by writing direction and then amount of pixels. Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Move only once per iteration. Never make multiple movements at once.
Write your own thoughts and comments.
Go thru the maze and not directly to point B.
Write "END" and nothing more once point B is reached.
Current position of the red square:
'''
system_movement = '''You will recieve a text, which will contain a letter and number, which indicate movement.
Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Extract the command from text given to you. Write only the command, never write anything else.
'''
system_helper = '''You will recieve a picture of a maze. Help solve it.
The goal is to move the red square from point A to point B thru the maze.
The square moves only once per iteration.
The red square must never move thru a wall.
Write recommendations of only 1 next movement.
Current position of the red square:
'''
brain = ImgChatGPTAgent(api_key, 'gpt-4o', 500, system_brain, 1)
movement = ChatGPTAgent(api_key, 'gpt-3.5-turbo', 500, system_movement, 0.8)
helper = ImgChatGPTAgent(api_key, 'gpt-4o', 255, system_helper, 1)
while True:
updated_system_brain = system_brain + " X: " + str(current_x) + " Y: " + str(current_y)
updated_system_helper = system_helper + " X: " + str(current_x) + " Y: " + str(current_y)
helper.ChangeSystemMessage(updated_system_helper)
tip = helper.GetResponseWithImg(new_image, "What should I do now to continue solving the maze? How to move to not move thru walls?")
print("Tip: "+tip)
brain.ChangeSystemMessage(updated_system_brain)
response = brain.GetResponseWithImg(new_image, tip)
Nothing changed in the path, since the Helper AI considers the same wrong path as corrent.from APIWojOpenAI import * #my OpenAI API interface, available on my GitHub
from WojSafeAPI import * #safely loads my own API key, you are unable to ever use this library and its function.
from PIL import Image, ImageDraw
import time
api_key = YourAPIKeyHere("openai")
image = Image.open(str(input("Type image file name: ")))
image = image.convert('RGB')
def draw_line(image, x1, y1, x2, y2):
d = ImageDraw.Draw(image)
d.line([(x1, y1), (x2, y2)], fill=(255,0,0), width=4)
image.save("log/"+str(time.time())+".jpg")
return image
def draw_square(image, x, y, wall=40):
new_image = image.copy()
d = ImageDraw.Draw(new_image)
c_x = wall//2
c_y = wall//2
if x>c_x and y>c_y:
d.rectangle((x-c_x , y-c_y , x-c_x+wall , y-c_y+wall), fill=(255,0,0), width=1)
else:
d.rectangle((x , y , x+wall , y+wall), fill=(255,0,0), width=1)
new_image.save("log/"+str(time.time())+".jpg")
return new_image
current_x=230
current_y=100
new_image = draw_square(image, current_x, current_y)
#image.show()
system_brain = '''You will recieve a picture of a maze. Solve it by getting from point A to B.
You control a red square. Solve the maze by moving it from point A to B.
The maze has black walls and white floor.
Never move the red square into a green area.
Control the red square by writing direction and then amount of pixels. Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Move only once per iteration. Never make multiple movements at once.
Write your own thoughts and comments.
Go thru the maze and not directly to point B.
Write "END" and nothing more once point B is reached.
Enter the maze by first moving to the right.
Current position of the red square:
'''
system_movement = '''You will recieve a text, which will contain a letter and number, which indicate movement.
Available directions are:
W - up/north
S - down/south
A - left/west
D - right/east
Examples:
Move 150 pixels north:
W150
Move 30 pixels left:
A30
Extract the command from text given to you. Write only the command, never write anything else.
'''
system_helper = '''You will recieve a picture of a maze. Help solve it.
The goal is to move the red square from point A to point B thru the maze.
The maze has black walls and white floor.
Never move the red square into a green area.
The square moves only once per iteration.
The red square must never move thru a wall.
Write recommendations of only 1 next movement.
Current position of the red square:
'''
brain = ImgChatGPTAgent(api_key, 'gpt-4o', 500, system_brain, 1)
movement = ChatGPTAgent(api_key, 'gpt-3.5-turbo', 500, system_movement, 0.8)
helper = ImgChatGPTAgent(api_key, 'gpt-4o', 500, system_helper, 1)
while True:
updated_system_brain = system_brain + " X: " + str(current_x) + " Y: " + str(current_y)
updated_system_helper = system_helper + " X: " + str(current_x) + " Y: " + str(current_y)
#helper.ChangeSystemMessage(updated_system_helper)
#tip = helper.GetResponseWithImg(new_image, "What should I do now to continue solving the maze? How to move to not move thru walls?")
#print("Tip: "+tip)
brain.ChangeSystemMessage(updated_system_brain)
response = brain.GetResponseWithImg(new_image, "Continue solving the maze")
if str(response.strip()).lower()=="end":
print("DONE")
quit()
commands = movement.GetResponse(response)
if str(commands.strip()).lower()=="end":
print("DONE")
quit()
print(time.time())
print("Response: "+response)
print("Commands: "+commands)
command_table = []
command_table.append(str(commands[0]).lower())
command_table.append(int(commands[1:]))
if command_table[0]=="w":
current_y = current_y - command_table[1]
elif command_table[0]=="s":
current_y = current_y + command_table[1]
elif command_table[0]=="a":
current_x = current_x - command_table[1]
elif command_table[0]=="d":
current_x = current_x + command_table[1]
else:
print("An error occured.")
quit()
new_image = draw_square(image, current_x, current_y)
Comments
Post a Comment