Wojtekb30's Blog (EN)

Posts

Showing posts with the label Programming

Pre-training a GPT-2 AI model on 32 GB of data. I made it both draw and recognise sketches!

August 21, 2025

Hello everyone! In this post I will tell you about how I trained a GPT-2 "Small" AI model from scratch on a huge dataset of around 32 GB of raw text files with use of Torch and HuggingFace's Transformers libraries. I will explain steps of such a project, tell you about how it went for me and how I recommend doing these things now, and of course I will show you the code that pre-trains the AI model from a dataset too. Let's start from the beginning: 1. The idea, purpose and raw dataset First and foremost, a project like this needs an idea and should have an purpose. I pre-trained (trained from scratch) the AI model just to learn more about the process, but pre-training LLM AI models of that scale is done rarely and is usually pointless, as there are lot of base models already trained on specific languages that can then just be fine-tuned into a specific response format for example. But let's assume that we want to pre-train such an model anyway, for example as an a...

Using agent-based approach to make ChatGPT capable of video analysis - part 2 - adding speech recognition and more

January 12, 2025

Hello. In this follow-up, we will continue development of the solution to let ChatGPT describe videos. In my approach, the video is split into fragments and then these fragments are turned into "comics". Then, every fragment gets described and all the descriptions are sent to the master AI agent, which interacts with the user. Since the last post, I updated the code to make it more modular and move AI processiong out of initalizer: from moviepy.video.io.VideoFileClip import VideoFileClip from PIL import Image import os import easyapiopenai from WojSafeAPI import * #My own library for loading API keys safely, you cannot use it. class Video_ChatGPT: def __init__(self,openai_api_key:str, video_path: str, time_between_frames_seconds: float, frames_per_scene: int, process_audio=True, frame_height = 480, watcher_model='gpt-4o-mini',master_model='gpt-4o', watcher_token_limit = 500, master_token_limit=15*10...

Using agent-based approach to make ChatGPT capable of video analysis - part 1 - proof of concept

December 12, 2024

Hello everyone! Today, we will create a way to let ChatGPT watch videos. But first, I will describe my approach to this problem. So models like GPT-4o and GPT-4o-mini are able to analyze static images. My approach to let the AI analyze movies will look like this: - The video will be split into individual frames on specific interval. - Certain amounts of the frames will become scenes (for example 8 frames when the frames are took every 4 seconds will make a 32 second long scene in total). The scenes will then be assembled into images containing all the frames, creating sort of a "comic". - If set to do so, audio fragment for each scene will be processed thru speech recognition to get transcript of dialogues. As I am writing this, OpenAI released a model gpt-4o-audio-preview, which they claim can analyze audio; but since I haven't used it yet, let's first extract speech with a classic speech recognition library. In this part I am not going to add audio support yet tho. ...

ISKRA experiment 02_10_2024-1 - "Defeat the Minecraft creeper" [reupload]

November 03, 2024

ISKRA experiment 02_10_2024-1 - "Defeat the Minecraft Creeper" Hello everyone! In this post, I will describe my recent and first public ISKRA Project experiment. I have conducted some experiments before, but I believe they are not worth publishing. Let's consider this one the first official ISKRA Project experiment. Let's begin with the description of it. I've built an installation in Minecraft Bedrock Edition: This was the starting point and viewport of our AI: The goal of the AI was to defeat the Creeper by using a set of pre-defined control commands: W - move forward 1 block S - move backwards 1 block A - strafe left 1 block D - strafe right 1 block P - place a brown dirt block in front of you R - destroy block in front of you F - punch to attack/fight The commands P and R were never actually implemented since they were unnecessary for this experiment. The Creeper was trapped in a structure to prevent...