Wojtekb30's Blog (EN)

Posts

Showing posts with the label Python

Pre-training a GPT-2 AI model on 32 GB of data. I made it both draw and recognise sketches!

August 21, 2025

Hello everyone! In this post I will tell you about how I trained a GPT-2 "Small" AI model from scratch on a huge dataset of around 32 GB of raw text files with use of Torch and HuggingFace's Transformers libraries. I will explain steps of such a project, tell you about how it went for me and how I recommend doing these things now, and of course I will show you the code that pre-trains the AI model from a dataset too. Let's start from the beginning: 1. The idea, purpose and raw dataset First and foremost, a project like this needs an idea and should have an purpose. I pre-trained (trained from scratch) the AI model just to learn more about the process, but pre-training LLM AI models of that scale is done rarely and is usually pointless, as there are lot of base models already trained on specific languages that can then just be fine-tuned into a specific response format for example. But let's assume that we want to pre-train such an model anyway, for example as an a...

Using agent-based approach to make ChatGPT capable of video analysis - part 2 - adding speech recognition and more

January 12, 2025

Hello. In this follow-up, we will continue development of the solution to let ChatGPT describe videos. In my approach, the video is split into fragments and then these fragments are turned into "comics". Then, every fragment gets described and all the descriptions are sent to the master AI agent, which interacts with the user. Since the last post, I updated the code to make it more modular and move AI processiong out of initalizer: from moviepy.video.io.VideoFileClip import VideoFileClip from PIL import Image import os import easyapiopenai from WojSafeAPI import * #My own library for loading API keys safely, you cannot use it. class Video_ChatGPT: def __init__(self,openai_api_key:str, video_path: str, time_between_frames_seconds: float, frames_per_scene: int, process_audio=True, frame_height = 480, watcher_model='gpt-4o-mini',master_model='gpt-4o', watcher_token_limit = 500, master_token_limit=15*10...