Posts

Using agent-based approach to make ChatGPT capable of video analysis - part 2 - adding speech recognition and more

Image
Hello. In this follow-up, we will continue development of the solution to let ChatGPT describe videos. In my approach, the video is split into fragments and then these fragments are turned into "comics". Then, every fragment gets described and all the descriptions are sent to the master AI agent, which interacts with the user.  Since the last post, I updated the code to make it more modular and move AI processiong out of initalizer: from moviepy.video.io.VideoFileClip import VideoFileClip from PIL import Image import os import easyapiopenai from WojSafeAPI import * #My own library for loading API keys safely, you cannot use it. class Video_ChatGPT: def __init__(self,openai_api_key:str, video_path: str, time_between_frames_seconds: float, frames_per_scene: int, process_audio=True, frame_height = 480, watcher_model='gpt-4o-mini',master_model='gpt-4o', watcher_token_limit = 500, master_token_limit=15*10...

Using agent-based approach to make ChatGPT capable of video analysis - part 1 - proof of concept

Image
Hello everyone! Today, we will create a way to let ChatGPT watch videos. But first, I will describe my approach to this problem. So models like GPT-4o and GPT-4o-mini are able to analyze static images. My approach to let the AI analyze movies will look like this: - The video will be split into individual frames on specific interval. - Certain amounts of the frames will become scenes (for example 8 frames when the frames are took every 4 seconds will make a 32 second long scene in total). The scenes will then be assembled into images containing all the frames, creating sort of a "comic". - If set to do so, audio fragment for each scene will be processed thru speech recognition to get transcript of dialogues. As I am writing this, OpenAI released a model gpt-4o-audio-preview, which they claim can analyze audio; but since I haven't used it yet, let's first extract speech with a classic speech recognition library. In this part I am not going to add audio support yet tho. ...

ISKRA experiment 02_10_2024-1 - "Defeat the Minecraft creeper" [reupload]

Image
ISKRA experiment 02_10_2024-1 - "Defeat the Minecraft Creeper" Hello everyone! In this post, I will describe my recent and first public ISKRA Project experiment. I have conducted some experiments before, but I believe they are not worth publishing. Let's consider this one the first official ISKRA Project experiment. Let's begin with the description of it. I've built an installation in Minecraft Bedrock Edition: This was the starting point and viewport of our AI: The goal of the AI was to defeat the Creeper by using a set of pre-defined control commands: W - move forward 1 block S - move backwards 1 block A - strafe left 1 block D - strafe right 1 block P - place a brown dirt block in front of you R - destroy block in front of you F - punch to attack/fight The commands P and R were never actually implemented since they were unnecessary for this experiment. The Creeper was trapped in a structure to prevent...

Info on the first ISKRA project post

Edit 03.11.2024: The blog is again visible in search results. Hello everyone reading this! I noticed that my blog does not appear in search results of any search engine. Suspecting it got shadowbanned because of words used in first ISKRA experiment post, I decided to hide the post about experiment 02_10_2024_1, which was about making AI fight a creeper in Minecraft. Once my blog returns to being visible, I will try to publish it again, rewritten in a way that would not make web browser algorythms see it as controversial and hide existence of this blog.

AssemblerGPT - my GPT fine-tuning adventure

Image
Hello everyone reading this! Before I start with the topic of this post, I would like to inform that due to being busy next  ISKRA Project  articles will sadly be delayed. Sadly, I won't be able to publish them in regular intervals. Anyway, let's get into the topic I would like to write about here. I have fine-tuned different AI models in the past, but I never fine-tuned a latest OpenAI's GPT model. Some time ago, I decided to finally try that. But for those of you who don't know, I will first explain what fine-tuning is.  In a nutshell, it is a process of training artificial intelligence, usually to shape its output (responses) in a specific manner, teach it dealing with new tasks or to provide it with new data. It is usually done by providing lot of examples of input (for example questions) and desired output for each of the inputs (for example correct answers to all the questions).  A example could be: if you fine-tune a text generator AI on theatrical play script...