Hello everyone! Today, we will create a way to let ChatGPT watch videos. But first, I will describe my approach to this problem. So models like GPT-4o and GPT-4o-mini are able to analyze static images. My approach to let the AI analyze movies will look like this: - The video will be split into individual frames on specific interval. - Certain amounts of the frames will become scenes (for example 8 frames when the frames are took every 4 seconds will make a 32 second long scene in total). The scenes will then be assembled into images containing all the frames, creating sort of a "comic". - If set to do so, audio fragment for each scene will be processed thru speech recognition to get transcript of dialogues. As I am writing this, OpenAI released a model gpt-4o-audio-preview, which they claim can analyze audio; but since I haven't used it yet, let's first extract speech with a classic speech recognition library. In this part I am not going to add audio support yet tho. ...
Comments
Post a Comment