Using agent-based approach to make ChatGPT capable of video analysis - part 2 - adding speech recognition and more
Hello. In this follow-up, we will continue development of the solution to let ChatGPT describe videos. In my approach, the video is split into fragments and then these fragments are turned into "comics". Then, every fragment gets described and all the descriptions are sent to the master AI agent, which interacts with the user. Since the last post, I updated the code to make it more modular and move AI processiong out of initalizer: from moviepy.video.io.VideoFileClip import VideoFileClip from PIL import Image import os import easyapiopenai from WojSafeAPI import * #My own library for loading API keys safely, you cannot use it. class Video_ChatGPT: def __init__(self,openai_api_key:str, video_path: str, time_between_frames_seconds: float, frames_per_scene: int, process_audio=True, frame_height = 480, watcher_model='gpt-4o-mini',master_model='gpt-4o', watcher_token_limit = 500, master_token_limit=15*10...