Video Query AI: Search your videos like you search text 🧠🎥

The Problem
Video content is rich but hard to search through. If you've ever wanted to find that one moment in a long video a specific visual, object, or scene you know how frustrating it can be. Unlike text, video isn't naturally searchable by keywords or phrases unless someone takes the time to manually add descriptions (and honestly, who has time for that?).
That’s where Video Query AI comes in.
With Video Query AI, you can ask a question like:
❓ "When does the chef dice the onions in this cooking tutorial?"
❓ "Where is the elephant in this video"
❓ "Can you show me where koalas are clapping?"
...and it will instantly give you the exact timestamps where those items appear in your videos. Whether you're searching across all your content or a specific video, you'll get precise results without any manual tagging required.
Introducing Video Query AI 🧠
A fully local, privacy-first app that helps you semantically search through videos using natural language. Simply upload a video, and we’ll do the heavy lifting to process frames, generate descriptions using AI, embed them, and allow fast search.
🎯 Use cases:
Find key moments in tutorials
Detect objects/scenes in surveillance videos
Find specific scenes or moments in videos with text search
App Walkthrough ✨
1️⃣ 🏠 Home & Video Library

Land on a clean dashboard with a sidebar listing all uploaded videos.
Browse or search through your video collection easily.
2️⃣ 📤 Video Upload & Processing

Drag and drop a video file for processing.
The backend kicks off a background job to:
Extract frames using
ffmpegGenerate descriptions via LLaVA
Generate vector embeddings of those descriptions for semantic search
Real-time progress updates
Updates are streamed live via WebSockets.
Progress persists across reloads using Redis Pub/Sub for state sync
3️⃣ 🔍 Natural Language Search

Enter a query like "Where is the elephant?" or "Chef chopping onions"
The app performs a vector similarity search against frame captions.
You’ll get timestamps + thumbnails of the best matching moments in the video.
⚙ System Architecture
Under the Hood 🛠

Video Query AI follows a modular architecture built by:
Frontend: React + TypeScript with Vite and React Router
Backend: FastAPI serving REST and WebSocket endpoints
Job Queue: Redis + RQ for background processing
Embedding Store: ChromaDB for vector search
Realtime Updates: WebSockets with Redis Pub/Sub for progress tracking and resumable streams
📼 Video Processing Flow
Here’s what happens behind the scenes when you upload a video:

Upload
- File is saved to disk and its metadata is stored in ChromaDB.
Job Queuing
- A video processing job is pushed to a Redis Queue and handled asynchronously by a worker.
Frame Extraction
- Frames are extracted from the video using
ffmpeg.
- Frames are extracted from the video using
Frame Analysis
Each frame is sent through LLaVA (via Ollama) to describe it.
Description is embedded into a vector using a sentence transformer.
Storage
- Vector embeddings + metadata is stored in ChromaDB.
Progress Updates
- Real-time progress is sent to the frontend via WebSockets + Redis PubSub.
🔍 Search Flow
Users can search across:
All uploaded videos
A single selected video

When a query is made:
The backend embeds the query using the same embedding model.
A vector similarity search is performed in ChromaDB.
Top 10 closest matches (timestamps + thumbnails) are returned.
🔁 Real-Time Progress & Resumability
Even if the user refreshes the page mid-processing:
The frontend reconnects via WebSocket.
The backend reads the current job state from Redis and resumes updates seamlessly.
What’s Next ✨
🔍 Multi-modal search: Use both audio and visual cues
🧩 Tag-based search: Filter by detected objects/tags
🎬 Scene-wise processing: Break videos into meaningful segments and let users search by scene, not just individual frames.
Useful links 🔗
That’s all folks ��
Building Video Query AI was a fun way to bring together my interests in video processing, LLMs, and systems engineering. I hope it helps you explore your video content more easily or gives you an idea for something you'd like to build!
Let’s connect:



