Skip to main content

Command Palette

Search for a command to run...

Video Query AI: Search your videos like you search text 🧠🎥

Updated
4 min read
Video Query AI: Search your videos like you search text 🧠🎥

The Problem

Video content is rich but hard to search through. If you've ever wanted to find that one moment in a long video a specific visual, object, or scene you know how frustrating it can be. Unlike text, video isn't naturally searchable by keywords or phrases unless someone takes the time to manually add descriptions (and honestly, who has time for that?).

That’s where Video Query AI comes in.

With Video Query AI, you can ask a question like:

"When does the chef dice the onions in this cooking tutorial?"

"Where is the elephant in this video"

"Can you show me where koalas are clapping?"

...and it will instantly give you the exact timestamps where those items appear in your videos. Whether you're searching across all your content or a specific video, you'll get precise results without any manual tagging required.

Introducing Video Query AI 🧠

A fully local, privacy-first app that helps you semantically search through videos using natural language. Simply upload a video, and we’ll do the heavy lifting to process frames, generate descriptions using AI, embed them, and allow fast search.

🎯 Use cases:

  • Find key moments in tutorials

  • Detect objects/scenes in surveillance videos

  • Find specific scenes or moments in videos with text search


App Walkthrough ✨

1️⃣ 🏠 Home & Video Library

  • Land on a clean dashboard with a sidebar listing all uploaded videos.

  • Browse or search through your video collection easily.

2️⃣ 📤 Video Upload & Processing

  • Drag and drop a video file for processing.

  • The backend kicks off a background job to:

    • Extract frames using ffmpeg

    • Generate descriptions via LLaVA

    • Generate vector embeddings of those descriptions for semantic search

  • Real-time progress updates

    • Updates are streamed live via WebSockets.

    • Progress persists across reloads using Redis Pub/Sub for state sync

  • Enter a query like "Where is the elephant?" or "Chef chopping onions"

  • The app performs a vector similarity search against frame captions.

  • You’ll get timestamps + thumbnails of the best matching moments in the video.


⚙ System Architecture

Under the Hood 🛠

Video Query AI follows a modular architecture built by:

  • Frontend: React + TypeScript with Vite and React Router

  • Backend: FastAPI serving REST and WebSocket endpoints

  • Job Queue: Redis + RQ for background processing

  • Embedding Store: ChromaDB for vector search

  • Realtime Updates: WebSockets with Redis Pub/Sub for progress tracking and resumable streams

📼 Video Processing Flow

Here’s what happens behind the scenes when you upload a video:

  1. Upload

    • File is saved to disk and its metadata is stored in ChromaDB.
  2. Job Queuing

    • A video processing job is pushed to a Redis Queue and handled asynchronously by a worker.
  3. Frame Extraction

    • Frames are extracted from the video using ffmpeg.
  4. Frame Analysis

    • Each frame is sent through LLaVA (via Ollama) to describe it.

    • Description is embedded into a vector using a sentence transformer.

  5. Storage

    • Vector embeddings + metadata is stored in ChromaDB.
  6. Progress Updates

    • Real-time progress is sent to the frontend via WebSockets + Redis PubSub.

🔍 Search Flow

Users can search across:

  • All uploaded videos

  • A single selected video

When a query is made:

  1. The backend embeds the query using the same embedding model.

  2. A vector similarity search is performed in ChromaDB.

  3. Top 10 closest matches (timestamps + thumbnails) are returned.

🔁 Real-Time Progress & Resumability

Even if the user refreshes the page mid-processing:

  • The frontend reconnects via WebSocket.

  • The backend reads the current job state from Redis and resumes updates seamlessly.


What’s Next ✨

  • 🔍 Multi-modal search: Use both audio and visual cues

  • 🧩 Tag-based search: Filter by detected objects/tags

  • 🎬 Scene-wise processing: Break videos into meaningful segments and let users search by scene, not just individual frames.

That’s all folks ��

Building Video Query AI was a fun way to bring together my interests in video processing, LLMs, and systems engineering. I hope it helps you explore your video content more easily or gives you an idea for something you'd like to build!

Let’s connect: