Turning Youtube videos into searchable knowledge

Designing and developing a SaaS product end-to-end

Gistra

Gistra

Gistra was built to solve a specific problem: existing YouTube transcript tools were either poorly designed, couldn't handle bulk operations, or lacked AI-powered features. The goal was to create something better for content marketers, students, and researchers who need to work with large amounts of video content.

At its core, Gistra lets users extract transcripts in bulk and then interact with them using AI. You can search semantically across your entire library, chat with transcripts, and generate study materials like flashcards, quizzes, and guides from multiple videos at once. The design blends clean UI with subtle neo-brutalist touches to stand out from generic AI SaaS tools.

The core challenge was building a system that could process thousands of YouTube transcripts quickly and reliably while maintaining fair performance for all users. Most tools either sacrifice speed for reliability or vice versa. Gistra needed to give users fast access to their transcripts while managing a job queue that wouldn't let large requests clog the system. Plus, it needed to be secure with server-side auth and payment handling, scalable, and deliver a polished UX throughout.

Industry

AI SaaS

Technologies

  • Astro
  • Custom microservices
  • Supabase
  • AI

Services

  • Backend
  • AI integration
  • Design
  • Microservices

Table of Contents

Schedule a call

Curious about a redesign, technical audit, AI integration, or an entirely new website?

Let's talk about what we can do to grow your business. No fluff.

Start your project

Finding the gap in the market

What existing tools got wrong

Most YouTube transcript tools fall into one of three buckets: poor user experience, single-video limitations, or missing AI features. Some require users to manually paste URLs one at a time. Others have great features but are clunky and unpleasant to use. And the ones that do offer bulk processing often lack the ability to actually do anything meaningful with all that data.

Worse still, many tools force users to download every video from a channel or playlist without the ability to selectively choose what they need. That's inefficient and frustrating when you only want specific content.

Gistra was designed to address all of these issues simultaneously. The bulk extraction system is two-step: first, it pulls a list of all available videos, then lets users select exactly what they want in a modal interface. This gives users control while keeping the workflow fast.

Designing the architecture

A security-first, full-stack approach

The system is built on Astro in SSR mode, which handles the frontend and acts as a backend-for-frontend. All authentication happens server-side through Supabase, which also stores transcripts and user data. Dodo Payments handles billing, connected to both Astro and Supabase. A separate microservice stack handles the actual transcript fetching and preparation.

From day one, security was a priority. The client remains "dumb"—it never has access to auth-critical information. All auth events, payment webhooks, and account management happen on protected server actions and endpoints. Rate-limiting is implemented for sensitive actions like login, signup, and password recovery.

Gistra system architecture diagram
We prioritized a practical, simple, but secure architecture

Building a fault-tolerant extraction engine

Fast, fair, and scalable

When a user requests hundreds of transcripts, they need some results immediately, but their request shouldn't overwhelm the system for everyone else. The solution is a priority queue system. The first 5-10 requests from any user get priority 1, meaning they're executed immediately. The rest are assigned lower priority. This gives users a quick win while protecting system performance.

Reliability was equally important. The primary method fetches video metadata from the YouTube Data API (under 200ms) and uses a microservice for transcripts (under 2 seconds for a batch of 5-10). The microservice retries up to three times depending on the error type. If a video has no captions, we surface that to the user and refund the credit immediately.

When everything fails, we call an Apify worker as a final fallback. It's slower but very reliable. Testing across 10,000 transcripts showed this approach achieves over 98% reliability, which is strong for a service depending on proxies and external APIs.

Clear feedback as data loads

Users always know what's happening. Postgres realtime connections monitor the status of transcripts. When viewing a library, unfinished transcripts display a loading skeleton. They pop in smoothly when ready. This provides clear visual understanding of what's available even when downloading hundreds of transcripts at once.

Details of loading skeletons and components
Some examples of UI elements that communicate loading state.

AI-powered insights and interaction

Once transcripts are extracted, the real work begins. A job queue generates embeddings for each video in the background via OpenRouter. This doesn't block users from accessing their transcripts—the embeddings arrive shortly after. The embeddings power semantic search across entire libraries, letting users find relevant content across hundreds of videos.

From there, users can generate bulk AI interactions: chat with transcripts, create flashcards, build quizzes, and develop study guides from multiple videos at once.

Since launching, Gistra has processed over 100,000 transcripts. The time between submitting a request and receiving the first transcript averages less than one second, depending on location and network conditions.

Gistra now has a strong foundation on which to grow: a robust, stable, but easily maintainable architecture.