RecapGPT — How it works

Ingest

Paste any YouTube link.

Public videos, unlisted videos with the URL, playlists, channels — RecapGPT pulls the audio and metadata in seconds. We don't bypass paywalls, DRM, or private content.

Long videos (4+ hours) are chunked, processed in parallel, and stitched back together. Most three-hour podcasts complete in under thirty seconds.

Transcribe

Audio becomes accurate text.

A specialized transcription model handles the speech-to-text step, with diarization to label different speakers. We retrain on the speaker's vocabulary mid-transcript, so technical terms get progressively more accurate.

Confidence scores are tracked per segment. Anything below threshold gets flagged in the final output.

Edit

A writing model takes over.

This is where most "AI summarizer" tools stop and hand you a wall of text. RecapGPT runs a separate editing pass with a model trained on long-form journalism. It restructures, removes filler, identifies the actual argument, and writes connective prose.

The output is constrained to spans that map back to the source transcript — so the article never invents facts that aren't in the original.

Format

Choose how you'll read it.

The same source video can become a TL;DR, a 1,500-word essay, a 12-tweet thread, or a set of hierarchical study notes — generated in parallel. Pick the format that fits your workflow.

It joins your library.

Every recap becomes part of a private, searchable corpus. Ask natural-language questions across everything you've processed and get cited answers — every quote linked to its exact second in the source video.

Try it on your longest open tab.

That four-hour podcast you've been meaning to watch? Recap it now in thirty seconds.

Get started — free →