ClipsAI is a Python library that gives you clip-detection and dynamic 9:16 reframing primitives. Podcli is the end-to-end product around the same idea: CLI, web UI, captions, AI scoring, MCP server.
The features that change the day-to-day for clip creators.
ClipsAI is excellent at what it sets out to do: it gives Python developers two primitives. One does clip detection from transcripts; the other does speaker-aware dynamic 9:16 reframing. You import it, you call it, you wire it into your own pipeline. The caption step, the UI, the export queue, and the AI ranking are all yours to build.
Podcli is the opposite shape: a finished pipeline. Same core building blocks (Whisper, Pyannote, face detection), plus the entire surface above them. Captions rendered via Remotion. A web UI with a 9:16 phone-frame preview. AI clip scoring against a knowledge base. An MCP server so Claude Code can drive it. If you don't want to build the wrapper, Podcli is the wrapper.
If you're shipping your own clipping product or a hosted SaaS on top of clip-detection, ClipsAI is a sensible foundation. You get Python-native primitives without the rest of the opinionated pipeline. Podcli's opinions (Remotion captions, knowledge-base scoring, MCP server) become baggage if you already have your own answers.
If you are a podcaster or studio that wants the clips themselves, not a clip-detection toolkit, install Podcli and run the CLI. The pipeline is already there. Same goes for anyone who lives in Claude Code or Cursor and wants the agent to do the whole thing; that's what the MCP server is for.
Direct answers to the searches people run before they decide.
No. Podcli is an independent project. It uses similar building blocks (Whisper, Pyannote, face detection) because that's the standard stack for this task, but the code, pipeline, captions, UI, and AI scoring layers are all separate.
Yes. The CLI is a single command. The web UI is a browser app. The MCP server takes natural-language instructions. You only touch code if you want to edit a caption style (React/Remotion) or extend the knowledge base (markdown).
Both use face/speaker detection to crop 16:9 → 9:16. Podcli adds a mouth-motion analysis for 2-person split-screen interviews so you don't need diarization for the camera framing, plus a scene-cut guard that suppresses jittery pans on B-roll-heavy clips.
The setup script handles the toolchain. You'll have a clip out the other side in a few minutes.