Step‑by‑Step: Building a YouTube Research Agent with Claude Code

If you’ve ever spent an afternoon rabbit‑holing through YouTube, only to forget which videos were worth saving, you’re not alone. Now imagine a tireless assistant that can find the best videos, extract summaries, pull key quotes, timestamp insights, and return sources on demand—fast. That’s exactly what a YouTube research agent can do. In this step‑by‑step guide, we’ll build a practical YouTube research agent with Claude Code, designed for creators, analysts, students, and obsessed learners who want signal over noise.

We’ll take a practical & direct route: architecture, code, prompts, and guardrails. Along the way, we’ll make opinionated choices you can swap later. By the end, you’ll have a working agent that can search YouTube, gather transcripts, reason across multiple videos, and produce clean research briefs.

What We’re Building (and Why It Matters)

Goal: A YouTube research agent that can:

Search YouTube by query

Rank results by relevance/engagement

Fetch transcripts (auto‑captions or third‑party)

Chunk and embed content for retrieval

Use Claude Code to synthesize multi‑video insights

Output structured notes: summary, claims, timestamps, quotes, and citations

Primary keyword: "Building a YouTube research agent with Claude Code"

Format: Step‑by‑step tutorial with runnable code and prompts

Outputs: Markdown research brief + JSON for programmatic use

Why it matters: YouTube is the largest public knowledge base of talks, lessons, demos, and debates. But it’s noisy. Building a YouTube research agent with Claude Code gives you an edge: you can aggregate insights across dozens of videos in minutes, not hours.

Architecture at a Glance

We’ll keep the first version simple and robust.

Inputs: a research query (e.g., "LLM agent architectures 2025"), optional constraints (date range, channel, duration)

YouTube Search: YouTube Data API v3 (or SerpAPI fallback)

Transcripts: YouTube Transcript API; fallback to ASR (e.g., Whisper) when unavailable

Chunking: Sentence‑aware segmentation (approx 800–1,200 tokens)

Embeddings: Use a local or hosted embedding model (e.g., text-embedding-3-large, nomic-embed-text, or bge-large)

Vector Store: Local FAISS for speed; can swap to Pinecone, Weaviate, or Qdrant

Reasoning: Claude Code for orchestration, tool use, synthesis, and code execution inside a controlled loop

Outputs: Markdown report + JSON index with citations, timestamps, and scores

Data flow: Query → Search → Fetch metadata → Transcript → Chunk → Embed → Retrieve top‑K → Claude Code synthesis → Report.

Prerequisites and Setup

Python 3.10+

API keys: YOUTUBE_API_KEY, ANTHROPIC_API_KEY (for Claude Code)

Optional: OPENAI_API_KEY or local embeddings

Libraries:

google-api-python-client, youtube-transcript-api

faiss-cpu, numpy, pandas, tiktoken (or sentencepiece)

requests, pydantic, tenacity

anthropic (Claude API)

pip install google-api-python-client youtube-transcript-api faiss-cpu numpy pandas requests pydantic tenacity anthropic tiktoken

Environment variables:

export YOUTUBE_API_KEY=YOUR_YT_KEY
export ANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY

Step 1: YouTube Search with Filters

We’ll search YouTube and return structured metadata: title, channel, publish date, duration, views (if available), and videoId.

# file: yt_search.py
from googleapiclient.discovery import build
import os
YOUTUBE_API_KEY = os.environ — channel, date\n\n"
"---\n"
"JSON schema: {\"claims\":[{\"claim\":str,\"support\":[{\"video_id\":str,\"start\":float,\"end\":float}]}]}\n"
)
def call_claude(goal: str, passages: list[dict]):
passages_str = "\n\n".join(
f"[rank {p['rank']} | score {p['score']:.3f}] (vID={p.get('video_id','?')}, {p.get('start',0):.1f}-{p.get('end',0):.1f})\n{p['text']}"
for p in passages
)
msg = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1800,
temperature=0.2,
system=SYSTEM_PROMPT,
messages=[
{"role": "user", "content": USER_TEMPLATE.format(goal=goal, passages=passages_str)}
])
return msg.content[0].text

Prompt tips when building a YouTube research agent with Claude Code:

Ask for structured outputs in both human‑readable and machine‑readable formats

Enforce timestamped citations

Encourage uncertainty disclosures and contradictions

Step 6: Putting It All Together

Let’s wire up query → search → transcripts → chunks → embeddings → retrieve → synthesize.

# file: run_agent.py
from yt_search import search_youtube
from transcripts import fetch_transcript
from chunking import transcript_to_docs
from embeddings import VectorStore
from orchestrator import call_claude
from datetime import datetime
def build_corpus(query: str, max_videos=8):
results = search_youtube(query, max_results=max_videos)
corpus_docs = []
for r in results:
tx = fetch_transcript(r["video_id"]) or []
if not tx:
continue
docs = transcript_to_docs(tx)
for d in docs:
d.update({
"video_id": r["video_id"],
"title": r["title"],
"channel": r["channel"],
"url": r["url"],
})
corpus_docs.extend(docs)
return corpus_docs
def research(query: str, k=12):
corpus = build_corpus(query)
if not corpus:
return "No transcripts available."
vs = VectorStore
vs.add(corpus)
passages = vs.search(query, k=k)
md = call_claude(query, passages)
timestamp = datetime.utcnow.isoformat
return f"<!-- generated {timestamp} UTC -->\n\n" + md
if __name__ == "__main__":
print(research("LLM agents for YouTube research"))

This baseline version of a YouTube research agent with Claude Code will search, retrieve, and synthesize multi‑video insights with citations. Upgrade the embeddings and add caching to make it production‑ready.

Seven Upgrades To Make It Great

Better embeddings and hybrid search

Swap in high‑quality embeddings and add BM25 keyword search. Hybrid gives more recall on niche terms and better precision on abstract topics.

Expand tools for richer metadata

Pull comments, likes/dislikes ratio, and channel authority. Add a re‑ranker (cross‑encoder) for top 100 candidates.

Multi‑turn research planning

Use Claude Code to propose a research plan: sub‑questions, hypotheses, and coverage checks. Execute iteratively until coverage thresholds are met.

Evidence tracking and counter‑evidence

For each claim, log supporting and contradicting snippets. Present both in reports; add confidence scores.

Long‑video strategies

Use scene detection via subtitles or Whisper word timings. Summarize per‑section before global synthesis to avoid context dilution.

Caching and persistence

Store transcripts, embeddings, and reports per query. Reuse when users tweak filters. Add deduplication by video ID.

Export formats and delivery

Export Markdown, PDF, and JSON. Email or Slack delivery. Render timestamps as clickable ?t=mmss links.

Prompts You Can Reuse

Use these templates while building a YouTube research agent with Claude Code.

System: You are a meticulous research agent. Synthesize across multiple YouTube transcripts. Cite inline with [vID @ mm:ss], and include a Sources section with URLs. Return both a Markdown brief and a JSON payload of claims with timestamped support.

User: Research goal: {topic}
Constraints: focus on {audience or scope}; prefer sources within {date range}; include disagreements.
Candidate passages (ranked):
{retrieved_passages}
Output: Summary → Key Insights (bullets) → Notable Quotes (with timestamps) → Contradictions & Gaps → Sources. Then JSON {"claims": ...}

Guardrails and Ethics

Respect creator rights: Link to the original videos and avoid publishing large verbatim transcripts.

Be transparent: Show where claims come from using timestamps and video IDs.

Avoid over‑summarization: Preserve nuance; flag when captions are auto‑generated and likely noisy.

Handle sensitive topics carefully: Highlight uncertainty and seek diverse sources.

Troubleshooting: Common Issues and Fixes

"No transcript found"

Fallback to Whisper; try different languages; check if the video is region‑blocked.

Bad retrieval quality

Upgrade embeddings; add BM25; increase chunk overlap; parameter‑tune top‑K.

Hallucinated citations

Force strict citation schema; penalize unsupported claims; require exact timestamps present in retrieved chunks.

API quota limits

Cache aggressively; reduce max_results; batch requests; add back‑off with tenacity.

Long‑form drift

Summarize per‑section; constrain max tokens; use planning prompts with explicit outline.

Measuring Quality

Precision@K of retrieved chunks vs. a labeled set

Faithfulness rate: proportion of claims with verifiable timestamped support

Coverage: number of unique relevant videos cited

Latency: time from query to report

Example: Researching "Vector Databases Explained"

Query: "vector databases explained for developers 2025"

Filters: videos after 2023, duration 6–30 minutes

Outcome: Agent cites 6 videos, highlights trade‑offs of HNSW vs. IVF‑PQ, discusses cost/recall, and links to benchmarks. Contradictions section compares vendor claims vs. open‑source results.

By the Way: Automating This Inside Your Workflow

If you work across docs and code, it’s worth automating the last mile. A small CLI can run nightly queries and drop Markdown briefs into your knowledge base. You can also wire it into issue templates for sprint research.

Worth noting: if your workflow already lives in a browser sidebar or AI assistant, tools like Sider.AI can streamline the research loop—select a topic, run a search, capture transcripts, and draft a Claude‑powered summary right where you work. This can save context switching and make building a YouTube research agent with Claude Code even more practical for teams.

Key Takeaways

Building a YouTube research agent with Claude Code is a high‑leverage way to turn videos into actionable briefs.

The minimal stack: YouTube API + transcripts + chunking + embeddings + FAISS + Claude synthesis.

Upgrade paths: hybrid search, re‑ranking, planning loops, and strict citation tracking.

Start simple, measure faithfulness, and iterate toward reliability.

Next Steps

Implement a real embedding model and hybrid retrieval

Add a re‑ranking step and quality metrics

Create a scheduled job to refresh topics weekly

Package as a CLI and a lightweight web UI

FAQ

Q1:How do I start building a YouTube research agent with Claude Code? Begin with YouTube search, fetch transcripts, chunk content, embed into a vector store, and use Claude Code to synthesize results. The guide above provides step-by-step code to assemble a working pipeline.

Q2:What libraries are best for a YouTube research agent? Use the YouTube Data API for search, youtube-transcript-api for captions, FAISS for vector search, and the Anthropic SDK to call Claude Code. You can swap embeddings with OpenAI, Nomic, or BGE.

Q3:How do I ensure accurate citations and timestamps? Keep start/end timestamps during chunking and require Claude Code to cite [video_id @ mm:ss]. Validate that cited timestamps exist in retrieved chunks before publishing.

Q4:Can I use this agent for private or unlisted videos? Yes, if you have access and can fetch transcripts or run local ASR (e.g., Whisper). Always respect permissions and avoid distributing copyrighted content.

Q5:How can I scale this YouTube research agent for teams? Add caching, a shared vector store, job queues, and scheduled runs. Integrate with Slack or a wiki, and consider a browser-based assistant like Sider.AI to streamline researcher workflows.