Keywords AI
This is a comprehensive guide, estimated 12–15 minutes reading. Please save it for later. After you read this article, you'll be an expert in anthropic agent sdk skills, claude agent skills documentation, and production-ready agentic workflows.
In the modern LLM development stack, we are moving away from monolithic "God Prompts." Cramming 20,000 tokens into a System Prompt leads to several critical problems:
The solution is Agent Skills (part of the Anthropic Agent SDK). It introduces a Progressive Disclosure mechanism: loading metadata first and only injecting deep instructions or local resources when the task specifically requires them.
This isn't just theory. Claude Code, Cursor, and other major AI development tools have already adopted this pattern. By the end of this guide, you'll understand how to implement agent skills in your own systems and monitor them in production with KeywordsAI.
Agent skills use a two-phase loading pattern:
This means instead of processing 20,000 tokens on every request, you might process 500 tokens for metadata scanning, and only occasionally expand to include specific skill instructions.
To get started, your project needs a standardized directory structure. This ensures that tools like Claude Code, Cursor, and your own custom agents can discover your skills.
.claude/ └── skills/ ├── video-processor/ # Skill 1 │ ├── SKILL.md # Core definition (Required) │ ├── scripts/ # Local Python/Node scripts │ │ └── extract_frames.py │ └── references/ # Style guides or technical docs │ └── video_codec_guide.md │ ├── code-reviewer/ # Skill 2 │ ├── SKILL.md │ └── references/ │ ├── style_guide.md │ └── security_checklist.md │ └── data-analyst/ # Skill 3 ├── SKILL.md └── scripts/ └── query_builder.py
SKILL.mdPer the official Agent Skills specification, your SKILL.md uses YAML frontmatter for metadata, followed by Markdown instructions.
--- name: video-processor description: Triggers when the user asks to summarize video content, generate subtitles, or take automated screenshots. Use when user mentions videos, MP4, WebM, or multimedia processing. --- # Video Processing Specialist You are a video engineering specialist with expertise in multimedia processing. ## Core Responsibilities ### 1. Video Summarization When summarizing videos, use timestamps for every key point: - Format: `[MM:SS] - Description of what happens` - Include speaker names if identifiable - Note any visual elements that text cannot capture ### 2. Screenshot Extraction For screenshot requests: - Call the local Python script: `scripts/extract_frames.py` - Default to 1 frame per second unless specified - Save with descriptive filenames: `scene_{timestamp}_{description}.png` ### 3. Subtitle Generation - Use SRT format by default - Include proper timing codes - Break lines at natural speech pauses ## Output Format - All final deliverables should be in **GitHub-flavored Markdown** - Include a summary section at the top - Provide download links to generated assets ## References - See `references/video_codec_guide.md` for technical specifications
Important: The metadata uses YAML frontmatter (
---), not six dashes. This is the official format per Anthropic's Agent Skills documentation. Thedescriptionfield should be detailed and include trigger keywords. This is what Claude scans first, keeping initial token costs near zero.
Your description field is critical for skill activation. Here are effective patterns:
Good descriptions (specific, action-oriented):
Poor descriptions (vague, passive):
Understanding the "Brain vs. Hands" distinction is vital for a robust architecture.
| Component | Role | When it Loads | Token Cost | Benefit |
|---|---|---|---|---|
| Prompt | Context | Always | Low (~500-1,000 tokens) | Sets the baseline persona and current state |
| Agent Skills | The Brain | On-Demand | Medium (~1,000-5,000 tokens) | Handles complex logic and specialized rules |
| MCP | The Hands | On Call | Variable | Provides standardized "hooks" to external apps |
User request: "Audit my latest GitHub PR and post the results to Slack"
Execution flow:
code-reviewer) is triggered by the word "audit" → loads security checklist, style guide enforcement rulesgithub.getPullRequest() → fetches the PR diffslack.postMessage() → posts the audit resultsWithout this separation:
With the Trinity:
Use Prompt for:
Use Agent Skills for:
Use MCP for:
The beauty of the anthropic claude agent skills standard is that it works across your entire dev environment.
Claude Code is Anthropic's official agentic coding tool that runs in your terminal. With KeywordsAI integration, you get full observability into every skill activation, thinking block, and tool call.
Step-by-step setup:
npm install -g @anthropic-ai/claude-code
mkdir -p ~/.claude/skills cd ~/.claude/skills
Add your SKILL.md files here. Claude Code will auto-discover them.
To capture every skill activation in KeywordsAI, add the observability hook:
Download the hook script:
# Create hooks directory mkdir -p ~/.claude/hooks # Download KeywordsAI hook curl -o ~/.claude/hooks/keywordsai_hook.py \ https://raw.githubusercontent.com/Keywords-AI/keywordsai-example-projects/main/example_scripts/python/claude_code/keywordsai_hook.py
Set environment variables (add to .bashrc, .zshrc, or PowerShell $PROFILE):
export KEYWORDSAI_API_KEY="your-api-key" export TRACE_TO_KEYWORDSAI="true" # Optional: Enable debug logging export CC_KEYWORDSAI_DEBUG="true"
Configure Claude Code settings at ~/.claude/settings.json:
{ "hooks": { "Stop": [ { "hooks": [ { "type": "command", "command": "python ~/.claude/hooks/keywordsai_hook.py" } ] } ] } }
With KeywordsAI observability, every Claude Code conversation is traced:
| Data Captured | Description |
|---|---|
| Skill activations | Which skills were triggered and why |
| Thinking blocks | Extended thinking content |
| Tool calls | File reads, writes, bash commands |
| Token usage | Prompt, completion, and cache tokens |
| Timing | Skill load time and execution latency |
| Hierarchical traces | Parent-child relationships between spans |
After setup, your Claude Code traces appear in KeywordsAI with full hierarchy:
claudecode_abc123_turn_1 (2.5s) ├── Skill: video-processor (0.8s) - "Detected video processing request" ├── Tool: Read (0.1s) - {"path": "scripts/extract_frames.py"} ├── Thinking (0.5s) - "I'll extract frames at 1 fps..." ├── Tool: Bash (1.0s) - "python scripts/extract_frames.py input.mp4" └── Token usage: 1,234 prompt / 567 completion / 200 cache
For complete setup details, see Claude Code Observability with KeywordsAI.
Cursor has native support for agent skills in the nightly build:
Step-by-step setup:
Enable Nightly Channel:
Create Skills Directory:
mkdir -p .claude/skills cd .claude/skills
Add Your Skills:
SKILL.md files with proper YAML frontmatterVerify Discovery:
Usage:
@Agent in chat to invoke agentic modeCursor + KeywordsAI: While Cursor doesn't yet have the same observability hooks as Claude Code, you can capture Cursor agent traces using the Cursor Agent Tracing setup.
Pro tip: Skills work best with @Agent mode rather than inline chat. Agent mode has higher context limits and better tool-calling support.
For those building their own wrappers via the Anthropic API, you'll need to manually implement skill discovery and disclosure.
Required API headers:
anthropic-version: 2023-06-01 anthropic-beta: skills-2025-10-02,code-execution-2025-08-25
Implementation pseudocode:
import anthropic import os import glob def load_skill_metadata(skills_dir=".claude/skills"): """Scan for SKILL.md files and extract metadata""" skills = [] for skill_path in glob.glob(f"{skills_dir}/*/SKILL.md"): with open(skill_path) as f: content = f.read() # Extract metadata between ------ metadata_section = content.split("------")[1] # Parse name and description # ... parsing logic ... skills.append({ "name": name, "description": description, "full_path": skill_path }) return skills def should_disclose_skill(skill, user_message, conversation_history): """ Determine if skill should be loaded based on: - Keyword matching in description - Semantic similarity - Explicit user request """ # Simple keyword matching keywords = extract_keywords(skill["description"]) if any(kw in user_message.lower() for kw in keywords): return True # More sophisticated: use embeddings # similarity = cosine_similarity( # embed(skill["description"]), # embed(user_message) # ) # return similarity > 0.7 return False def build_system_prompt(base_prompt, active_skills): """Construct the system prompt with disclosed skills""" prompt_parts = [base_prompt] for skill in active_skills: with open(skill["full_path"]) as f: content = f.read() # Extract everything after second ------ instructions = content.split("------")[2] prompt_parts.append(f"\n\n# {skill['name']} Skill\n{instructions}") return "\n".join(prompt_parts) # Usage client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY")) # 1. Load skill metadata (do this once at startup) available_skills = load_skill_metadata() # 2. For each user message, decide which skills to disclose user_message = "Can you review this pull request for security issues?" active_skills = [ skill for skill in available_skills if should_disclose_skill(skill, user_message, []) ] # 3. Build the final system prompt base_prompt = "You are a helpful coding assistant." system_prompt = build_system_prompt(base_prompt, active_skills) # 4. Make the API call message = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4096, system=system_prompt, messages=[ {"role": "user", "content": user_message} ] )
Key considerations for production:
A major question for DevRel and Engineering teams is: How do we manage and monitor these skills in production?
This is where KeywordsAI becomes essential. It provides both a management layer for your skills and full observability into how they're being used.
The Problem: In the basic setup, skills live in .claude/skills/ within your codebase. But what if you want to:
The Solution: Host your skill instructions in KeywordsAI Prompt Management.
Setup:
Create a Skill Registry in KeywordsAI:
bash1# Instead of reading from file: 2# skill_content = open(".claude/skills/video-processor/SKILL.md").read() 3 4# Fetch from KeywordsAI: 5import requests 6 7response = requests.get( 8 "https://api.keywordsai.co/api/prompts/video-processor", 9 headers={"Authorization": f"Bearer {KEYWORDS_AI_API_KEY}"} 10) 11skill_content = response.json()["content"]
Benefits:
A/B Testing Skills:
python1# KeywordsAI can serve different skill versions to different users 2response = requests.get( 3 "https://api.keywordsai.co/api/prompts/video-processor", 4 headers={ 5 "Authorization": f"Bearer {KEYWORDS_AI_API_KEY}", 6 "X-User-ID": user_id # KeywordsAI handles A/B assignment 7 } 8)
The Problem: In an agentic workflow, a single user prompt might trigger:
How do you debug when something goes wrong? How do you know if a skill is working effectively?
The Solution: KeywordsAI Observability.
1. Skill Activation Tracking
KeywordsAI automatically detects when agent skills are activated:
{ "trace_id": "claudecode_abc123_turn_1", "workflow_name": "claudecode_abc123", "thread_id": "claudecode_abc123", "spans": [ { "span_id": "span_001", "span_type": "agent", "name": "User Request", "prompt_messages": [{"role": "user", "content": "Extract frames from video.mp4"}], "completion": "I'll use the video-processor skill...", "children": ["span_002", "span_003"] }, { "span_id": "span_002", "span_type": "generation", "name": "Skill: video-processor", "prompt_messages": [{"role": "system", "content": "# Video Processing Specialist\n..."}], "metadata": { "skill_name": "video-processor", "skill_loaded": true, "load_time_ms": 45, "token_count": 3240 } }, { "span_id": "span_003", "span_type": "tool", "name": "Tool: Bash", "input": {"command": "python scripts/extract_frames.py video.mp4"}, "output": "Extracted 240 frames to output/", "latency_ms": 1200 } ], "total_tokens": { "prompt": 1234, "completion": 567, "cache_creation": 200, "cache_read": 3000 } }
Key metrics tracked:
2. Prompt Expansion Analysis
See exactly when and why skills were disclosed:
{ "request_id": "req_abc123", "user_message": "Review this PR for security issues", "skill_metadata_scanned": [ {"name": "video-processor", "matched": false}, {"name": "code-reviewer", "matched": true, "reason": "keyword: 'review'"}, {"name": "data-analyst", "matched": false} ], "skills_disclosed": [ { "name": "code-reviewer", "token_count": 3240, "disclosure_time_ms": 45 } ], "total_prompt_tokens": 3740 }
2. Cost & Token Monitoring
Track the efficiency gains from progressive disclosure:
| Metric | Without Skills (God Prompt) | With Skills (Progressive) | Savings |
|---|---|---|---|
| Average Tokens/Request | 18,500 | 4,200 | 77% |
| Cost/Request | $0.37 | $0.08 | 78% |
| Latency (ms) | 3,200 | 1,100 | 66% |
KeywordsAI tracks these metrics automatically and shows you the ROI of your skill architecture.
3. Tool Success Rate
If your skills use local scripts (like extract_frames.py), KeywordsAI can monitor their success:
{ "skill": "video-processor", "tool_call": "scripts/extract_frames.py", "status": "failed", "error": "FileNotFoundError: video.mp4 not found", "stack_trace": "...", "timestamp": "2026-01-14T10:30:00Z" }
4. Skill Performance Analytics
KeywordsAI dashboard shows:
Step 1: Install the SDK
pip install keywordsai
Step 2: Wrap Your Agent
from keywordsai import KeywordsAI # Initialize kai = KeywordsAI(api_key=os.environ["KEYWORDS_AI_API_KEY"]) # Wrap your agent function @kai.trace_agent(name="code-review-agent") def process_user_request(user_message): # Your existing skill discovery logic active_skills = discover_skills(user_message) # Log skill disclosure kai.log_skill_disclosure( skills=[s["name"] for s in active_skills], trigger=user_message ) # Build prompt with skills system_prompt = build_system_prompt(base_prompt, active_skills) # Make LLM call (automatically traced) response = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=4096, system=system_prompt, messages=[{"role": "user", "content": user_message}] ) return response # All calls are now traced in KeywordsAI result = process_user_request("Review this PR for security issues")
Step 3: View in Dashboard
Navigate to the KeywordsAI dashboard to see:
Once you have observability, you can optimize:
1. Skill Description Tuning
If a skill has low activation when it should trigger:
description field is too vague2. Instruction Pruning
If a skill is disclosed often but rarely helps:
3. Skill Splitting
If one skill handles multiple unrelated tasks:
Example:
code-reviewer (8,000 tokens, handles security + style + performance)security-auditor (2,500 tokens), style-enforcer (1,800 tokens), performance-analyzer (3,200 tokens)Understanding the official Agent Skills specification is crucial for building portable, interoperable skills that work across all platforms.
Agent Skills use progressive disclosure with three distinct loading levels:
The YAML frontmatter is loaded at startup and included in the system prompt:
--- name: pdf-processing description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction. ---
Token cost: ~100 tokens per skill
When loaded: At agent startup
Impact: You can install dozens of skills with minimal context penalty
The main body of SKILL.md contains procedural knowledge:
# PDF Processing Specialist ## Quick Start Use pdfplumber to extract text from PDFs: import pdfplumber with pdfplumber.open("document.pdf") as pdf: text = pdf.pages[0].extract_text() For advanced form filling, see [FORMS.md](FORMS.md).
Token cost: Under 5,000 tokens
When loaded: When skill description matches user request
Impact: Only relevant skills consume context
Additional files are accessed via filesystem:
pdf-skill/ ├── SKILL.md # Main instructions ├── FORMS.md # Form-filling guide (loaded only when referenced) ├── REFERENCE.md # Detailed API docs (loaded only when needed) └── scripts/ └── fill_form.py # Executed via bash (code never loads into context)
Token cost: Effectively unlimited
When loaded: Only when explicitly referenced
Impact: Scripts execute without consuming context; documentation files load on-demand
Agent Skills run in a code execution environment where Claude has:
cat, ls, etc.Example loading sequence:
pdf-processing - Extract text and tables from PDF files...bash: cat pdf-skill/SKILL.md → Instructions loadedPer the official specification:
Required fields:
name: Lowercase letters, numbers, hyphens only (max 64 chars)description: Clear description of what the skill does and when to use it (max 1,024 chars)Prohibited in both fields:
Description best practices:
❌ Too vague: "For video tasks"
✅ Specific and trigger-rich: "Triggers when user mentions: video, mp4, webm, summarize footage, extract frames, generate subtitles, or multimedia processing"
Understanding when to use each tool:
| Feature | System Prompt | Agent Skills | MCP |
|---|---|---|---|
| Purpose | Current context | Specialized expertise | External tool access |
| When loaded | Always | On-demand | On-call |
| Token cost | ~500-1,000 | ~100-5,000 | Metadata only |
| Best for | Session state | Domain knowledge | API integrations |
| Example | "You are helpful" | "Video processing workflows" | "GitHub API connector" |
Use System Prompt for: User preferences, current file context, session state
Use Agent Skills for: Reusable workflows, domain expertise, complex multi-step processes
Use MCP for: External service connections (Slack, GitHub, databases)
The Agent Skills standard ensures write once, use everywhere:
✅ Works in:
skill_id parameter)Same SKILL.md format across all platforms without requiring platform-specific modifications.
Some tasks require multiple skills in sequence:
# User: "Analyze this video and generate a report" # Chain: video-processor → data-analyst → report-generator def handle_complex_task(user_message): # Phase 1: Identify required skills skill_chain = plan_skill_chain(user_message) # Result: ["video-processor", "data-analyst", "report-generator"] # Phase 2: Execute in order context = {} for skill_name in skill_chain: skill = load_skill(skill_name) result = execute_skill(skill, context) context[skill_name] = result # Pass results to next skill return context["report-generator"]
KeywordsAI tracing will show this as a linked chain of requests, making it easy to debug multi-stage workflows.
If a skill fails, have a backup:
def execute_with_fallback(primary_skill, fallback_skill, context): try: return execute_skill(primary_skill, context) except Exception as e: kai.log_skill_failure(primary_skill, error=str(e)) return execute_skill(fallback_skill, context)
Different users may need different skills:
def load_user_skills(user_id): base_skills = load_skill_metadata(".claude/skills") # Check user permissions user_permissions = get_user_permissions(user_id) # Filter skills by permission allowed_skills = [ skill for skill in base_skills if skill["name"] in user_permissions["allowed_skills"] ] return allowed_skills
KeywordsAI can track per-user skill usage and help you understand which roles need which capabilities.
✅ Skill Design:
--- not ------)name: lowercase, hyphens, numbers only (max 64 chars)description: clear, specific, max 1,024 chars✅ Performance:
✅ Maintenance:
SKILL.v2.md)✅ Security:
✅ Observability:
The shift from monolithic prompts to Anthropic Agent SDK Skills represents a fundamental change in how we build with LLMs. By adopting progressive disclosure, you gain:
The combination of Agent Skills for the brain, MCP for the hands, and KeywordsAI for the observability gives you a production-ready stack for building sophisticated AI systems.
Week 1: Set up your first skill
.claude/skills/ directoryWeek 2: Integrate KeywordsAI
Week 3: Expand and optimize
Month 2+: Production at scale
Official Documentation:
KeywordsAI Resources:
Community & Examples:
This guide covered:
Ready to build the future of AI applications? Start with your first skill today, and use KeywordsAI to ensure it's production-ready from day one.


