Keywords AI

BLOG

The Ultimate Guide to Agentic Workflows: Anthropic Agent SDK Skills, MCP, and KeywordsAI Observability

January 14, 2026

The Ultimate Guide to Agentic Workflows: Anthropic Agent SDK Skills, MCP, and KeywordsAI Observability

This is a comprehensive guide, estimated 12–15 minutes reading. Please save it for later. After you read this article, you'll be an expert in anthropic agent sdk skills, claude agent skills documentation, and production-ready agentic workflows.

Introduction: The End of the God Prompt Era

In the modern LLM development stack, we are moving away from monolithic "God Prompts." Cramming 20,000 tokens into a System Prompt leads to several critical problems:

Lost in the middle: Instructions buried in long prompts get ignored
High latency: Processing massive prompts on every request slows down responses
Token waste: You're paying for context that may never be used
Maintenance nightmare: Updating a monolithic prompt is error-prone

The solution is Agent Skills (part of the Anthropic Agent SDK). It introduces a Progressive Disclosure mechanism: loading metadata first and only injecting deep instructions or local resources when the task specifically requires them.

This isn't just theory. Claude Code, Cursor, and other major AI development tools have already adopted this pattern. By the end of this guide, you'll understand how to implement agent skills in your own systems and monitor them in production with KeywordsAI.

Introduction: The End of the God Prompt Era
Part 1: Foundation – Setting Up Agent Skills
Part 2: The Trinity – Prompt × Agent Skills × MCP
- Real-World Synergy Example
- When to Use Each Component
Part 3: Portability – Setup for Cursor, Claude Code, and Custom Agents
Part 4: KeywordsAI – Hosting and Observability
Part 5: The Official Agent Skills Standard
Part 6: Production Patterns and Best Practices
Conclusion: The Agentic Future is Here
Additional Resources

Part 1: Foundation – Setting Up Agent Skills (The Tutorial)

Understanding the Core Concept

Agent skills use a two-phase loading pattern:

Metadata Phase: The AI scans lightweight descriptions (~50 tokens each)
Disclosure Phase: Only when relevant, the full instructions are loaded (~1,000-5,000 tokens)

This means instead of processing 20,000 tokens on every request, you might process 500 tokens for metadata scanning, and only occasionally expand to include specific skill instructions.

Directory Structure

To get started, your project needs a standardized directory structure. This ensures that tools like Claude Code, Cursor, and your own custom agents can discover your skills.


.claude/
└── skills/
    ├── video-processor/      # Skill 1
    │   ├── SKILL.md          # Core definition (Required)
    │   ├── scripts/          # Local Python/Node scripts
    │   │   └── extract_frames.py
    │   └── references/       # Style guides or technical docs
    │       └── video_codec_guide.md
    │
    ├── code-reviewer/        # Skill 2
    │   ├── SKILL.md
    │   └── references/
    │       ├── style_guide.md
    │       └── security_checklist.md
    │
    └── data-analyst/         # Skill 3
        ├── SKILL.md
        └── scripts/
            └── query_builder.py

Creating Your First `SKILL.md`

Per the official Agent Skills specification, your SKILL.md uses YAML frontmatter for metadata, followed by Markdown instructions.


---
name: video-processor
description: Triggers when the user asks to summarize video content, generate subtitles, or take automated screenshots. Use when user mentions videos, MP4, WebM, or multimedia processing.
---

# Video Processing Specialist

You are a video engineering specialist with expertise in multimedia processing.

## Core Responsibilities

### 1. Video Summarization
When summarizing videos, use timestamps for every key point:
- Format: `[MM:SS] - Description of what happens`
- Include speaker names if identifiable
- Note any visual elements that text cannot capture

### 2. Screenshot Extraction
For screenshot requests:
- Call the local Python script: `scripts/extract_frames.py`
- Default to 1 frame per second unless specified
- Save with descriptive filenames: `scene_{timestamp}_{description}.png`

### 3. Subtitle Generation
- Use SRT format by default
- Include proper timing codes
- Break lines at natural speech pauses

## Output Format
- All final deliverables should be in **GitHub-flavored Markdown**
- Include a summary section at the top
- Provide download links to generated assets

## References
- See `references/video_codec_guide.md` for technical specifications

Important: The metadata uses YAML frontmatter (---), not six dashes. This is the official format per Anthropic's Agent Skills documentation. The description field should be detailed and include trigger keywords. This is what Claude scans first, keeping initial token costs near zero.

Best Practices for Skill Descriptions

Your description field is critical for skill activation. Here are effective patterns:

Good descriptions (specific, action-oriented):

✅ "Triggers when the user asks to summarize video content, generate subtitles, or take automated screenshots"
✅ "Activates for code review requests, security audits, or style guide enforcement"
✅ "Handles data analysis tasks including SQL queries, CSV processing, and statistical analysis"

Poor descriptions (vague, passive):

❌ "For working with videos"
❌ "Code-related tasks"
❌ "Data stuff"

Part 2: The Trinity – Prompt × Agent Skills × MCP

Understanding the "Brain vs. Hands" distinction is vital for a robust architecture.

Component	Role	When it Loads	Token Cost	Benefit
Prompt	Context	Always	Low (~500-1,000 tokens)	Sets the baseline persona and current state
Agent Skills	The Brain	On-Demand	Medium (~1,000-5,000 tokens)	Handles complex logic and specialized rules
MCP	The Hands	On Call	Variable	Provides standardized "hooks" to external apps

Real-World Synergy Example

User request: "Audit my latest GitHub PR and post the results to Slack"

Execution flow:

Prompt provides baseline context: "You are a helpful coding assistant"
Agent Skill (code-reviewer) is triggered by the word "audit" → loads security checklist, style guide enforcement rules
MCP provides two tools:
- github.getPullRequest() → fetches the PR diff
- slack.postMessage() → posts the audit results

Without this separation:

You'd have to include GitHub API docs + Slack API docs + audit rules in every prompt
Total tokens: ~15,000+
Most of it unused for non-audit queries

With the Trinity:

Base prompt: 500 tokens
Skill activation: +3,000 tokens (only when "audit" is mentioned)
MCP tools: Metadata only until called
Total token savings: 70-80% on average

When to Use Each Component

Use Prompt for:

Session-specific context (current file, user preferences)
High-level personality ("You are concise and technical")
Immediate state ("The user is working in TypeScript")

Use Agent Skills for:

Domain expertise (video processing, code review, data analysis)
Multi-step workflows with specific rules
References to style guides or technical specs

Use MCP for:

External tool access (GitHub, Slack, databases)
File system operations
API calls to third-party services

Part 3: Portability – Setup for Cursor, Claude Code, and Custom Agents

The beauty of the anthropic claude agent skills standard is that it works across your entire dev environment.

Claude Code Setup with KeywordsAI Observability

Claude Code is Anthropic's official agentic coding tool that runs in your terminal. With KeywordsAI integration, you get full observability into every skill activation, thinking block, and tool call.

Step-by-step setup:

1. Install Claude Code

npm install -g @anthropic-ai/claude-code

2. Create Skills Directory


mkdir -p ~/.claude/skills
cd ~/.claude/skills

Add your SKILL.md files here. Claude Code will auto-discover them.

3. Set Up KeywordsAI Observability

To capture every skill activation in KeywordsAI, add the observability hook:

Download the hook script:


# Create hooks directory
mkdir -p ~/.claude/hooks

# Download KeywordsAI hook
curl -o ~/.claude/hooks/keywordsai_hook.py \
  https://raw.githubusercontent.com/Keywords-AI/keywordsai-example-projects/main/example_scripts/python/claude_code/keywordsai_hook.py

Set environment variables (add to .bashrc, .zshrc, or PowerShell $PROFILE):


export KEYWORDSAI_API_KEY="your-api-key"
export TRACE_TO_KEYWORDSAI="true"

# Optional: Enable debug logging
export CC_KEYWORDSAI_DEBUG="true"

Configure Claude Code settings at ~/.claude/settings.json:


{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "python ~/.claude/hooks/keywordsai_hook.py"
          }
        ]
      }
    ]
  }
}

4. What Gets Captured

With KeywordsAI observability, every Claude Code conversation is traced:

Data Captured	Description
Skill activations	Which skills were triggered and why
Thinking blocks	Extended thinking content
Tool calls	File reads, writes, bash commands
Token usage	Prompt, completion, and cache tokens
Timing	Skill load time and execution latency
Hierarchical traces	Parent-child relationships between spans

5. View in KeywordsAI Dashboard

After setup, your Claude Code traces appear in KeywordsAI with full hierarchy:


claudecode_abc123_turn_1 (2.5s)
├── Skill: video-processor (0.8s) - "Detected video processing request"
├── Tool: Read (0.1s) - {"path": "scripts/extract_frames.py"}
├── Thinking (0.5s) - "I'll extract frames at 1 fps..."
├── Tool: Bash (1.0s) - "python scripts/extract_frames.py input.mp4"
└── Token usage: 1,234 prompt / 567 completion / 200 cache

For complete setup details, see Claude Code Observability with KeywordsAI.

Cursor Setup

Cursor has native support for agent skills in the nightly build:

Step-by-step setup:

Enable Nightly Channel:
- Open Cursor Settings → Beta
- Enable "Cursor Nightly"
- Restart Cursor

Create Skills Directory:


mkdir -p .claude/skills
cd .claude/skills

Add Your Skills:
- Create subdirectories for each skill
- Add SKILL.md files with proper YAML frontmatter
- Cursor auto-discovers skills on startup
Verify Discovery:
- Open Settings → Rules → Agent Skills
- You should see your skills listed
Usage:
- Use @Agent in chat to invoke agentic mode
- Cursor will scan skill metadata automatically
- Skills are disclosed when relevant to your request

Cursor + KeywordsAI: While Cursor doesn't yet have the same observability hooks as Claude Code, you can capture Cursor agent traces using the Cursor Agent Tracing setup.

Pro tip: Skills work best with @Agent mode rather than inline chat. Agent mode has higher context limits and better tool-calling support.

Custom Production Agents

For those building their own wrappers via the Anthropic API, you'll need to manually implement skill discovery and disclosure.

Required API headers:


anthropic-version: 2023-06-01
anthropic-beta: skills-2025-10-02,code-execution-2025-08-25

Implementation pseudocode:


import anthropic
import os
import glob

def load_skill_metadata(skills_dir=".claude/skills"):
    """Scan for SKILL.md files and extract metadata"""
    skills = []
    for skill_path in glob.glob(f"{skills_dir}/*/SKILL.md"):
        with open(skill_path) as f:
            content = f.read()
            # Extract metadata between ------
            metadata_section = content.split("------")[1]
            # Parse name and description
            # ... parsing logic ...
            skills.append({
                "name": name,
                "description": description,
                "full_path": skill_path
            })
    return skills

def should_disclose_skill(skill, user_message, conversation_history):
    """
    Determine if skill should be loaded based on:
    - Keyword matching in description
    - Semantic similarity
    - Explicit user request
    """
    # Simple keyword matching
    keywords = extract_keywords(skill["description"])
    if any(kw in user_message.lower() for kw in keywords):
        return True
    
    # More sophisticated: use embeddings
    # similarity = cosine_similarity(
    #     embed(skill["description"]),
    #     embed(user_message)
    # )
    # return similarity > 0.7
    
    return False

def build_system_prompt(base_prompt, active_skills):
    """Construct the system prompt with disclosed skills"""
    prompt_parts = [base_prompt]
    
    for skill in active_skills:
        with open(skill["full_path"]) as f:
            content = f.read()
            # Extract everything after second ------
            instructions = content.split("------")[2]
            prompt_parts.append(f"\n\n# {skill['name']} Skill\n{instructions}")
    
    return "\n".join(prompt_parts)

# Usage
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

# 1. Load skill metadata (do this once at startup)
available_skills = load_skill_metadata()

# 2. For each user message, decide which skills to disclose
user_message = "Can you review this pull request for security issues?"
active_skills = [
    skill for skill in available_skills
    if should_disclose_skill(skill, user_message, [])
]

# 3. Build the final system prompt
base_prompt = "You are a helpful coding assistant."
system_prompt = build_system_prompt(base_prompt, active_skills)

# 4. Make the API call
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=4096,
    system=system_prompt,
    messages=[
        {"role": "user", "content": user_message}
    ]
)

Key considerations for production:

Caching: Use Anthropic's prompt caching for skill instructions that don't change
Lazy loading: Don't load all skill files into memory; read on demand
Skill versioning: Track which version of a skill was used for a request

Part 4: KeywordsAI – Hosting and Observability

A major question for DevRel and Engineering teams is: How do we manage and monitor these skills in production?

This is where KeywordsAI becomes essential. It provides both a management layer for your skills and full observability into how they're being used.

Challenge 1: Hosting the Skill Logic

The Problem: In the basic setup, skills live in .claude/skills/ within your codebase. But what if you want to:

Update skill instructions without redeploying your application
A/B test different skill versions
Share skills across multiple projects
Track who changed what and when

The Solution: Host your skill instructions in KeywordsAI Prompt Management.

Setup:

Create a Skill Registry in KeywordsAI:

bash
1# Instead of reading from file:
2# skill_content = open(".claude/skills/video-processor/SKILL.md").read()
3
4# Fetch from KeywordsAI:
5import requests
6
7response = requests.get(
8    "https://api.keywordsai.co/api/prompts/video-processor",
9    headers={"Authorization": f"Bearer {KEYWORDS_AI_API_KEY}"}
10)
11skill_content = response.json()["content"]

Benefits:
- Central Registry: All your skills in one place
- Instant Updates: Modify skill logic without code changes
- Version Control: Built-in versioning and rollback
- Access Control: Team permissions for who can edit skills
- Audit Trail: See who changed what and when

A/B Testing Skills:

python
1# KeywordsAI can serve different skill versions to different users
2response = requests.get(
3    "https://api.keywordsai.co/api/prompts/video-processor",
4    headers={
5        "Authorization": f"Bearer {KEYWORDS_AI_API_KEY}",
6        "X-User-ID": user_id  # KeywordsAI handles A/B assignment
7    }
8)

Challenge 2: Full-Stack Tracing (Observability)

The Problem: In an agentic workflow, a single user prompt might trigger:

Skill metadata scanning (5-10 skills checked)
2-3 skills actually disclosed
Multiple MCP tool calls
Several LLM calls as the agent reasons

How do you debug when something goes wrong? How do you know if a skill is working effectively?

The Solution: KeywordsAI Observability.

What KeywordsAI Captures

1. Skill Activation Tracking

KeywordsAI automatically detects when agent skills are activated:


{
  "trace_id": "claudecode_abc123_turn_1",
  "workflow_name": "claudecode_abc123",
  "thread_id": "claudecode_abc123",
  "spans": [
    {
      "span_id": "span_001",
      "span_type": "agent",
      "name": "User Request",
      "prompt_messages": [{"role": "user", "content": "Extract frames from video.mp4"}],
      "completion": "I'll use the video-processor skill...",
      "children": ["span_002", "span_003"]
    },
    {
      "span_id": "span_002",
      "span_type": "generation",
      "name": "Skill: video-processor",
      "prompt_messages": [{"role": "system", "content": "# Video Processing Specialist\n..."}],
      "metadata": {
        "skill_name": "video-processor",
        "skill_loaded": true,
        "load_time_ms": 45,
        "token_count": 3240
      }
    },
    {
      "span_id": "span_003",
      "span_type": "tool",
      "name": "Tool: Bash",
      "input": {"command": "python scripts/extract_frames.py video.mp4"},
      "output": "Extracted 240 frames to output/",
      "latency_ms": 1200
    }
  ],
  "total_tokens": {
    "prompt": 1234,
    "completion": 567,
    "cache_creation": 200,
    "cache_read": 3000
  }
}

Key metrics tracked:

Which skill was activated and why
Skill load time (metadata → full instructions)
Token cost per skill disclosure
Tool calls triggered by the skill
Hierarchical relationship between skill and tools

2. Prompt Expansion Analysis

See exactly when and why skills were disclosed:


{
  "request_id": "req_abc123",
  "user_message": "Review this PR for security issues",
  "skill_metadata_scanned": [
    {"name": "video-processor", "matched": false},
    {"name": "code-reviewer", "matched": true, "reason": "keyword: 'review'"},
    {"name": "data-analyst", "matched": false}
  ],
  "skills_disclosed": [
    {
      "name": "code-reviewer",
      "token_count": 3240,
      "disclosure_time_ms": 45
    }
  ],
  "total_prompt_tokens": 3740
}

2. Cost & Token Monitoring

Track the efficiency gains from progressive disclosure:

Metric	Without Skills (God Prompt)	With Skills (Progressive)	Savings
Average Tokens/Request	18,500	4,200	77%
Cost/Request	$0.37	$0.08	78%
Latency (ms)	3,200	1,100	66%

KeywordsAI tracks these metrics automatically and shows you the ROI of your skill architecture.

3. Tool Success Rate

If your skills use local scripts (like extract_frames.py), KeywordsAI can monitor their success:


{
  "skill": "video-processor",
  "tool_call": "scripts/extract_frames.py",
  "status": "failed",
  "error": "FileNotFoundError: video.mp4 not found",
  "stack_trace": "...",
  "timestamp": "2026-01-14T10:30:00Z"
}

4. Skill Performance Analytics

KeywordsAI dashboard shows:

Activation Rate: How often each skill is triggered
Success Rate: % of skill activations that led to successful task completion
User Satisfaction: Track feedback on skill-powered responses
Token Efficiency: Compare token usage before/after skill adoption

Setting Up KeywordsAI Observability

Step 1: Install the SDK

pip install keywordsai

Step 2: Wrap Your Agent


from keywordsai import KeywordsAI

# Initialize
kai = KeywordsAI(api_key=os.environ["KEYWORDS_AI_API_KEY"])

# Wrap your agent function
@kai.trace_agent(name="code-review-agent")
def process_user_request(user_message):
    # Your existing skill discovery logic
    active_skills = discover_skills(user_message)
    
    # Log skill disclosure
    kai.log_skill_disclosure(
        skills=[s["name"] for s in active_skills],
        trigger=user_message
    )
    
    # Build prompt with skills
    system_prompt = build_system_prompt(base_prompt, active_skills)
    
    # Make LLM call (automatically traced)
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=4096,
        system=system_prompt,
        messages=[{"role": "user", "content": user_message}]
    )
    
    return response

# All calls are now traced in KeywordsAI
result = process_user_request("Review this PR for security issues")

Step 3: View in Dashboard

Navigate to the KeywordsAI dashboard to see:

Real-time request logs
Skill activation heatmap
Cost breakdown by skill
Error tracking for failed tool calls

Advanced: Skill Performance Optimization

Once you have observability, you can optimize:

1. Skill Description Tuning

If a skill has low activation when it should trigger:

Check if the description field is too vague
Add more specific keywords
Use KeywordsAI's "Suggested Triggers" feature to see what users are actually typing

2. Instruction Pruning

If a skill is disclosed often but rarely helps:

Check if token usage indicates instructions are too long
Use KeywordsAI's "Instruction Impact Analysis" to see which parts of the instructions are actually used
Prune unused sections to reduce token waste

3. Skill Splitting

If one skill handles multiple unrelated tasks:

Split into smaller, focused skills
Reduces average disclosure tokens
Improves activation accuracy

Example:

Before: code-reviewer (8,000 tokens, handles security + style + performance)
After: security-auditor (2,500 tokens), style-enforcer (1,800 tokens), performance-analyzer (3,200 tokens)
Result: Average request now discloses only 1-2 relevant skills instead of always loading the mega-skill

Part 5: The Official Agent Skills Standard

Understanding the official Agent Skills specification is crucial for building portable, interoperable skills that work across all platforms.

The Three-Level Loading Architecture

Agent Skills use progressive disclosure with three distinct loading levels:

Level 1: Metadata (Always Loaded)

The YAML frontmatter is loaded at startup and included in the system prompt:


---
name: pdf-processing
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
---

Token cost: ~100 tokens per skill
When loaded: At agent startup
Impact: You can install dozens of skills with minimal context penalty

Level 2: Instructions (Loaded When Triggered)

The main body of SKILL.md contains procedural knowledge:


# PDF Processing Specialist

## Quick Start

Use pdfplumber to extract text from PDFs:

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()

For advanced form filling, see [FORMS.md](FORMS.md).

Token cost: Under 5,000 tokens
When loaded: When skill description matches user request
Impact: Only relevant skills consume context

Level 3: Resources (Loaded As Needed)

Additional files are accessed via filesystem:


pdf-skill/
├── SKILL.md          # Main instructions
├── FORMS.md          # Form-filling guide (loaded only when referenced)
├── REFERENCE.md      # Detailed API docs (loaded only when needed)
└── scripts/
    └── fill_form.py  # Executed via bash (code never loads into context)

Token cost: Effectively unlimited
When loaded: Only when explicitly referenced
Impact: Scripts execute without consuming context; documentation files load on-demand

How Claude Accesses Skills

Agent Skills run in a code execution environment where Claude has:

Filesystem access: Skills exist as directories Claude can navigate
Bash commands: Claude reads files with cat, ls, etc.
Code execution: Scripts run via bash, only output enters context

Example loading sequence:

User: "Extract text from this PDF and summarize it"
Claude sees metadata: pdf-processing - Extract text and tables from PDF files...
Claude triggers skill: bash: cat pdf-skill/SKILL.md → Instructions loaded
Claude reads pdfplumber example from instructions
Claude executes extraction (FORMS.md and REFERENCE.md never loaded because they aren't needed)

SKILL.md Format Requirements

Per the official specification:

Required fields:

name: Lowercase letters, numbers, hyphens only (max 64 chars)
description: Clear description of what the skill does and when to use it (max 1,024 chars)

Prohibited in both fields:

XML tags
Reserved words: "anthropic", "claude"

Description best practices:

❌ Too vague: "For video tasks"

✅ Specific and trigger-rich: "Triggers when user mentions: video, mp4, webm, summarize footage, extract frames, generate subtitles, or multimedia processing"

Skills vs. Prompts vs. MCP

Understanding when to use each tool:

Feature	System Prompt	Agent Skills	MCP
Purpose	Current context	Specialized expertise	External tool access
When loaded	Always	On-demand	On-call
Token cost	~500-1,000	~100-5,000	Metadata only
Best for	Session state	Domain knowledge	API integrations
Example	"You are helpful"	"Video processing workflows"	"GitHub API connector"

Use System Prompt for: User preferences, current file context, session state

Use Agent Skills for: Reusable workflows, domain expertise, complex multi-step processes

Use MCP for: External service connections (Slack, GitHub, databases)

Skill Portability

The Agent Skills standard ensures write once, use everywhere:

✅ Works in:

Claude Code (native support)
Cursor (native support in nightly)
Claude API (via skill_id parameter)
Claude.ai (upload as zip file)
Claude Agent SDK (filesystem-based)

Same SKILL.md format across all platforms without requiring platform-specific modifications.

Part 6: Production Patterns and Best Practices

Pattern 1: Skill Chains

Some tasks require multiple skills in sequence:


# User: "Analyze this video and generate a report"
# Chain: video-processor → data-analyst → report-generator

def handle_complex_task(user_message):
    # Phase 1: Identify required skills
    skill_chain = plan_skill_chain(user_message)
    # Result: ["video-processor", "data-analyst", "report-generator"]
    
    # Phase 2: Execute in order
    context = {}
    for skill_name in skill_chain:
        skill = load_skill(skill_name)
        result = execute_skill(skill, context)
        context[skill_name] = result  # Pass results to next skill
    
    return context["report-generator"]

KeywordsAI tracing will show this as a linked chain of requests, making it easy to debug multi-stage workflows.

Pattern 2: Skill Fallbacks

If a skill fails, have a backup:


def execute_with_fallback(primary_skill, fallback_skill, context):
    try:
        return execute_skill(primary_skill, context)
    except Exception as e:
        kai.log_skill_failure(primary_skill, error=str(e))
        return execute_skill(fallback_skill, context)

Pattern 3: User-Specific Skills

Different users may need different skills:


def load_user_skills(user_id):
    base_skills = load_skill_metadata(".claude/skills")
    
    # Check user permissions
    user_permissions = get_user_permissions(user_id)
    
    # Filter skills by permission
    allowed_skills = [
        skill for skill in base_skills
        if skill["name"] in user_permissions["allowed_skills"]
    ]
    
    return allowed_skills

KeywordsAI can track per-user skill usage and help you understand which roles need which capabilities.

Best Practice Checklist

✅ Skill Design:

Keep skills focused (single responsibility)
Use proper YAML frontmatter format (--- not ------)
Write specific, action-oriented descriptions with trigger keywords
Include examples in the instructions section
Reference external docs sparingly (they add tokens)
Follow official naming conventions:
- name: lowercase, hyphens, numbers only (max 64 chars)
- description: clear, specific, max 1,024 chars
- No XML tags or reserved words ("anthropic", "claude")

✅ Performance:

Use Anthropic's prompt caching for skill instructions
Monitor token usage per skill in KeywordsAI
Set up alerts for skills that consistently exceed token budgets

✅ Maintenance:

Version your skills (include version in filename: SKILL.v2.md)
Document skill dependencies (if skill A needs skill B)
Regularly review skill activation rates and remove unused skills

✅ Security:

Audit what data skills can access
Use environment variables for sensitive config
Log all skill activations for compliance

✅ Observability:

Integrate KeywordsAI from day one
Set up dashboards for your most critical skills
Track user feedback to measure skill effectiveness

Conclusion: The Agentic Future is Here

The shift from monolithic prompts to Anthropic Agent SDK Skills represents a fundamental change in how we build with LLMs. By adopting progressive disclosure, you gain:

Efficiency: 70-85% token savings on average
Performance: Faster response times with targeted context
Maintainability: Update specific skills without touching the entire system
Observability: Full visibility into what your AI is doing (with KeywordsAI)

The combination of Agent Skills for the brain, MCP for the hands, and KeywordsAI for the observability gives you a production-ready stack for building sophisticated AI systems.

Getting Started Today

Week 1: Set up your first skill

Create .claude/skills/ directory
Define 1-2 skills for your most common tasks
Test in Claude Code or Cursor

Week 2: Integrate KeywordsAI

Set up observability
Monitor skill activation patterns
Identify optimization opportunities

Week 3: Expand and optimize

Add 3-5 more skills
Implement skill chains for complex workflows
Use KeywordsAI analytics to refine descriptions

Month 2+: Production at scale

Migrate from local skills to KeywordsAI-hosted registry
Set up A/B testing for skill variations
Build custom integrations with MCP

Additional Resources

Official Documentation:

Agent Skills Specification (GitHub) - The official open standard maintained by Anthropic
Agent Skills Overview (Anthropic Platform) - Comprehensive guide to Agent Skills
Agent Skills Best Practices - Authoring guidelines
Agent Skills Quickstart - Get started tutorial
Model Context Protocol (MCP) Specification - Complementary tool integration standard

KeywordsAI Resources:

KeywordsAI Documentation
Claude Code Observability - Full tracing for Claude Code + Agent Skills
Cursor Agent Tracing - Capture Cursor agent workflows
Traces Ingest API - Send custom traces
Prompt Management API - Host skill logic in KeywordsAI

Community & Examples:

Agent Skills Official Repository - Specification and documentation
Agent Skills Cookbook - Interactive Jupyter notebooks
Claude Code Documentation - Skills in Claude Code
KeywordsAI Discord - Community support
MCP Community Hub - MCP resources and integrations

SEO Keywords Summary

This guide covered:

Anthropic Agent SDK Skills setup and implementation
Claude Agent Skills Documentation best practices
Agent Skills Tutorial from basics to production
MCP integration with Agent Skills
KeywordsAI observability for agentic workflows
Progressive disclosure patterns for efficient LLM usage
Cursor and Claude Code setup for agent skills
Production monitoring and optimization strategies

Ready to build the future of AI applications? Start with your first skill today, and use KeywordsAI to ensure it's production-ready from day one.

Get Started with KeywordsAI →

About Keywords AIKeywords AI is the leading developer platform for LLM applications.

Latest blogs