danielhuber.dev@proton.me Sunday, February 22, 2026

Skills Pattern

A filesystem-based approach to tool management that achieves 98% token savings by loading tool definitions on-demand rather than sending all tools on every request.


February 18, 2026

Traditional function calling requires sending every tool definition with every API request. For a capable agent with 50 specialized tools, each definition averaging 3,000 tokens, that is 150,000 tokens of overhead per request — before any user message, context, or reasoning has been included. This scaling problem makes complex agents prohibitively expensive, and it gets worse as you add capabilities. The Skills Pattern solves this by treating tools as files on disk that are loaded on demand, rather than static definitions passed wholesale with every call.

The Problem: Context Bloat from Tools

The core insight is simple: an agent almost never needs all of its tools simultaneously. A user asking to “search the web for recent news about AI” needs the web-search skill. They do not need the code-review skill, the data-analysis skill, or the email-composer skill. Sending all of those definitions with every request wastes tokens, inflates latency, and degrades reasoning quality by crowding the context with irrelevant information.

The Skills Pattern solves this by giving the agent access to a skills/ directory. The agent reads skill files as needed — exactly like a developer reads documentation — rather than having everything preloaded into memory whether it is relevant or not.

50 tools × 3,000 tokens/tool = 150,000 tokens/request (traditional)
50 skills × 50 tokens/metadata + 1 skill × 1,000 tokens = 3,500 tokens (skills pattern)
Savings: 97.7%

Three Pillars of the Skills Pattern

Skills Pattern Architecture
┌─────────────────────────────────────────────────────────────┐
│                     Skills Pattern                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. FILESYSTEM AS TOOL STORAGE                              │
│     skills/                                                 │
│     ├── web-search/SKILL.md                                 │
│     ├── code-review/SKILL.md                                │
│     └── data-analysis/SKILL.md                              │
│                                                             │
│  2. PROGRESSIVE DISCLOSURE                                  │
│     ┌──────────┐   ┌───────────────┐   ┌──────────────┐    │
│     │ Metadata │ → │ Instructions  │ → │  Examples    │    │
│     │ ~50 tok  │   │ ~1000 tok     │   │ ~2000 tok    │    │
│     └──────────┘   └───────────────┘   └──────────────┘    │
│         ↑                  ↑                  ↑             │
│      Always            On select          If complex        │
│                                                             │
│  3. DATABASE-BACKED DISCOVERY (Optional)                    │
│     ┌─────────────┐                                         │
│     │ Vector DB   │  ← Embed skill descriptions             │
│     │ (Chroma,    │  ← Semantic search for relevance        │
│     │  Qdrant)    │  ← Skip metadata scanning               │
│     └─────────────┘                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

1. Filesystem as Tool Storage

Skills are organized as directories, each containing a SKILL.md file with YAML frontmatter for metadata and a markdown body for instructions. The agent can list, read, and navigate these files using standard filesystem tools — no special API required. This makes skills introspectable, version-controllable, and user-extensible without code changes.

The following shows how to scan the filesystem and load skill metadata:

from pathlib import Path
from dataclasses import dataclass
import yaml

@dataclass
class SkillMetadata:
    name: str
    description: str
    triggers: list[str]
    tools_required: list[str]

def discover_skills(skills_dir: Path) -> dict[str, SkillMetadata]:
    """Scan filesystem to discover available skills."""
    skills = {}

    for skill_path in skills_dir.iterdir():
        if not skill_path.is_dir():
            continue

        skill_file = skill_path / "SKILL.md"
        if not skill_file.exists():
            continue

        # Parse SKILL.md frontmatter
        content = skill_file.read_text()
        metadata = parse_skill_frontmatter(content)

        skills[skill_path.name] = SkillMetadata(
            name=metadata.get("name", skill_path.name),
            description=metadata.get("description", ""),
            triggers=metadata.get("triggers", []),
            tools_required=metadata.get("tools", [])
        )

    return skills

def parse_skill_frontmatter(content: str) -> dict:
    """Extract YAML frontmatter from SKILL.md."""
    if not content.startswith("---"):
        return {}
    end_idx = content.find("---", 3)
    if end_idx == -1:
        return {}
    frontmatter = content[3:end_idx].strip()
    return yaml.safe_load(frontmatter)

Each SKILL.md file follows a standard format. The frontmatter captures everything needed for skill selection — name, description, trigger phrases, required tools, and a token estimate for prioritization. The markdown body provides the full instructions that get loaded only when the skill is selected.

The triggers field deserves particular attention. These are phrases that help the agent quickly match user requests to relevant skills without reading the full instructions. Specific, domain-relevant trigger phrases dramatically improve selection accuracy. Overly generic triggers like “help me” match everything and should be avoided.

2. Progressive Disclosure

Not all skill information is needed for every request. Progressive disclosure loads context in stages, adding only what is necessary at each step.

Three stages of progressive skill loading
StageContentTokensWhen Loaded
1. MetadataName, description, triggers~50/skillAlways (for selection)
2. InstructionsFull SKILL.md body~500–1000After skill selected
3. ResourcesExamples, templates, schemasVariableOnly for complex tasks
from dataclasses import dataclass
from enum import Enum
from pathlib import Path

class DisclosureLevel(Enum):
    METADATA = 1     # Name, description, triggers (~50 tokens)
    INSTRUCTIONS = 2 # Full SKILL.md body (~500-1000 tokens)
    RESOURCES = 3    # Examples, templates (~variable)

@dataclass
class SkillContext:
    name: str
    level: DisclosureLevel
    content: str
    token_count: int

class ProgressiveSkillLoader:
    def __init__(self, skills_dir: Path):
        self.skills_dir = skills_dir
        self._metadata_cache: dict[str, dict] = {}

    def get_skill_list(self) -> list[dict]:
        """Stage 1: Return minimal metadata for all skills."""
        skills = []
        for skill_path in self.skills_dir.iterdir():
            if not skill_path.is_dir():
                continue
            metadata = self._load_metadata(skill_path.name)
            skills.append({
                "name": metadata["name"],
                "description": metadata["description"][:100],  # Truncate
                "triggers": metadata.get("triggers", [])[:5]   # Limit
            })
        return skills

    def load_instructions(self, skill_name: str) -> SkillContext:
        """Stage 2: Load full instructions on demand."""
        skill_file = self.skills_dir / skill_name / "SKILL.md"
        content = skill_file.read_text()
        body = self._extract_body(content)
        return SkillContext(
            name=skill_name,
            level=DisclosureLevel.INSTRUCTIONS,
            content=body,
            token_count=self._estimate_tokens(body)
        )

    def load_resources(self, skill_name: str) -> SkillContext:
        """Stage 3: Load examples and additional resources."""
        examples_dir = self.skills_dir / skill_name / "examples"
        resources = []
        if examples_dir.exists():
            for example_file in examples_dir.iterdir():
                resources.append(example_file.read_text())
        combined = "\n---\n".join(resources)
        return SkillContext(
            name=skill_name,
            level=DisclosureLevel.RESOURCES,
            content=combined,
            token_count=self._estimate_tokens(combined)
        )

    def _load_metadata(self, skill_name: str) -> dict:
        if skill_name not in self._metadata_cache:
            skill_file = self.skills_dir / skill_name / "SKILL.md"
            content = skill_file.read_text()
            self._metadata_cache[skill_name] = parse_skill_frontmatter(content)
        return self._metadata_cache[skill_name]

# Usage in agent
class SkillAwareAgent:
    def __init__(self, loader: ProgressiveSkillLoader):
        self.loader = loader

    def process(self, query: str) -> str:
        # Stage 1: Select skill from metadata
        skill_list = self.loader.get_skill_list()
        selected = self.llm.select_skill(query, skill_list)

        if not selected:
            return self.llm.respond_without_skill(query)

        # Stage 2: Load instructions
        context = self.loader.load_instructions(selected)

        # Stage 3: Load examples for complex tasks
        if self.is_complex_task(query):
            resources = self.loader.load_resources(selected)
            context.content += "\n\n" + resources.content

        return self.llm.respond_with_context(query, context.content)

The token savings compound quickly. With 50 skills and the progressive approach, a typical request uses roughly 3,500 tokens instead of 150,000 — a 97.7% reduction. The vector search variant skips the metadata scan entirely, reducing this further to about 3,000 tokens.

3. Database-Backed Tool Discovery

For skill libraries with 50 or more skills, scanning metadata files on every request becomes slow. Vector databases enable instant semantic search: embed each skill’s description and triggers once at index time, then at runtime query with the user’s message to find the closest match without touching the filesystem.

Vector-Based Skill Discovery
User Query: "help me analyze this spreadsheet"
                  │
                  ▼
          ┌───────────────┐
          │ Embed Query   │
          │ (384-dim vec) │
          └───────────────┘
                  │
                  ▼
    ┌─────────────────────────┐
    │     Vector Database     │
    │  ┌─────────────────┐   │
    │  │ data-analysis   │●──┼── 0.92 similarity
    │  │ visualization   │●──┼── 0.78 similarity
    │  │ web-search      │●──┼── 0.31 similarity
    │  │ code-review     │●──┼── 0.22 similarity
    │  └─────────────────┘   │
    └─────────────────────────┘
                  │
                  ▼
      Top match: data-analysis
      Load: skills/data-analysis/SKILL.md
import chromadb
from chromadb.utils import embedding_functions
from dataclasses import dataclass

@dataclass
class SkillMatch:
    name: str
    description: str
    score: float
    tools: list[str]

class VectorSkillDiscovery:
    def __init__(self, persist_dir: str = "./skill_vectors"):
        self.client = chromadb.PersistentClient(path=persist_dir)

        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="all-MiniLM-L6-v2"
        )

        self.collection = self.client.get_or_create_collection(
            name="skills",
            embedding_function=self.embedding_fn,
            metadata={"hnsw:space": "cosine"}
        )

    def index_skill(self, skill: dict) -> None:
        """Add or update a skill in the vector database."""
        text = f"{skill['name']}: {skill['description']}"
        if skill.get('triggers'):
            text += f" Triggers: {', '.join(skill['triggers'])}"

        self.collection.upsert(
            ids=[skill['name']],
            documents=[text],
            metadatas=[{
                "name": skill['name'],
                "description": skill['description'],
                "tools": ",".join(skill.get('tools', [])),
                "token_estimate": skill.get('token_estimate', 0)
            }]
        )

    def find_skills(
        self,
        query: str,
        top_k: int = 3,
        min_score: float = 0.5
    ) -> list[SkillMatch]:
        """Find most relevant skills for a query."""
        results = self.collection.query(
            query_texts=[query],
            n_results=top_k,
            include=["documents", "metadatas", "distances"]
        )

        matches = []
        for i, distance in enumerate(results['distances'][0]):
            score = 1 - distance
            if score < min_score:
                continue
            metadata = results['metadatas'][0][i]
            matches.append(SkillMatch(
                name=metadata['name'],
                description=metadata['description'],
                score=score,
                tools=metadata['tools'].split(',') if metadata['tools'] else []
            ))

        return matches

    def reindex_all(self, skills_dir) -> int:
        """Reindex all skills from filesystem."""
        count = 0
        for skill_path in skills_dir.iterdir():
            if not skill_path.is_dir():
                continue
            skill_file = skill_path / "SKILL.md"
            if not skill_file.exists():
                continue
            metadata = parse_skill_frontmatter(skill_file.read_text())
            metadata['name'] = skill_path.name
            self.index_skill(metadata)
            count += 1
        return count
Discovery approaches compared
ApproachProsConsBest For
Keyword/TriggerSimple, fast, no dependenciesMisses synonyms, brittle<20 skills
LLM SelectionUnderstands intentExtra API call, latency20–50 skills
Vector SearchSemantic matching, fastRequires embedding model50+ skills
HybridBest accuracyMost complexProduction systems

Evaluation

Skill discovery quality should be measured with explicit test cases that have ground-truth skill assignments. The key metrics are precision at rank 1 (is the top result correct?) and mean reciprocal rank (how high does the correct skill appear on average?). Latency matters too — vector search should complete in under 50ms.

from dataclasses import dataclass
import time

@dataclass
class TestCase:
    query: str
    relevant_skills: set[str]

@dataclass
class EvaluationResult:
    precision_at_1: float
    precision_at_3: float
    recall_at_3: float
    avg_latency_ms: float
    mrr: float  # Mean Reciprocal Rank

def evaluate_skill_discovery(
    test_cases: list[TestCase],
    discovery
) -> EvaluationResult:
    """Evaluate skill discovery accuracy and performance."""

    p1_scores, p3_scores, recall_scores = [], [], []
    latencies, reciprocal_ranks = [], []

    for case in test_cases:
        start = time.perf_counter()
        results = discovery.find_skills(case.query, top_k=3)
        latencies.append((time.perf_counter() - start) * 1000)

        result_names = [r.name for r in results]

        # Precision@1
        p1_scores.append(1.0 if result_names[0] in case.relevant_skills else 0.0)

        # Precision@3
        hits = sum(1 for r in result_names[:3] if r in case.relevant_skills)
        p3_scores.append(hits / 3)

        # Recall@3
        recall_scores.append(hits / len(case.relevant_skills))

        # Mean Reciprocal Rank
        for i, name in enumerate(result_names):
            if name in case.relevant_skills:
                reciprocal_ranks.append(1.0 / (i + 1))
                break
        else:
            reciprocal_ranks.append(0.0)

    return EvaluationResult(
        precision_at_1=sum(p1_scores) / len(p1_scores),
        precision_at_3=sum(p3_scores) / len(p3_scores),
        recall_at_3=sum(recall_scores) / len(recall_scores),
        avg_latency_ms=sum(latencies) / len(latencies),
        mrr=sum(reciprocal_ranks) / len(reciprocal_ranks)
    )
Key metrics for skill discovery evaluation
MetricWhat it MeasuresTarget
Precision@1Is the top result the right skill?>90%
Precision@3How many of top 3 are relevant?>80%
Mean Reciprocal RankHow high is the correct skill ranked?>0.85
LatencyTime to find relevant skill(s)<50ms
False Positive RateSkills selected but not relevant<5%

Common Pitfalls

Tags: skillstool-managementtoken-savings