Skip to main content

4 posts tagged with "agents"

View All Tags

How Do You Know If a Skill Is Any Good? LLM-as-Judge Scoring

· 13 min read
Manny Silva
Creator of Docs as Tests and Doc Detective | Head of Docs as Skyflow

How Do You Know If a Skill Is Any Good? LLM-as-Judge Scoring banner

Last time, I walked through writing skills that agents can actually execute and introduced skill-validator as a way to catch structural and content issues before an agent ever sees the skill. At the end, I mentioned that skill-validator also supports LLM-as-judge scoring across dimensions like clarity, actionability, token efficiency, and novelty—and promised to dig into that.

This is that post.

Writing Skills That Agents Can Actually Execute

· 10 min read
Manny Silva
Creator of Docs as Tests and Doc Detective | Head of Docs as Skyflow

Writing Skills That Agents Can Actually Execute banner

First, I argued that agent configurations are documentation. Next, I made the case that they're specifically internal documentation and should be managed that way. Both times I covered five doc types: project descriptions, agent definitions, orchestration patterns, skills, and plans/specs.

Of those five, skills are the hardest to write well. Let's walk through how I handled writing and validating skills for Doc Detective's agent tools.

Your Agent Configs Are Internal Docs. Manage Them That Way.

· 10 min read
Manny Silva
Creator of Docs as Tests and Doc Detective | Head of Docs as Skyflow

Agentic docs are internal documentation banner

A few months into working with AI agents on a documentation project, I'd noticed some inconsistency in agent behaviors and decided to do some digging. Turns out the AGENTS.md file in our repo — the one telling agents how to behave, where things were, and what to escalate — had grown to over 800 lines, and a few people (or likely their agents) had added rules independently, some subtly contradicting each other.

The agents weren't broken. They were following instructions that didn't serve them well.