Skip to main content

How to build a skill that actually works

Writing a custom instructions prompt takes seconds. Building a reusable AI skill that behaves reliably in real use is harder. The difference is in the details: you need a tightly defined problem, a way to test the output, and a clean check to verify the skill behaves as intended.

3 min read

A good skill solves one specific problem that keeps coming back. If you can’t describe that problem in a single sentence, the skill is probably too vague. The more precise you are, the easier it is for the model to do the right thing.

Bad:A git commit skill.

Good:Our commit messages drift away from our prefix and style rules unless I remind the model every time.


Don’t start by writing the whole markdown file from scratch. Use an interactive setup step first. It helps surface the things you might forget to write down.

The interview

The setup asks a few simple questions and pushes you to be more specific.

> /skill-creator

What problem does this skill solve?
> Commit messages drift from our convention unless I remind the model.

When should it trigger?
> Whenever I’m about to run git commit.

What’s the exact convention?
> Imperative subject, under 50 chars, with a prefix like feat or fix, and a body that explains why the change exists.

The draft

That interview can then produce a minimal skill file: a short YAML frontmatter block for triggering, plus a markdown section with the actual rules.

---
name: commit-style
description: Enforce prefix, imperative mood, max 50 chars, wrap body at 72.
---

# Commit Style Guidelines

1. Prefix: feat, fix, chore, or docs.
2. Subject: Imperative mood, max 50 characters, no trailing period.
3. Body: Explain why the change is being made, wrapped at 72 characters.
The description matters a lot. It’s the main signal the model uses to decide whether the skill should load. If it’s vague, the skill may never trigger when you need it.

The best way to know whether the skill works is to test it against actual inputs. Write a small set of pass/fail examples based on the kind of text the skill should handle. That gives you something concrete to check, instead of relying on vibes.

# Evals
- input: "staged a bug fix for checkout redirection"
  expect: prefix with "fix:", imperative mood, under 50 chars.

- input: "merging main into release"
  expect: do NOT run this skill on merge commits.

If the skill fails on these, the instructions are still too loose.


Testing the skill in the same chat where you wrote it is misleading. The model already has context, so it can seem like the skill is working even when the file itself is underspecified.

A better test is to start fresh, load only the skill file, and try a realistic prompt. That shows you whether the skill stands on its own. For example:

Based on everything you know about this skill and my intent, are there any improvements you would make?

In one clean-room test, the skill exposed two problems: it didn’t clearly say the word git, and some of the instructions were too soft. Phrases like “try to” are easy to ignore. If you want reliable behaviour, use direct rules.


Once the skill behaves well in tests, commit it and use it in real work. Then watch where it still fails. Most skills improve fastest after they’ve been used a few times. The first version should be good enough to run; the second version should be better because it has real failure data.


1

Describe the friction point in one concrete sentence.

2

Use an interactive setup to pull out edge cases.

3

Write pass/fail tests using real inputs.

4

Check the skill in a fresh session.

5

Tighten the markdown, retest, and commit it.

Want to collaborate or chat about design?

I'm always happy to connect — whether it's about a project, a role, or just swapping ideas.

Steven Dempster

© 2026 Steven Dempster