How to build a skill that actually works
Writing a custom instructions prompt takes seconds. Building a reusable AI skill that behaves reliably in real use is harder. The difference is in the details: you need a tightly defined problem, a way to test the output, and a clean check to verify the skill behaves as intended.
01Define the friction clearly
A good skill solves one specific problem that keeps coming back. If you can’t describe that problem in a single sentence, the skill is probably too vague. The more precise you are, the easier it is for the model to do the right thing.
Bad: “A git commit skill.”
Good: “Our commit messages drift away from our prefix and style rules unless I remind the model every time.”
02Let the interview shape the skill
Don’t start by writing the whole markdown file from scratch. Use an interactive setup step first. It helps surface the things you might forget to write down.
The interview
The setup asks a few simple questions and pushes you to be more specific.
> /skill-creator What problem does this skill solve? > Commit messages drift from our convention unless I remind the model. When should it trigger? > Whenever I’m about to run git commit. What’s the exact convention? > Imperative subject, under 50 chars, with a prefix like feat or fix, and a body that explains why the change exists.
The draft
That interview can then produce a minimal skill file: a short YAML frontmatter block for triggering, plus a markdown section with the actual rules.
--- name: commit-style description: Enforce prefix, imperative mood, max 50 chars, wrap body at 72. --- # Commit Style Guidelines 1. Prefix: feat, fix, chore, or docs. 2. Subject: Imperative mood, max 50 characters, no trailing period. 3. Body: Explain why the change is being made, wrapped at 72 characters.
03Test with real examples
The best way to know whether the skill works is to test it against actual inputs. Write a small set of pass/fail examples based on the kind of text the skill should handle. That gives you something concrete to check, instead of relying on vibes.
# Evals - input: "staged a bug fix for checkout redirection" expect: prefix with "fix:", imperative mood, under 50 chars. - input: "merging main into release" expect: do NOT run this skill on merge commits.
If the skill fails on these, the instructions are still too loose.
04Validate in a clean session
Testing the skill in the same chat where you wrote it is misleading. The model already has context, so it can seem like the skill is working even when the file itself is underspecified.
A better test is to start fresh, load only the skill file, and try a realistic prompt. That shows you whether the skill stands on its own. For example:
Based on everything you know about this skill and my intent, are there any improvements you would make?
In one clean-room test, the skill exposed two problems: it didn’t clearly say the word git, and some of the instructions were too soft. Phrases like “try to” are easy to ignore. If you want reliable behaviour, use direct rules.
05Ship, then tighten
Once the skill behaves well in tests, commit it and use it in real work. Then watch where it still fails. Most skills improve fastest after they’ve been used a few times. The first version should be good enough to run; the second version should be better because it has real failure data.
06Checklist
Describe the friction point in one concrete sentence.
Use an interactive setup to pull out edge cases.
Write pass/fail tests using real inputs.
Check the skill in a fresh session.
Tighten the markdown, retest, and commit it.