·5 min read

Frontmatter as a contract between AI and your CMS

Why the YAML block at the top of every MDX file is the most important interface in an AI content pipeline, and what breaks when you treat it as decoration.

BA
Beka A.
Founder
--- yaml
contract.mdx

The first AI-generated draft Branchpost ever opened as a pull request had a beautiful body and broken frontmatter. The title was 140 characters with a colon in the middle. The date was a string in one place and a Date object in another. The tags array contained a single string with commas inside it. The body would have rendered fine. The site would not have built.

That failure taught me something I now consider load-bearing for the whole product: the frontmatter is not metadata. It is the contract between the model and the CMS, and if you don't treat it like an API schema, your pipeline will leak in ways that look like writing quality problems but are actually type errors.

What the contract is actually for

Most blog engines treat frontmatter as a convenient place to stash a title and a date. That's underselling it. In an AI-driven pipeline, frontmatter does four jobs at once:

  1. It tells the build system how to render the page (title, cover, draft flag).
  2. It tells the site how to list the page (tags, category, featured).
  3. It tells the reader what the post claims before they read it (summary).
  4. It tells the model what shape the output must take before it generates a single word of body.

The fourth job is the one nobody writes about. The frontmatter schema, fed back into the generation prompt, is the cheapest, most reliable constraint you have on AI output. A model that knows it must produce a summary under 200 characters and a tags array of three to five strings from a defined vocabulary will hallucinate less in the body, because the structural rails carry into the prose.

The drift problem

Without a strict schema, AI-generated drafts drift in small ways that compound. Draft 1 uses description. Draft 2 uses summary. Draft 3 invents excerpt. Your CMS reads one of them. The other two render as missing fields. Multiply that by ten posts a week and your archive page is half-empty for reasons nobody can debug because the failures are silent.

The fix isn't a better prompt. The fix is a validator that runs before the PR is opened and rejects the draft if any field is wrong. We use Zod:

import { z } from 'zod'
 
const FrontmatterSchema = z.object({
  title: z.string().min(10).max(80),
  date: z.coerce.date(),
  summary: z.string().min(40).max(200),
  tags: z.array(z.string()).min(2).max(5),
  category: z.enum(['Engineering', 'Product', 'Workflow', 'Field notes']),
  author: z.string(),
  authorRole: z.string(),
  cover: z.object({
    glyph: z.string().max(20),
    subtitle: z.string().optional(),
  }),
  featured: z.boolean().default(false),
  draft: z.boolean().default(false),
})
 
export type Frontmatter = z.infer<typeof FrontmatterSchema>

That schema is the single source of truth. The build uses it. The model gets a serialized version of it in the generation prompt. The PR check re-runs it before merge. Three enforcement points, one definition.

Feeding the schema back to the model

Here's the part most teams skip. The model is not psychic. If you want it to produce valid frontmatter, you have to tell it the shape, and you have to tell it in a form it can actually follow. We inline a JSON Schema representation of the Zod schema into the generation prompt, along with the enum values for category and the allowed tag vocabulary from the publication context.

const prompt = `
Produce an MDX file with frontmatter matching this schema:
 
${JSON.stringify(zodToJsonSchema(FrontmatterSchema), null, 2)}
 
Allowed tags: ${allowedTags.join(', ')}
Allowed categories: ${allowedCategories.join(', ')}
 
The frontmatter must come first, between --- delimiters, in YAML.
`

The model gets the schema. The validator enforces it. If validation fails, the pipeline retries with the error message appended to the prompt. That feedback loop catches roughly 90% of frontmatter failures on the first retry in our runs, which matters because retries are where the cost of an AI content pipeline actually lives.

Tags are the field that breaks first

Of all the frontmatter fields, tags are the one I've seen drift the most. The model will happily invent a new tag every post: "ai-workflows" in one draft, "ai-workflow" in the next, "workflows-ai" in a third. Your tag pages fragment. Nobody notices until you try to build a related-posts feature and realize you have 47 tags with one post each.

The fix is brutal: tags are an enum, not a free-text field. The publication context defines the allowed set. Anything outside that set fails validation. If the model thinks a new tag is needed, that's a conversation for the human reviewer, not a unilateral decision baked into a draft.

This is the same logic that applies to reviewing the draft itself: the frontmatter is the function signature, and a wrong signature means the implementation doesn't get read.

What the contract buys you

When the frontmatter contract holds, three things become true. Your CMS never silently breaks on a missing field. Your archive, tag, and category pages stay coherent across hundreds of posts. And your model output becomes meaningfully more predictable in the body, because the constraints at the top of the file propagate down. A model that has committed to a 160-character summary writes a tighter introduction. A model that has chosen a category writes within that category's conventions.

Treat the YAML block like an API. Validate it like an API. Version it like an API. Everything downstream gets easier.

Branchpost watches your repo and opens PRs with blog drafts. You review them like code. branchpost.dev

posts/frontmatter-as-a-contract.mdx