Writing prompts that produce code-aware blog drafts

A draft landed in my review queue last week that opened with "In modern software development, performance is critical." I closed the PR without reading further. The repository it was generated against had three commits that week touching a Shiki highlighter cache, a Drizzle migration, and a retry handler in the Anthropic client. The draft mentioned none of them. That gap, between what the repo actually says and what the model writes, is the entire problem code-aware prompting has to solve.

Most AI writing tools accept a topic and produce a post. The post sounds plausible because language models are good at plausible. It's also interchangeable with any other tool's output because the only input was the topic. If you want a draft that could only have come from your repo, you have to feed the model your repo, in a shape it can use.

What "code-aware" actually means as prompt input

Code-aware is not "the model has seen your code." That framing leads people to dump a tarball into context and hope. It doesn't work, because most of a repo is noise relative to a single blog post, and because raw code is the wrong abstraction for prose generation anyway.

What the model needs is a structured digest of recent signal. For a Branchpost run, that digest looks roughly like this:

{
  "recent_commits": [
    {
      "sha": "a3f1c92",
      "message": "Cache Shiki highlighter across MDX files",
      "files": ["lib/mdx/highlighter.ts", "lib/mdx/pipeline.ts"],
      "diff_summary": "Hoisted createHighlighter out of per-file loop"
    }
  ],
  "touched_modules": ["lib/mdx", "lib/cache"],
  "open_issues_tagged_blog": [...],
  "package_changes": { "added": [], "removed": [], "updated": ["[email protected]"] },
  "topic_queue_entry": {
    "title": "Writing prompts that produce code-aware drafts",
    "notes": "Show how repo signals feed the prompt"
  }
}

That JSON is not the prompt. It's the evidence the prompt cites. The prompt itself is a set of instructions that tells the model how to weave this evidence into prose without lying about it.

The four signals worth extracting

Not every repo signal is useful to a blog draft. After running a lot of these, the four that consistently produce non-generic output are:

Commit messages with file paths. A commit message alone is a slogan. A commit message paired with the files it touched is a claim the model can ground a paragraph in.
Package.json deltas. "We upgraded Shiki to 1.22" is a concrete sentence the model couldn't have invented. Dependency changes are one of the highest signal-to-token ratios available.
Closed issues and merged PRs since the last post. These describe decisions, not just changes. Decisions are what readers actually want to read about.
The topic queue entry itself. A short note from the author saying "the angle here is X" beats any amount of inferred context.

Notice what's not on that list: the full contents of source files. We tried it. The model uses 20% of the tokens and writes worse, because raw code invites the model to quote it verbatim instead of explaining it.

The prompt structure that uses this

The prompt has three parts, in this order:

[publication context]         // cached, ~8k tokens
[repo digest as JSON]         // fresh per run, ~1-2k tokens
[generation instructions]     // fresh per run, ~400 tokens

The generation instructions are where code-awareness gets enforced. The relevant lines, paraphrased from our actual prompt:

When you make a technical claim, reference a specific file path, commit, or dependency from the repo digest above. If you cannot ground a claim in the digest, either omit it or label it as a general principle, not a fact about this codebase.

That single instruction is responsible for most of the difference between "In modern software development, performance is critical" and "Hoisting the Shiki highlighter out of lib/mdx/pipeline.ts cut cold-start overhead from 6s to 1s." The model is capable of both. It writes the second one when the prompt forces it to.

Where it still fails

Code-aware prompting does not fix structure. A draft can cite every commit correctly and still ramble. It does not fix tone, which is what publication context is for. And it does not catch the case where the model invents a plausible-sounding file path that doesn't exist in the repo. That last one is rare but real, and it's why the review step matters regardless of how good the prompt is.

The other failure mode is over-citation. If you tell the model "ground every claim in a file path," it will produce drafts that read like a changelog: "We changed x.ts. Then we changed y.ts. Then we changed z.ts." A blog post is not a commit log. The prompt has to allow synthesis, not just citation, which is why the instruction reads "when you make a technical claim" rather than "in every sentence."

The shortest version of the lesson

If you want a draft that references your actual work, the model needs your actual work in a shape it can use. That shape is not a tarball. It's a digest: commits with file paths, dependency deltas, closed decisions, and an author's note about the angle. Pair that digest with an instruction that forbids ungrounded technical claims, and the difference shows up in the first paragraph.

The reason this is hard to retrofit onto existing tools is that they don't have a repo to read from. They have a text box. Branchpost watches your repo and opens PRs with blog drafts. You review them like code. branchpost.dev

engineering prompts ai github

posts/writing-code-aware-prompts.mdx

Writing prompts that produce code-aware blog drafts

What "code-aware" actually means as prompt input

The four signals worth extracting

The prompt structure that uses this

Where it still fails

The shortest version of the lesson

Related posts

Frontmatter as a contract between AI and your CMS

The cost shape of an AI content pipeline

How we cut our blog build time from 9s to 1.4s