AI can't tell your biggest feature from a dependency update

A product lead at a developer tools company told us about the time he tried using AI to speed up his changelog. He pointed it at a quarter’s worth of PRs and asked for a summary of what shipped.

The AI looked at the PR volume and decided the most important thing that quarter was a dependency update. Granted, a massive dependency update with forty+ PRs and tons of code changes, but plainly not the headline feature for a whole quarter of work.

The migration was pure infrastructure, critical internally, but ideally should be invisible externally (otherwise they did something wrong).

Of course the three PRs that actually did introduce major new product capability got a passing mention at the bottom.

Volume is not importance

Generic AI tools have a fundamental problem when it comes to changelogs: they treat all code changes equally. More PRs means more important, more lines changed means more significant, and more commits means more worth talking about.

That’s the opposite of how customer communication works. The change that matters most to customers might be a single commit that fixes a billing display bug. The change that matters least might be a month-long infrastructure project with hundreds of commits.

Without product context, AI has no way to tell the difference. It can read the code and summarize what changed technically, but what it can’t do is judge whether a customer would care.

The prioritization gap

The PM told us his team intentionally structures their public changelog to always lead with the stuff that benefits customers. It goes:

Major features in the first paragraph, then
improvements get the second most prominent placement, then
bug fixes that customers reported.

The “internal work”, migrations, dependency updates, and refactors either gets a brief mention at the bottom or gets left out entirely.

That prioritization requires judgment: knowing which changes customers are actually going to see, which changes fix bugs that customers have open tickets for, and which feature improvements are worth announcing versus which ones just need to work quietly in the background.

When he tried ChatGPT, the output was technically accurate but strategically useless. The AI had no sense of customer impact, no awareness of what matters to the audience. He spent more time restructuring and rewriting the AI’s output than it would have taken to write from scratch.

The voice problem

Even when you point AI at the right changes, there’s a second issue: the output sounds like AI.

“We’re excited to announce” followed by feature descriptions that could be about any product. The PM characterized the writing as “disingenuous.” His team’s changelog has a specific tone since they’re selling to developers, it needed to be direct, technical, and honest. The AI’s version sounded like a press release from a company trying too hard.

This isn’t just an aesthetic complaint. Customers can tell when content is AI-generated, and for a developer tools company, authenticity matters. Your changelog is a direct line to your most engaged users. If it reads like a marketing bot wrote it, you’ve damaged trust with the exact people who care the most.

The prompt engineering trap

His next attempt was the obvious: “fix the prompts.” Try giving the AI more context, tell it explicitly which changes matter, specify the characteristics of the tone, and then provide examples of both good and bad updates.

Teams try this, and it works… kind of. But even when the output is acceptable, now you’re spending time that you would have spent writing futzing with the prompts instead.

I’m a bit of a prompt futzer myself, but if you’re manually providing all the context that makes a good changelog, customer impact, prioritization, voice… at some point the effort to manage the AI exceeds the effort to just write the changelog yourself.

And that’s how we wound up on the call, his conclusion is that with all the information at his fingertips he could write a killer update “two, three hours,” so the hours of data wrangling, guess and check, editing, and the still needing to do all the sharing manually wasn’t worth it.

What “AI-powered” should mean

The problem isn’t “AI is bad,” but is rather the nature of using generic AI to solve an “Expert problem.”

A general-purpose language model doesn’t know your product, your customers, or your voice. It doesn’t know that a migration is internal work. It doesn’t know that three PRs matter more than forty. It doesn’t know that your changelog should sound direct and technical, not breathless and generic.

This is all captured in the expertise of your product managers and product marketers (if you’re lucky enough to have dedicated product marketers!)

Tools built for the hands of these experts solve the problem differently. Instead of asking you to provide context through prompts, they build context from your actual workflow, commit messages, PR descriptions, ticket metadata, historical changelog entries. They learn what your team considers customer-facing versus internal. They match your existing voice instead of defaulting to AI-speak.

Our goal isn’t to remove the human from the changelog. It’s to handle the parts that don’t require human judgment, like aggregation, categorization, and first-draft generation, so the human can focus on the parts that do: prioritization, framing, and making sure the right message reaches the right audience.

If you’ve tried using AI for your changelog and found the output generic or the editing overhead too high, let’s have a chat about how Changebot generates updates with product context built in.