Claude Code's product lead talks usage limits, transparency, and the "lean harness"

1966woodenghost May 15, 2026 0

We don’t find that it makes a measurable improvement in performance, but we’ve designed Claude Code to be extensible enough that if you want a plugin that does that, it’s available, and you can connect it. But we’ve found that Claude Code is pretty good at generating high-quality code without needing to add that to be able to navigate the codebase.

Ars: The question is less about the quality of the code than the efficiency of getting there, right? Because, again, people get very frustrated with usage limits. Sometimes people try to introduce some kind of structure for an LLM, and they find out that has an unexpected hidden cost. Is that what you’re saying happens with that kind of semantic information? Do you have data that tells you that’s not the way to go with this?

Wu: Going by the evals, we don’t see a measurable change. And I think we generally lean more toward shipping a leaner harness with fewer opinionated tools and just letting developers add their own if they want. So unless a tool clearly improves token performance or accuracy, we default toward not shipping it.

I think token efficiency is always top of mind for us because we just want to give people the maximum amount of intelligence per token, so we’re constantly experimenting with ways to reduce it, but it’s actually harder than I wish it were to do it well.

For us, the most important thing is just maintaining intelligence, so we would only ship something if we felt like it actually makes a model more intelligent because that’s that’s really the north star for us, not token efficiency.

Ars: For some users it might be easier to accept limitations on token availability if it was more transparent. But at the same time, my impression is that actually having real transparency about the token usage of “this task did this much because you did this instead of this”—that’s actually hard to do.

I assume you’ve looked into ways to communicate that to users. What have you found when you’ve tried to do that?

Wu: We did get a lot of questions about that, like, “Hey, my usage limits got used up quickly, where did they go?” And I think that’s totally valid, and we need to be transparent about that. It is hard to diagnose.

So when people have these complaints, we pick a few people, we jump on a call with them, and we actually just debug live because your full transcript is fully stored locally, so you actually have all the data on your computer already about all the tokens that you use…

We noticed two main patterns. One, people have these really long sessions, they step away for two hours, they come back and then the cache is broken—and when the cache is broken, it’s actually much more expensive to send the next query. So we start showing a notification that says, “Hey, the cache is broken, run /clear if you want to start a new session.” So it’s just a reminder that this one’s pretty expensive to resume. Also, when you run /usage, you’ll actually see, “Hey, these sessions cost a lot because your cache is broken.”

Source: arstechnica.com…