It's easy to argue we have developed a horrible habit as devs.

A lot of us just let coding agents run with full permissions not worrying too much about the issues this can cause. On personal projects especially, we are happy to give Claude or Codex broad access to the machine because the pain of clicking Approve every 30 seconds feels like torture. Once you use these tools for any amount of real work instead of just making quick demos the pain becomes obvious, fast. The permissions systems might look reasonable at first, then you spend a few days inside and you realise they were built largely to have the user confirm the products internal plumbing.

The tools quickly adapted the way you'd expect. Every single one of them ended up with some version of Always Approve because everyone hit the same wall. We didn't really solve this problem though. We just built enough of a safety net that most of us don't care about the mistakes. Worktrees, branches, logs, isolated environments, a good record of what's been changed, and a good ability to throw the bad work away, or my personal favorite "I doubt it would ever run rm -rf / but if it does it's 3 min to reinstall my OS and 5 more to get the rest back up, I can live with that." We get comfortable because the setup can survive the imperfections of the model, not because the models are good enough to be fully trustworthy.

This has been stuck in my head recently as I've been messing around with little personal finance prototypes and the Plaid sandbox, trying to think through how actual "agentic products" move beyond harmless work and start touching areas where making a mistake comes with a real cost. This also ties into a pretty common question, how do we build this for the capabilities of the models in 6 months without making the product far too painful to use today.

The Dev space matters here because it is the first space where people have really felt the full pain of dealing with these approval heavy agent workflows. If you use these tools all day you learn very quickly that constant prompts are unbearable, but the deeper lesson is about attention. As a user I don't want to sit there blessing every single invisible step the system wants to take on the way to doing what I asked it to do.

Why this keeps coming up

You hear this everywhere "build for the models of tomorrow", and I think this is the right take. If you design super complex systems around the current model limitations you're going to end up with products that feel ancient incredibly quickly. Issue is many people hear this and are too lazy about how they action this. They build what workflows will look like next and then secure it by exposing all the scaffolding to user permissions today. In theory the product is "agentic" but in practice this feels worse than doing it yourself.

What we need to aim for is the seamlessness of having someone we hired to do this for us.

This is where a lot of the current "AI" products get stuck. One version is just glorified ChatGPT with extra context and maybe a nicer wrapper. These get less interesting by the minute and already feel dated cause the labs keep absorbing the obvious version of the feature set. The other approach technically does things on it's own, but makes the user approve every small step in the workflow. Approve reading this. Approve checking that. Approve drafting this. Approve sending that. At a certain point I haven't delegated anything. I'm just using a crappier interface to do the same thing.

Permissions and approvals are different problems.

Some access you simply should not give to the system. So yeah, least privilege still matters, but once you move from chat to action the interruption problem shows up immediately. How many times does the user have to get dragged back in before the whole thing starts to feel like torture. This is what I keep coming back to. These "agentic workflows" need a second design principle next to least privilege. Least approvals. Or more precisely, least approvals required for a still-safe system.

So where do the approvals actually belong

The mechanism we are trying to achieve here is basically approval compression.

By that I don't mean hiding five important decisions behind one lazy approve at the end. That would be brain dead. I mean letting the system do the reversible work, the checking, the comparing, the prep, all in the background, then surfacing the real edge as one clean decision. Compress approvals, not legibility. I still need to know what is about to happen. I still absolutely need a clear summary, a reason, a record of what happened after the fact. What I don't really need though is to approve the system thinking.

The Dev side has been very useful here cause it forced the tradeoff out into the open. We never really solved the problem, but many got so fed up with it they are working on their own layers for coding agents. This has resulted in an impressive range. You've got everything from 99 which Prime made all the way to things like vibe-kanban (which if you look at my github you know I tried using for a bit and contributed to it).

These are almost the direct opposites of the philosophy we can take here. 99 aims for the smallest most targetted edits so you get things exactly how you want, and in this sense it doesn't need constant permissions since it's making such small changes at a time. On the other extreme vibe-kanban where you could try not even looking at the code and just managing a kanban board of agents.

Neither of them was really for me, but both were in different ways removing the need for constant approvals. This is the key question to carry over to other domains, how do we build the right surfaces so that we don't deal with the constant approvals.

Where finance forces the issue

Finance makes this much harder very quickly because fake delegation falls apart the moment you are moving my money.

If I say "I just got a $10k bonus, sort it out", I do not mean "walk me through every internal step". What I want is the agent to handle it the way I would let a competent person handle it for me. Look at what I already have. Look at where I am underweight, where I am overexposed, where i am behind on something I told you I care about. Check the boring constraints. Consider the tax implications. Then and only then come back to me when there is something real to approve.

That is only gonna works though if the system already knows the rough shape of what I want. Otherwise one final approval is just me rubber stamping a random set of hidden judgment calls. So yes, there has to be some standing policy underneath this. Keep me above some cash floor. Don't trigger taxable sales without asking. Don't use new counterparties without asking. Stay inside the bounds I already set for diversification and risk. The user shouldn't have to think about the machinery every time. That is the whole point. I should be able to instruct it the way I would my team. Be able to trust it can handle it. Here OpenClaw taught us a lot around managing memory, working around preferences, and how messy all of this can get when trying to persist properly, but breaking this down is probably a whole separate write up.

Once that is in place though, the workflow gets much cleaner. The agent can reconcile balances, check goals, compare a couple of paths, prepare the transfers, build the final summary, all quietly in the background. Then when it reaches the point where money is about to move, it gives me one clear view of what is happening. X amount goes here. Y amount goes there. These accounts are involved. This stays inside the rules you've already set. Approve.

This is what a system I would actually wanna use should feel like.

The approval boundary lines up with a real boundary. Money leaving an account. A new counterparty. A hard to undo action. A decision with legal or tax consequences which I would get a phonecall about anyway if I delegated this out to someone else. Those are the moments where as the user I should come back into the flow. The individual steps of internal implementation are not.

So where might this be going

As much as everyone on twitter would have you believe this revolution happened more than 9 months ago, I think we only really reached the point where the models can now do enough useful work in the background for the product questions to matter in December. Now that we're here though we have to quickly adpat before the next shift. We don't wanna stay some crummy glorified chat app of two years ago, and in a couple years we'll likely have good enough safety to remove most of these approvals. This means right now we have to figure out how do we build the workflows today so it works and feels like what we'll have by end of next year, but without exposing the users to unreasonably risk.

This brings us back to where we've been throughout this article. Let the system do as much as it can quietly. Keep the last meaningful checkpoint. Then as models get better, and as the system gets better at understanding intent, we slowly let the approvals shrink and eventually disappear naturally. We shouldn't need to reinvent the product later. Better models should just remove friction from the same basic shape.

In the long term I do think much fuller autonomy is coming in a lot of domains. The end state is probably a system that hadnles the routine work on its own and only asks where it is unsure, the same way a good operator would. Right now though we are not there yet. So for now, once clean approval at the real edge is probably the closest version of the future worth shipping.

Least Approvals

Why this keeps coming up

So where do the approvals actually belong

Where finance forces the issue

So where might this be going