All vibes, no QA!

The week before easter I visited Microsoft's HQ in Redmond for the annual MVP Summit. Naturally, as you probably expect, the topic of everything AI was almost impossible to avoid (I tried) and when I woke up on Saturday morning, with hours to spare before we had to drag our weary butts and heavy luggage to the airport I was having some shower thoughts: There must be something in this whole AI thing right? "Everyone" who's raving about the amazing opportunities and possibilities can't all be wrong, can they?

I have no problems to admit I am a sceptic. When people try to sell me something too hard it makes my spider sense tingle. The myriad of examples where people have vibed away and ended up with amazing catastrophes have made me laugh (out loud), and my own attempts at using various AI assistants for work has been "sometimes may be good, sometimes may be shit" to quote the Milan legend Gattuso. Maybe it's just me being too old and grumpy to be interested in the new stuff. It's been some years since I pass that magic 35 year old marker Douglas Adams described after all:

“I've come up with a set of rules that describe our reactions to technologies:

1. Anything that is in the world when you’re born is normal and ordinary and is just a natural part of the way the world works.
2. Anything that's invented between when you’re fifteen and thirty-five is new and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you're thirty-five is against the natural order of things.”

― Douglas Adams, The Salmon of Doubt: Hitchhiking the Galaxy One Last Time

Anyways! It was with this slight feeling of doom I packed my bags, discovered that there was no breakfast at the hotel, decided to head towards the closest place where one could procure a cup of coffee and hopefully something a scandinavian would classify as breakfast (e.g. nothing close to what americans think breakfast is), and like the elder millennial desperately trying to stay young brought my laptop and decided to give all of this a more serious try.

The app

The very best tip I made a mental note of during all this AI-related information my brain assimilated was to find a real world problem to solve. I think it was a reply from one of the Aspire.net developers to a question from the audience. And I have a few of these in my backlog. Things that would remove some annoyance from my life, but also feel safe enough as they are not touching anything import (like my actual job). So I picked one I think is pretty nice and complex:

I want a web app for focused writing. No disturbances, no big interface with loads of buttons for formatting content, typewriter scrolling, pleasant colors, autosaving so I don't end up losing anything. There are a few of these on the market, but as far as I could find the last time I investigated, none of them have integrations with https://sanity.io.

I also set a couple of rules for myself:

  1. I am not allowed to write any code myself. Just give instructions.
  2. Reviewing pull requests and reading code is not a lot of fun. I don't want to do that either

The result of this experiment is "Earnesty". Obviously a silly wordplay on "Earnest" and the fact that Ernest Hemingway, while being quite a douche, still is my favorite author (and the name Hemingway was already in use by a pretty similar app). You can try the it out yourself at https://write.liasis.dev or view the sourcecode at https://github.com/sjovang/earnesty (you are allowed to look at the commit history and laugh as much as you want. It's an experiment. It is supposed to be weird)

Sometimes may be good

Frontend related tasks are pretty good most of the times. Way better than what I am able to create myself. To be fair, I am not a frontend engineer. I am not really a developer either. I am an old ops dude who likes to automate stuff so I don't have to endlessly solve the same problem over and over

Describing intent over details produce better results, but multiple iterations are often needed to go from idea to working feature

I like the interactive experience of using copilot-cli. Switch between ask, plan and autopilot (/yolo) modes, glance at the agent output as it works, stop it if it goes down the wrong rabbit holes

When an agent got stuck on being stupid it was helpful to switch models, iterate, find a solution, and remember to have it update it's own copilot-instructions file to try to encourage better behavior in the future. That mostly worked well

Sometimes may be bad

At first I did the mistake of trying to giving too specific instructions. I wanted to use https://aspire.dev because I also wanted to play around with how they map app requirements to Azure infrastructure. That was stupid. It turns out my app has few, if any at all, benefits from what Aspire does. It just complicates things and confuse my agents

Working with authentication is a pain and makes agents go stupid. I was expecting openid connect to be a pretty easy concept to implement, but I ended up with soooooo maaaanyyyy iteeeeraaations of basicly telling agents "no, its still not working. these are the errors and what the user see" until I finally managed to write good enough prompts and guide it in the right direction

Sometimes there will be straight up lies. It wasn't until I was about halfway into the experiment that I discovered that the "save" button was not actually implemented at all. We're making a !$!%!$ editor. It's kinda useless if you can't save your words

Actually getting the backend APIs to work properly took way too much time. It absolutely didn't do the right things until I first made sure it implemented tests (and it went completely overboard and made like 100+ tests) and attached application insights to my app to provide traces of what was actually going on. Turns out the whole thing was an import error and the necessary dependencies was not pulled from npm ...

I tried using agents directly in GitHub. It's probably nice if you have a solid setup with GitHub Actions, agent instructions, policies and other nice things, and you're working on a larger scale where you're juggling many feature requests and bug reports. But I just found the whole flow of writing an issue, waiting for output, looking at the PR and giving GitHub Copilot new instructions to iterate until it was ready to merge too slow and boring compared to using copilot-cli.

However, it was not as dreary as trying to use https://github.com/github/spec-kit. That thing just smells of all my least favorite parts in agile. If I wanted to be a scrum master I would have pursued that path, but I'm not a big fan of bureaucracy and I would much rather solve problems than be stuck in meetings.

It’s also really easy to do very stupid things. At one point I was bored with approving the PR’s (I blindly just approved them. The commit log will have definite proofs), typed /agent and made a merge-boss. It immediately took it as a sign to just start merging everything and kicking off builds. I laughed at my own stupidity, hammered escape to stop copilot-cli from doing what it was doing and made sure to give it stricter instructions to only do that when I tell it to, but it shows clearly how easily the big fails people who delete their whole infrastructure can happen.

I still have concerns

"Blow tokens fast" is my new anthem. Honestly. Creating this app consumed 30-40% of the monthly tokens included in my GitHub Pro+ subscription. And to be clear: I did not spend nearly that time chatting with copilot-cli. How is this going to scale for a team of super productive slop jockeys developers? Who is going to pay for all of this?

I didn't learn much of value from this whole thing. When I do hobby projects one of the most important outcomes are new knowledge that I can bring over to the (more) professional part of my life. When I re-wrote my website for the ∞-th time I browsed through a bunch of docs, compared different frameworks, learned a bit of css again for the first time in many years and in the end I was a bit wiser than I was before I started.

Sure, it took me a lot more time than it took to create this app, but it is skills and knowledge I can apply somewhere else. If the only thing I learned from this was to write slightly better prompts and guide my fleet of agents, how useful will that actually be when everything in the AI space change every month and I get a new idea 3 months from now?

This leads me to thinking about junior developers and students trying to land or cope with their first job. Solving problems by having computers deal with them are basicly two fundamental things:

  1. Pattern matching
  2. Reducing complex problems into smaller bits that are easy to understand

What happens to our next generation of developers, engineers and promising talents when they're used to just asking AI for help both in their studies and their entry level job? How will they become seasoned seniors who can draw from experience, translate it into a new domain and solve complex problems when they have skipped the "hard" phases of figuring things out, practicing, being mentored by humans and made mistakes they learn from?

This experiment was pretty fun, but the whole AI party with the army of influencers still feels a bit like this

I am Locutus of Borg. You will be AIsimmilated

And finally. I have a working app. A proof of concept or minimum viable product. I have basicly not read through any of the code. I have no clue if there are critical security problems. Or the quality of the code generated. And because I didn’t learn much and just gave away control I have a tech stack where the only parts I’m familiar with are the infrastructure components. I could continue vibing along in my ignorance, make a shinier landing page, monetize it, market it, and I’m sure there are people in the world who would think: this seems nice. I can stop paying for FocusWriter or Hemingway now. And then just wait for a catastrophe to happen and I would probably be not capable of fixing it.

AI tools, agents, copilots, Claude, and whatever all of them are named, amplifies behavior. That goes both ways. It might give you false confidence with its friendly language, but if you try to do silly things it will just enable you to generate 10x more silly things and somewhere in the output you didn’t read are a few lines similar to “This is dangerous, but the user instructed me to do it”.

To be really good at something you still need to put in the work, think about the problems, talk to the rubber duck, collaborate with your other team members, learn, read the documentation, fail, make mistakes and try again. In the end it will be worth it, and you can use the AI tools to amplify your output, actually evaluate what it’s doing, and be confident that the work you sign off on brings value to the world.

(and yes. This whole post was written in my shiny new writing app. It’s pretty nice. A lot more pleasant than using Sanity Studio)