Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice to see that you recognize that!

> One of the more "engineering" like skills in using this stuff is methodically figuring out what's a superstition and what actually works.

The problem is there are so many variables and the system is so chaotic that this is a nearly impossible task for things that don’t have an absolutely enormous effect size.

For most things you’re testing, you need to run the experiment many many times to get any kind is statistically significant result, which rules out manual review.

And since we have tried and failed to develop objective code quality metrics, you’re left with metrics like “does this pass the automated test or not!”, but that doesn’t tell you whether the code is any good, or whether it is overfitting the test suite. Then when a new model comes out, you have to scrap your results and run your experiments all over. This is engineering of the laws of physics were constantly changing, and I lived in that universe, I think I’d take my ball and go home.

There's always been a bit of magic to being a programmer, and if you look at the cover of SICP people like to imagine that they are wizards or alchemists. But "vibe engineering" moves that to a whole new level. You're a wizard mixing up gunpowder and sacrificing chickens to fire spirits before you light it. It's not engineering because unless the models fundamentally change you'll never be able to really sort the science from the superstition. Software engineering already had too much superstition for my taste, but we're at a whole new level now.





Here's an example from today of something I just figured out.

I had Claude Code do some work which I pushed as a branch to GitHub. Then I opened a PR so I could more easily review it and added a bunch of notes and comments there.

On a hunch, I pasted the URL to that PR into Claude Code and said "use the GitHub API to fetch the notes on this PR"...

... and it did exactly that. It guesses the API URL, fetched the JSON and read my notes back to me.

I told it to address each note in turn and commit the result. It did.

If a future model changes such that it can no longer correctly guess the URL to fetch JSON notes for a GitHub PR I'll notice when this trick fails. For the moment it's something I get to tuck in my ever expanding list of things that Claude (and likely other good models) can do.


How is that an example of something you are doing that might be a superstition?

You asked it to do a single easily verifiable task and it did it. You don’t know whether that’s something it can do reliably until you test it sure.

An example of a possible superstitious action would be always adding commands as notes in a PR because you believe Claude gives PR notes more weight.

That’s something that sounds crazy, but it’s perfectly believable that some artifact of training could lead some model to actually behave this way. And you can imagine that someone picking up on this pattern could continue to favor writing commands as PR notes years after model changes have removed this behavior.


When I'm working with models I'm always looking for the simplest possible way to express a task. I've never been a fan of the whole "you're a world expert in X", "I'll tip you a million dollars if..." etc school of prompting.

I wrote up another real world example of how I use Claude Code this afternoon: https://simonwillison.net/2025/Oct/8/claude-datasette-plugin...


Those are some obvious potential superstitious incantations. They might not be superstitions though. They might actually work. It’s entirely feasible that bribes produce higher quality code. Unfortunately it’s not as easy as avoiding things that sound ridiculous.

The black box, random, chaotic nature of LLMs virtually ensures that you will pick up superstitions even if they aren’t as obvious as the above. Numbered lists work better than bullets. Prompts work better if they are concise and you remove superfluous words. You should reset your context as soon as the agent starts doing x.

All of those things may be true. They may have been true for one model, but not others. They may have never been generally true for any model, but randomness led someone to believe they were.


I just realized I picked up a new superstition quite recently involving ChatGPT search.

I've been asking it for "credible" reports on topics, because when I use that word its thinking trace seems to consider the source of the information more carefully. I've noticed it saying things like "but that's just a random blog, I should find a story from a news organization".

But... I haven't done a measured comparison, so for all I know it has the same taste in sources even if I don't nudge it with "credible" in the mix!


Great example!



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: