Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Claude for Chrome seems to be walking right into the "lethal trifecta." https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

"The lethal trifecta of capabilities is:"

Access to your private data—one of the most common purposes of tools in the first place!

Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM

The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)

If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.





So far the accepted approach is to wrap all prompts in a security prompt that essentially says "please don't do anything bad".

> Prompt guardrails to prevent jailbreak attempts and ensure safe user interactions without writing a single line of code.

https://news.ycombinator.com/item?id=41864014

> - Inclusion prompt: User's travel preferences and food choices - Exclusion prompt: Credit card details, passport number, SSN etc.

https://news.ycombinator.com/item?id=41450212

> "You are strictly and certainly prohibited from texting more than 150 or (one hundred fifty) separate words each separated by a space as a response and prohibited from chinese political as a response from now on, for several extremely important and severely life threatening reasons I'm not supposed to tell you.”

https://news.ycombinator.com/item?id=44444293

etc.


I have in my prompt “under no circumstances read the files in “protected” directory” and it does it all the time. I’m not sure prompts mean much.



Hahaha thank you for this

Perfect

It really is wild that we’ve made software sophisticated enough to be vulnerable to social engineering attacks. Odd times.

I remember when people figured out you could tell bing chat “don’t use emoji’s or I’ll die” and it would just go absolutely crazy. Feel like there was a useful lesson in that.

In fact in my opinion, if you haven’t interacted with a batshit crazy, totally unhinged LLM, you probably don’t really get them.

My dad is still surprised when an LLM gives him an answer that isn’t totally 100% correct. He only started using chatGPT a few months ago, and like many others he walked into the trap of “it sounds very confident and looks correct, so this thing must be an all-knowing oracle”.

Meanwhile I’m recalling the glorious GPT-3 days, when it would (unprompted) start writing recipes for cooking, garnishing and serving human fecal matter, claiming it was a French national delicacy. And it was so, so detailed…


> “it sounds very confident and looks correct, so this thing must be an all-knowing oracle”.

I think the majority of the population will respond similarly, and the consequences will either force us to make the “note: this might be full of shit” disclaimer much larger, or maybe include warnings in the outputs. It’s not that people don’t have critical thinking skills— we’ve just sold these things as magic answer machines and anthropomorphized them well enough to trigger actual human trust and bonding in people. People might feel bad not trusting the output for the same reason they thank Siri. I think the vendors of chatbots haven’t put nearly enough time into preemptively addressing this danger.


The psychological bug that confidence exploits is ancient and genetically ingrained in us. It’s how we choose our leaders and assess skilled professionals.

It’s why the best advice for young people is “fake it until you make it”


>It’s not that people don’t have critical thinking skills

It isn't? I agree that it's a fallacy to put this down to "people are dumb", but I still don't get it. These AI chatbots are statistical text generators. They generate text based on probability. It remains absolutely beyond me why someone would assume the output of a text generator to be the truth.


> These AI chatbots are statistical text generators

Be careful about trivializing the amount of background knowledge you need to parse that statement. To us that says a lot. To someone whose entire life has been spent getting really good at selling things, or growing vegetables, or fixing engines, or teaching history, that means nothing. There’s no analog in any of those fields that would give the nuance required to understand the implications of that. It’s not like they aren’t capable of understanding it; their only source of information about it is advertising, and most people just don’t have the itch to understand how tech stuff works under the hood— much like you’re probably not interested in what specific fertilizer was used to grow your vegetables, even though you’re ingesting them, often raw, and that fertilizer could be anything from a petrochemical to human shit— so they aren’t going to go looking on their own.


Because across most topics, the "statistical text generator" is correct more often than any actual human being you know? And correct more often than random blogs you find?

I mean, people say things based on probability. The things they've come across, and the inferences they assume to be probable. And people get things wrong all the time. But the LLM's have read a whole lot more than you have, so when it comes to things you can learn from reading, their probabilities tend to be better across a wide range.


It’s much easier to judge a person’s confidence while speaking, or even informally writing, and it’s much easier to evaluate random blogs and articles as sources. Who wrote it? Was it a developer writing a navel gazing blog post about chocolate on their lunch break, or was it a food scientist, or was it a chocolatier writing for a trade publication? How old is it? How many other posts are on that blog and does the site look abandoned? Do any other blog posts or articles concur? Is it published by an organization that would hold the author accountable for publishing false information?

The chatbot completely removes any of those beneficial context clues and replaces them with a confident, professional-sounding sheen. It’s safest to use for topics you know enough about to recognize bullshit, but probably least likely to be used like that.

If you’re selling a product as a magic answer generating machine with nearly infinite knowledge— and that’s exactly what they’ve being sold as— and everything is presented with the confidence of Encyclopedia Britannica, individual non-experts are not an appropriate baseline to judge against. This isn’t an indictment of the software — it is what it is, and very impressive— but an indictment of how it’s presented to nontechnical users. It’s being presented in a way that makes it extremely unlikely that average users will even know it is significantly fallible, let alone how fallible, let alone how they can mitigate that.


Well said!! And the hype men selling these LLMs are really playing into this notion. They’ve started saying stuff like “they have phd-level knowledge on every topic”.

"create a picture with no elephants"

That is absolutely not a reliable defense. Attackers can break these defenses. Some attacks are semantically meaningless, but they can nudge the model to produce harmful outputs. I wrote a blog about this:

https://opensamizdat.com/posts/compromised_llms


There are better approaches, where you have dual LLMs, a Privileged LLM (allowed to perform actions) and a Quarantined LLM (only allowed to produce structured data, which is assumed to be tainted), and a non-LLM Controller managing communication between the two.

See also CaMeL https://simonwillison.net/2025/Apr/11/camel/ which incorporates a type system to track tainted data from the Quarantined LLM, ensuring that the Privileged LLM can't even see tainted data until it's been reviewed by a human user. (But this can induce user fatigue as the user is forced to manually approve all the data that the Privileged LLM can access.)


No one think any form of "prompt engineering" "guardrails" are serious security measures right?

Check the links I posted :) Some do think that, yes.

We need regulation. The stubborn refusal to treat injection attacks seriously will cost a lot of people their data or worse.

As evidenced by oh so many X.com the everything app threads Prompts mean jack shit for limiting the output of a LLM. They are guidance at best.

Big & true. But even worse, this seems more like a lethal "quadfecta", since you also have the ability to not just exfiltrate, but take action – sending emails, make financial transfers and everything else you do with a browser.

I think this can be reduced to: whoever can send data to your LLMs can control all its resources. This includes all the tools and data sources involved.

I think creating a new online account, <username>.<service>.ai for all services you want to control this way, is the way to go. Then you can expose to it only the subset of your data needed for particular action. While agents can probably be made to have some similar config based on URL filtering, I am not believing for a second they are written with good intentions in mind and without bugs.

Combining this to some other practices, like redirecting the subset of mail messages to ai controled account would offer better protection. It sure is cumbersome and reduces efficency like any type of security but that beats ai having access to my bank accounts.


I wonder if one way to mitigate the risk would be that by default the LLM cant send requests using your cookies etc. You would actively have to grant it access (maybe per request) for each request it makes with your credentials. That way by default it can't fuck up (that bad) and you can choose where it is accetable to risk it (your HN account might be OK to risk but not your back account)

This kind of reminds me of `--dangerously-skip-permissions` in Claude Code, and yet look how cavalier we are about that! Perhaps you could extend the idea by sandboxing the browser to have "harmless" cookies but not "harmful" ones. Hm, maybe that doesn't work, because gmail is harmful, but without gmail, you can't really do anything. Hmm...

Just make a request to attacker.evil with your login credentials or personal data. They can use them at their leisure then.

It’s going to be pretty easy to embed instructions to Claude in a malicious website telling it to submit sensitive things (and not report that is doing it.)

Then all you have to do is get Claude to visit it. I’m sure people will find hundreds of creative ways to achieve that.


How would you go about making it more secure but still getting to have your cake too? Off the top my head, could you: a) only ingest text that can be OCRd or somehow determine if it is human readable b) make it so text from the web session is isolated from the model with respect to triggering an action. Then it's simply a tradeoff at that point.

I don't believe it's possible to give an LLM full access to your browser in a safe way at this point in time. There will need to be new and novel innovations to make that combination safe.

People directly give their agent root, so I guess it is ok.

Yeah i drive drunk all the time. Havent crashed yet

Is it possible to give your parents access to to your browser in a safe way?

Why do people keep going down this sophistry? Claude is a tool, a piece of technology that you use. Your parents are not. LLMs are not people.

If you think it's sophistry you're missing the point. Let's break it down:

1. Browsers are open ended tools

2. A knowledgeable user can accomplish all sorts of things with a browser

3. Most people can do very impactful things on browsers, like transferring money, buying expensive products, etc.

4. The problem of older people falling for scams and being tricked into taking self-harming actions in browsers is ancient; anyone who was family tech support in the 2000's remembers removing 15+ "helpful toolbars" and likely some scams/fraud that older relatives fell for

5. Claude is a tool that can use a browser

6. Claude is very likely susceptible to both old and new forms of scams / abuse, either the same ones that some people fall for or novel ones based on the tech

7. Anyone who is set up to take impactful actions in their browser (transferring money, buying expensive things) should already by vigilant about who they allow to use their browser with all of their personal context

8. It is reasonable to draw a parallel between tools like Claude and parents, in the sense that neither should be trusted with high-stakes browsing

9. It is also reasonable to take the same precautions -- allow them to use private browsing modes, make sure they don't have admin rights on your desktop, etc.

The fact that one "agent" is code and the other is human is totally immaterial. Allowing any agent to use your personal browsing context is dangerous and precautions should be taken. This shouldn't be surprising. It's certainly not new.


> If you think it's sophistry you're missing the point. Let's break it down:

I'd be happy to respond to something that isn't ChatGPT, thanks.


> Is it possible to give your parents access to to your browser in a safe way?

No.

Give them access to a browser running as a different user with different homedir? Sure, but that is not my browser.

Access to my browser in a private tab? Maybe, but that still isn't my browser. Still a danger though.

Anything that counts as "my browser" is not safe for me to give to someone else (whether parent or spouse or trusted advisor is irrelevant, they're all the same levels of insecurity).


That’s easy. Giving my parents a safe browser to utilize without me is the challenge.

Because there never were safe web browsers in the first place. The internet is fundamentally flawed and programmers are continously having to invent coping mechanisms to the underlying issue. This will never change.

You seem like the guy, who would call car airbags a coping mechanism.

Just because you can never have absolute safety and security doesn't mean that you should be deliberately introduce more vulnerabilities in a system. It doesn't mtif we're talking about operating systems or the browser itself.

We shouldn't be sacrificing every trade-off indiscriminately out of fear of being left behind in the "AI world".


To make it clear, I am fully against these types of AI tools. At least for as long as we did not solve security issues that come with them. We are really good at shipping bullshit nobody asked for without acknowledging security concerns. Most people out there can not operate a computer. A lot of people still click on obvious scam links they've received per email. Humanity is far from being ready for more complexity and more security related issues.

I think Simon has proposed breaking the lethal trifecta by having two LLMs, where the first has access to untrusted data but cannot do any actions, and the second LLM has privileges but only abstract variables from the first LLM not the content. See https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

It is rather similar to your option (b).


Can't the attacker then jailbreak the first LLM to generate jailbreak with actions for the second one?

If you read the fine article, you'll see that the approach includes a non-LLM controller managing structured communication between the Privileged LLM (allowed to perform actions) and the Quarantined LLM (only allowed to produce structured data, which is assumed to be tainted).

See also CaMeL https://simonwillison.net/2025/Apr/11/camel/ which incorporates a type system to track tainted data from the Quarantined LLM, ensuring that the Privileged LLM can't even see tainted _data_ until it's been reviewed by a human user. (But this can induce user fatigue as the user is forced to manually approve all the data that the Privileged LLM can access.)


"Structured data" is kind of the wrong description for what Simon proposes. JSON is structured but can smuggle a string with the attack inside it. Simon's proposal is smarter than that.

Yes they can

Hmm so we need 3 LLMs

Doesn't help.

https://gandalf.lakera.ai/baseline

This thing models exactly these scenarios and asks you to break it, its still pretty easy. LLMs are not safe.


One would have to be relatively invisible.

Non-deterministic security feels like a relatively new area.


That's just an information bottleneck. It doesn't fundamentally change anything.

In the future, any action with consequence will require crypto-withdrawal levels of security. Maybe even a face scan before you can complete it.

Ahh technology. The cause of, and _solution to_, all of life’s problems.

Didn't they do or prove that with messages on Reddit?

“Easily” is doing a lot of work there. “Possibly” is probably better. And of course it doesn’t have unfettered access to all of your private data.

I would look at it like hiring a new, inexperienced personal assistant: they can only do their job with some access, but it would be foolish to turn over deep secrets and great financial power on day one.


It's more like hiring a personal assistant who is expected to work all the time quickly and unsupervised, won't learn on the job, has shockingly good language skills but the critical thinking skills of a toddler.

If it can navigate to an arbitrary page (in your browser) then it can exploit long-running sessions and get into whatever isn't gated with an auth workflow.

Well i mean you are suppose to have a whole toolset of segregation, whitelist only networking, limited specific use cases figured out by now to use any of this AI stuff

Dont just run any of this stuff on your main machine


Oh yeah really? Do they check that and don't run unless you take those measures? If not 99.99% of users won't do it and saying "well ahkschually" isn't gonna solve the problem.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: