The internet would come to a grinding halt as everyone would suddenly become mindful of their browsing. It's not hard to imagine a situation where, say, pornhub sells its access data and the next day you get sacked at your teaching job.
It doesn't need to. Thanks to asymmetric cryptography governments can in theory provide you with a way to prove you are a human (or of a certain age) without:
1. the government knowing who you are authenticating yourself to
2. or the recipient learning anything but the fact that you are a human
3. or the recipient being able to link you to a previous session if you authenticate yourself again later
The EU is trying to build such a scheme for online age verification (I'm not sure if their scheme also extends to point 3 though. Probably?).
But I don't get how is goes for spam or scrapping: if I can pass the test "anonymously", then what prevents me from doing it for illegal purposes?
I get it for age verification: it is difficult for a child to get a token that says they are allowed to access porn because adults around them don't want them to access porn (and even though one could sell tokens online, it effectively makes it harder to access porn as a child).
But how does it prevent someone from using their ID to get tokens for their scrapper? If it's anonymous, then there is no risk in doing it, is there?
IIRC, you could use asymmetric cryptography to derive a site-specific pseudonymous token from the service and your government ID without the service knowing what your government ID is or the government provider knowing what service you are using.
The service then links the token to your account and uses ordinary detection measures to see if you're spamming, flooding, phishing, whatever. If you do, the token gets blacklisted and you can no longer sign on to that service.
This isn't foolproof - you could still bribe random people on the street to be men/mules in the middle and do your flooding through them - but it's much harder than just spinning up ten thousand bots on a residential proxy.
But that does not really answer my question: if a human can prove that they are human anonymously (by getting an anonymous token), what prevents them from passing that token to an AI?
The whole point is to prevent a robot from accessing the API. If you want to detect the robot based on its activity, you don't need to bother humans with the token in the first place: just monitor the activity.
It does not prevent a bot from using your ID. But a) the repercussions for getting caught are much more tangible when you can't hide behind anonymity - you risk getting blanket banned from the internet and b) the scale is significantly reduced - how many people are willing to rent/sell their IDs, i.e., their right to access the internet?
Edit: ok I see the argument that the feedback mechanism could be difficult when all the website can report is "hey, you don't know me but this dude from request xyz you just authenticated fucked all my shit up". But at the end of the day, privacy preservation is an implementation detail I don't see governments guaranteeing.
> But at the end of the day, privacy preservation is an implementation detail I don't see governments guaranteeing.
Sure, I totally see how you can prevent unwanted activity by identifying the users. My question was about the privacy-preserving way. I just don't see how that would be possible.
It does work as long as the attesting authority doesn't allow issuing a new identity (before it expires) if the old one is lost.
You (Y) generate a keypair and send your public key to the the attesting authority A, and keep your private key. You get a certificate.
You visit site b.com, and it asks for your identity, so you hash b.com|yourprivatekey. You submit the hash to b.com, along with a ZKP that you possess a private key that makes the hash work out, and that the private key corresponds to the public key in the certificate, and that the certificate has a valid signature from A.
If you break the rules of b.com, b.com bans your hash. Also, they set a hard rate limit on how many requests per hash are allowed. You could technically sell your hash and proof, but a scraper would need to buy up lots of them to do scraping.
Now the downside is that if you go to A and say your private key was compromised, or you lost control of it - the answer has to be tough luck. In reality, the certificates would expire after a while, so you could get a new hash every 6 months or something (and circumvent the bans), and if you lost the key, you'd need to wait out the expiry. The alternative is a scheme where you and A share a secret key - but then they can calculate your hash and conspire with b.com to unmask you.
Isn't the whole point of a privacy-preserving scheme be that you can ask many "certificates" to the attesting authority and it won't care (because you may need as many as the number of websites you visit), and the website b.com won't be able to link you to them, and therefore if it bans certificate C1, you can just start using certificate C2?
And then of course, if you need millions of certificates because b.com keeps banning you, it means that they ban you based on your activity, not based on your lack of certificate. And in that case, it feels like the certificate is useless in the first place: b.com has to monitor and ban you already.
There isn't a technical solution to this: governments and providers not only want proof of identity matching IDs, they want proof of life, too.
This will always end with live video of the person requesting to log in to provide proof of life at the very least, and if they're lazy/want more data, they'll tie in their ID verification process to their video pipeline.
That's not not the kind of proof of life the government and companies want online. They want to make sure their video identification 1) is of a living person right now, and 2) that living person matches their government ID.
It's a solution to the "grandma died but we've been collecting her Social Security benefits anyway", or "my son stole my wallet with my ID & credit card", or (god forbid) "We incapacitated/killed this person to access their bank account using facial ID".
It's also a solution to the problem advertisers, investors and platforms face of 1) wanting huge piles of video training data for free and 2) determining that a user truly is a monetizable human being and not a freeloader bot using stolen/sold credentials.
> That's not not the kind of proof of life the government and companies want online.
Well that's your assumption about governments, but it doesn't have to be true. There are governments that don't try to exploit their people. The question is whether such governments can have technical solutions to achieve that or not (I'm genuinely interested in understanding whether or not it's technically feasible).
It's the kind of proof my government already asks of me to sign documents much, much more important than watching adult content, such as social security benefits.
Such schemes have the fatal flaw that they can be trivially abused. All you need are a couple of stolen/sold identities and bots start proving their humanness and adultness to everyone.
> Such schemes have the fatal flaw that they can be trivially abused
I wouldn't expect the abuse rate to be higher than what it is for chip-and-pin debit cards. PKI failure modes are well understood and there are mitigations galore.
I did think asymmetric cryptography but I assumed the validators would be third parties / individual websites and therefore connections could be made using your public key. But I guess having the government itself provide the authentication service makes more sense.
I wonder if they'd actually honor 1 instead of forcing recipients to be registered, as presumably they'd be interested in tracking user activity.
Besides making yourself party to a criminal conspiracy, I suspect it would be partly the same reason you won't sell/rent your real-world identity to other people today; an illegal immigrant may be willing to rent it from you right now.
Mostly, it will because online identifies will be a market for lemons: there will be so many fake/expired/revoked identities being sold that the value of each one will be worth pennies, and that's not commensurate with the risk of someone commiting crimes and linking it to your government-registered identity.
> the same reason you won't sell/rent your real-world identity to other people today
If you sell your real-world identity to other people today, and they get arrested, then the police will know your identity (obviously). How does that work with a privacy-preserving scheme? If you sell your anonymous token that says that you are a human to a machine and the machine gets arrested, then the police won't be able to know who you are, right? That was the whole point of the privacy-preserving token.
I'm genuinely interested, I don't understand how it can work technically and be privacy-preserving.
> With privacy preserving cryptography the tokens are standalone and have no ties to the identity that spawned them.
I suspect there will be different levels of attestations from the anonymous ("this is an adult"), to semi-anonymous ("this person was born in 20YY and resides in administrative region XYZ") to the compete record ("This is John Quincy Smith III born on YYYY-MM-DD with ID doc number ABC123"). Somewhere in between the extremes is an pseudonymous token that's strongly tied to a single identity with non-repudiation.
Anonymous identities that can be easily churned out on demand by end-users have zero antibot utility
100% agree, but it will be necessary for any non-repudiation use cases, like signing contracts remotely. There is no one size fits all approach for online identity management.
While it's the privacy advocate's ideal, the politics reality is very few governments will deploy "privacy preserving" cryptography that gets in the way of LE investigations[1]. The best you can hope for is some escrowed service that requires a warrant to unmask the identity for any given token, so privacy is preserved in most cases, and against most parties except law enforcement when there's a valid warrant.
1. They can do it overtly in thr design of the system, or covertly via side-channels, logging, or leaking bits in ways that are hard for an outsider to investigate without access to the complete source code and or/system outputs, such as not-quite-random pseudo-randoms.
> Mostly, it will because online identifies will be a market for lemons: there will be so many fake/expired/revoked identities being sold that the value of each one will be worth pennies, and that's not commensurate with the risk of someone commiting crimes and linking it to your government-registered identity.
That would be trivially solved by using same verification mechanisms they would be used with.
You are right about the negative outcomes that this might have but you have way too much faith in the average person caring enough before it happens to them.
I live with the naïve and optimistic dream that something like that would just show that everyone was in the list so they can't use it to discriminate against people.