More

1vuio0pswjnm7 · 2025-08-23T20:46:20 1755981980

No user-agent header required:

https://web.archive.org/web/20250823105045if_/https://utcc.u...

1vuio0pswjnm7 · 2025-08-23T02:55:41 1755917741

"On Meta's fourth-quarter earnings call with analysts, in January, he predicted that 2025 was going to be a "pivotal year for the metaverse". In July's second-quarter earnings call, the metaverse only warranted one solitary mention in passing, in response to an analyst question about the company's "AI glasses".

In contrast, AI was mentioned 62 times, and another 12 times on a follow-up call, according to AlphaSense. "Superintelligence" got 20 shout-outs."

Normally I skip the images but this one is perfect:

https://www.ft.com/__origami/service/image/v2/images/raw/htt...

1vuio0pswjnm7 · 2025-08-22T16:53:33 1755881613

What is interesting IMO about the "confidently wrong" phenomenon is that this was also commonly found in internet forums and online commentary in general prior to widespread use of today's confidently wrong "AI". That is, online commenters routinely were and still are "confidently wrong". IMHO and IME, the "confidentlay wrong" phenonmenon was and still is greater represented in online commnentary than "IRL".

No surprise IMO that, generally, online commenters and so-called "tech" companies who tend to be overly fixated on computers as the solution to all problems, are also the most numerous promoters of confidently wrong "AI".

The nature of the medium itself and those so-called "tech" companies that have sought to dominate it through intermediation and "ad services"^1 could have something to do with the acceptance and promotion of confidently wrong "AI". Namely, its ability to reduce critical thinking and the relative ease with which uninformed opinions, misinformation, and other non-factual "confidently wrong" information can be spread by virtually anyone.

1. If "confidently wrong" information is popular, if it "goes viral", then with few exceptions it will be promoted by these companies to drive traffic and increase ad services revenue.

Please note: I could be wrong.

1vuio0pswjnm7 · 2025-08-22T02:28:45 1755829725

First time I read The Onion was at University of Wisconsin, Madison in the 1980s. There was no "online edition" at that time

Even as a student newspaper it was remarkably funny

1vuio0pswjnm7 · 2025-08-21T18:21:27 1755800487

"It's just that some greedy companies are writing incredibly shitty crawlers that don't follow any of the enstablished [sic] conventions (respecting robots.txt, using proper UA string, rate limiting, whatever)."

How does "proper UA string" solve this "blowing up websites" problem

The only thing that matters with respect to the "blowing up websites" problem is rate-limiting, i.e., behaviour

"Shitty crawlers" are a nuisance because of their behaviour, i.e., request rate, not because of whatever UA string they send; the behaviour is what is "shitty" not the UA string. The two are not necessarily correlated and any heuristic that naively assumes so is inviting failure

"Spoofed" UA strings have been facilitated and expected since the earliest web browsers

For example,

https://raw.githubusercontent.com/alandipert/ncsa-mosaic/mas...

To borrow the parent's phrasing, the "blowing up websites" problem has nothing to do with UA string specifically

It may have something to do with website operator reluctance to set up rate-limiting though; this despite widespread impelementation of "web APIs" that use rate-limiting

NB. I'm not suggesting rate-limiting is a silver bullet. I'm suggesting that without rate-limiting, UA string as a means of addressing the "blowing up websites" problem is inviting failure

AbortedLaunch · 2025-08-21T18:45:43 1755801943

Some of these crawlers appear to be designed to avoid rate limiting based on IP. I regularly see millions of unique ips doing strange requests, each just one or at most a few per day. When a response contains a unique redirect I often see a geographically distinct address fetching the destination.

1vuio0pswjnm7 · 2025-08-21T21:34:58 1755812098

"I regularly see millions of unique ips doing strange requests, each just one or at most a few per day."

How would UA string help

For example, a crawler making "strange" requests can send _any_ UA string, and a crawler doing "normal" requests can also send _any_ UA string.

The "doing requests" is what I refer to as "behaviour"

A website operator might think "Crawlers making strange requests send UA string X but not Y"

Let's assume the "strange" requests cause a "website load" problem^1

Then a crawler, or any www user, makes a "normal" request and sends UA string X; the operator blocks or redirects the request, unnecessarily

Then a crawler makes "strange" request and sends UA string Y; the operator allows the request and the website "blows up"

What matters for the "blowing up websites" problem^1 is behaviour, not UA string

1. The article's title calls it the "blowing up websites" problem, but the article text calls it a problem with "website load". As always the details are missing. For example, what is the "load" at issue. Is it TCP connections or HTTP requests. What number of simultaneous connections and/or requests per second is acceptable, what number is not unacceptable. Again, behaviour is the issue, not UA string

The acceptable numbers need to be published; for example, see documentation for "web APIs"

AbortedLaunch · 2025-08-22T05:45:41 1755841541

I do not make any point on UA-strings, just on the difficulty of rate limiting.

1vuio0pswjnm7 · 2025-08-22T16:24:56 1755879896

"Some of these crawlers appear to be designed to avoid rate limiting based on IP."

Unless the rate is exceeded, the limit is not being avoided

"I regularly see millions of unique ips doing strange requests, each just one or at most a few per day."

Assuming the rate limit is more than one or a few requests every 24h this would be complying with the limit, not avoiding it

It could be that sometimes the problem website operators are concerned about is not "website load", i.e., the problem the article is discussing, it is actually something else (NB. I am not speculating about this particular operator, I am making a general observation)

If a website is able to fulfill all requests from unique IPs without affecting quality of service, then it stands to reason "website load" is not a problem the website operator is having

For example, the article's title claims Meta is amongst the "worst offenders" of creating excessive website load caused by "AI crawlers, fetchers"

Meta has been shown to have used third party proxy services wth rotating IP addresses in order to scrape other websites; it also sued one of these services because it was being used to scrape Meta's website, Facebook

https://brightdata.com/blog/general/meta-dismisses-claim-aga...

Whether the problem that Meta was having with this "scraping" was "website load" is debatable; if the requests were being fulfilled without affecting QoS, then arguably "website load" was not a problem

Rate-limiting addresses the problem of website load; it allows website operators to ensure that requests from all IP addresses are adequately served as opposed to preferentially servicing some IP addresses to the detriment of others (degraded QoS)

Perhaps some website operators become concerned that many unique IP addresses may be under the control of a single entity, and that this entity may be a competitor; this could be a problem for them

But if their website is able to fulfill all the requests it receives without degrading QoS then arguably "website load" is not a problem they are having

NB. I am not suggesting that a high volume of requests from a single entity, each complying with a rate-limit is acceptable, nor am I making any comment about the practice of "scraping" for commercial gain. I am only commenting about what rate-limiting is designed to do and whether it works for that purpose

1vuio0pswjnm7 · 2025-08-21T17:51:18 1755798678

Solution looking for problem

1vuio0pswjnm7 · 2025-08-21T04:55:40 1755752140

https://web.archive.org/web/20250820155850if_/https://browse...

1vuio0pswjnm7 · 2025-08-20T22:38:25 1755729505

"Indeed, as long ago as the 1960s, that phenomenon was noticed by Joseph Weizenbaum, the designer of the pioneering chatbot ELIZA, which replicated the responses of a psychotherapist so convincingly that even test subjects who knew they were conversing with a machine thought it displayed emotions and empathy.

"What I had not realized," Weizenbaum wrote in 1976, "is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." Weizenbaum warned that the "reckless anthropomorphization of the computer" - that is, treating it as some sort of thinking companion - produced a "simpleminded view of intelligence.""

https://www.theguardian.com/technology/2023/jul/25/joseph-we...

Weizenbaum's 1976 book: https://news.ycombinator.com/item?id=36875958

HN commenter rates this "greatest tech book of all-time":

https://news.ycombinator.com/item?id=36592209

1vuio0pswjnm7 · 2025-08-20T16:57:17 1755709037

095.1 fix re: title where z==15 [no url]

    /*```````````title```````````*/
   <MODE1>>[^<]* if(z==15){
    z=0;
    if(yyleng==2)fwrite("[no title]",1,10,yyout);
    for(x=1;x<yyleng;x++)
    putc(yytext[x],yyout);
    fwrite("',",1,2,yyout); 
    }

1vuio0pswjnm7 · 2025-08-19T19:17:11 1755631031

Wishful thinking: OpenWRT userland can now replace dnsmasq with two separate programs. The DHCP server, odhcpd, is already included (for DHCP6). They just need to write the DNS software.

I always disable/remove dnsmasq when I can. Compared to the alternatives, I have never liked it. This is at least the second major dnsmasq coding mistake that has been published in recent memory.^1 Pi-Hole was based on dnsmasq which turned me off that as well.

1.

https://www.jsof-tech.com/wp-content/uploads/2021/01/DNSpooq...

https://www.cisa.gov/news-events/ics-advisories/icsa-21-019-...

https://www.malwarebytes.com/blog/news/2021/01/dnspooq-the-b...

https://web.archive.org/web/20210119133618if_/https://www.js...

https://seclists.org/oss-sec/2021/q1/49

Anyway, never gonna happen. Just wishful thinking.

cnst · 2025-08-19T19:43:13 1755632593

Would OpenWrt even be vulnerable in the first place?

If you're using dnsmasq behind NAT or a stateful firewall, how would an attacker be able to access the service in the first place?

aspenmayer · 2025-08-20T01:34:59 1755653699

In the past, this has been the case. I looked and didn’t see anything on the forum about this news, but it may be too soon to hit the forum? I don’t visit it very often.

https://forum.openwrt.org/t/security-advisory-2021-01-19-1-d...

https://openwrt.org/advisory/2021-01-19-1

dc396 · 2025-08-19T19:49:36 1755632976

While the functionality/complexity of dnsmasq makes me nervous and I use it (I don't have a use case for it), it isn't clear to me that dnsmasq is doing anything wrong in this particular case.

yjftsjthsd-h · 2025-08-19T19:36:05 1755632165

> This is at least the second major dnsmasq coding mistake that has been published in recent memory.

What was the first?

tptacek · 2025-08-19T19:41:07 1755632467

There was like a memory corruption RCE not long ago.

aspenmayer · 2025-08-20T01:44:05 1755654245

I think there were two sets of 7 total vulnerabilities at the same time so they might be perceived as one event? I don’t know for sure, the wording was kind of ambiguous.

https://openwrt.org/advisory/2021-01-19-1

> Dnsmasq has two sets of vulnerabilities, one set of memory corruption issues handling DNSSEC and a second set of issues validating DNS responses. These vulnerabilities could allow an attacker to corrupt memory on the target device and perform cache poisoning attacks against the target environment.

> These vulnerabilities are also tracked as ICS-VU-668462 and referred to as DNSpooq.

https://web.archive.org/web/20250121143405/https://www.jsof-...

> DNSpooq - Kaminsky attack is back!

> 7 new vulnerabilities are being disclosed in common DNS software dnsmasq, reminiscent of 2008 weaknesses in Internet DNS Architecture

Some less breathless sourcing, though I can’t blame OP for being excited in the above post:

https://www.kb.cert.org/vuls/id/434904

https://www.cisa.gov/news-events/ics-advisories/icsa-21-019-...

noinsight · 2025-08-19T23:04:55 1755644695

You can just use Unbound for DNS.

stock_toaster · 2025-08-20T00:17:18 1755649038

Unbound unfortunately has some a pair of issues ([1][2]) that in some situations (adblocking, source address based dns selection) can make it a less than optimal match for some use-cases.

[1]: https://github.com/NLnetLabs/unbound/issues/132

[2]: https://github.com/NLnetLabs/unbound/issues/210

1vuio0pswjnm7 · 2025-08-20T20:25:24 1755721524

From https://github.com/NLnetLabs/unbound/issues/132

"Some users of our service (NextDNS), discovered this issue since edgekey.net has been added to some anti-tracker blocklists, resulting in the blocking of large sites like apple.com, airbnb.com, ebay.com when used with unbound."

As Pi-Hole is a modified dnsmasq, NextDNS may be a modified unbound

1vuio0pswjnm7 · 2025-08-20T00:14:36 1755648876

I use tinydns or nsd

You can use unbound

I do not use a cache

For HTTP I use a localhost-bound TLS forward proxy that has the DNS data in memory; I gather the DNS data in bulk from various sources using various methods; there are no remote DNS queries when I make HTTP requests

Unbound is overkill for how I use DNS on the local network

throw0101c · 2025-08-20T00:19:37 1755649177

Unbound is a recursive-only resolver. NSD is an authoritative-only resolver.

Those are different use cases.

1vuio0pswjnm7 · 2025-08-20T02:01:12 1755655272

"Unbound is a recursive-only resolver"

https://raw.githubusercontent.com/NLnetLabs/unbound/master/d...

Unbound can also answer queries from data in a text file read into memory at startup, like an authoritative nameserver would; no recursion

JdeBP · 2025-08-20T01:46:06 1755654366

Psst! NSD isn't a "resolver" at all. Traditional DNS terminology is tricky to use (given that what is covered by "resolver" in the RFCs does not match how most people see the system as divided up) but something that does not do the resolving part at all is definitely not a resolver.

* https://jdebp.uk/FGA/dns-server-roles.html