"On Meta's fourth-quarter earnings call with analysts, in January, he predicted that 2025 was going to be a "pivotal year for the metaverse". In July's second-quarter earnings call, the metaverse only warranted one solitary mention in passing, in response to an analyst question about the company's "AI glasses".
In contrast, AI was mentioned 62 times, and another 12 times on a follow-up call, according to AlphaSense. "Superintelligence" got 20 shout-outs."
Normally I skip the images but this one is perfect:
What is interesting IMO about the "confidently wrong" phenomenon is that this was also commonly found in internet forums and online commentary in general prior to widespread use of today's confidently wrong "AI". That is, online commenters routinely were and still are "confidently wrong". IMHO and IME, the "confidentlay wrong" phenonmenon was and still is greater represented in online commnentary than "IRL".
No surprise IMO that, generally, online commenters and so-called "tech" companies who tend to be overly fixated on computers as the solution to all problems, are also the most numerous promoters of confidently wrong "AI".
The nature of the medium itself and those so-called "tech" companies that have sought to dominate it through intermediation and "ad services"^1 could have something to do with the acceptance and promotion of confidently wrong "AI". Namely, its ability to reduce critical thinking and the relative ease with which uninformed opinions, misinformation, and other non-factual "confidently wrong" information can be spread by virtually anyone.
1. If "confidently wrong" information is popular, if it "goes viral", then with few exceptions it will be promoted by these companies to drive traffic and increase ad services revenue.
"It's just that some greedy companies are writing incredibly shitty crawlers that don't follow any of the enstablished [sic] conventions (respecting robots.txt, using proper UA string, rate limiting, whatever)."
How does "proper UA string" solve this "blowing up websites" problem
The only thing that matters with respect to the "blowing up websites" problem is rate-limiting, i.e., behaviour
"Shitty crawlers" are a nuisance because of their behaviour, i.e., request rate, not because of whatever UA string they send; the behaviour is what is "shitty" not the UA string. The two are not necessarily correlated and any heuristic that naively assumes so is inviting failure
"Spoofed" UA strings have been facilitated and expected since the earliest web browsers
To borrow the parent's phrasing, the "blowing up websites" problem has nothing to do with UA string specifically
It may have something to do with website operator reluctance to set up rate-limiting though; this despite widespread impelementation of "web APIs" that use rate-limiting
NB. I'm not suggesting rate-limiting is a silver bullet. I'm suggesting that without rate-limiting, UA string as a means of addressing the "blowing up websites" problem is inviting failure
Some of these crawlers appear to be designed to avoid rate limiting based on IP. I regularly see millions of unique ips doing strange requests, each just one or at most a few per day. When a response contains a unique redirect I often see a geographically distinct address fetching the destination.
"I regularly see millions of unique ips doing strange requests, each just one or at most a few per day."
How would UA string help
For example, a crawler making "strange" requests can send _any_ UA string, and a crawler doing "normal" requests can also send _any_ UA string.
The "doing requests" is what I refer to as "behaviour"
A website operator might think "Crawlers making strange requests send UA string X but not Y"
Let's assume the "strange" requests cause a "website load" problem^1
Then a crawler, or any www user, makes a "normal" request and sends UA string X; the operator blocks or redirects the request, unnecessarily
Then a crawler makes "strange" request and sends UA string Y; the operator allows the request and the website "blows up"
What matters for the "blowing up websites" problem^1 is behaviour, not UA string
1. The article's title calls it the "blowing up websites" problem, but the article text calls it a problem with "website load". As always the details are missing. For example, what is the "load" at issue. Is it TCP connections or HTTP requests. What number of simultaneous connections and/or requests per second is acceptable, what number is not unacceptable. Again, behaviour is the issue, not UA string
The acceptable numbers need to be published; for example, see documentation for "web APIs"
"Some of these crawlers appear to be designed to avoid rate limiting based on IP."
Unless the rate is exceeded, the limit is not being avoided
"I regularly see millions of unique ips doing strange requests, each just one or at most a few per day."
Assuming the rate limit is more than one or a few requests every 24h this would be complying with the limit, not avoiding it
It could be that sometimes the problem website operators are concerned about is not "website load", i.e., the problem the article is discussing, it is actually something else (NB. I am not speculating about this particular operator, I am making a general observation)
If a website is able to fulfill all requests from unique IPs without affecting quality of service, then it stands to reason "website load" is not a problem the website operator is having
For example, the article's title claims Meta is amongst the "worst offenders" of creating excessive website load caused by "AI crawlers, fetchers"
Meta has been shown to have used third party proxy services wth rotating IP addresses in order to scrape other websites; it also sued one of these services because it was being used to scrape Meta's website, Facebook
Whether the problem that Meta was having with this "scraping" was "website load" is debatable; if the requests were being fulfilled without affecting QoS, then arguably "website load" was not a problem
Rate-limiting addresses the problem of website load; it allows website operators to ensure that requests from all IP addresses are adequately served as opposed to preferentially servicing some IP addresses to the detriment of others (degraded QoS)
Perhaps some website operators become concerned that many unique IP addresses may be under the control of a single entity, and that this entity may be a competitor; this could be a problem for them
But if their website is able to fulfill all the requests it receives without degrading QoS then arguably "website load" is not a problem they are having
NB. I am not suggesting that a high volume of requests from a single entity, each complying with a rate-limit is acceptable, nor am I making any comment about the practice of "scraping" for commercial gain. I am only commenting about what rate-limiting is designed to do and whether it works for that purpose
"Indeed, as long ago as the 1960s, that phenomenon was noticed by Joseph Weizenbaum, the designer of the pioneering chatbot ELIZA, which replicated the responses of a psychotherapist so convincingly that even test subjects who knew they were conversing with a machine thought it displayed emotions and empathy.
"What I had not realized," Weizenbaum wrote in 1976, "is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." Weizenbaum warned that the "reckless anthropomorphization of the computer" - that is, treating it as some sort of thinking companion - produced a "simpleminded view of intelligence.""
Wishful thinking: OpenWRT userland can now replace dnsmasq with two separate programs. The DHCP server, odhcpd, is already included (for DHCP6). They just need to write the DNS software.
I always disable/remove dnsmasq when I can. Compared to the alternatives, I have never liked it. This is at least the second major dnsmasq coding mistake that has been published in recent memory.^1 Pi-Hole was based on dnsmasq which turned me off that as well.
In the past, this has been the case. I looked and didn’t see anything on the forum about this news, but it may be too soon to hit the forum? I don’t visit it very often.
While the functionality/complexity of dnsmasq makes me nervous and I use it (I don't have a use case for it), it isn't clear to me that dnsmasq is doing anything wrong in this particular case.
I think there were two sets of 7 total vulnerabilities at the same time so they might be perceived as one event? I don’t know for sure, the wording was kind of ambiguous.
> Dnsmasq has two sets of vulnerabilities, one set of memory corruption issues handling DNSSEC and a second set of issues validating DNS responses. These vulnerabilities could allow an attacker to corrupt memory on the target device and perform cache poisoning attacks against the target environment.
> These vulnerabilities are also tracked as ICS-VU-668462 and referred to as DNSpooq.
Unbound unfortunately has some a pair of issues ([1][2]) that in some situations (adblocking, source address based dns selection) can make it a less than optimal match for some use-cases.
"Some users of our service (NextDNS), discovered this issue since edgekey.net has been added to some anti-tracker blocklists, resulting in the blocking of large sites like apple.com, airbnb.com, ebay.com when used with unbound."
As Pi-Hole is a modified dnsmasq, NextDNS may be a modified unbound
For HTTP I use a localhost-bound TLS forward proxy that has the DNS data in memory; I gather the DNS data in bulk from various sources using various methods; there are no remote DNS queries when I make HTTP requests
Unbound is overkill for how I use DNS on the local network
Psst! NSD isn't a "resolver" at all. Traditional DNS terminology is tricky to use (given that what is covered by "resolver" in the RFCs does not match how most people see the system as divided up) but something that does not do the resolving part at all is definitely not a resolver.
https://web.archive.org/web/20250823105045if_/https://utcc.u...
reply