Hacker Newsnew | past | comments | ask | show | jobs | submit | Mikhail_Edoshin's commentslogin

But it leads to something more important.

"If I speak in the tongues of men or of angels, but do not have love, I am only a resounding gong or a clanging cymbal. If I have the gift of prophecy and can fathom all mysteries and all knowledge, and if I have a faith that can move mountains, but do not have love, I am nothing."


Paginated view is stable. Scrollable is not. This hinders spatial memory immensely, but goes unnoticed.

There are ID3 tags used for MP3 and other files. Old players may not know about them and may misread their data as MPEG frames. To prevent this a tag may break up sequences of 00 bytes with an FF byte. Or may not, because now most players are aware of tags. So there is a preference, at two levels, default and for a single tag. Not too hard to program, but rather unfriendly to a grammar-based approach.

(Another example are checksums.)


Good taste implies there is more than taste. It implies there is a true nature of things and what we call taste is a recognition of things that are true to their nature.

No need to contemplate platonic ideals; we've all experienced code that is relatively easy to read and modify, performs well, handles error well, etc.

The author's definition of taste as a prioritization of various engineering values is one we can understand based on experience.


Fitness for a purpose has a very deep meaning. In "The Timeless Way of Building" Christopher Alexander used this term as the last attempt to describe "the quality without a name".

Well, "a sufficiently advanced technology is indistinguishable from magic". It's just that it is same in a bad way, not a good way.

Russia has a working system that tracks retail sales of individual cans of beer, bottles of milk and such. Initially it was introduced to track things like shoes and furs that were massively counterfeited, but then expanded to include other goods. So now in a grocery store you use it, for example, for all milk products (milk, cheese, ice cream, etc.), vegetable oil, beer, mineral water. Technically you just scan a different barcode (QR code). There's also an app you can use to scan the thing and get more information, such as the exact producer. The general idea was to fight counterfeit goods, but as a side effect it also enforces shelf life rules or may help to find a drugstore that has a specific drug.

So it is possible and not that expensive even as a country-wide system for goods that cost around $1 (a typical can of beer).


And yes, it does have additional codes for larger-scale packages. So a pack of cigarettes gets its own code, a carton gets its own code, a box of cartons gets its own code. A wholesaler can just scan the box and the system updates the status of every pack inside.


What am I missing about this? Couldn't the scammer just replicate the QR code of a legit shop? I thought the point of counterfeit goods was to fool you into buying them instead of the real thing. I guess part of the process would have to be verifying that every shipment of goods received was accurately tracked from a valid "ship from" address, but I have to imagine there's a lot of common warehousing in use for bulk goods. I'm not understanding how the QR code helps solve that.

Maybe a unique bar code per-item that includes some private hash information that makes it unique to the producer? Sort of an electronic signature for physical goods? Then if there's a centralized database, copying the QR codes wouldn't do much good. You might be able to slip in one if it is sold before the real version. But each subsequent copy could be caught.


This is fascinating in the context how they use and abuse intermediaries to buy and smuggle western tech into Russia. If every chip were that well tracked, it would be a lot easier to clamp down on it..


Fwiw this was the original vision of upc


Long time ago I considered myself atheistic. Then I noticed a strange thing: I liked Chesterton. (Or Graham Greene; also an open Catholic). Why? Why their writings appeared more profound than others'? I couldn't not answer it then, so I just noted that and kept reading. I guess it made me more open. A Buddhist would say it was a good karmic sign.


One aspect of Unicode that is probably not obvious is that with Unicode it is possible to keep using old encodings just fine. You can always get their Unicode equivalents, this is what Unicode was about. Otherwise just keep the data as is, tagged with the encoding. This nicely extends to filesystem "encodings" too.


For example, modern Python internally uses three forms (Latin-1, UTF-16 and 32) depending on the contents of the string. But this can be done for all encodings and also for things like file names that do not follow Unicode. The Unicode standard does not dictate everything must take the same form; it can be used to keep existing forms but make them compatible.


Here is what an UTF-8 decoder needs to handle:

1. Invalid bytes. Some bytes cannot appear in an UTF-8 string at all. There are two ranges of these.

2. Conditionally invalid continuation bytes. In some states you read a continuation byte and extract the data, but in some other cases the valid range of the first continuation byte is further restricted.

3. Surrogates. They cannot appear in a valid UTF-8 string, so if they do, this is an error and you need to mark it so. Or maybe process them as in CESU but this means to make sure they a correctly paired. Or maybe process them as in WTF-8, read and let go.

4. Form issues: an incomplete sequence or a continuation byte without a starting byte.

It is much more complicated than UTF-16. UTF-16 only has surrogates that are pretty straightforward.


I've written some Unicode transcoders; UTF-8 decoding devolves to a quartet of switch statements and each of the issues you've mentioned end up being a case statement where the solution is to replace the offending sequence with U+FFFD.

UTF-16 is simple as well but you still need code to absorb BOMs, perform endian detection heuristically if there's no BOM, and check surrogate ordering (and emit a U+FFFD when an illegal pair is found).

I don't think there's an argument for either being complex, the UTFs are meant to be as simple and algorithmic as possible. -8 has to deal with invalid sequences, -16 has to deal with byte ordering, other than that it's bit shifting akin to base64. Normalization is much worse by comparison.

My preference for UTF-8 isn't one of code complexity, I just like that all my 70's-era text processing tools continue working without too many surprises. The features like self-synchronization are nice too compared to what we _could_ have gotten as UTF-8.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: