SQLite's File Format

alphazard · 2025-09-07T14:06:45 1757254005

SQLite is a great example of a single factor mattering more than everything else combined. A database contained in a single file is such a good idea that it outweighs a poorly designed storage layer, poorly designed column formats, and a terrible SQL implementation.

If craftsmanship is measured by the long tail of good choices that give something a polished and pristine feel, then SQLite was built with none of it. And yet, it's by far the best initial choice for every project that needs a database. Most projects will never need to switch to anything more.

sethev · 2025-09-07T19:26:22 1757273182

This seems like an unnecessarily negative comment. I've been a user of SQLite for over 20 years now (time flies!), what you're calling lack of polish, I would chalk up to Dr. Hipp has been consciousness about maintaining compatibility over the long term. So much so, that the Library of Congress recommends it for long-term preservation of data.

Long term compatibility (i.e. prioritizing the needs of users vs chasing inevitably changing ideas about what feels polished or pristine), near fanatical dedication to testing and quality, and sustained improvement over decades - these are the actual signs of true craftsmanship in an engineering project.

(plus, I don't agree with you that the storage layer, column format, or SQL implementation are bad).

alphazard · 2025-09-07T22:47:09 1757285229

> I would chalk up to Dr. Hipp has been consciousness about maintaining compatibility over the long term.

I agree. I am not suggesting that the SQLite team doesn't know how to make the technology better. Just that they aren't/haven't. Backwards compatibility is a good reason not to.

My original comment was contrasting craftsmanship and utility, since both are somewhat prized on HN, but they aren't the same thing at all. Look at a system like Wireguard. A huge amount of small decisions went into making that as simple and secure as it is. When most developers are confronted with similar decisions, they perform almost randomly and accumulate complexity over the long tail of decisions (it doesn't matter just pick a way). With Wireguard, every design decision reliably drove toward simplicity (it does matter, choose carefully).

jmull · 2025-09-07T23:28:45 1757287725

I don't think they ever hesitate to make sqlite better. It's just that they have a different definition of "better" than you.

ttz · 2025-09-08T02:45:51 1757299551

> contrasting craftsmanship and utility, since both are somewhat prized on HN

I'd say they're prized everywhere, though "craftsmanship" is really subjective. and the HN I usually [edit/add: see] seems to have more a meta of "criticize anything someone tries to build, and rave about IQ" tbh ;)

SQLite works and I don't have to think about it why it works (too much). That is IMO a true hallmark of solid engineering.

pmarreck · 2025-09-07T14:35:16 1757255716

> If craftsmanship is measured by the long tail of good choices that give something a polished and pristine feel, then SQLite was built with none of it.

It apparently has an extensive and thorough test suite. That's an excellent design choice that tons of other projects could learn from, and is probably a key element of its success.

Sometimes a poorly-designed thing that is excellently-documented and thoroughly-tested is better than a brilliantly-designed thing that is lacking in those. In fact, unless the number of users of the thing is 1 (the creator), the former is likely a better option across all possible use-cases.

Perhaps we could generalize this by stating that determinism > pareto-optimality.

chasil · 2025-09-07T15:03:56 1757257436

Digital Equipment Corporation sold a SQL database known as Rdb that could also run as a single file.

It was the first database to introduce a cost-based optimizer, and ran under both VMS and Digital UNIX.

Oracle bought it, and VMS versions are still supported.

https://www.oracle.com/database/technologies/related/rdb.htm...

https://en.m.wikipedia.org/wiki/Oracle_Rdb

(My employer is still using the VMS version.)

owyn · 2025-09-07T17:53:12 1757267592

Oh! RDB was the first database I worked with. I forgot all about it. I do remember refactoring the data layer so that it also worked with Berkeley DB, which is also owned by Oracle now. Or maybe it was the other way around? There was no SQL involved in that particular application so it was just a K/V store. Working with a local data file was the primary design goal, no client/server stuff was even on the radar. SQLite would have been perfect if it had existed.

kevin_thibedeau · 2025-09-07T15:39:29 1757259569

It was designed to be a DB for Tcl at a time when that language didn't have typed objects. Its SQL implementation reflects that. Where are the grand Python, or Perl, or JS DBs?

skissane · 2025-09-07T22:53:26 1757285606

It actually does have typed values, it is just the schema didn’t constrain the value types stored in each column, until relatively recently the column type was mostly just documentation. However, now it has STRICT tables which do constrain the value types of columns. And for a lot longer you’ve been able to implement the same thing manually using check constraints-which is a bit verbose if you are writing the schema by hand, much less of a problem if it is being generated out of ORM model classes/etc

degamad · 2025-09-08T02:00:28 1757296828

>> It was designed to be a DB for Tcl at a time when that language didn't have typed objects. Its SQL implementation reflects that.

> It actually does have typed values

Now. As the article points out, they were not part of the initial design, because of the Tcl heritage.

skissane · 2025-09-08T05:58:47 1757311127

AFAIK it has always had typed values. Don’t confuse column types (which constrain a column to containing only values of a specified type) with value types (which enable it to treat the string “12” and the integer 12 and the floating point 12.0 as three distinct values)

Tcl has value types. Tcl 7.x and earlier only had one data type, the string-so adding two integers required two string-to-int conversions followed by an int-to-string conversion. In 1997, Tcl 8.x was released, which internally has distinct values types (int, string, etc), although it retains the outward appearance of “everything-is-a-string” for backward compatibility. So SQLite’s Tcl heritage included distinguishing different types of values, as is done in post-1997 Tcl.

codesnik · 2025-09-07T19:11:43 1757272303

I've never used it, but perl contains support for Berkeley DB in stdlib since forever. But sqlite maps to perl just fine.

miohtama · 2025-09-07T19:26:22 1757273182

ZODB https://zodb.org/en/latest/

Jabbles · 2025-09-07T15:31:56 1757259116

> poorly designed storage layer, poorly designed column formats, and a terrible SQL implementation

Is this opinion shared by others?

chasil · 2025-09-07T15:54:41 1757260481

Dr. Hipp has said several times that nobody expected a weakly-typed database to achieve the pervasiveness that is observed with SQLite.

At the same time, strict tables address some of the concern of those coming from conventional databases.

Dates and times are a core problem to SQLite not seen elsewhere as far as I know, but this does evade UTC and constantly shifting regional time. My OS gets timezone updates every few months, and avoiding that had foresight.

Default conformance with Postel's Law is SQLite's stance, and it does seem to work with the ANSI standard.

SQLite · 2025-09-07T19:05:49 1757271949

> Dr. Hipp has said several times that nobody expected a weakly-typed database to achieve the pervasiveness that is observed with SQLite.

I don't remember ever saying that. Rather, see https://sqlite.org/flextypegood.html for detailed explanation of why I think flexible typing ("weak typing" is a purgative and inaccurate label) is a useful and innovative feature, not a limitation or a bug. I am surprised at how successful SQLite has become, but if anything, the flexible typing system is a partial explanation for that success, not a cause of puzzlement.

chasil · 2025-09-07T19:51:57 1757274717

Did I misinterpret the experts' assertion of imposibility?

"I had this crazy idea that I’m going to build a database engine that does not have a server, that talks directly to disk, and ignores the data types, and if you asked any of the experts of the day, they would say, “That’s impossible. That will never work. That’s a stupid idea.” Fortunately, I didn’t know any experts and so I did it anyway, so this sort of thing happens. I think, maybe, just don’t listen to the experts too much and do what makes sense. Solve your problem."

https://corecursive.com/066-sqlite-with-richard-hipp/

jmull · 2025-09-07T21:06:48 1757279208

> Did I misinterpret the experts' assertion of imposibility?

Misstated, I'd say. You said "nobody" but the actual quote is about the assumed conventional wisdom of the time, which is quite different. And while this was probably inadvertent, you phrased it in a way that almost made it sound like that was Dr. Hipp's original opinion, which, of course, is the opposite of true.

alberth · 2025-09-07T16:55:29 1757264129

While nobody expected it … it should not be unexpected.

Typically, the Lowest-Common-Denominator wins mass appeal/uasge.

By not having safety checks and even typing enforcement, SQLite caters to actually more use cases than less.

chrisweekly · 2025-09-07T19:14:01 1757272441

I often forget or mix up which "Law" refers to which observation, and I'm surely not the only one. So:

Postel's Law, also known as the Robustness Principle, is a guideline in software design that states: "be conservative in what you send, be liberal in what you accept."

formerly_proven · 2025-09-07T18:07:07 1757268427

SQLite probably doesn't do anything with times and dates except punting some functions to the limited libc facilities because including any proper date-time facilities would basically double the footprint of SQLite. Same for encodings and collations.

62 · 2025-09-08T05:23:23 1757309003

Same for encodings and collations.

da_chicken · 2025-09-07T16:26:18 1757262378

I think it's one of the reasons DuckDB has seen the popularity that it has.

benjiro · 2025-09-07T19:49:42 1757274582

DuckDB is a columnar database, and columnar DBs are way better for analytics, statistics... That is its main reason for its popularity, the ability to run specific workloads that row based databases will struggle/be slower at.

Nothing to do with the posters badly formatted complained about Sqlite. By that metric DuckDB has a ton of issues that even out scale Sqlite.

qaq · 2025-09-07T19:50:30 1757274630

thats a strange argument DuckDB is for OLAP and SQLite is for OLTP

da_chicken · 2025-09-08T05:00:12 1757307612

Yeah, but most applications are small. So, at the scale of most applications you can drop in DuckDB with zero change in actual performance. It still has indexes to support highly selective queries because it needs to have functional primary keys.

crazygringo · 2025-09-07T23:31:51 1757287911

> outweighs a poorly designed storage layer, poorly designed column formats, and a terrible SQL implementation

You're going to have to expand on that, because I have no idea what you're talking about, nor does anyone else here seem to.

It's a relational database meant primarily for a single user. It's SQL. It works as expected. It's performant. It's astonishingly reliable.

The only obviously questionable design decision I'm aware of is for columns to be able to mix types, but that's more "differently designed" rather than "poorly designed", and it's actually fantastic for automatically saving space on small integers. And maybe the fact ALTER TABLE is limited, but there are workarounds and it's not like you'll be doing that much in production anyways.

What are your specific problems with it?

pluto_modadic · 2025-09-08T06:08:48 1757311728

I think they do a good job with test coverage, compatibility, and sustainable support. Can't say that about most every other hype database made by a fortune 500 and shut down 3 years later.

christophilus · 2025-09-07T16:56:38 1757264198

Firebird also fits the bill, I think, but never took off. Firebird even supports client-server deployments.

AlexClickHouse · 2025-09-07T20:09:16 1757275756

Exactly as in MS Access, Interbase/Firebird, and dBase II.

mockingloris · 2025-09-07T12:45:22 1757249122

> From the official SQLite Database File Format page.

The maximum size database would be 4294967294 pages at 65536 bytes per page or 281,474,976,579,584 bytes (about 281 terabytes).

Usually SQLite will hit the maximum file size limit of the underlying filesystem or disk hardware long before it hits its own internal size limit.

yread · 2025-09-07T16:51:43 1757263903

The kioxia lc9 is sold with capacities up to 245TB, so we are like 1 year max away from having a single disk with more than 281TB

saghm · 2025-09-07T13:00:52 1757250052

"Usually"? I'm not saying there are literally no computers in existence that might have this much space on a single filesystem, but...has there ever been a known case of someone hitting this limit with a single SQLite file?

wongarsu · 2025-09-07T14:39:12 1757255952

That's just 10 30TB HDDs. Throw in two more for redundancy and mount them in a single zfs raidz2 (a fancy RAID6). At about $600 per drive that's just $7200. Half that if you go with 28TB refurbished drives (throw in another drive to make up for lost capacity). That is in the realm of lots of people's hobby projects (mostly people who end up on /r/datahoarder). If you aren't into home-built NAS hardware you can even do this with stock Synology or QNAP devices

The limit is more about how much data you want to keep in sqlite before switching to a "proper" DBMS.

Also the limit above is for someone with the foresight that their database will be huge. In practice most sqlite files use the default page size of 4096, or 1024 if you created the file before the 2016 version. That limits your file to 17.6TB or 4.4TB respectively.

mastax · 2025-09-07T16:51:47 1757263907

Last week I threw together a 840TB system to do a data migration. $1500 used 36-bay 4U, 36 refurbished Exos X28 drives, 3x12 RAIDz2. $15000 all in.

hiatus · 2025-09-08T02:21:03 1757298063

Where did you source the drives?

mjevans · 2025-09-07T14:02:17 1757253737

Never underestimate the ability of an organization to throw money at hardware and use things _far_ past their engineered scale as long as the performance is still good enough to not make critical infrastructure changes that, while necessary, might take real engineering.

Though to be fair to those organizations. It's amazing the performance someone can get out of a quarter million dollars of off the shelf server gear. Just imagine how much RAM and enterprise grade flash that can get someone off of AMD or Intel's highest bin CPU even at that budget!

dahart · 2025-09-07T14:02:57 1757253777

Poking around for only a minute, the largest SQLite file I could find is 600GB https://www.reddit.com/r/learnpython/comments/1j8wt4l/workin...

The largest filesystems I could find are ~1EB and 700PB at Oak Ridge.

FWIW, I took the ‘usually’ to mean usually the theoretical file size limit on a machine is smaller than theoretical SQLite limit. It doesn’t necessarily imply that anyone’s hit the limit.

mockingloris · 2025-09-07T13:13:33 1757250813

Wondered the same thing. That's a lot of data for just one file!

Did a full-day deep dive into SQLite a while back; funny how one tiny database runs the whole world—phones, AI, your fridge, your face... and like, five people keep it alive.

Blows my mind.

dmd · 2025-09-07T15:05:05 1757257505

> I'm not saying there are literally no computers in existence that might have this much space on a single filesystem

I don't use it for sqlite, but having multi-petabyte filesystems, in 2025, is not rare.

webstrand · 2025-09-07T13:41:16 1757252476

With block level compression you might manage it. But you'd have to be trying for it specifically.

formerly_proven · 2025-09-07T18:08:45 1757268525

Seen bigger files on HPC systems. Granted, these were not generated intentionally. But still, they were.

adzm · 2025-09-07T12:43:54 1757249034

I certainly do appreciate that the file format internals are so well documented here. It really reveals a lot of information about the inner workings of sqlite itself. I highly recommend reading it; I actually saved a copy for a rainy day sometime and it was very insightful and absolutely influenced some design decisions using sqlite in the future.

chasil · 2025-09-07T15:14:21 1757258061

The format itself is a U.S. federal standard, and cannot be changed. That has advantages and drawbacks.

https://www.sqlite.org/locrsf.html

justin66 · 2025-09-07T15:43:08 1757259788

I assume the SQLite team could increment the version to 4 if they really needed to, and leave the LOC to update (or not) their recommendation, which specifies version 3.

chasil · 2025-09-07T16:05:37 1757261137

Very true.

However, a significant fraction of the current installed base would not upgrade, requiring new feature development for both versions.

The test harness would also need implementations for both versions.

Then the DO-178B status would would need maintenance for both.

That introduces significant complexity.

johannes1234321 · 2025-09-07T18:03:29 1757268209

Compared to the amount of SQLite database files in the world only few are shared between different applications. If there is an upgrade path most won't notice. The bigger issue imo is API and SQL dialect compatibility.

lisper · 2025-09-07T14:43:44 1757256224

> The database page size in bytes. Must be a power of two between 512 and 32768 inclusive, or the value 1 representing a page size of 65536.

What an odd design choice. Why not just have the value be the base 2 logarithm of the page size, i.e. a value between 9 and 16?

SQLite · 2025-09-07T19:26:00 1757273160

> Why not just have the value be the base 2 logarithm of the page size, i.e. a value between 9 and 16?

Yes, that would have been a better choice. Originally, the file format only supported page sizes between 512 and 32768, though, and so it just seemed natural to stuff the actual number into a 2-byte integer. The 65536 page size capability was added years later (at the request of a client) and so I had to implement the 65536 page size in a backwards compatible way. The design is not ideal for human readability, but there are no performance issues nor unreasonable code complications.

The page size value is not the only oddity. There other details in the file format that could have been done better. But with trillions of databases in circulation, it seems best to leave these minor quirks as they are rather than to try to create a new, more perfect, but also incompatible format.

Retr0id · 2025-09-07T16:27:04 1757262424

There exists hardware with non-power-of-two disk sector sizes. Although sqlite's implementation requires powers-of-two today, a future implementation could conceivably not. Representing 64k was presumably an afterthought.

https://eki.moe/posts/using-520-byte-sector-disks/

kevincox · 2025-09-07T15:33:40 1757259220

If I had to guess this field was specified before page sizes of 65536 were supported. And at that point using the value 1 for page sizes of 65536 made the most sense.

kayson · 2025-09-07T14:50:31 1757256631

Any recommendations from HN for a write-once (literally once), data storage format that's suitable for network storage?

sqlite docs recommend avoiding using it on network storage, though from what I can gather, it's less of an issue if you're truly only doing reads (meaning I could create it locally and then copy it to network storage). Apache Parquet seems promising, and it seems to support indexing now which is an important requirement.

mcculley · 2025-09-07T15:22:33 1757258553

SQLite works fine over read-only NFS, in my experience. Just only work on an immutable copy and restart your application if ever changing it. If your application is short lived and can only ever see an immutable copy on the path, then it is a great solution.

usefulcat · 2025-09-08T04:48:56 1757306936

Ordinary files inside squashfs?

https://www.kernel.org/doc/html/latest/filesystems/squashfs....

nasretdinov · 2025-09-07T16:31:02 1757262662

SQLite does work on NFS even in read-write scenario. Discovered by accident, but my statement still holds. The WAL mode is explicitly not supported over network filesystems, but I guess you don't expect it to :)

kayson · 2025-09-07T17:06:26 1757264786

My experience has been the opposite... Lots of db lock and corruption issues. The FAQ doesn't call out WAL specifically, just says don't do it at all: https://www.sqlite.org/faq.html#q5

kirici · 2025-09-07T18:39:08 1757270348

I've had multiple flaky issues with SQLite (e.g. non-HA Grafana) on Azure Files using NFS v4.1 leading to locked DBs. Perhaps some implementations work, I'm not gonna rely on it or advise others to do so.

nasretdinov · 2025-09-07T18:57:27 1757271447

Yeah trying to write from several hosts will certainly fail if you don't have advisory locks working, which is not a given, so you are right of course

simlevesque · 2025-09-07T17:25:22 1757265922

Parquet files are what I use.

heavyset_go · 2025-09-08T09:34:18 1757324058

SQLite over NFS works if you have one writer and many readers.

pstuart · 2025-09-07T18:01:47 1757268107

Multiple writers on network storage is the issue. Reading should be totally fine.

relium · 2025-09-07T23:11:10 1757286670

The one issue I have with SQLite's file format is that if part of the file gets corrupted, you can't easily recover the rest of the file. I asked Richard Hipp about this many years ago and he said that fixing the problem would unfortunately break binary compatibility.

rkagerer · 2025-09-07T23:51:31 1757289091

The fact this fits in a few pages and is so approachable is a testament to its simplicity. I think I'd find it a lot harder to grok the file format of, for example, a Word doc/docx file.

mdaniel · 2025-09-08T01:12:35 1757293955

I wouldn't put .doc and .docx next to one another, as they're only tangentially related. I'd bet getting the <html><body><p>hello, world</p></body></html> of .docx would be some silliness, but would not be hard to grok. I couldn't readily find a browsable copy of ECMA 376 4th Ed online but https://github.com/PumasAI/WriteDocx.jl/blob/v1.2.0/docs/src... was in the ballpark of what I expected to find in some section of the actual spec

mrtimo · 2025-09-07T20:38:37 1757277517

It’s 2025. Let’s separate storage from processing. SQLite showed how elegant embedded databases can be, but the real win is formats like Parquet: boring, durable storage you can read with any engine. Storage stays simple, compute stays swappable. That’s the future.

re · 2025-09-07T20:48:13 1757278093

Counterpoint: "The two versions of Parquet" https://news.ycombinator.com/item?id=44970769 (17 days ago, 50 comments)

codedokode · 2025-09-07T21:39:07 1757281147

As I understood by reading the short description, Parquet is a column-oriented format which is made for selecting data and which is difficult to use for updating (like Yandex Clickhouse).

cyanydeez · 2025-09-07T13:30:10 1757251810

The neatest thing i seen is you can put a sqlite db on a http server and read it effectively using range requests

johannes1234321 · 2025-09-07T18:12:53 1757268773

There are implementations for that: For example https://github.com/psanford/sqlite3vfshttp or https://github.com/phiresky/sql.js-httpvfs

ncruces · 2025-09-07T23:11:26 1757286686

The latency on those requests matters, though.

You'll probably benefit from using the largest possible page size; also, keep alive; etc.

But even then, you'll pull at most 64 KiB per request. If you managed to have response times of 10 ms, you'd be pulling at most 52 Mbps.

So yeah, if your queries end up reading just a couple of pages, it's great. If they require a full table scan, you need some smart prefetching+caching to hide the latency.

simlevesque · 2025-09-07T17:27:41 1757266061

In my experience, this works when the db is read only.

And in these read only cases I'd use Parquet files queried with Duckdb Wasm.

pmarreck · 2025-09-07T14:38:00 1757255880

so basically using the http server as a randomly-accessed data store? sounds about right

SchwKatze · 2025-09-07T14:00:41 1757253641

Sometimes I ask myself with we could do a better file format, something like parquet but row-oriented

Dwedit · 2025-09-07T16:47:12 1757263632

My only question is if you really need a prefix before every value to say what type it is.

lawrencejgd · 2025-09-07T20:32:33 1757277153

Any field in SQLite can contain any type, even if the schema says that a field should be INTEGER, it could have a TEXT, so it's necessary to specify what's the type of every single value

maxbond · 2025-09-08T16:37:48 1757349468

Indeed, unless it's a strict table, you can put gibberish in the type field (or forego giving a column a type altogether).

porridgeraisin · 2025-09-07T13:23:32 1757251412

Related: https://sqlite-internal.pages.dev/

Discussions: https://news.ycombinator.com/item?id=43682006 | 5 months ago | 41 comments