Orama is definitely a hidden gem, and it's a clever usage for complementary indexing!
Also agreed Triplit's DX is excellent. I'd recommend giving it another look, Triplit's recent 1.0 release has up to 10x performance boost (https://www.triplit.dev/blog/triplit-1.0).
Since your use-case is data in the range of gigabytes, you could consider using duckdb-wasm. However I'm not sure how to best integrate this with collaboration / CRDTs (sqlRooms is also interesting prior art).
But, does Replicache work for your native targets? Or you are okay with a different data layer for native (sqlite) vs web (boutique data model on top of IndexedDB). At the start of the article it sounds like the goal is to use the same abstraction across web and mobile native and solutions that bifurcate implementation are unacceptable, but then we end up preferring a solution that's different between web target and native targets.
Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed; like SQLite-over-IndexedDB, I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records. The big difference between SQLite-over-IndexedDB and Zero-over-IndexedDB is that Zero is written with sympathy to IndexedDB's performance characteristics, whereas SQLite is written with sympathy to conventional filesystem performance.
On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
> Zero (and I believe Replicache as well) layer their own SQL-like semantics on top of an arbitrary KV store, much like the layering of SQLite-over-IndexedDB discussed
Replicache exposes only a kv interface. Zero does expose a SQL-like interface.
> I believe they are storing binary byte pages in the underlying KV store and each page contains data for one-or-more Replicache/Zero records.
The pages are JSON values not binary encoded, but that's an impl detail. At a big picture, you're right that both Replicache and Zero aggregate many values into pages that are stored in IDB (or SQLite in React Native).
> On the subject of "keep whole thing in memory", this is what Zero does for its instant performance, and why they suggest limiting your working set / data desired at app boot to ~40MB, although I can't find a reference for this. Zero is smart though and will pick the 40MB for you though. Hopefully Zero folks come by and corrects me if I'm wrong.
Replicache and Zero are a bit different here. Replicache keeps only up to 64MB in memory. It uses an LRU cache to manage this. The rest is paged in and out of IDB.
This ended up being a really big perf cliff because bigger applications would thrash against this limit.
In Zero, we just keep the entire client datastore in memory. Basically we use IDB/SQLite as a backup/restore target. We don't page in and out of it.
This might sound worse, but the difference is Zero's query-driven sync. Queries automatically fallback to the server and sync. So the whole model is different. You don't sync everything, you just sync what you need. From some upcoming docs:
I really like Zero’s approach: it feels very much like Triplit, including many of its features like query-based smart caching. However, what holds me back from using it is that, unlike Triplit, Zero currently lacks support for offline modifications, which must be a major obstacle for a truly local‑first library.
Yes, Replicache works beautifully on our mobile/native targets.
The constructor allows you to pass in any arbitrary KVStore provider, and we happen to use op-sqlite as its performance is exceptional.
There is no "different data layer" per se, just a different storage mechanism.
Replicache also holds a mem cache that is limited to ~50MB if I recall. Our use case is extremely data-heavy, so we might end up never migrating to Zero – who knows.
Perhaps I misunderstood your question, let me know if I can clarify further.
Ah, I understood "native application in some targets" to mean you're writing application code in languages other than JavaScript/TypeScript; not that sometimes you're React Native and sometimes you're Web/DOM but you're always TypeScript.
Notion always* has a webview component, even in native apps, but we also have a substantial amount of "true native" Swift/Kotlin. We can't use Replicache/Zero today because our native code and our webview share the SQLite database and both need to be able to read and write the data there; if we use Replicache that would make our persisted data opaque bytes to Swift/Kotlin.
*There's many screens of the Android/iOS app that are entirely native but the editor will probably remain a webview for a while yet.
The punchline of this article is that all the implementations they tried (WatermelonDB, PowerSync, ElectricSQL, Triplit, InstantDB, Convex) are all built on top of IndexedDB.
"The root cause is that all of these offline-first tools for web are essentially hacks. PowerSync itself is WASM SQLite... On top of IndexedDB."
But there's a new web storage API in town, Origin Private File System. https://developer.mozilla.org/en-US/docs/Web/API/File_System... "It provides access to a special kind of file that is highly optimized for performance and offers in-place write access to its content."
OPFS reached Baseline "Newly Available" in March 2023; it will be "Widely Available" in September.
WASM sqlite on OPFS is, finally, not a hack, and is pretty much exactly what the author needed in the first place.
We do see about 10x the database row corruption rate w/ WASM OPFS SQLite compared to the same logic running against native SQLite. For read-side cache use-case this is recoverable and relatively benign but we're not moving write-side use-case from IndexedDB to WASM-OPFS-SQLite until things look a bit better. Not to put the blame on SQLite here, there's shared responsibility for the corruption between the host application (eg Notion), the SQLite OPFS VFS authors, the user-agent authors, and the user's device to ensure proper locking and file semantics.
Yeah, I did fail to mention OPFS in the blog post. It does look very promising, but we're not in a position to build on emergent tech – we need a battle-tested stack. Boring over exciting.
Not sure anything in the offline-first ecosystem qualifies as "boring" yet. You would need some high-profile successful examples that have been around for a few years to earn that title
Maintenance mode doesn't mean "this is so mature we don't have anything else to add", it means "we don't want to spend any more time on it so we'll only fix bugs and that's it".
Some notable companies using Replicache are Vercel and Productlane. It's a very mature product.
The Rocicorp team have decided to focus on a different product, Zero, which is far less "offline-first" in that it does not sync all data, but rather syncs data based on queries. This works well for applications that have unbounded amounts of data (ie something like Instagram), but is _not_ what we want or need at Marco.
The majority of the cost in a database is often serializing/deserializing data. By using IDB from JS, we delegate that to the browser's highly optimized native code. The data goes from JS vals to binary serialization in one hop.
If we were to use OPFS, we would instead have to do that marshaling ourselves in JS. JS is much slower that native code, so the resulting impl would probably be a lot slower.
We could attempt to move that code into Rust/C++ via WASM, but then we have a different problem: we have to marshal between JS types and native types first, before writing to OPFS. So there are now two hops: JS -> C++ -> OPFS.
We have actually explored this in a previous version of Replicache and it was much slower. The marshalling btwn JS and WASM killed it. That's why Replicache has the design it does.
I don't personally think we can do this well until WASM and JS can share objects directly, without copies.
Triplit and Orama are definitely often overlooked hidden gems.
Since the post is already a few months old, it's worth mentioning that the newly released Triplit 1.0 had had a massive performance update (up to 10x). You should definitely reconsider it for larger scale data projects and the team is really highly knowledgable. https://www.triplit.dev/blog/triplit-1.0
I’m doing offline-first apps at work and want to emphasize that you’re constraining yourself a lot trying to do this.
As mentioned, everything fast(ish) is using SQLite under the hood. If you don’t already know, SQLite has a limited set of types, and some funky defaults. How are you going to take this loosey-goosey typed data and store it in a backend database when you sync? What about foreign key constraints, etc., can you live without those? Some of the sync solutions don’t support enforcing them on the client.
Also, the SQLite query planner isn’t great in my experience, even when you’re only joining on ids/indexes.
Document databases seem more friendly/natural, but as mentioned indexeddb is slow.
I wish this looked at https://rxdb.info/ more. They have some posts that lead me to believe they have a good grasp on the issues in this space at least
Also, OPFS is a newish thing everyone is using to store SQLite directly instead of wrapping IndexedDB for better performance.
Notion is a very async collaborative application and we rely on a form of transactions. When you make a change in Notion like moving a bunch of blocks from one page to another, we compose the transaction client-side given the client's in-memory snapshot view of the universe, and send the transaction to the server. If the transaction turns out to violate some server-side validation (like a permissions issue), we reject the change as a unit and roll back the client.
I'm not sure how we'd do this kind of thing with RxDb. If we model it as a delete in one document and an insert into another document, we'd get data loss. Maybe they'd tell us our app shouldn't have that feature.
I am continually bewildered how no one ever gives RxDB, which has been around for many years longer than the rest of these tools, any love.
It has so many optimizations and features that the others dont. And is even better when you use the premium addons. I compared it to pretty much everything, and its not even close.
IndexedDB is a standard and can be implemented however the user-agent sees fit. Chromium source tree has an implementation on LevelDB and an implementation on SQLite; I'm not sure how they pick the appropriate backend. Firefox and WebKit both appear to use SQLite as the backend.
WebSQL was a clunky API, but not as clunky as IndexedDB which is truly yucky and very easy to get wrong in modern apps that use promises.
wa-sqlite on top of OPFS is actually pretty great these days. Performance is about half of what I'd get in native SQLite, which is not too bad overall. It's around 10x faster than SQLite on top of IndexedDB for large databases in my experience.
It's much better than WebSQL could ever be. You get the full power of modern SQLite, with the version, compile options, additional extensions, all under your control.
Nice post! I'm building an offline-first collaboration app and went on the route of building a custom sync engine, mainly because the app is open-source and I didn't want to introduce any dependency. I've implemented a simple cursor based sync with Postgres on server and SQLite in client side.
Initially I built only a desktop client, because I didn't like IndexedDB. After the app got into HN, someone recommended to check for OPFS (Origin Private File System).
Now we have a full offline-first app in web using SQLite on top of OPFS. We didn't test it with large scale yet, but so far looks very promising. The good thing is that we use Kysely as an abstraction for performing queries in SQLite which helps us share most of the code across both platforms (electron + web) with some minor abstractions.
Depending on your data model, LiveStore is a completely open-source, SQLite based approach for local first sync-y apps: https://livestore.dev/
It's oriented around event sourcing and syncs the events, which get materialized into local table views on clients. It's got pretty slick devtools too.
I did look into it back then, but was not very convenient for my use case. Apart from the data model, I wanted to use Yjs for conflict resolution and wanted more direct control over the sync.
p.s Just wanted to say thank you for all the contribution you do here on HN. Colanode (the app I'm building) is an alternative to Notion and I learned a lot about how you (Notion) build things through reading your comments.
Triplit has migrated its data format away from triples since this post was authored, so the memory concerns mentioned are no longer relevant since 1.0 https://www.triplit.dev/blog/triplit-1.0 (source: i'm Triplit author). I don't think triples as a format are inherently bad but for us it did entail more data on disk, more difficult and slower querying, and more objects in the JS heap ballooning RAM.
I find InstantDB's page confusing: How far is it open-source and self-hostable ? I don't mind you having a sustainable cash flow, but it all seems a bit unclear which parts are fully open-source and self-hostable.
I wrap that in FlatDB, which is an opinionated flat cache for the files with metadata inline, used for very fast searches (searching 150k messages in less than 20ms on my 4 year old phone). This handles a lot of tricky cases, like accidental cache modification, and editing the database in different tabs.
I struggled with this landscape a few years ago when building Mere Medical to manage my own medical records. To be fair, I was aiming for not just offline-first, but offline-only (user data was exclusively stored on device, not in any server). I got surprisingly far with RxDB, but it definitely felt like I was pushing these tools and the web platform to their limit.
There’s just an assumption that these client databases don’t need mature tools and migration strategies as “it’s just a web client, you can always just re-sync with a server”. Few client db felt mature enough to warrant building my entire app on as they’re not the easiet to migrate off of.
I also tried LokiJS which is mentioned in the OP. I even forked (renamed it SylvieJS lol) it to rewrite it in TS and update some of the adapters. I ultimately moved away from it as well. I found an in memory db will struggle past a few hundred mbs which I hit pretty quickly.
No matter what db you use, you’re realistically using indexed db behind the hood. What surprised me was that a query to indexed db can be slower than a network call. Like what.
On midrange and below Android devices, literally any local persisted data access can be slower than a network call. Even a point read from a small SQLite b-tree can be coming off a $3 microsd card and a CPU equivalent to a 10 year old iPhone. https://infrequently.org/2024/01/performance-inequality-gap-...
So, the final choice is dexie + custom backend? I've researched these open source solution before, I think dexie is the choice that you will never regret.
Related question to people building local-first - what size of db is too big? I always see examples doing todo lists etc which seems perfect for this. But what about apps with larger databases. When does local-first no longer make sense?
I wonder the same thing, especially thinking about local-first image storage.
Iirc there are different limits on IndexedDB sizes depending on the browser/platform, and the tighter limit is around 1GB. But I would love to hear from people that ran into those limits.
These topic and mentioned tools always fail to demonstrate how sync engine works in multiple players modes. And mostly only for backend written in TS/JS, worse it's only for using with 3rd service (which you couldn't put your business logic in here). The first thing i scan in these tools' docs are how they handle write conflict whose conflict is semantic level, not data exchange format level.
Or you could … just build it directly on indexedDB. That's what we did for our offline support at Fastmail, with just a small wrapper function to make the API promise based: https://www.fastmail.com/blog/offline-architecture/
The performance has been pretty decent, even with multi-gigabyte mailboxes.
The offline support has been great. I used to have to keep another mail app synced with my fastmail inbox over IMAP just in case I needed access to an email and had crappy connection. Now I can just have the one email icon on my homescreen.
Not sure there is a formal definition, but here's my current understanding:
In a local-first approach, changes are initially stored locally, but there's an expectation to eventually connect to a server backend to merge these changes, typically within days, weeks, or months. On the other hand, an offline-first approach may not even require a backend, functioning seamlessly regardless of internet connectivity.
These distinctions may blur as sync engines improve, allowing clients to remain offline for increasingly extended periods. Ultimately, the differentiating factor might hinge on whether there's a central authority that enforces migrations or changes.
Except in pre-sales, I can’t imagine anyone using an email heavy workflow in 2025.
In my personal life, email is only for one way transactions. Where some company is sending email to me or spam. Even the one newsletter I subscribe to - Stratechery is available as a podcast and an RSS feed.
In my professional life, of course all internal communication happens on Slack (700 employees) and even in consulting, the first thing we do after a deal closes is either invite customers to our Slack or ask to be invited to their platform.
What do you mean by this? I send 5-15 emails a day at a minimum throughout the day and receive just as many directly with another 2-3x as cc in various distribution lists (which I read in full). Add in server notifications, automated reports from data processing scripts, and the generic info@company.com inbox and it's probably close to 100 in a day with ease. Lots of skimming and Ctrl-Q'ing and it's hardly a burden.
The lasting power of email is that it's one of the few federated communication channels that has a global network effect. Email and chat are two different media for different purposes. You have plausible deniability when a single message in a group chat is missed. When an email is sent to the team with a change in procedure you can have some expectations that it will be seen and it also provides a one-one or one-many channel for clarification.
I'm not familiar with how the sales world works but I use email every day with clients, vendors, the team, my boss(es), and many other intra-company relationships. I think you have a lack of imagination in this regard :)
> various distribution lists (which I read in full). Add in server notifications, automated reports from data processing scripts,
And all of those can just as easily be sent to a Slack channel without everyone bothering to create email rules since they are automatically sent to the correct Slack channel where if it’s an actionable alert, a responsible party can add an “ack” reaction that kicks off a workflow that says this person is handling it.
This can also be integrated into your CRM or wherever you call something like ServiceNow. We have all sorts of workflows and integrations with Slack.
> You have plausible deniability when a single message in a group chat is missed. When an email is sent to the team with a change in procedure you can have some expectations that it will be seen and it also provides a one-one or one-many channel for clarification
How are you any less likely to miss an email than miss a channel set aside for leadership announcements that only certain people can send a message too? Then you also have the “reply all” issue that I’ve seen blow up email servers. Messages allow threading etc in Slack and it’s a lot easier to ignore a thread that doesn’t pertain to you and follow those that do.
Everyone at our 1000 person company communicates through Slack up to an including our CEO for announcements and updates.
I don’t think I’ve emailed someone internally in over 8 years except to forward an external email and during that time, I’ve worked for startups and the second largest employer in the US.
Ironically, the only use I have for Slack is to communicate with an external web developer that we contracted.
In my world... EVERYONE (50/50 internal and external) is on email and/or Teams (or the phone). It works. Shit gets done. It's a small, high-trust environment of autonomous people doing rapidly changing work.
There is a working world where email makes sense and trying to "make Slack a thing" would be (rightfully) scoffed at. If I'm yanked out of this and dropped into some Slack/ticket/KPI/whatever environment I will adapt and play ball.
In my experience, that's because corporate mail/groupware (especially anything by Microsoft) is configured so that it is completely non-productive, and so people seek out alternatives. Slack isn't as configurable, so can't be made as bad, but it's still pretty bad and builds in assumptions that make trying to use it asynchronously a major pain.
How much easier can it be to work asynchronously than to use Slack? For the most part people don’t expect an immediate response. You can sync it with your corporate calendar to see when people are in meetings or on PTO, it tells you other people’s time zones and when there are on PTO and thier status. Most importantly, you can schedule messages.
There are times where I’m running errands during the day or traveling and working late and I schedule a message for 8:00 or 9:00 their time.
From a personal use standpoint, I’m looking at all of my credit card companies, hotel apps, flight apps etc and they all either have in app messaging or integrate with iMessage.
Now I’m seeing more companies that want to integrate with messaging platforms for customer support - one of my specialties is implementing call centers with Amazon Connect. I’ve never been asked in 5 years to integrate customer support with inbound email.
Yes, because as a business you shouldn’t have a sales pitch ready for people when they ask you why they should spend money on your product.
It’s the first thing you do if you are trying to make a product that you plan to monetize is ask - who is the ideal user and how will we market it. If it’s just a hobby project to learn a new to you technology, that’s fine or if you just wanted to scratch an itch.
We've had great success with Replicache+Orama since this was written. We're keen to give Zero a spin once it's a bit more stable.
Triplit has essentially folded as a "company" and become some sort of open-source initiative instead.
InstantDB has matured massively and is definitely worth a look for anyone starting a new project.
reply