Matryoshka embeddings are not sparse. And SPLADE can scale to tens or hundreds o...

faxipay349 · 2025-09-01T02:27:57 1756693677

Yeah, the standard SPLADE model trained from BERT typically already has a vocabulary/vector size of 30,552. If the SPLADE model is based on a multilingual version of BERT, such as mBERT or XLM-R, the vocabulary size could inherently expand to approximately 100,000, as does the vector size.

CuriouslyC · 2025-08-30T00:26:37 1756513597

If you consider the actual latent space the full higher dimensional representation, and you take the first principle component, the other vectors are zero. Pretty sparse. No it's not a linked list sparse matrix. Don't be a pedant.

yorwba · 2025-08-30T06:55:54 1756536954

When you truncate Matryoshka embeddings, you get the storage benefits of low-dimensional vectors with the limited expressiveness of low-dimensional vectors. Usually, what people look for in sparse vectors is to combine the storage benefits of low-dimensional vectors with the expressiveness of high-dimensional vectors. For that, you need the non-zero dimensions to be different for different vectors.

zwaps · 2025-08-30T06:44:58 1756536298

No one means Matryoshka embeddings when they talk about sparse embeddings. This is not pedantic.

CuriouslyC · 2025-08-30T06:48:31 1756536511

No one means wolves when they talk about dogs, obviously wolves and dogs are TOTALLY different things.

cap11235 · 2025-08-30T13:56:20 1756562180