Small Data

Delete Set public Set private Add tags Delete tags

11099 shaares
396 private links

11099 shaares · 396 private links

Filters

Links per page

20 50 100

Small Data

Organizations don't use that much data.

Of queries that scan at least 1 MB, the median query scans about 100 MB. The 99.9th percentile query scans about 300 GB.

but 99.9% of real world queries could run on a single large node.

I did the analysis for this post using DuckDB, and it can scan the entire 11 GB Snowflake query sample on my Mac Studio in a few seconds.

When we think about new database architectures, we’re hypnotized by scaling limits. If it can’t handle petabytes, or at least terabytes, it’s not in the conversation. But most applications will never see a terabyte of data, even if they’re successful. We’re using jackhammers to drive finish nails.

As an industry, we’ve become absolutely obsessed with “scale”. Seemingly at the expense of all else, like simplicity, ease of maintenance, and reducing developer cognitive load

Years it takes to get to 10x:
10% -> ~ 24y
50% -> ~ 5.7y
200% -> ~ 2.10y

Scaling is also a luxurious issue in many cases: it means the business runs well.

Hardware is getting really, really good

In the last decade:
SSDs got ~5.6x cheaper, 30x more on a single SSD and 11x faster in sequential reads and 18x in radom reads.
CPUs core count went up 2.6x, price went down at least 5x per core, each Turin core is also probably 2x-2.5x faster.

Distributed systems are also overkill as hardware progresses faster.

DataLove · dev

September 28, 2025 at 21:40:54 GMT+2 * · permalink

https://topicpartition.io/definitions/small-data

Filters

Links per page

20 50 100