Search: [optimisation] - Les liens de la mer numérique

Behind The Scenes of Bun Install | Bun Blog

The current hardware bottleneck isn't I/O anymore but system calls.

Each system call causes a CPU mode switch between user mode and kernel mode. The switch costs 1000-1500 CPU cycles.

On a 3GHz processor, 1000-1500 cycles is about 500 nanoseconds. This might sound negligibly fast, but modern SSDs can handle over 1 million operations per second. If each operation requires a system call, you're burning 1.5 billion cycles per second just on mode switching.

A package manager can trigger 50k+ system calls to install reacts for example.

JS adds overhead, especially with NodeJS that have layers. There are more steps in the pipeline to read the content of a file. Bun read package.json 2.2x faster than NodeJS because of it

Another use case is string optimization. package-lock files have an expected format with predefined strings (MIT, licence, etc...). These repeated strings can be optimized.
The manifest of each package is stored in a binary format

Bun stores the responses's ETag and sends If-None-Match header

The buffer for the tarball decompression is set in advance. When the data size is unknown, the buffer must be reallocated to grow (see [}. Bun buffers the entire tarball before decompressing. Most of JS packages are 1MB so it's fine (ts package is 50MB ok).
The uncompressed file size is known with the last 4 bytes of the gzip format.
Bun uses libdefalte optimized with SIMD instructions.
The comparison in NodeJS is a readStream, but it's not as efficient as a seek operation.
Cache-friendly data layout

JSON is inefficient because each address pointer has a string step. "The CPU accesses a pointer that tells it where Next's data is located in memory. This data then contains yet another pointer to where its dependencies live, which in turn contains more pointers to the actual dependency strings."
Fetching data from RAM is slow, because CPU stores data in cache lines.
Because JSON (and especially JS objects) are stored randomly in RAM, the line cache is inefficient or will be used only for a few bytes.

This optimization works great for data that's stored sequentially, but it backfires when your data is scattered randomly across memory.

The nested structure of objects creates whats called "pointer chasing", a common anti-pattern in system programming.

For a project with 1000 packages averaging 5 dependencies, that's 2ms of pure memory latency.

5.Structure of arrays (SoA) instead of array of structs

Bun uses large contiguous buffers. While accessing a package is 8 bytes, the CPU can load an entire 64 byte cache line from packages[0] to packages[7]

As a sidenote: Bun originally used a binary lockfile format (bun.lockb) to avoid JSON parsing overhead entirely, but binary files are impossible to review in pull requests and can't be merged when conflicts happen.

File copying

Copying a file can be expensive as it runs first through the kernal memory. There are ways to optimize it though.

On MacOS, clonefile can clone entire directories, so it's a O(n) operation.
Linux has hardlinks. It has fallbacks such as ioctl_ficlone for Btrfs and XFS, or copy_file_range, or sendfile

Multi-Core parallelism

Bun uses lock-free data structures. It also uses a thread pool of 64 concurrent HTTP connections.

Each thread gets its own memory pool.

Conclusion

[...] npm gave us a foundation to build on, yarn made managing workspaces less painful, and pnpm came up with a clever way to save space and speed things up with hardlinks. Each worked hard to solve the problems developers were actually hitting at the time. But that world no longer exists. SSDs are 70× faster, CPUs have dozens of cores, and memory is cheap. The real bottleneck shifted from hardware speed to software abstractions. [...] The tools that will define the next decade of developer productivity are being written right now, by teams who understand that performance bottlenecks shifted when storage got fast and memory got cheap. Installing packages 25x faster isn't "magic": it's what happens when tools are built for the hardware we actually have.

optimisation · bun · js · zig

September 19, 2025 at 22:16:41 GMT+2 * · permalink

·

https://bun.sh/blog/behind-the-scenes-of-bun-install

·

The unreasonable effectiveness of modern sort algorithms · Voultapher/sort-research-rs · GitHub

It's absolutely possible to beat even the best sort implementations with domain specific knowledge, careful benchmarking and an understanding of CPU micro-architectures. At the same time, assumptions will become invalid, mistakes can creep in silently and good sort implementations can be surprisingly fast even without prior domain knowledge. If you have access to a high-quality sort implementation, think twice about replacing it with something home-grown.

algorithm · optimisation · rust

September 14, 2025 at 19:57:25 GMT+2 * · permalink

·

https://github.com/Voultapher/sort-research-rs/blob/main/writeup/unreasonable/text.md

·

Faster Rust builds on Mac | Nicholas Nethercote

rust · macos · optimisation · programming

September 12, 2025 at 10:12:11 GMT+2 * · permalink

·

https://nnethercote.github.io/2025/09/04/faster-rust-builds-on-mac.html

·

2x Performance, $300k Savings: A Case Study in Rewriting a Critical Service in Rust

Optimizing some endpoints in Rust inside a go app.

The results shows nearly 2x performance.

go · rust · optimisation

August 30, 2025 at 18:41:41 GMT+2 * · permalink

·

https://wxiaoyun.com/blog/rust-rewrite-case-study/

·

Compounding performance issues | Go Make Things

Modular CSS or a bundles? It follows Rethinking modular CSS and build-free design systems.

On first load, modular css files are worse.

Once the files are cached, subsequent renders take just 100ms to 200ms slower with modular files compared to one bundled file.

Given that the guiding ethos of Kelp is that the web is for everyone, it looks like I should probably be encouraging folks to use a bundled version as the main entry point.

css · optimisation

August 15, 2025 at 09:48:54 GMT+2 * · permalink

·

https://gomakethings.com/compounding-performance-issues/

·

Welcome to the wonderful world of Web Performance

web · optimisation · project

August 4, 2025 at 14:05:30 GMT+2 * · permalink

·

https://www.sitespeed.io/

·

gzip performance is wild! | Go Make Things

tl;dr: the issue isn’t the @import rule itself, but that files under 1kb often end up the same size or even bigger when gzipped, so you get no compression benefits.

The experience shows that atomic css files is not optimal.

If the files I was importing were larger, it might make sense. As tiny, modular files? Not so much!
The complete library concatenated and gzipped is less than a single HTTP request. It’s just over 25-percent of the transfer size of sending modular gzipped files instead.

css · programming · optimisation · compression

July 28, 2025 at 18:33:47 GMT+2 * · permalink

·

https://gomakethings.com/gzip-performance-is-wild/

·

500× faster: Four different ways to speed up your code

The naive Rust implémentation is 10 times faster than the python one.
It remains 6 times faster than the optimized one.

The Python has a collections.Counter class that is approximately as fast as the naive Rust version.

rust · python · programming · optimisation

July 2, 2025 at 22:18:38 GMT+2 * · permalink

·

https://pythonspeed.com/articles/different-ways-speed/

·

sebsauvage : « Finalement laisser son périphérique branché avec … » - Framapiaf

Finalement laisser son périphérique branché avec la batterie tout le temps à 100%, c'est comme prendre une grande bouffée d'air et retenir sa respiration. C'est pas bon.

Est-ce que laisser sa batterie à 20-80% est toujours une bonne idée, puisque les BMS intègrent une logique dédiée.

thoughts · technology · optimisation

June 29, 2025 at 17:27:13 GMT+2 * · permalink

·

https://framapiaf.org/@sebsauvage/114703434709362629

·

What a Difference a Faster Hash Makes | Something Something Programming

Replace the standard DefaultHasher to ahash::{AHashMap, AHashSet} to gain 18% improvements.

rust · programming · optimisation

May 29, 2025 at 13:03:05 GMT+2 * · permalink

·

https://nickdrozd.github.io/2025/05/27/faster-hash.html

·

Making the rav1d Video Decoder 1% Faster

rust · programming · optimisation

May 25, 2025 at 07:54:34 GMT+2 * · permalink

·

https://ohadravid.github.io/posts/2025-05-rav1d-faster/

·

Incremental Font Transfer Demo

Only transfer the useful part of a font. It subsets static Unicode-ranges, so only a part of the font will be downloaded.

typography · optimisation · css

May 4, 2025 at 14:46:39 GMT+2 * · permalink

·

https://garretrieger.github.io/ift-demo/

·

A faster way to copy SQLite databases between computers – alexwlchan

Dump the database as SQL statements instead of copying it with indexes. Then compress the resulting txt file.

# Create the backup
sqlite3 my_db.sqlite .dump | gzip -c > my_db.sqlite.txt.gz
# Reconstruct the database from the text file
cat my_local_database.db.txt | sqlite3 my_local_database.db

As complete script example:

# Create a gzip-compressed text file on the server
ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c > my_remote_database.db.txt.gz"

# Copy the gzip-compressed text file to my local machine
rsync --progress username@server:my_remote_database.db.txt.gz my_local_database.db.txt.gz

# Remove the gzip-compressed text file from my server
ssh username@server "rm my_remote_database.db.txt.gz"

# Uncompress the text file
gunzip my_local_database.db.txt.gz

# Reconstruct the database from the text file
cat my_local_database.db.txt | sqlite3 my_local_database.db

# Remove the local text file
rm my_local_database.db.txt

There should be better ways though.

sqlite · optimisation · programming

May 1, 2025 at 11:53:15 GMT+2 * · permalink

·

https://alexwlchan.net/2025/copying-sqlite-databases/

·

A surprising enum size optimization in the Rust compiler · post by James Fennell

Option has zero cost with Some types in memory.

rust · optimisation · programming

April 17, 2025 at 22:30:16 GMT+2 * · permalink

·

https://jpfennell.com/posts/enum-type-size/

·

The power of interning: making a time series database 2000x smaller in Rust | Blog | Guillaume Endignoux

rust · programming · optimisation

March 4, 2025 at 07:08:25 GMT+1 * · permalink

·

https://gendignoux.com/blog/2025/03/03/rust-interning-2000x.html

·

Tips for Faster Rust CI Builds | corrode Rust Consulting

rust · optimisation · CI/CD · programming

January 28, 2025 at 21:50:04 GMT+1 * · permalink

·

https://corrode.dev/blog/tips-for-faster-ci-builds/

·

The Linux Scheduler: a Decade of Wasted Cores - eurosys16-final29.pdf

This is huge:

Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.

DOI: https://dl.acm.org/doi/10.1145/2901318.2901326

It may be useful to read it completely.

Fixes:

compare the minimum load of each scheduling groups instead of the average
Linux spawns threads on the same core as their parent thread: a node can steal threads from a another node by comparing the average load
and two others

It is useful to read their tools (online sanity checker for invariants such as "No core remains idle while another core is overloaded")

During the 00s,dozens of papers described new schedling algorithms, [... but] a few of them were adopted in mainstream operatin systems, mainly because it is not clear how to integrate all theseideas in scheduler safely.

Similar the part Related Work describes the current state of the research on other domains: performance bugs, kernel correctness, tracing.

The resources are available on Github: https://github.com/jplozi/wastedcores