295 private links
Because the problem is so low-level to solve that the Rust guarantees can not work properly at this assembly code level.
Safe wrappers can be built upon it though.
Reading a file is actually slow.
getCurrentThreadUserTime() uses many syscalls because it reads from /proc.
clock_gettime(CLOCK_THREAD_CPUTIME_ID) has only one syscall and a direct function call chain.
The optimisation can be done, but:
- The kernel policy is clear: don't break userspace
- It's undocumented anywhere!
- Author's take: if glibc depends on it, it's not going away.
This is why I like browsing commits of large open source projects. A 40-line deletion eliminated a 400x performance gap. The fix required no new kernel features, just knowledge of a stable-but-obscure Linux ABI detail.
The lessons:
- read the kernel source. POSIX tells what's portable; the kernel source code tells what's possible.
- check the old assumptions: revisiting them occasionally pays off.
#define MAKE_U32_FROM_TWO_U16(high, low) ( ((uint32_t)(high) << 16) | ((uint32_t)(low) & 0xFFFF) )
Rule 1: Restrict all code to very simple control flow constructs
Rule 2: Give all loops a fixed upper bound.
Rule 3: Do not use dynamic memory allocation after initialization.
Rule 4: No function should be longer than what can be printed on a single sheet of paper in a standard format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.
Rule 5: The code's assertion density should average to minimally two assertions per function. Assertions must be used to check for anomalous conditions that should never happen in real-life executions. Assertions must be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken such as returning an error condition to the caller of the function that executes the failing assertion. Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule.
Rule 6: Declare all data objects at the smallest possible level of scope.
Rule 7: Each calling function must check the return value of nonvoid functions, and each called function must check the validity of all parameters provided by the caller
Rule 8: preprocessor must be limited to the inclusion of header files
Rule 9: The use of pointers must be restricted. Specifically, no more than one level of dereferencing should be used.
Rule 10: All code must be compiled, from the first day of development, with all compiler warnings enabled at the most pedantic setting available. All code must compile without warnings. All code must also be checked daily with at least one, but preferably more than one, strong static source code analyzer and should pass all analyses with zero warnings.
- In Rust, this struct is 16 bytes (on x86_64, again) and in C, it is 24. This is because Rust is free to reorder the fields to optimize for size, while C is not.
- Social factor: it is more convenient to write a bite more dangerous code than in the equivalent C. Firefox failed two times to parallelize Firefox's style layout twice. They get it right the third time with Rust. Does a junior write faster production code in Rust than in C?
- Compile time vs runtime: Rust provides a bit of safety at runtime (index access)
If C is the fastest language, is there any inherent reason why Rust could not do the same things? At the fundamental level, the answer is “there’s no difference between the two.”
But projects does not work only on the fundamentals, "We’re usually talking about something in the context of engineering, a specific project, with specific developers, with specific time constraints, and so on. I think that there are so many variables that it is difficult to draw generalized conclusions."
This is huge:
Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database.
DOI: https://dl.acm.org/doi/10.1145/2901318.2901326
It may be useful to read it completely.
Fixes:
- compare the minimum load of each scheduling groups instead of the average
- Linux spawns threads on the same core as their parent thread: a node can steal threads from a another node by comparing the average load
and two others
It is useful to read their tools (online sanity checker for invariants such as "No core remains idle while another core is overloaded")
During the 00s,dozens of papers described new schedling algorithms, [... but] a few of them were adopted in mainstream operatin systems, mainly because it is not clear how to integrate all theseideas in scheduler safely.
Similar the part Related Work describes the current state of the research on other domains: performance bugs, kernel correctness, tracing.
The resources are available on Github: https://github.com/jplozi/wastedcores
An optimisation that I don't really understand.
Explaining pointers in one image
A C Compiler in 512 bytes
Impressive!
J'ai passé plusieurs décennies à écrire du C et du C++, voire même un peu d'ASM. Et franchement, je suis admiratif des quelques codeurs "safe".
C'est comme jongler avec des tronçonneuses: ce n'est pas parce-que certain y arrivent qu'il faut en faire une recommandation.
4kB en Assembleur, ou quelques octets pour les builtins de l'OS
Some examples of why C is faster than Java, because C and algorithms
Only one same job at a time ?
My solution to deal with this is to bind an IPv6 localhost ::1 socket to a given port. Only one process can do this, and thus it’s a very effective mutex. No lock files to cause havoc, no dealing with the dark and buggy corners of advisory file locking.
For shell scripts, simply replace the #!/bin/sh with #!/somewhere/bin/lock 2048 where 2048 is the port number you will use to enforce the lock (greater than 1024 if you do not want to deal with the hassles of privileged ports).
Takeaways:
- It is possible to list a directory with 8 million files in it.
- strace is your friend
- Don't be afraid to compile code and modify it (hell, simple C compiles so fast it could be interpreted)
- There is no good reason to have 8 million files in a directory :-), but this was a good learning experience (and possibly a good interview question).
The last point is the most important :D