kernel/fork: group allocation/free of per-cpu counters for mm struct
authorMateusz Guzik <mjguzik@gmail.com>
Wed, 23 Aug 2023 05:06:09 +0000 (07:06 +0200)
committerDennis Zhou <dennis@kernel.org>
Fri, 25 Aug 2023 15:10:35 +0000 (08:10 -0700)
commit14ef95be6f5558fb9e43aaf06ef9a1d6e0cae6c8
tree7abdf1224e08569f9a4dbd499192deeaba8729ef
parentc439d5e8a0deb7310b5bb4e5f2fe47c40ff5297f
kernel/fork: group allocation/free of per-cpu counters for mm struct

A trivial execve scalability test which tries to be very friendly
(statically linked binaries, all separate) is predominantly bottlenecked
by back-to-back per-cpu counter allocations which serialize on global
locks.

Ease the pain by allocating and freeing them in one go.

Bench can be found here:
http://apollo.backplane.com/DFlyMisc/doexec.c

$ cc -static -O2 -o static-doexec doexec.c
$ ./static-doexec $(nproc)

Even at a very modest scale of 26 cores (ops/s):
before: 133543.63
after: 186061.81 (+39%)

While with the patch these allocations remain a significant problem,
the primary bottleneck shifts to page release handling.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20230823050609.2228718-3-mjguzik@gmail.com
[Dennis: reflowed 1 line]
Signed-off-by: Dennis Zhou <dennis@kernel.org>
kernel/fork.c