https://blog.twitter.com/engineering/en_us/topics/open-source/2021/dropping-cache-didnt-drop-cache.html

Eng_EXPLORE_Pink.png.twimg.768.png

The article requires some prerequisite knowledge about Linux kernel memory management subsystem.

Recently, while investigating an OOM (out-of-memory) problem, Twitter engineers found that the slab cache was increasing consistently, but the page cache was decreasing consistently. A closer look showed that the highest consumption of the slab cache was the dentry cache, and the dentry caches were charged to one memory control group (cgroup). It seems that the Linux kernel’s memory reclaimer only reclaimed the page cache, but didn’t reclaim the dentry cache at all.

We tried to drop the slab cache by running:

but surprisingly it didn’t drop the cache.

By debugging the problem, we found that there was a rare race condition in the shrinker code.

Background

To fully understand what prevented the slab cache from being dropped, we need to take a closer look at the Linux kernel internals, particularly the following memory management subsystem:

Memory reclaimer
Shrinker
Memory cgroups

The memory reclaimer

The memory resource for a machine is not infinite.There might be more demand than supply for memory resources if we run multiple programs simultaneously or run a very big program. Once available memory is low, the memory reclaimer (aka kswapd or direct reclaim) tries to evict “old” memory to free enough space to satisfy the allocators. The reclaimer scans lists (most of which are least-recently-used lists, aka LRU) to evict the not-recently-used memory. The lists include anonymous memory lists, file memory lists, and slab caches (Figure 1).

The reclaimer scans all the above lists, except the unevictable list. However, sometimes the reclaimer only scans a few of them. For example, the reclaimer does not scan the anonymous memory list if the swap partition is not available.

The shrinker

When the reclaimer is trying to reclaim slab caches, it needs to call shrinkers. Slab caches are memory objects with different sizes managed by different subsystems. For example, this includes the inode cache and dentry cache. Some slab caches are reclaimable. Since these caches hold the data for specific subsystems and have specific states, they can be reclaimed by dedicated shrinkers that understand how the caches are managed.

Filesystem metadata caches have a dedicated shrinker for each segment of metadata. The reclaimable filesystem metadata caches are in memory cgroup-aware and non-uniform memory access (NUMA) aware LRU lists. The shrinker scans these lists to reclaim some of the caches when memory pressure is hit.

Background

The memory reclaimer

The shrinker

The memory cgroup