blob: 75a1a70b5dfcdd74745d9beef49270271e411e5b [file] [log] [blame] [view]
Christopher Ferris4316d432019-06-27 00:08:23 -07001# Native Memory Allocator Verification
2This document describes how to verify the native memory allocator on Android.
3This procedure should be followed when upgrading or moving to a new allocator.
4A small minor upgrade might not need to run all of the benchmarks, however,
5at least the
6[SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark),
7[Memory Replay Benchmarks](#memory-replay-benchmarks) and
8[Performance Trace Benchmarks](#performance-trace-benchmarks) should be run.
9
10It is important to note that there are two modes for a native allocator
11to run in on Android. The first is the normal allocator, the second is
Christopher Ferris73f2ec22025-01-14 16:44:58 -080012called the low memory config, which is designed to run on memory constrained
13systems and be a bit slower, but take less RSS. To enable the low memory
14config, add this line to the `BoardConfig.mk` for the given target:
15
16 MALLOC_LOW_MEMORY := true
17
18This is valid starting with Android V (API level 35), before that the
19way to enable the low memory config is:
Christopher Ferris4316d432019-06-27 00:08:23 -070020
21 MALLOC_SVELTE := true
22
23The `BoardConfig.mk` file is usually found in the directory
24`device/<DEVICE_NAME>/` or in a sub directory.
25
26When evaluating a native allocator, make sure that you benchmark both
27versions.
28
29## Android Extensions
30Android supports a few non-standard functions and mallopt controls that
31a native allocator needs to implement.
32
33### Iterator Functions
34These are functions that are used to implement a memory leak detector
35called `libmemunreachable`.
36
37#### malloc\_disable
38This function, when called, should pause all threads that are making a
39call to an allocation function (malloc/free/etc). When a call
40is made to `malloc_enable`, the paused threads should start running again.
41
42#### malloc\_enable
43This function, when called, does nothing unless there was a previous call
44to `malloc_disable`. This call will unpause any thread which is making
45a call to an allocation function (malloc/free/etc) when `malloc_disable`
46was called previously.
47
48#### malloc\_iterate
49This function enumerates all of the allocations currently live in the
50system. It is meant to be called after a call to `malloc_disable` to
51prevent further allocations while this call is being executed. To
52see what is expected for this function, the best description is the
53tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`.
54
55### Mallopt Extensions
56These are mallopt options that Android requires for a native allocator
57to work efficiently.
58
59#### M\_DECAY\_TIME
60When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an
61allocator will attempt to purge and release any unused memory back to the
62kernel on free calls. This is important in Android to avoid consuming extra
Christopher Ferris05197f72019-08-07 14:27:52 -070063RSS.
Christopher Ferris4316d432019-06-27 00:08:23 -070064
65When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the
66purge and release action. The amount of delay is up to the allocator
67implementation, but it should be a reasonable amount of time. The jemalloc
68allocator was implemented to have a one second delay.
69
70The drawback to this option is that most allocators do not have a separate
71thread to handle the purge, so the decay is only handled when an
72allocation operation occurs. For server processes, this can mean that
Christopher Ferris05197f72019-08-07 14:27:52 -070073RSS is slightly higher when the server is waiting for the next connection
Christopher Ferris4316d432019-06-27 00:08:23 -070074and no other allocation calls are made. The `M_PURGE` option is used to
75force a purge in this case.
76
77For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is
78made by default. The idea is that it allows application frees to run a
Christopher Ferris05197f72019-08-07 14:27:52 -070079bit faster, while only increasing RSS a bit.
Christopher Ferris4316d432019-06-27 00:08:23 -070080
81#### M\_PURGE
82When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release
83any unused memory immediately. The argument for this call is ignored. If
84possible, this call should clear thread cached memory if it exists. The
85idea is that this can be called to purge memory that has not been
86purged when `M_DECAY_TIME` is set to one. This is useful if you have a
87server application that does a lot of native allocations and the
88application wants to purge that memory before waiting for the next connection.
89
90## Correctness Tests
91These are the tests that should be run to verify an allocator is
92working properly according to Android.
93
94### Bionic Unit Tests
95The bionic unit tests contain a small number of allocator tests. These
96tests are primarily verifying Android extensions and non-standard behavior
97of allocation routines such as what happens when a non-power of two alignment
98is passed to memalign.
99
100To run all of the compliance tests:
101
102 adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
103 adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
104
105The allocation tests are not meant to be complete, so it is expected
106that a native allocator will have its own set of tests that can be run.
107
Christopher Ferris51863b32019-10-25 15:24:16 -0700108### Libmemunreachable Tests
109The libmemunreachable tests verify that the iterator functions are working
110properly.
111
112To run all of the tests:
113
114 adb shell /data/nativetest64/memunreachable_binder_test/memunreachable_binder_test
115 adb shell /data/nativetest/memunreachable_binder_test/memunreachable_binder_test
116 adb shell /data/nativetest64/memunreachable_test/memunreachable_test
117 adb shell /data/nativetest/memunreachable_test/memunreachable_test
118 adb shell /data/nativetest64/memunreachable_unit_test/memunreachable_unit_test
119 adb shell /data/nativetest/memunreachable_unit_test/memunreachable_unit_test
120
Christopher Ferris4316d432019-06-27 00:08:23 -0700121### CTS Entropy Test
122In addition to the bionic tests, there is also a CTS test that is designed
123to verify that the addresses returned by malloc are sufficiently randomized
124to help defeat potential security bugs.
125
126Run this test thusly:
127
128 atest AslrMallocTest
129
130If there are multiple devices connected to the system, use `-s <SERIAL>`
131to specify a device.
132
133## Performance
134There are multiple different ways to evaluate the performance of a native
135allocator on Android. One is allocation speed in various different scenarios,
Christopher Ferris05197f72019-08-07 14:27:52 -0700136another is total RSS taken by the allocator.
Christopher Ferris4316d432019-06-27 00:08:23 -0700137
138The last is virtual address space consumed in 32 bit applications. There is
139a limited amount of address space available in 32 bit apps, and there have
140been allocator bugs that cause memory failures when too much virtual
141address space is consumed. For 64 bit executables, this can be ignored.
142
Christopher Ferris1cb99ae2025-02-13 14:35:50 -0800143NOTE: The default native allocator operates differently in an application
144versus command-line tools running in the shell. In order to run the same
145as an application, follow these instructions:
146
147 > adb shell
148 # export MALLOC_USE_APP_DEFAULTS=1
149 # <Run command-line benchmarks>
150
151Running without setting this environment variable can result in different
152performance and even different RSS usage for the benchmarks mentioned below.
153The environment variable has only been available since API level 36.
154Applications using different native allocator defaults than command-line
155tools has been present since API level 26 (Android O).
156
Christopher Ferris4316d432019-06-27 00:08:23 -0700157### Bionic Benchmarks
158These are the microbenchmarks that are part of the bionic benchmarks suite of
159benchmarks. These benchmarks can be built using this command:
160
161 mmma -j bionic/benchmarks
162
163These benchmarks are only used to verify the speed of the allocator and
Christopher Ferris05197f72019-08-07 14:27:52 -0700164ignore anything related to RSS and virtual address space consumed.
Christopher Ferris4316d432019-06-27 00:08:23 -0700165
Christopher Ferris75edf162019-11-13 13:55:17 -0800166For all of these benchmark runs, it can be useful to add these two options:
167
168 --benchmark_repetitions=XX
169 --benchmark_report_aggregates_only=true
170
171This will run the benchmark XX times and then give a mean, median, and stddev
172and helps to get a number that can be compared to the new allocator.
173
174In addition, there is another option:
175
176 --bionic_cpu=XX
177
178Which will lock the benchmark to only run on core XX. This also avoids
179any issue related to the code migrating from one core to another
180with different characteristics. For example, on a big-little cpu, if the
181benchmark moves from big to little or vice-versa, this can cause scores
Christopher Ferris5a3c9202019-12-04 15:57:07 -0800182to fluctuate in indeterminate ways.
Christopher Ferris75edf162019-11-13 13:55:17 -0800183
184For most runs, the best set of options to add is:
185
186 --benchmark_repetitions=10 --benchmark_report_aggregates_only=true --bionic_cpu=3
187
188On most phones with a big-little cpu, the third core is the little core.
189Choosing to run on the little core can tend to highlight any performance
190differences.
191
Christopher Ferris4316d432019-06-27 00:08:23 -0700192#### Allocate/Free Benchmarks
193These are the benchmarks to verify the allocation speed of a loop doing a
194single allocation, touching every page in the allocation to make it resident
195and then freeing the allocation.
196
197To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
198
199 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default
200 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default
201
202To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
203
204 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1
205 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1
206
207The last value in the output is the size of the allocation in bytes. It is
208useful to look at these kinds of benchmarks to make sure that there are
209no outliers, but these numbers should not be used to make a final decision.
210If these numbers are slightly worse than the current allocator, the
211single thread numbers from trace data is a better representative of
212real world situations.
213
214#### Multiple Allocations Retained Benchmarks
215These are the benchmarks that examine how the allocator handles multiple
216allocations of the same size at the same time.
217
218The first set of these benchmarks does a set number of 8192 byte allocations
219in one loop, and then frees all of the allocations at the end of the loop.
220Only the time it takes to do the allocations is recorded, the frees are not
221counted. The value of 8192 was chosen since the jemalloc native allocator
222had issues with this size. It is possible other sizes might show different
223results, but, as mentioned before, these microbenchmark numbers should
224not be used as absolutes for determining if an allocator is worth using.
225
226This benchmark is designed to verify that there is no performance issue
227related to having multiple allocations alive at the same time.
228
229To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
230
231 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
232 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
233
234To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
235
236 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
237 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
238
239For these benchmarks, the last parameter is the total number of allocations to
240do in each loop.
241
242The other variation of this benchmark is to always do forty allocations in
243each loop, but vary the size of the forty allocations. As with the other
244benchmark, only the time it takes to do the allocations is tracked, the
245frees are not counted. Forty allocations is an arbitrary number that could
246be modified in the future. It was chosen because a version of the native
247allocator, jemalloc, showed a problem at forty allocations.
248
249To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
250
251 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
252 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
253
254To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command:
255
256 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
257 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
258
259For these benchmarks, the last parameter in the output is the size of the
260allocation in bytes.
261
262As with the other microbenchmarks, an allocator with numbers in the same
263proximity of the current values is usually sufficient to consider making
264a switch. The trace benchmarks are more important than these benchmarks
265since they simulate real world allocation profiles.
266
267#### SQL Allocation Trace Benchmark
268This benchmark is a trace of the allocations performed when running
269the SQLite BenchMark app.
270
271This benchmark is designed to verify that the allocator will be performant
272in a real world allocation scenario. SQL operations were chosen as a
273benchmark because these operations tend to do lots of malloc/realloc/free
274calls, and they tend to be on the critical path of applications.
275
276To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
277
278 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
279 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
280
281To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
282
283 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
284 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
285
286These numbers should be as performant as the current allocator.
287
Christopher Ferris75edf162019-11-13 13:55:17 -0800288#### mallinfo Benchmark
289This benchmark only verifies that mallinfo is still close to the performance
290of the current allocator.
291
292To run the benchmark, use these commands:
293
294 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
295 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
296
297Calls to mallinfo are used in ART so a new allocator is required to be
298nearly as performant as the current allocator.
299
Christopher Ferris5a3c9202019-12-04 15:57:07 -0800300#### mallopt M\_PURGE Benchmark
301This benchmark tracks the cost of calling `mallopt(M_PURGE, 0)`. As with the
302mallinfo benchmark, it's not necessary for this to be better than the previous
303allocator, only that the performance be in the same order of magnitude.
304
305To run the benchmark, use these commands:
306
307 adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge
308 adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge
309
310These calls are used to free unused memory pages back to the kernel.
311
Christopher Ferris4316d432019-06-27 00:08:23 -0700312### Memory Trace Benchmarks
Christopher Ferris05197f72019-08-07 14:27:52 -0700313These benchmarks measure all three axes of a native allocator, RSS, virtual
Christopher Ferris4316d432019-06-27 00:08:23 -0700314address space consumed, speed of allocation. They are designed to
315run on a trace of the allocations from a real world application or system
316process.
317
318To build this benchmark:
319
320 mmma -j system/extras/memory_replay
321
322This will build two executables:
323
324 /system/bin/memory_replay32
325 /system/bin/memory_replay64
326
327And these two benchmark executables:
328
329 /data/benchmarktest64/trace_benchmark/trace_benchmark
330 /data/benchmarktest/trace_benchmark/trace_benchmark
331
332#### Memory Replay Benchmarks
Christopher Ferris05197f72019-08-07 14:27:52 -0700333These benchmarks display RSS, virtual memory consumed (VA space), and do a
Christopher Ferris4316d432019-06-27 00:08:23 -0700334bit of performance testing on actual traces taken from running applications.
335
336The trace data includes what thread does each operation, so the replay
337mechanism will simulate this by creating threads and replaying the operations
338on a thread as if it was rerunning the real trace. The only issue is that
339this is a worst case scenario for allocations happening at the same time
340in all threads since it collapses all of the allocation operations to occur
341one after another. This will cause a lot of threads allocating at the same
342time. The trace data does not include timestamps,
343so it is not possible to create a completely accurate replay.
344
Elliott Hughes9c06d162023-10-04 23:36:14 +0000345To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md),
346the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md#record_allocs_total_entries).
Christopher Ferris4316d432019-06-27 00:08:23 -0700347
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700348To run these benchmarks, first copy the trace files to the target using
349these commands:
Christopher Ferris4316d432019-06-27 00:08:23 -0700350
Peter Collingbourne7bdca8d2021-04-09 15:11:42 -0700351 adb push system/extras/memory_replay/traces /data/local/tmp
Christopher Ferris4316d432019-06-27 00:08:23 -0700352
353Since all of the traces come from applications, the `memory_replay` program
354will always call `mallopt(M_DECAY_TIME, 1)' before running the trace.
355
356Run the benchmark thusly:
357
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700358 adb shell memory_replay64 /data/local/tmp/traces/XXX.zip
359 adb shell memory_replay32 /data/local/tmp/traces/XXX.zip
Christopher Ferris4316d432019-06-27 00:08:23 -0700360
Christopher Ferris2f5fc332019-10-17 14:21:03 -0700361Where XXX.zip is the name of a zipped trace file. The `memory_replay`
362program also can process text files, but all trace files are currently
363checked in as zip files.
Christopher Ferris4316d432019-06-27 00:08:23 -0700364
Christopher Ferris05197f72019-08-07 14:27:52 -0700365Every 100000 allocation operations, a dump of the RSS and VA space will be
366performed. At the end, a final RSS and VA space number will be printed.
Christopher Ferris4316d432019-06-27 00:08:23 -0700367For the most part, the intermediate data can be ignored, but it is always
368a good idea to look over the data to verify that no strange spikes are
369occurring.
370
371The performance number is a measure of the time it takes to perform all of
372the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc).
373For any call that allocates a pointer, the time for the call and the time
374it takes to make the pointer completely resident in memory is included.
375
376The performance numbers for these runs tend to have a wide variability so
377they should not be used as absolute value for comparison against the
378current allocator. But, they should be in the same range as the current
379values.
380
381When evaluating an allocator, one of the most important traces is the
382camera.txt trace. The camera application does very large allocations,
383and some allocators might leave large virtual address maps around
384rather than delete them. When that happens, it can lead to allocation
385failures and would cause the camera app to abort/crash. It is
386important to verify that when running this trace using the 32 bit replay
387executable, the virtual address space consumed is not much larger than the
388current allocator. A small increase (on the order of a few MBs) would be okay.
389
Christopher Ferris05197f72019-08-07 14:27:52 -0700390There is no specific benchmark for memory fragmentation, instead, the RSS
391when running the memory traces acts as a proxy for this. An allocator that
392is fragmenting badly will show an increase in RSS. The best trace for
393tracking fragmentation is system\_server.txt which is an extremely long
394trace (~13 million operations). The total number of live allocations goes
395up and down a bit, but stays mostly the same so an allocator that fragments
396badly would likely show an abnormal increase in RSS on this trace.
397
Christopher Ferris4316d432019-06-27 00:08:23 -0700398NOTE: When a native allocator calls mmap, it is expected that the allocator
399will name the map using the call:
400
401 prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc");
402
403If the native allocator creates a different name, then it necessary to
404modify the file:
405
406 system/extras/memory_replay/NativeInfo.cpp
407
408The `GetNativeInfo` function needs to be modified to include the name
409of the maps that this allocator includes.
410
411In addition, in order for the frameworks code to keep track of the memory
412of a process, any named maps must be added to the file:
413
414 frameworks/base/core/jni/android_os_Debug.cpp
415
416Modify the `load_maps` function and add a check of the new expected name.
417
418#### Performance Trace Benchmarks
419This is a benchmark that treats the trace data as if all allocations
420occurred in a single thread. This is the scenario that could
421happen if all of the allocations are spaced out in time so no thread
422every does an allocation at the same time as another thread.
423
424Run these benchmarks thusly:
425
426 adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark
427 adb shell /data/benchmarktest/trace_benchmark/trace_benchmark
428
429When run without any arguments, the benchmark will run over all of the
430traces and display data. It takes many minutes to complete these runs in
431order to get as accurate a number as possible.