🌐 Blame - docs/native_allocator.md - platform/bionic - Git at Google

blob: 75a1a70b5dfcdd74745d9beef49270271e411e5b [file] [log] [blame] [view]

Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	1	# Native Memory Allocator Verification
				2	This document describes how to verify the native memory allocator on Android.
				3	This procedure should be followed when upgrading or moving to a new allocator.
				4	A small minor upgrade might not need to run all of the benchmarks, however,
				5	at least the
				6	[SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark),
				7	[Memory Replay Benchmarks](#memory-replay-benchmarks) and
				8	[Performance Trace Benchmarks](#performance-trace-benchmarks) should be run.
				9
				10	It is important to note that there are two modes for a native allocator
				11	to run in on Android. The first is the normal allocator, the second is
Christopher Ferris	73f2ec2	2025-01-14 16:44:58 -0800	[diff] [blame]	12	called the low memory config, which is designed to run on memory constrained
				13	systems and be a bit slower, but take less RSS. To enable the low memory
				14	config, add this line to the `BoardConfig.mk` for the given target:
				15
				16	MALLOC_LOW_MEMORY := true
				17
				18	This is valid starting with Android V (API level 35), before that the
				19	way to enable the low memory config is:
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	20
				21	MALLOC_SVELTE := true
				22
				23	The `BoardConfig.mk` file is usually found in the directory
				24	`device/<DEVICE_NAME>/` or in a sub directory.
				25
				26	When evaluating a native allocator, make sure that you benchmark both
				27	versions.
				28
				29	## Android Extensions
				30	Android supports a few non-standard functions and mallopt controls that
				31	a native allocator needs to implement.
				32
				33	### Iterator Functions
				34	These are functions that are used to implement a memory leak detector
				35	called `libmemunreachable`.
				36
				37	#### malloc\_disable
				38	This function, when called, should pause all threads that are making a
				39	call to an allocation function (malloc/free/etc). When a call
				40	is made to `malloc_enable`, the paused threads should start running again.
				41
				42	#### malloc\_enable
				43	This function, when called, does nothing unless there was a previous call
				44	to `malloc_disable`. This call will unpause any thread which is making
				45	a call to an allocation function (malloc/free/etc) when `malloc_disable`
				46	was called previously.
				47
				48	#### malloc\_iterate
				49	This function enumerates all of the allocations currently live in the
				50	system. It is meant to be called after a call to `malloc_disable` to
				51	prevent further allocations while this call is being executed. To
				52	see what is expected for this function, the best description is the
				53	tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`.
				54
				55	### Mallopt Extensions
				56	These are mallopt options that Android requires for a native allocator
				57	to work efficiently.
				58
				59	#### M\_DECAY\_TIME
				60	When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an
				61	allocator will attempt to purge and release any unused memory back to the
				62	kernel on free calls. This is important in Android to avoid consuming extra
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	63	RSS.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	64
				65	When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the
				66	purge and release action. The amount of delay is up to the allocator
				67	implementation, but it should be a reasonable amount of time. The jemalloc
				68	allocator was implemented to have a one second delay.
				69
				70	The drawback to this option is that most allocators do not have a separate
				71	thread to handle the purge, so the decay is only handled when an
				72	allocation operation occurs. For server processes, this can mean that
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	73	RSS is slightly higher when the server is waiting for the next connection
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	74	and no other allocation calls are made. The `M_PURGE` option is used to
				75	force a purge in this case.
				76
				77	For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is
				78	made by default. The idea is that it allows application frees to run a
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	79	bit faster, while only increasing RSS a bit.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	80
				81	#### M\_PURGE
				82	When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release
				83	any unused memory immediately. The argument for this call is ignored. If
				84	possible, this call should clear thread cached memory if it exists. The
				85	idea is that this can be called to purge memory that has not been
				86	purged when `M_DECAY_TIME` is set to one. This is useful if you have a
				87	server application that does a lot of native allocations and the
				88	application wants to purge that memory before waiting for the next connection.
				89
				90	## Correctness Tests
				91	These are the tests that should be run to verify an allocator is
				92	working properly according to Android.
				93
				94	### Bionic Unit Tests
				95	The bionic unit tests contain a small number of allocator tests. These
				96	tests are primarily verifying Android extensions and non-standard behavior
				97	of allocation routines such as what happens when a non-power of two alignment
				98	is passed to memalign.
				99
				100	To run all of the compliance tests:
				101
				102	adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
				103	adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*"
				104
				105	The allocation tests are not meant to be complete, so it is expected
				106	that a native allocator will have its own set of tests that can be run.
				107
Christopher Ferris	51863b3	2019-10-25 15:24:16 -0700	[diff] [blame]	108	### Libmemunreachable Tests
				109	The libmemunreachable tests verify that the iterator functions are working
				110	properly.
				111
				112	To run all of the tests:
				113
				114	adb shell /data/nativetest64/memunreachable_binder_test/memunreachable_binder_test
				115	adb shell /data/nativetest/memunreachable_binder_test/memunreachable_binder_test
				116	adb shell /data/nativetest64/memunreachable_test/memunreachable_test
				117	adb shell /data/nativetest/memunreachable_test/memunreachable_test
				118	adb shell /data/nativetest64/memunreachable_unit_test/memunreachable_unit_test
				119	adb shell /data/nativetest/memunreachable_unit_test/memunreachable_unit_test
				120
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	121	### CTS Entropy Test
				122	In addition to the bionic tests, there is also a CTS test that is designed
				123	to verify that the addresses returned by malloc are sufficiently randomized
				124	to help defeat potential security bugs.
				125
				126	Run this test thusly:
				127
				128	atest AslrMallocTest
				129
				130	If there are multiple devices connected to the system, use `-s <SERIAL>`
				131	to specify a device.
				132
				133	## Performance
				134	There are multiple different ways to evaluate the performance of a native
				135	allocator on Android. One is allocation speed in various different scenarios,
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	136	another is total RSS taken by the allocator.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	137
				138	The last is virtual address space consumed in 32 bit applications. There is
				139	a limited amount of address space available in 32 bit apps, and there have
				140	been allocator bugs that cause memory failures when too much virtual
				141	address space is consumed. For 64 bit executables, this can be ignored.
				142
Christopher Ferris	1cb99ae	2025-02-13 14:35:50 -0800	[diff] [blame]	143	NOTE: The default native allocator operates differently in an application
				144	versus command-line tools running in the shell. In order to run the same
				145	as an application, follow these instructions:
				146
				147	> adb shell
				148	# export MALLOC_USE_APP_DEFAULTS=1
				149	# <Run command-line benchmarks>
				150
				151	Running without setting this environment variable can result in different
				152	performance and even different RSS usage for the benchmarks mentioned below.
				153	The environment variable has only been available since API level 36.
				154	Applications using different native allocator defaults than command-line
				155	tools has been present since API level 26 (Android O).
				156
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	157	### Bionic Benchmarks
				158	These are the microbenchmarks that are part of the bionic benchmarks suite of
				159	benchmarks. These benchmarks can be built using this command:
				160
				161	mmma -j bionic/benchmarks
				162
				163	These benchmarks are only used to verify the speed of the allocator and
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	164	ignore anything related to RSS and virtual address space consumed.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	165
Christopher Ferris	75edf16	2019-11-13 13:55:17 -0800	[diff] [blame]	166	For all of these benchmark runs, it can be useful to add these two options:
				167
				168	--benchmark_repetitions=XX
				169	--benchmark_report_aggregates_only=true
				170
				171	This will run the benchmark XX times and then give a mean, median, and stddev
				172	and helps to get a number that can be compared to the new allocator.
				173
				174	In addition, there is another option:
				175
				176	--bionic_cpu=XX
				177
				178	Which will lock the benchmark to only run on core XX. This also avoids
				179	any issue related to the code migrating from one core to another
				180	with different characteristics. For example, on a big-little cpu, if the
				181	benchmark moves from big to little or vice-versa, this can cause scores
Christopher Ferris	5a3c920	2019-12-04 15:57:07 -0800	[diff] [blame]	182	to fluctuate in indeterminate ways.
Christopher Ferris	75edf16	2019-11-13 13:55:17 -0800	[diff] [blame]	183
				184	For most runs, the best set of options to add is:
				185
				186	--benchmark_repetitions=10 --benchmark_report_aggregates_only=true --bionic_cpu=3
				187
				188	On most phones with a big-little cpu, the third core is the little core.
				189	Choosing to run on the little core can tend to highlight any performance
				190	differences.
				191
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	192	#### Allocate/Free Benchmarks
				193	These are the benchmarks to verify the allocation speed of a loop doing a
				194	single allocation, touching every page in the allocation to make it resident
				195	and then freeing the allocation.
				196
				197	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				198
				199	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default
				200	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default
				201
				202	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				203
				204	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1
				205	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1
				206
				207	The last value in the output is the size of the allocation in bytes. It is
				208	useful to look at these kinds of benchmarks to make sure that there are
				209	no outliers, but these numbers should not be used to make a final decision.
				210	If these numbers are slightly worse than the current allocator, the
				211	single thread numbers from trace data is a better representative of
				212	real world situations.
				213
				214	#### Multiple Allocations Retained Benchmarks
				215	These are the benchmarks that examine how the allocator handles multiple
				216	allocations of the same size at the same time.
				217
				218	The first set of these benchmarks does a set number of 8192 byte allocations
				219	in one loop, and then frees all of the allocations at the end of the loop.
				220	Only the time it takes to do the allocations is recorded, the frees are not
				221	counted. The value of 8192 was chosen since the jemalloc native allocator
				222	had issues with this size. It is possible other sizes might show different
				223	results, but, as mentioned before, these microbenchmark numbers should
				224	not be used as absolutes for determining if an allocator is worth using.
				225
				226	This benchmark is designed to verify that there is no performance issue
				227	related to having multiple allocations alive at the same time.
				228
				229	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				230
				231	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
				232	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default
				233
				234	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				235
				236	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
				237	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1
				238
				239	For these benchmarks, the last parameter is the total number of allocations to
				240	do in each loop.
				241
				242	The other variation of this benchmark is to always do forty allocations in
				243	each loop, but vary the size of the forty allocations. As with the other
				244	benchmark, only the time it takes to do the allocations is tracked, the
				245	frees are not counted. Forty allocations is an arbitrary number that could
				246	be modified in the future. It was chosen because a version of the native
				247	allocator, jemalloc, showed a problem at forty allocations.
				248
				249	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				250
				251	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
				252	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default
				253
				254	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command:
				255
				256	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
				257	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1
				258
				259	For these benchmarks, the last parameter in the output is the size of the
				260	allocation in bytes.
				261
				262	As with the other microbenchmarks, an allocator with numbers in the same
				263	proximity of the current values is usually sufficient to consider making
				264	a switch. The trace benchmarks are more important than these benchmarks
				265	since they simulate real world allocation profiles.
				266
				267	#### SQL Allocation Trace Benchmark
				268	This benchmark is a trace of the allocations performed when running
				269	the SQLite BenchMark app.
				270
				271	This benchmark is designed to verify that the allocator will be performant
				272	in a real world allocation scenario. SQL operations were chosen as a
				273	benchmark because these operations tend to do lots of malloc/realloc/free
				274	calls, and they tend to be on the critical path of applications.
				275
				276	To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands:
				277
				278	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
				279	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default
				280
				281	To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands:
				282
				283	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
				284	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1
				285
				286	These numbers should be as performant as the current allocator.
				287
Christopher Ferris	75edf16	2019-11-13 13:55:17 -0800	[diff] [blame]	288	#### mallinfo Benchmark
				289	This benchmark only verifies that mallinfo is still close to the performance
				290	of the current allocator.
				291
				292	To run the benchmark, use these commands:
				293
				294	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
				295	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo
				296
				297	Calls to mallinfo are used in ART so a new allocator is required to be
				298	nearly as performant as the current allocator.
				299
Christopher Ferris	5a3c920	2019-12-04 15:57:07 -0800	[diff] [blame]	300	#### mallopt M\_PURGE Benchmark
				301	This benchmark tracks the cost of calling `mallopt(M_PURGE, 0)`. As with the
				302	mallinfo benchmark, it's not necessary for this to be better than the previous
				303	allocator, only that the performance be in the same order of magnitude.
				304
				305	To run the benchmark, use these commands:
				306
				307	adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge
				308	adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge
				309
				310	These calls are used to free unused memory pages back to the kernel.
				311
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	312	### Memory Trace Benchmarks
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	313	These benchmarks measure all three axes of a native allocator, RSS, virtual
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	314	address space consumed, speed of allocation. They are designed to
				315	run on a trace of the allocations from a real world application or system
				316	process.
				317
				318	To build this benchmark:
				319
				320	mmma -j system/extras/memory_replay
				321
				322	This will build two executables:
				323
				324	/system/bin/memory_replay32
				325	/system/bin/memory_replay64
				326
				327	And these two benchmark executables:
				328
				329	/data/benchmarktest64/trace_benchmark/trace_benchmark
				330	/data/benchmarktest/trace_benchmark/trace_benchmark
				331
				332	#### Memory Replay Benchmarks
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	333	These benchmarks display RSS, virtual memory consumed (VA space), and do a
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	334	bit of performance testing on actual traces taken from running applications.
				335
				336	The trace data includes what thread does each operation, so the replay
				337	mechanism will simulate this by creating threads and replaying the operations
				338	on a thread as if it was rerunning the real trace. The only issue is that
				339	this is a worst case scenario for allocations happening at the same time
				340	in all threads since it collapses all of the allocation operations to occur
				341	one after another. This will cause a lot of threads allocating at the same
				342	time. The trace data does not include timestamps,
				343	so it is not possible to create a completely accurate replay.
				344
Elliott Hughes	9c06d16	2023-10-04 23:36:14 +0000	[diff] [blame]	345	To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md),
				346	the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md#record_allocs_total_entries).
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	347
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	348	To run these benchmarks, first copy the trace files to the target using
				349	these commands:
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	350
Peter Collingbourne	7bdca8d	2021-04-09 15:11:42 -0700	[diff] [blame]	351	adb push system/extras/memory_replay/traces /data/local/tmp
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	352
				353	Since all of the traces come from applications, the `memory_replay` program
				354	will always call `mallopt(M_DECAY_TIME, 1)' before running the trace.
				355
				356	Run the benchmark thusly:
				357
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	358	adb shell memory_replay64 /data/local/tmp/traces/XXX.zip
				359	adb shell memory_replay32 /data/local/tmp/traces/XXX.zip
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	360
Christopher Ferris	2f5fc33	2019-10-17 14:21:03 -0700	[diff] [blame]	361	Where XXX.zip is the name of a zipped trace file. The `memory_replay`
				362	program also can process text files, but all trace files are currently
				363	checked in as zip files.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	364
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	365	Every 100000 allocation operations, a dump of the RSS and VA space will be
				366	performed. At the end, a final RSS and VA space number will be printed.
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	367	For the most part, the intermediate data can be ignored, but it is always
				368	a good idea to look over the data to verify that no strange spikes are
				369	occurring.
				370
				371	The performance number is a measure of the time it takes to perform all of
				372	the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc).
				373	For any call that allocates a pointer, the time for the call and the time
				374	it takes to make the pointer completely resident in memory is included.
				375
				376	The performance numbers for these runs tend to have a wide variability so
				377	they should not be used as absolute value for comparison against the
				378	current allocator. But, they should be in the same range as the current
				379	values.
				380
				381	When evaluating an allocator, one of the most important traces is the
				382	camera.txt trace. The camera application does very large allocations,
				383	and some allocators might leave large virtual address maps around
				384	rather than delete them. When that happens, it can lead to allocation
				385	failures and would cause the camera app to abort/crash. It is
				386	important to verify that when running this trace using the 32 bit replay
				387	executable, the virtual address space consumed is not much larger than the
				388	current allocator. A small increase (on the order of a few MBs) would be okay.
				389
Christopher Ferris	05197f7	2019-08-07 14:27:52 -0700	[diff] [blame]	390	There is no specific benchmark for memory fragmentation, instead, the RSS
				391	when running the memory traces acts as a proxy for this. An allocator that
				392	is fragmenting badly will show an increase in RSS. The best trace for
				393	tracking fragmentation is system\_server.txt which is an extremely long
				394	trace (~13 million operations). The total number of live allocations goes
				395	up and down a bit, but stays mostly the same so an allocator that fragments
				396	badly would likely show an abnormal increase in RSS on this trace.
				397
Christopher Ferris	4316d43	2019-06-27 00:08:23 -0700	[diff] [blame]	398	NOTE: When a native allocator calls mmap, it is expected that the allocator
				399	will name the map using the call:
				400
				401	prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc");
				402
				403	If the native allocator creates a different name, then it necessary to
				404	modify the file:
				405
				406	system/extras/memory_replay/NativeInfo.cpp
				407
				408	The `GetNativeInfo` function needs to be modified to include the name
				409	of the maps that this allocator includes.
				410
				411	In addition, in order for the frameworks code to keep track of the memory
				412	of a process, any named maps must be added to the file:
				413
				414	frameworks/base/core/jni/android_os_Debug.cpp
				415
				416	Modify the `load_maps` function and add a check of the new expected name.
				417
				418	#### Performance Trace Benchmarks
				419	This is a benchmark that treats the trace data as if all allocations
				420	occurred in a single thread. This is the scenario that could
				421	happen if all of the allocations are spaced out in time so no thread
				422	every does an allocation at the same time as another thread.
				423
				424	Run these benchmarks thusly:
				425
				426	adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark
				427	adb shell /data/benchmarktest/trace_benchmark/trace_benchmark
				428
				429	When run without any arguments, the benchmark will run over all of the
				430	traces and display data. It takes many minutes to complete these runs in
				431	order to get as accurate a number as possible.