Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 1 | # Native Memory Allocator Verification |
| 2 | This document describes how to verify the native memory allocator on Android. |
| 3 | This procedure should be followed when upgrading or moving to a new allocator. |
| 4 | A small minor upgrade might not need to run all of the benchmarks, however, |
| 5 | at least the |
| 6 | [SQL Allocation Trace Benchmark](#sql-allocation-trace-benchmark), |
| 7 | [Memory Replay Benchmarks](#memory-replay-benchmarks) and |
| 8 | [Performance Trace Benchmarks](#performance-trace-benchmarks) should be run. |
| 9 | |
| 10 | It is important to note that there are two modes for a native allocator |
| 11 | to run in on Android. The first is the normal allocator, the second is |
Christopher Ferris | 73f2ec2 | 2025-01-14 16:44:58 -0800 | [diff] [blame] | 12 | called the low memory config, which is designed to run on memory constrained |
| 13 | systems and be a bit slower, but take less RSS. To enable the low memory |
| 14 | config, add this line to the `BoardConfig.mk` for the given target: |
| 15 | |
| 16 | MALLOC_LOW_MEMORY := true |
| 17 | |
| 18 | This is valid starting with Android V (API level 35), before that the |
| 19 | way to enable the low memory config is: |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 20 | |
| 21 | MALLOC_SVELTE := true |
| 22 | |
| 23 | The `BoardConfig.mk` file is usually found in the directory |
| 24 | `device/<DEVICE_NAME>/` or in a sub directory. |
| 25 | |
| 26 | When evaluating a native allocator, make sure that you benchmark both |
| 27 | versions. |
| 28 | |
| 29 | ## Android Extensions |
| 30 | Android supports a few non-standard functions and mallopt controls that |
| 31 | a native allocator needs to implement. |
| 32 | |
| 33 | ### Iterator Functions |
| 34 | These are functions that are used to implement a memory leak detector |
| 35 | called `libmemunreachable`. |
| 36 | |
| 37 | #### malloc\_disable |
| 38 | This function, when called, should pause all threads that are making a |
| 39 | call to an allocation function (malloc/free/etc). When a call |
| 40 | is made to `malloc_enable`, the paused threads should start running again. |
| 41 | |
| 42 | #### malloc\_enable |
| 43 | This function, when called, does nothing unless there was a previous call |
| 44 | to `malloc_disable`. This call will unpause any thread which is making |
| 45 | a call to an allocation function (malloc/free/etc) when `malloc_disable` |
| 46 | was called previously. |
| 47 | |
| 48 | #### malloc\_iterate |
| 49 | This function enumerates all of the allocations currently live in the |
| 50 | system. It is meant to be called after a call to `malloc_disable` to |
| 51 | prevent further allocations while this call is being executed. To |
| 52 | see what is expected for this function, the best description is the |
| 53 | tests for this funcion in `bionic/tests/malloc_itearte_test.cpp`. |
| 54 | |
| 55 | ### Mallopt Extensions |
| 56 | These are mallopt options that Android requires for a native allocator |
| 57 | to work efficiently. |
| 58 | |
| 59 | #### M\_DECAY\_TIME |
| 60 | When set to zero, `mallopt(M_DECAY_TIME, 0)`, it is expected that an |
| 61 | allocator will attempt to purge and release any unused memory back to the |
| 62 | kernel on free calls. This is important in Android to avoid consuming extra |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 63 | RSS. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 64 | |
| 65 | When set to non-zero, `mallopt(M_DECAY_TIME, 1)`, an allocator can delay the |
| 66 | purge and release action. The amount of delay is up to the allocator |
| 67 | implementation, but it should be a reasonable amount of time. The jemalloc |
| 68 | allocator was implemented to have a one second delay. |
| 69 | |
| 70 | The drawback to this option is that most allocators do not have a separate |
| 71 | thread to handle the purge, so the decay is only handled when an |
| 72 | allocation operation occurs. For server processes, this can mean that |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 73 | RSS is slightly higher when the server is waiting for the next connection |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 74 | and no other allocation calls are made. The `M_PURGE` option is used to |
| 75 | force a purge in this case. |
| 76 | |
| 77 | For all applications on Android, the call `mallopt(M_DECAY_TIME, 1)` is |
| 78 | made by default. The idea is that it allows application frees to run a |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 79 | bit faster, while only increasing RSS a bit. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 80 | |
| 81 | #### M\_PURGE |
| 82 | When called, `mallopt(M_PURGE, 0)`, an allocator should purge and release |
| 83 | any unused memory immediately. The argument for this call is ignored. If |
| 84 | possible, this call should clear thread cached memory if it exists. The |
| 85 | idea is that this can be called to purge memory that has not been |
| 86 | purged when `M_DECAY_TIME` is set to one. This is useful if you have a |
| 87 | server application that does a lot of native allocations and the |
| 88 | application wants to purge that memory before waiting for the next connection. |
| 89 | |
| 90 | ## Correctness Tests |
| 91 | These are the tests that should be run to verify an allocator is |
| 92 | working properly according to Android. |
| 93 | |
| 94 | ### Bionic Unit Tests |
| 95 | The bionic unit tests contain a small number of allocator tests. These |
| 96 | tests are primarily verifying Android extensions and non-standard behavior |
| 97 | of allocation routines such as what happens when a non-power of two alignment |
| 98 | is passed to memalign. |
| 99 | |
| 100 | To run all of the compliance tests: |
| 101 | |
| 102 | adb shell /data/nativetest64/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*" |
| 103 | adb shell /data/nativetest/bionic-unit-tests/bionic-unit-tests --gtest_filter="malloc*" |
| 104 | |
| 105 | The allocation tests are not meant to be complete, so it is expected |
| 106 | that a native allocator will have its own set of tests that can be run. |
| 107 | |
Christopher Ferris | 51863b3 | 2019-10-25 15:24:16 -0700 | [diff] [blame] | 108 | ### Libmemunreachable Tests |
| 109 | The libmemunreachable tests verify that the iterator functions are working |
| 110 | properly. |
| 111 | |
| 112 | To run all of the tests: |
| 113 | |
| 114 | adb shell /data/nativetest64/memunreachable_binder_test/memunreachable_binder_test |
| 115 | adb shell /data/nativetest/memunreachable_binder_test/memunreachable_binder_test |
| 116 | adb shell /data/nativetest64/memunreachable_test/memunreachable_test |
| 117 | adb shell /data/nativetest/memunreachable_test/memunreachable_test |
| 118 | adb shell /data/nativetest64/memunreachable_unit_test/memunreachable_unit_test |
| 119 | adb shell /data/nativetest/memunreachable_unit_test/memunreachable_unit_test |
| 120 | |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 121 | ### CTS Entropy Test |
| 122 | In addition to the bionic tests, there is also a CTS test that is designed |
| 123 | to verify that the addresses returned by malloc are sufficiently randomized |
| 124 | to help defeat potential security bugs. |
| 125 | |
| 126 | Run this test thusly: |
| 127 | |
| 128 | atest AslrMallocTest |
| 129 | |
| 130 | If there are multiple devices connected to the system, use `-s <SERIAL>` |
| 131 | to specify a device. |
| 132 | |
| 133 | ## Performance |
| 134 | There are multiple different ways to evaluate the performance of a native |
| 135 | allocator on Android. One is allocation speed in various different scenarios, |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 136 | another is total RSS taken by the allocator. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 137 | |
| 138 | The last is virtual address space consumed in 32 bit applications. There is |
| 139 | a limited amount of address space available in 32 bit apps, and there have |
| 140 | been allocator bugs that cause memory failures when too much virtual |
| 141 | address space is consumed. For 64 bit executables, this can be ignored. |
| 142 | |
Christopher Ferris | 1cb99ae | 2025-02-13 14:35:50 -0800 | [diff] [blame] | 143 | NOTE: The default native allocator operates differently in an application |
| 144 | versus command-line tools running in the shell. In order to run the same |
| 145 | as an application, follow these instructions: |
| 146 | |
| 147 | > adb shell |
| 148 | # export MALLOC_USE_APP_DEFAULTS=1 |
| 149 | # <Run command-line benchmarks> |
| 150 | |
| 151 | Running without setting this environment variable can result in different |
| 152 | performance and even different RSS usage for the benchmarks mentioned below. |
| 153 | The environment variable has only been available since API level 36. |
| 154 | Applications using different native allocator defaults than command-line |
| 155 | tools has been present since API level 26 (Android O). |
| 156 | |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 157 | ### Bionic Benchmarks |
| 158 | These are the microbenchmarks that are part of the bionic benchmarks suite of |
| 159 | benchmarks. These benchmarks can be built using this command: |
| 160 | |
| 161 | mmma -j bionic/benchmarks |
| 162 | |
| 163 | These benchmarks are only used to verify the speed of the allocator and |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 164 | ignore anything related to RSS and virtual address space consumed. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 165 | |
Christopher Ferris | 75edf16 | 2019-11-13 13:55:17 -0800 | [diff] [blame] | 166 | For all of these benchmark runs, it can be useful to add these two options: |
| 167 | |
| 168 | --benchmark_repetitions=XX |
| 169 | --benchmark_report_aggregates_only=true |
| 170 | |
| 171 | This will run the benchmark XX times and then give a mean, median, and stddev |
| 172 | and helps to get a number that can be compared to the new allocator. |
| 173 | |
| 174 | In addition, there is another option: |
| 175 | |
| 176 | --bionic_cpu=XX |
| 177 | |
| 178 | Which will lock the benchmark to only run on core XX. This also avoids |
| 179 | any issue related to the code migrating from one core to another |
| 180 | with different characteristics. For example, on a big-little cpu, if the |
| 181 | benchmark moves from big to little or vice-versa, this can cause scores |
Christopher Ferris | 5a3c920 | 2019-12-04 15:57:07 -0800 | [diff] [blame] | 182 | to fluctuate in indeterminate ways. |
Christopher Ferris | 75edf16 | 2019-11-13 13:55:17 -0800 | [diff] [blame] | 183 | |
| 184 | For most runs, the best set of options to add is: |
| 185 | |
| 186 | --benchmark_repetitions=10 --benchmark_report_aggregates_only=true --bionic_cpu=3 |
| 187 | |
| 188 | On most phones with a big-little cpu, the third core is the little core. |
| 189 | Choosing to run on the little core can tend to highlight any performance |
| 190 | differences. |
| 191 | |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 192 | #### Allocate/Free Benchmarks |
| 193 | These are the benchmarks to verify the allocation speed of a loop doing a |
| 194 | single allocation, touching every page in the allocation to make it resident |
| 195 | and then freeing the allocation. |
| 196 | |
| 197 | To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands: |
| 198 | |
| 199 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_default |
| 200 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_default |
| 201 | |
| 202 | To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands: |
| 203 | |
| 204 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_free_decay1 |
| 205 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_free_decay1 |
| 206 | |
| 207 | The last value in the output is the size of the allocation in bytes. It is |
| 208 | useful to look at these kinds of benchmarks to make sure that there are |
| 209 | no outliers, but these numbers should not be used to make a final decision. |
| 210 | If these numbers are slightly worse than the current allocator, the |
| 211 | single thread numbers from trace data is a better representative of |
| 212 | real world situations. |
| 213 | |
| 214 | #### Multiple Allocations Retained Benchmarks |
| 215 | These are the benchmarks that examine how the allocator handles multiple |
| 216 | allocations of the same size at the same time. |
| 217 | |
| 218 | The first set of these benchmarks does a set number of 8192 byte allocations |
| 219 | in one loop, and then frees all of the allocations at the end of the loop. |
| 220 | Only the time it takes to do the allocations is recorded, the frees are not |
| 221 | counted. The value of 8192 was chosen since the jemalloc native allocator |
| 222 | had issues with this size. It is possible other sizes might show different |
| 223 | results, but, as mentioned before, these microbenchmark numbers should |
| 224 | not be used as absolutes for determining if an allocator is worth using. |
| 225 | |
| 226 | This benchmark is designed to verify that there is no performance issue |
| 227 | related to having multiple allocations alive at the same time. |
| 228 | |
| 229 | To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands: |
| 230 | |
| 231 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default |
| 232 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_default |
| 233 | |
| 234 | To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands: |
| 235 | |
| 236 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1 |
| 237 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_multiple_8192_allocs_decay1 |
| 238 | |
| 239 | For these benchmarks, the last parameter is the total number of allocations to |
| 240 | do in each loop. |
| 241 | |
| 242 | The other variation of this benchmark is to always do forty allocations in |
| 243 | each loop, but vary the size of the forty allocations. As with the other |
| 244 | benchmark, only the time it takes to do the allocations is tracked, the |
| 245 | frees are not counted. Forty allocations is an arbitrary number that could |
| 246 | be modified in the future. It was chosen because a version of the native |
| 247 | allocator, jemalloc, showed a problem at forty allocations. |
| 248 | |
| 249 | To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands: |
| 250 | |
| 251 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default |
| 252 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_default |
| 253 | |
| 254 | To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these command: |
| 255 | |
| 256 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1 |
| 257 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=stdlib_malloc_forty_decay1 |
| 258 | |
| 259 | For these benchmarks, the last parameter in the output is the size of the |
| 260 | allocation in bytes. |
| 261 | |
| 262 | As with the other microbenchmarks, an allocator with numbers in the same |
| 263 | proximity of the current values is usually sufficient to consider making |
| 264 | a switch. The trace benchmarks are more important than these benchmarks |
| 265 | since they simulate real world allocation profiles. |
| 266 | |
| 267 | #### SQL Allocation Trace Benchmark |
| 268 | This benchmark is a trace of the allocations performed when running |
| 269 | the SQLite BenchMark app. |
| 270 | |
| 271 | This benchmark is designed to verify that the allocator will be performant |
| 272 | in a real world allocation scenario. SQL operations were chosen as a |
| 273 | benchmark because these operations tend to do lots of malloc/realloc/free |
| 274 | calls, and they tend to be on the critical path of applications. |
| 275 | |
| 276 | To run the benchmarks with `mallopt(M_DECAY_TIME, 0)`, use these commands: |
| 277 | |
| 278 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default |
| 279 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_default |
| 280 | |
| 281 | To run the benchmarks with `mallopt(M_DECAY_TIME, 1)`, use these commands: |
| 282 | |
| 283 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1 |
| 284 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=malloc_sql_trace_decay1 |
| 285 | |
| 286 | These numbers should be as performant as the current allocator. |
| 287 | |
Christopher Ferris | 75edf16 | 2019-11-13 13:55:17 -0800 | [diff] [blame] | 288 | #### mallinfo Benchmark |
| 289 | This benchmark only verifies that mallinfo is still close to the performance |
| 290 | of the current allocator. |
| 291 | |
| 292 | To run the benchmark, use these commands: |
| 293 | |
| 294 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo |
| 295 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallinfo |
| 296 | |
| 297 | Calls to mallinfo are used in ART so a new allocator is required to be |
| 298 | nearly as performant as the current allocator. |
| 299 | |
Christopher Ferris | 5a3c920 | 2019-12-04 15:57:07 -0800 | [diff] [blame] | 300 | #### mallopt M\_PURGE Benchmark |
| 301 | This benchmark tracks the cost of calling `mallopt(M_PURGE, 0)`. As with the |
| 302 | mallinfo benchmark, it's not necessary for this to be better than the previous |
| 303 | allocator, only that the performance be in the same order of magnitude. |
| 304 | |
| 305 | To run the benchmark, use these commands: |
| 306 | |
| 307 | adb shell /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge |
| 308 | adb shell /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --benchmark_filter=BM_mallopt_purge |
| 309 | |
| 310 | These calls are used to free unused memory pages back to the kernel. |
| 311 | |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 312 | ### Memory Trace Benchmarks |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 313 | These benchmarks measure all three axes of a native allocator, RSS, virtual |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 314 | address space consumed, speed of allocation. They are designed to |
| 315 | run on a trace of the allocations from a real world application or system |
| 316 | process. |
| 317 | |
| 318 | To build this benchmark: |
| 319 | |
| 320 | mmma -j system/extras/memory_replay |
| 321 | |
| 322 | This will build two executables: |
| 323 | |
| 324 | /system/bin/memory_replay32 |
| 325 | /system/bin/memory_replay64 |
| 326 | |
| 327 | And these two benchmark executables: |
| 328 | |
| 329 | /data/benchmarktest64/trace_benchmark/trace_benchmark |
| 330 | /data/benchmarktest/trace_benchmark/trace_benchmark |
| 331 | |
| 332 | #### Memory Replay Benchmarks |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 333 | These benchmarks display RSS, virtual memory consumed (VA space), and do a |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 334 | bit of performance testing on actual traces taken from running applications. |
| 335 | |
| 336 | The trace data includes what thread does each operation, so the replay |
| 337 | mechanism will simulate this by creating threads and replaying the operations |
| 338 | on a thread as if it was rerunning the real trace. The only issue is that |
| 339 | this is a worst case scenario for allocations happening at the same time |
| 340 | in all threads since it collapses all of the allocation operations to occur |
| 341 | one after another. This will cause a lot of threads allocating at the same |
| 342 | time. The trace data does not include timestamps, |
| 343 | so it is not possible to create a completely accurate replay. |
| 344 | |
Elliott Hughes | 9c06d16 | 2023-10-04 23:36:14 +0000 | [diff] [blame] | 345 | To generate these traces, see the [Malloc Debug documentation](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md), |
| 346 | the option [record\_allocs](https://android.googlesource.com/platform/bionic/+/main/libc/malloc_debug/README.md#record_allocs_total_entries). |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 347 | |
Christopher Ferris | 2f5fc33 | 2019-10-17 14:21:03 -0700 | [diff] [blame] | 348 | To run these benchmarks, first copy the trace files to the target using |
| 349 | these commands: |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 350 | |
Peter Collingbourne | 7bdca8d | 2021-04-09 15:11:42 -0700 | [diff] [blame] | 351 | adb push system/extras/memory_replay/traces /data/local/tmp |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 352 | |
| 353 | Since all of the traces come from applications, the `memory_replay` program |
| 354 | will always call `mallopt(M_DECAY_TIME, 1)' before running the trace. |
| 355 | |
| 356 | Run the benchmark thusly: |
| 357 | |
Christopher Ferris | 2f5fc33 | 2019-10-17 14:21:03 -0700 | [diff] [blame] | 358 | adb shell memory_replay64 /data/local/tmp/traces/XXX.zip |
| 359 | adb shell memory_replay32 /data/local/tmp/traces/XXX.zip |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 360 | |
Christopher Ferris | 2f5fc33 | 2019-10-17 14:21:03 -0700 | [diff] [blame] | 361 | Where XXX.zip is the name of a zipped trace file. The `memory_replay` |
| 362 | program also can process text files, but all trace files are currently |
| 363 | checked in as zip files. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 364 | |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 365 | Every 100000 allocation operations, a dump of the RSS and VA space will be |
| 366 | performed. At the end, a final RSS and VA space number will be printed. |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 367 | For the most part, the intermediate data can be ignored, but it is always |
| 368 | a good idea to look over the data to verify that no strange spikes are |
| 369 | occurring. |
| 370 | |
| 371 | The performance number is a measure of the time it takes to perform all of |
| 372 | the allocation calls (malloc/memalign/posix_memalign/realloc/free/etc). |
| 373 | For any call that allocates a pointer, the time for the call and the time |
| 374 | it takes to make the pointer completely resident in memory is included. |
| 375 | |
| 376 | The performance numbers for these runs tend to have a wide variability so |
| 377 | they should not be used as absolute value for comparison against the |
| 378 | current allocator. But, they should be in the same range as the current |
| 379 | values. |
| 380 | |
| 381 | When evaluating an allocator, one of the most important traces is the |
| 382 | camera.txt trace. The camera application does very large allocations, |
| 383 | and some allocators might leave large virtual address maps around |
| 384 | rather than delete them. When that happens, it can lead to allocation |
| 385 | failures and would cause the camera app to abort/crash. It is |
| 386 | important to verify that when running this trace using the 32 bit replay |
| 387 | executable, the virtual address space consumed is not much larger than the |
| 388 | current allocator. A small increase (on the order of a few MBs) would be okay. |
| 389 | |
Christopher Ferris | 05197f7 | 2019-08-07 14:27:52 -0700 | [diff] [blame] | 390 | There is no specific benchmark for memory fragmentation, instead, the RSS |
| 391 | when running the memory traces acts as a proxy for this. An allocator that |
| 392 | is fragmenting badly will show an increase in RSS. The best trace for |
| 393 | tracking fragmentation is system\_server.txt which is an extremely long |
| 394 | trace (~13 million operations). The total number of live allocations goes |
| 395 | up and down a bit, but stays mostly the same so an allocator that fragments |
| 396 | badly would likely show an abnormal increase in RSS on this trace. |
| 397 | |
Christopher Ferris | 4316d43 | 2019-06-27 00:08:23 -0700 | [diff] [blame] | 398 | NOTE: When a native allocator calls mmap, it is expected that the allocator |
| 399 | will name the map using the call: |
| 400 | |
| 401 | prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, <PTR>, <SIZE>, "libc_malloc"); |
| 402 | |
| 403 | If the native allocator creates a different name, then it necessary to |
| 404 | modify the file: |
| 405 | |
| 406 | system/extras/memory_replay/NativeInfo.cpp |
| 407 | |
| 408 | The `GetNativeInfo` function needs to be modified to include the name |
| 409 | of the maps that this allocator includes. |
| 410 | |
| 411 | In addition, in order for the frameworks code to keep track of the memory |
| 412 | of a process, any named maps must be added to the file: |
| 413 | |
| 414 | frameworks/base/core/jni/android_os_Debug.cpp |
| 415 | |
| 416 | Modify the `load_maps` function and add a check of the new expected name. |
| 417 | |
| 418 | #### Performance Trace Benchmarks |
| 419 | This is a benchmark that treats the trace data as if all allocations |
| 420 | occurred in a single thread. This is the scenario that could |
| 421 | happen if all of the allocations are spaced out in time so no thread |
| 422 | every does an allocation at the same time as another thread. |
| 423 | |
| 424 | Run these benchmarks thusly: |
| 425 | |
| 426 | adb shell /data/benchmarktest64/trace_benchmark/trace_benchmark |
| 427 | adb shell /data/benchmarktest/trace_benchmark/trace_benchmark |
| 428 | |
| 429 | When run without any arguments, the benchmark will run over all of the |
| 430 | traces and display data. It takes many minutes to complete these runs in |
| 431 | order to get as accurate a number as possible. |