armlint examines AArch64 machine code to find suboptimal instruction
sequences. For example, building the constant 0x66666666 as
movz w0, #0x6666
movk w0, #0x6666, lsl #16is two instructions where one would do, because 0x66666666 is encodable
as an AArch64 logical (bitmask) immediate:
mov w0, #0x66666666 ; orr w0, wzr, #0x66666666armlint helps compiler writers and assembly authors generate tighter code, and documents corners of the A64 instruction set.
armlint is a peephole analyzer. It decodes each 32-bit A64 instruction
directly from the binary and matches it by mask and value, resolving
aliases (for example MUL is MADD with a zero accumulator) so that
both spellings of a pattern are caught. It then looks for a short window
of adjacent instructions that a shorter or cheaper encoding can replace.
The overriding rule is soundness: armlint emits a finding only when the rewrite provably preserves the architectural result. For a tool that suggests code changes, a false positive is the worst failure, so it errs toward false negatives -- a missed opportunity is cheaper than a wrong one. Each check documents the exact conditions under which its rewrite is equivalent; the constraints below are the ones they share.
- Strict adjacency. A producer and its consumer must be consecutive; an unrelated instruction between them suppresses the finding. armlint does not reorder code or look through intervening instructions.
- Liveness is proved structurally, not analyzed. A producer-into-consumer fold fires only when the consumer overwrites the producer's destination register, proving the intermediate value is dead. There is no general-purpose register liveness pass.
- MOV-chain folds assume the constant is dead. Folds that absorb a
materialized constant --
MUL/MNEG/UDIVby a constant,MOV+ADD/AND/ORR/EOR,MOV #0-- report a saving only if the constant register feeds nothing else, which armlint cannot confirm without a liveness pass. The consumer rewrite itself stays valid regardless. - Flag liveness uses a bounded forward scan. The branch- and
flag-folding checks drop a
CMP/TSTonly after confirming that no later instruction reads N/C/V before they are overwritten, scanning the fall-through path for a limited window. The branch-target path is never followed, so a finding near a taken edge is suppressed rather than risked.
Findings are opportunities, not guaranteed speedups: some -- the pre- and post-indexed addressing folds -- are code-size and front-end wins that are backend-neutral. Each check's notes say what its rewrite actually saves.
Each row links to its full description -- mechanics, soundness, and what the rewrite saves -- in analyses.md.
armlint depends on Capstone and uses
pkg-config to locate it. On macOS:
brew install capstoneOn Debian/Ubuntu:
apt install libcapstone-dev pkg-configBuild:
git clone https://github.com/gaul/armlint.git armlint
cd armlint
make allTwo test suites are available. make test runs the unit tests against
fabricated byte sequences, exercising the check registry directly.
make integration-test runs the snapshot suite under fixtures/:
each .s is assembled with clang -arch arm64 and armlint's
output is diffed against a checked-in .expected file. The
integration suite covers the Mach-O parser and the report formatting,
which the unit tests bypass; it skips cleanly on hosts without an
arm64 toolchain. After an intentional output change, regenerate the
snapshots with make integration-test-regen and review the diff
before committing.
armlint is intended to be part of compiler test suites which should
#include "armlint.h" and link libarmlint.a. Disassemble the
just-emitted machine code with check_instructions; its return value is
the number of opportunities found, which a test can assert is zero:
#include "armlint.h" // also includes <capstone/capstone.h>
// code/code_len: the AArch64 bytes to check (e.g. a function the
// compiler just emitted); base_addr is the address they load at.
// Returns the opportunity count (0 == clean), or -1 on a decode error.
int lint(const uint8_t *code, size_t code_len, uint64_t base_addr)
{
csh handle;
if (cs_open(CS_ARCH_ARM64, CS_MODE_ARM, &handle) != CS_ERR_OK) {
return -1;
}
cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
armlint_summary *summary = armlint_summary_create();
int findings = check_instructions(
handle, code, code_len, base_addr, /*verbose=*/true, summary);
armlint_summary_print(summary); // optional by-type tally
armlint_summary_destroy(summary);
cs_close(&handle);
return findings;
}The summary is optional -- pass NULL to skip the by-type tally --
and verbose controls whether each opportunity is printed as it is
found. armlint can also read arbitrary AArch64 binaries (ELF, thin
Mach-O, or universal/fat Mach-O) directly:
./armlint /path/to/aarch64/binary
./armlint /bin/lsBy default armlint prints only a summary: the opportunities grouped by type and sorted by prevalence, so it is clear which to look at first, followed by a total and the number of instructions scanned. A large binary can have hundreds of thousands of opportunities, so the per-opportunity detail is suppressed unless requested:
$ ./armlint /bin/ls
Optimization opportunities by type:
39 ADD + LDR foldable to pre-indexed LDR
36 ADD + LDR foldable to immediate-offset LDR
1 adjacent STRs foldable into STP
76 optimization opportunities in 4153 instructionsPass -v to also print each opportunity -- its one-line summary plus
the offending instructions, as shown below -- ahead of the summary:
$ ./armlint -v /bin/ls
ADD + LDR foldable to immediate-offset LDR at offset: 0x60: -> ldr w8, [x8, #0x2c] (2 instructions)
add x8, x8, #0x2c
ldr w8, [x8]
...The process exits non-zero when any opportunity is found, so armlint can gate a compiler test suite.
- Arm Architecture Reference Manual - A64 instruction set
- Arm Cortex-A optimization guides - per-microarchitecture tuning notes
- Capstone disassembly framework - library to parse instructions
- x86lint - x86-64 equivalent of asmlint
Copyright (C) 2026 Andrew Gaul
Licensed under the Apache License, Version 2.0