Build Configuration - The Rust Performance Book (2024)

You can drastically change the performance of a Rust program without changingits code, just by changing its build configuration. There are many possiblebuild configurations for each Rust program. The one chosen will affect severalcharacteristics of the compiled code, such as compile times, runtime speed,memory use, binary size, debuggability, profilability, and which architecturesyour compiled program will run on.

Most configuration choices will improve one or more characteristics whileworsening one or more others. For example, a common trade-off is to acceptworse compile times in exchange for higher runtime speeds. The right choicefor your program depends on your needs and the specifics of your program, andperformance-related choices (which is most of them) should be validated withbenchmarking.

It is worth reading this chapter carefully to understand all the buildconfiguration choices. However, for the impatient or forgetful,cargo-wizard encapsulates this information and can help you choose anappropriate build configuration.

Note that Cargo only looks at the profile settings in the Cargo.toml file atthe root of the workspace. Profile settings defined in dependencies areignored. Therefore, these options are mostly relevant for binary crates, notlibrary crates.

Release Builds

The single most important build configuration choice is simple but easy tooverlook: make sure you are using a release build rather than a dev buildwhen you want high performance. This is usually done by specifying the--release flag to Cargo.

Dev builds are the default. They are good for debugging, but are not optimized.They are produced if you run cargo build or cargo run. (Alternatively,running rustc without additional options also produces an unoptimized build.)

Consider the following final line of output from a cargo build run.

Finished dev [unoptimized + debuginfo] target(s) in 29.80s

This output indicates that a dev build has been produced. The compiled codewill be placed in the target/debug/ directory. cargo run will run the devbuild.

In comparison, release builds are much more optimized, omit debug assertionsand integer overflow checks, and omit debug info. 10-100x speedups over devbuilds are common! They are produced if you run cargo build --release orcargo run --release. (Alternatively, rustc has multiple options foroptimized builds, such as -O and -C opt-level.) This will typically takelonger than a dev build because of the additional optimizations.

Consider the following final line of output from a cargo build --release run.

Finished release [optimized] target(s) in 1m 01s

This output indicates that a release build has been produced. The compiled codewill be placed in the target/release/ directory. cargo run --release willrun the release build.

See the Cargo profile documentation for more details about the differencesbetween dev builds (which use the dev profile) and release builds (which usethe release profile).

The default build configuration choices used in release builds provide a goodbalance between the abovementioned characteristics such as compile times, runtimespeed, and binary size. But there are many possible adjustments, as thefollowing sections explain.

Maximizing Runtime Speed

The following build configuration options are designed primarily to maximizeruntime speed. Some of them may also reduce binary size.

Codegen Units

The Rust compiler splits crates into multiple codegen units to parallelize(and thus speed up) compilation. However, this might cause it to miss somepotential optimizations. You may be able to improve runtime speed and reducebinary size, at the cost of increased compile times, by setting the number ofunits to one. Add these lines to the Cargo.toml file:

[profile.release]codegen-units = 1

Example 1,Example 2.

Link-time Optimization

Link-time optimization (LTO) is a whole-program optimization technique thatcan improve runtime speed by 10-20% or more, and also reduce binary size, atthe cost of worse compile times. It comes in several forms.

The first form of LTO is thin local LTO, a lightweight form of LTO. Bydefault the compiler uses this for any build that involves a non-zero level ofoptimization. This includes release builds. To explicitly request this level ofLTO, put these lines in the Cargo.toml file:

[profile.release]lto = false

The second form of LTO is thin LTO, which is a little more aggressive, andlikely to improve runtime speed and reduce binary size while also increasingcompile times. Use lto = "thin" in Cargo.toml to enable it.

The third form of LTO is fat LTO, which is even more aggressive, and mayimprove performance and reduce binary size further while increasing buildtimes again. Use lto = "fat" in Cargo.toml to enable it.

Finally, it is possible to fully disable LTO, which will likely worsen runtimespeed and increase binary size but reduce compile times. Use lto = "off" inCargo.toml for this. Note that this is different to the lto = false option,which, as mentioned above, leaves thin local LTO enabled.

Alternative Allocators

It is possible to replace the default (system) heap allocator used by a Rustprogram with an alternative allocator. The exact effect will depend on theindividual program and the alternative allocator chosen, but large improvementsin runtime speed and large reductions in memory usage have been seen inpractice. The effect will also vary across platforms, because each platform’ssystem allocator has its own strengths and weaknesses. The use of analternative allocator is also likely to increase binary size and compile times.

jemalloc

One popular alternative allocator for Linux and Mac is jemalloc, usable viathe tikv-jemallocator crate. To use it, add a dependency to yourCargo.toml file:

[dependencies]tikv-jemallocator = "0.5"

Then add the following to your Rust code, e.g. at the top of src/main.rs:

#[global_allocator]static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

Furthermore, on Linux, jemalloc can be configured to use transparent hugepages (THP). This can further speed up programs, possibly at the cost ofhigher memory usage.

Do this by setting the MALLOC_CONF environment variable appropriately beforebuilding your program, for example:

MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release

The system running the compiled program also has to be configured to supportTHP. See this blog post for more details.

mimalloc

Another alternative allocator that works on many platforms is mimalloc,usable via the mimalloc crate. To use it, add a dependency to yourCargo.toml file:

[dependencies]mimalloc = "0.1"

Then add the following to your Rust code, e.g. at the top of src/main.rs:

#[global_allocator]static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

CPU Specific Instructions

If you do not care about the compatibility of your binary on older (or othertypes of) processors, you can tell the compiler to generate the newest (andpotentially fastest) instructions specific to a certain CPU architecture,such as AVX SIMD instructions for x86-64 CPUs.

To request these instructions from the command line, use the -C target-cpu=native flag. For example:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Alternatively, to request these instructions from a config.toml file (forone or more projects), add these lines:

[build]rustflags = ["-C", "target-cpu=native"]

This can improve runtime speed, especially if the compiler finds vectorizationopportunities in your code.

If you are unsure whether -C target-cpu=native is working optimally, comparethe output of rustc --print cfg and rustc --print cfg -C target-cpu=nativeto see if the CPU features are being detected correctly in the latter case. Ifnot, you can use -C target-feature to target specific features.

Profile-guided Optimization

Profile-guided optimization (PGO) is a compilation model where you compileyour program, run it on sample data while collecting profiling data, and thenuse that profiling data to guide a second compilation of the program. This canimprove runtime speed by 10% or more.Example 1,Example 2.

It is an advanced technique that takes some effort to set up, but is worthwhilein some cases. See the rustc PGO documentation for details. Also, thecargo-pgo command makes it easier to use PGO (and BOLT, which is similar)to optimize Rust binaries.

Unfortunately, PGO is not supported for binaries hosted on crates.io anddistributed via cargo install, which limits its usability.

Minimizing Binary Size

The following build configuration options are designed primarily to minimizebinary size. Their effects on runtime speed vary.

Optimization Level

You can request an optimization level that aims to minimize binary size byadding these lines to the Cargo.toml file:

[profile.release]opt-level = "z"

This may also reduce runtime speed.

An alternative is opt-level = "s", which targets minimal binary size a littleless aggressively. Compared to opt-level = "z", it allows slightly moreinlining and also the vectorization of loops.

Abort on panic!

If you do not need to unwind on panic, e.g. because your program doesn’t usecatch_unwind, you can tell the compiler to simply abort on panic. Onpanic, your program will still produce a backtrace.

This might reduce binary size and increase runtime speed slightly, and may evenreduce compile times slightly. Add these lines to the Cargo.toml file:

[profile.release]panic = "abort"

Strip Debug Info and Symbols

You can tell the compiler to strip debug info and symbols from the compiledbinary. Add these lines to Cargo.toml to strip just debug info:

[profile.release]strip = "debuginfo"

Alternatively, use strip = "symbols" to strip both debug info and symbols.

Prior to Rust 1.77, the default behaviour was to do no stripping. As of Rust1.77 the default behaviour is to strip debug info in release builds.

Stripping debug info can greatly reduce binary size. On Linux, the binary sizeof a small Rust programs might shrink by 4x when debug info is stripped.Stripping symbols can also reduce binary size, though generally not by as much.Example.The exact effects are platform-dependent.

However, stripping makes your compiled program more difficult to debug andprofile. For example, if a stripped program panics, the backtrace produced maycontain less useful information than normal. The exact effects for the twolevels of stripping depend on the platform.

Other Ideas

For more advanced binary size minimization techniques, consult thecomprehensive documentation in the excellent min-sized-rust repository.

Minimizing Compile Times

The following build configuration options are designed primarily to minimizecompile times.

Linking

A big part of compile time is actually linking time, particularly whenrebuilding a program after a small change. It is possible to select a fasterlinker than the default one.

One option is lld, which is available on Linux and Windows. To specify lldfrom the command line, use the -C link-arg=-fuse-ld=lld flag. For example:

RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release

Alternatively, to specify lld from a config.toml file (for one or moreprojects), add these lines:

[build]rustflags = ["-C", "link-arg=-fuse-ld=lld"]

lld is not fully supported for use with Rust, but it should work for most usecases on Linux and Windows. There is a GitHub Issue tracking full support forlld.

Another option is mold, which is currently available on Linux and macOS.Simply substitute mold for lld in the instructions above. mold is oftenfaster than lld.Example.It is also much newer and may not work in all cases.

Unlike the other options in this chapter, there are no trade-offs here!Alternative linkers can be dramatically faster, without any downsides.

Experimental Parallel Front-end

If you use nightly Rust, you can enable the experimental parallel front-end.It may reduce compile times at the cost of higher compile-time memory usage. Itwon’t affect the quality of the generated code.

You can do that by adding -Zthreads=N to RUSTFLAGS, for example:

RUSTFLAGS="-Zthreads=8" cargo build --release

Alternatively, to enable the parallel front-end from a config.toml file (forone or more projects), add these lines:

[build]rustflags = ["-Z", "threads=8"]

Values other than 8 are possible, but that is the number that tends to givethe best results.

In the best cases, the experimental parallel front-end reduces compile times byup to 50%. But the effects vary widely and depend on the characteristics of thecode and its build configuration, and for some programs there is no compiletime improvement.

Cranelift Codegen Back-end

If you use nightly Rust on x86-64/Linux or ARM/Linux, you can enable theCranelift codegen back-end. It may reduce compile times at the cost of lowerquality generated code, and therefore is recommended for dev builds rather thanrelease builds.

First, install the back-end with this rustup command:

rustup component add rustc-codegen-cranelift-preview --toolchain nightly

To select Cranelift from the command line, use the-Zcodegen-backend=cranelift flag. For example:

RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build

Alternatively, to specify Cranelift from a config.toml file (for one ormore projects), add these lines:

[unstable]codegen-backend = true[profile.dev]codegen-backend = "cranelift"

For more information, see the Cranelift documentation.

Custom profiles

In addition to the dev and release profiles, Cargo supports customprofiles. It might be useful, for example, to create a custom profile halfwaybetween dev and release if you find the runtime speed of dev buildsinsufficient and the compile times of release builds too slow for everydaydevelopment.

Summary

There are many choices to be made when it comes to build configurations. Thefollowing points summarize the above information into some recommendations.

  • If you want to maximize runtime speed, consider all of the following:codegen-units = 1, lto = "fat", an alternative allocator, and panic = "abort".
  • If you want to minimize binary size, consider opt-level = "z",codegen-units = 1, lto = "fat", panic = "abort", and strip = "symbols".
  • In either case, consider -C target-cpu=native if broad architecture supportis not needed, and cargo-pgo if it works with your distribution mechanism.
  • Always use a faster linker if you are on a platform that supports it, becausethere are no downsides to doing so.
  • Use cargo-wizard if you need additional help with these choices.
  • Benchmark all changes, one at a time, to ensure they have the expectedeffects.

Finally, this issue tracks the evolution of the Rust compiler’s own buildconfiguration. The Rust compiler’s build system is stranger and more complexthan that of most Rust programs. Nonetheless, this issue may be instructive inshowing how build configuration choices can be applied to a large program.

Build Configuration - The Rust Performance Book (2024)
Top Articles
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 6250

Rating: 4.8 / 5 (78 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.