The scrypt parameters (2024)

The recommended scrypt parameters in the Go docs were recently brought up for discussion given they haven't changed since 2009.

Even if at this point I memorized the three numbers (N=16384, r=8, p=1) I only have a vague understanding of their meaning, so I took some time to read the scrypt paper.

It's an enjoyable and witty read, even if mathy at times, with lots of future predictions and reality modeling. Really drives across how scrypt is a fine piece of engineering. Also, it's single column with numbered pages, which earns it 100 points in my book.

The definitions are nested, each building on top of the previous one. In this post I summed up how each parameter impacts the whole scrypt algorithm. Finally, I had a look at what parameters you should use in 2017.

𝑟

𝑟 is the second parameter, but we start with it because it's used by the deepest nested function, BlockMix.

BlockMix turns a hash function with 𝑘-bit long inputs and outputs into a hash function with 2𝑟𝑘-bit long inputs and outputs. That is, it makes the core hash function in scrypt 2𝑟 wider.

It does that by iterating the hash function 2𝑟 times, so both memory usage (to store the hash values) and CPU time scale linearly with it. That is, if 𝑟 doubles the resources double.

That's useful because scrypt applies the hash to "random" memory positions. CPUs load memory in fixed-size blocks called cache lines. If the hash block size is smaller than the cache line, all the rest of the loaded line will be wasted memory bandwidth. Also, it dilutes the memory latency cost. Percival predicted both cache line sizes and memory latencies would increase over time, so made the hash size tunable to prevent scrypt from becoming latency-bound.

I have read that 𝑟 tunes memory usage, and believed it meant it is a memory-only work factor. That is incorrect because both CPU and memory scale with 𝑟. Also, while 𝑟 acts as a work factor, it's unclear increasing it provides the same security as 𝑁 (since there is no added randomization in memory accesses, see below), so it shouldn't be used as one.

𝑁

𝑁 is the one and only work factor.

Memory and CPU usage scale linearly with 𝑁. The mixing function, ROMix, stores 𝑁 sequential hash results in RAM, to then load them in a random order and sequentially xor and hash them.

The reason 𝑁 must be a power of two is that to randomly select one of the 𝑁 memory slots at each iteration, scrypt converts the hash output to an integer and reduces it mod 𝑁. If 𝑁 is a power of two, that operation can be optimized into simple (and fast) binary masking.

Estimating scrypt memory usage

scrypt requires 𝑁 times the hash block size memory. Because of BlockMix, the hash block size is 2𝑟 the underlying hash output size. In scrypt, that hash is the Salsa20 core, which operates on 64-bytes blocks.

So the minimum memory requirement of scrypt is:

𝑁 × 2𝑟 × 64 = 128 × 𝑁 × 𝑟 bytes

For 𝑁 = 16384 and 𝑟 = 8 that would be 16 MiB. It scales linearly with 𝑁 and 𝑟, and some implementations or APIs might cause internal copying doubling the requirement.

𝑝

𝑝 is used in the outmost function, MFcrypt. It is a parallelization parameter. 𝑝 instances of the mixing function are run independently and their outputs concatenated as salt for the final PBKDF2.

𝑝 > 1 can be handled in two ways: sequentially, which does not increase memory usage but requires 𝑝 times the CPU and wall clock time; or parallelly, which requires 𝑝 times the memory and effective CPU time, but does not increase wall clock time.

So 𝑝 can be used to increase CPU time without affecting memory requirements when handled sequentially, or without affecting wall clock time when handled parallelly. However, it offers attackers the same opportunity to optimize for processing or memory.

Parameters for 2017

We apply the same methodology of the paper to pick recommended 𝑁 values for interactive logins and file encryption: the biggest power of two that will run in less than 100ms and 5s respectively on "the CPU in the author's laptop" (a 3.1 GHz Intel Core i5).

func main() {for n := uint8(14); n < 22; n++ {b := testing.Benchmark(func(b *testing.B) {for i := 0; i < b.N; i++ {scrypt.Key([]byte("password"), []byte("salt"), 1<<n, 8, 1, 32)}})t := b.T / time.Duration(b.N)fmt.Printf("N = 2^%d\t%dms\n", n, t/time.Millisecond)}}
  • interactive logins: 2^15 — 1 << 15 — 32 768 — 86ms
  • file encryption: 2^20 — 1 << 20 — 1 048 576 — 3802ms

Curiously enough, the execution time of 𝑁 = 2^20 is exactly the same as in the paper's Table 1, while the sub-100ms value went from 2^14 to 2^15.

Cache line sizes have not significantly increased since 2009, so 8 should still be optimal for 𝑟.

If we really wanted to insist that CPUs have changed in 10 years we could say that more cores are now available, and increase the 𝑝 factor. However, common implementations don't spread the load of 𝑝 and instead compute each instance sequentially. Also, many use cases involve processing multiple parallel requests, so the available cores are not idle. So it seems ok to leave 𝑝 at 1.

Final miscellaneous notes

Colin Percival seems to agree [1] [2] on the new parameters.

Since the final output of scrypt is generated by PBKDF2(HMAC‑SHA256, Password, MixingOutput, 1), even if everything about scrypt were broken, it would still be a secure KDF as long as PBKDF2 with 1 iterations is. (While scrypt uses PBKDF2, it doesn't use it for its work factor.)

Best quote from the paper:

those few organizations which have the resources and inclination to design and fabricate custom circuits for password-cracking tend to be somewhat secretive

If you like digging into a cryptography paper now and then, you might enjoy following me on Twitter.

I never truly understood what the scrypt parameters 𝑁, 𝑟 and 𝑝 meant. So I read the paper and wrote it up for you. https://t.co/BL2a0BWAWH

— Filippo Valsorda (@FiloSottile) October 4, 2017

This post features my favourite Unicode points: U+2011 NON-BREAKING HYPHEN, U+202F NARROW NO-BREAK SPACE and the Mathematical Alphanumeric Symbols.

As an expert with a comprehensive understanding of the scrypt algorithm and its parameters, let me delve into the concepts mentioned in the article and provide an in-depth analysis.

  1. N (Iteration Count):

    • Definition: N is the iteration count or work factor that significantly influences both memory and CPU usage in the scrypt algorithm.
    • Impact: Memory and CPU usage scale linearly with N. The higher the N, the more resources are required.
    • Explanation: The mixing function, ROMix, involves storing N sequential hash results in RAM, loading them in a random order, and sequentially XOR and hashing them. N must be a power of two for optimization purposes.
  2. r (BlockMix Depth):

    • Definition: r is the second parameter in scrypt, affecting the depth of the BlockMix function.
    • Impact: BlockMix turns the core hash function in scrypt 2r wider. Both memory usage and CPU time scale linearly with r.
    • Explanation: BlockMix iterates the hash function 2r times, preventing scrypt from becoming latency-bound as it applies the hash to "random" memory positions. It addresses cache line sizes and memory latencies predicted to increase over time.
  3. p (Parallelization Factor):

    • Definition: p is the parallelization factor used in the outermost function, MFcrypt. It determines how many instances of the mixing function run independently.
    • Impact: p instances of the mixing function are run independently, affecting both CPU time and memory usage.
    • Explanation: p can be handled sequentially or parallelly. Sequential handling increases CPU time but does not affect memory requirements, while parallel handling increases both memory and CPU time. It can be used to adjust CPU time without impacting memory when handled sequentially or without affecting wall clock time when handled parallelly.
  4. Choosing Parameters for 2017:

    • The methodology from the scrypt paper is applied to pick recommended N values for interactive logins and file encryption.
    • The recommended N values are chosen based on the biggest power of two that runs in a specified time on the author's laptop CPU.
    • Parameters for interactive logins: 2^15 (32,768) with an execution time of 86ms.
    • Parameters for file encryption: 2^20 (1,048,576) with an execution time of 3802ms.
  5. Miscellaneous Notes:

    • Cache line sizes are kept at 8, as they have not significantly increased since 2009.
    • The author discusses the possibility of increasing the p factor due to the availability of more cores in modern CPUs but notes that common implementations compute each instance sequentially.

In summary, the scrypt parameters N, r, and p play crucial roles in determining the security, memory usage, and computational requirements of the algorithm. The article provides a detailed exploration of these parameters and their implications in the context of scrypt's design and functionality.

The scrypt parameters (2024)
Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 6012

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.