NB: The benchmark conlusions are out of date since no results have been collected for Intel and AMDs latest desktop CPU offerings.
TL;DR The Intel Core i7-8700K is 20%-30% faster than the AMD Ryzen 7 2700X at Haskell compile workloads, and performs 7%-19% better in terms of performance per Euro when you buy a completely new system. The AMD Ryzen 7 2700X is 2%-7% better in terms of performance per Euro when you’re upgrading an existing DDR3 system.
This is a blog post about pitting different CPUs against each other where compiling Haskell projects is the benchmarked workload; it is not about benchmarking Haskell programs, profiling in order to improve the run-time of your Haskell program, improving GHC to lower compile times, etc.
This project started when we acquired two new machines for the office, a “desktop” and a “server” machine. Since we were busy with projects, and the desktop was mostly an upgrade to an existing machine, we decided to run some benchmarks on them before taking them into production. Not just any random benchmark though, we wanted to see how well they performed at the job at hand: compiling Haskell projects. When we picked the parts for our desktop, we had to make our decision based on benchmarks found on sites like Phoronix and https://openbenchmarking.org/, specifically benchmarks such as these compilation benchmarks. Those benchmarks are C compile time benchmarks though, and so all we could hope for is that the numbers would translate to Haskell/GHC compile times.
In the compilation benchmarks @ Phoronix, AMD’s Ryzen 7 2700X seemed pretty much on par with Intel’s Core i7-8700k; so we decided to build our desktop around the 2700X hoping that its 8 cores would give it a leg up over the i7-8700k’s 6 cores in our highly parallel test suites. As we will see in this blog post, however, it turns out that for compiling Haskell projects the Intel Core i7-8700K would have been the better choice.
Contribute? Disagree?
Our benchmark script, and collected results, can all be found on the github project hosting this blog. Even better than simply running our benchmark scripts would be to collaborate in adding some Haskell compile benchmarks to https://openbenchmarking.org/.
Haskell desktop benchmarks
In your day-to-day development cycle you probably execute the following compile tasks:
- Compile the module you’re currently working on (very often)
- Compile your project and run the (fast) test suite (frequent); slow tests are for you CI.
- Compile your project and all its dependencies (infrequent)
Tasks 2. and 3. are likely to benefit from CPUs that have more cores, which can then exploit the available parallelism; while task 1 will likely benefit from higher single-core performance. Given the dependencies between modules and packages, the available parallelism might be limited, and so a CPU with fewer cores but higher single-threaded performance might outperform a CPU that has more cores but lower single-thread performance on task 2. and 3.
Haskell test environment
All of the tests were run using GHC 8.4.4 in combination with cabal-install 2.4.1.0 which were acquired through ghcup:
$ ( mkdir -p ~/.ghcup/bin && curl https://raw.githubusercontent.com/haskell/ghcup/master/ghcup > ~/.ghcup/bin/ghcup && chmod +x ~/.ghcup/bin/ghcup) && echo "Success"
$ export PATH="$HOME/.cabal/bin:$HOME/.ghcup/bin:$PATH"
$ ghcup install 8.4.4
$ ghcup set 8.4.4
$ ghcup install-cabal
The tests
To benchmark all three compile task, we have created the following tests.
1. Building the Clash compiler
This builds the clash compiler, and all of its dependencies, including haddock. The Clash compiler has many dependencies, large and small, so it gives us a large range of Haskell project where we can exercise different levels of parallelism.
We make a checkout of a fixed commit, build it once to populate the download cache, then delete the Cabal store and dist-newstyle
directory, and subsequently run:
cabal new-build clash-ghc --ghc-options="+RTS -qn8 -A32M -RTS -j{GHC_THREADS}" -j{CABAL_THREADS}
we repeat this process for different values of GHC_THREADS
and CABAL_THREADS
, deleting the Cabal store and dist-newstyle
directories between runs. Some additional info on the flags:
-j{GHC_THREADS}
: we compile with multiple GHC threads, i.e. exploit the available compile-parallelism within a single package.-j{CABAL_THREADS}
: we compile with multiple Cabal threads, i.e. exploit the available compile-parallelism between packages.+RTS -qn8 -A32M -RTS
: These settings where given to us by Ben Gamari, GHC maintainer, after we discovered very poor performance at higher thread counts. The-qn8
settings limits the number of threads participating in garbage-collection(GC) to 8; GC is bandwidth-bound, so over-saturation in terms of cores participating in GC can hurt performance; additionally, synchronization between a large amount of GC threads also hurts performance. To give an indication, for one of the benchmarked machines, running the test with64
GHC threads, and64
Cabal threads, the runtime went from 1742s to 377.14s using the updated RTS settings. The-A32M
setting sets the allocation area to 32MB, reducing the number of collections and promotions. Benchmarking the effect of these setting different values for these options would be a blog post on its own. Given that the chosen values gave performance improvements across the board kept them fixed for all variations ofGHC_THREADS
andCABAL_THREADS
. Really, you want to check the productivity number by running GHC with+RTS -s -RTS
to check how/if RTS parameters improve compiler performance.
We’ll be comparing the following results between the different machines:
GHC_THREADS=1 CABAL_THREADS=1
to compare single-threaded performance which is important for task 1.GHC_THREADS=N CABAL_THREADS=1
to compare multi-core performance which is important for task 2. The clash compiler, and its dependencies, are of various sizes and inter-module dependencies, so these numbers represent the average multi-core performance of the CPUs.GHC_THREADS=X CABAL_THREADS=Y
to compare multi-core performance which is important for task 3. These number represent the peak multi-core performance of the CPUs.
2. Building the Stack executable
This builds the stack-1.9.3 executable, without haddock. It has even more dependencies than the Clash compiler, and probably holds more weight in terms of projects-haskellers-care-about. We build it once to populate the download cache, then delete the Cabal store and subsequently:
- Edit the global
~/.cabal/config
to set:ghc-options: +RTS -qn8 -A32M -RTS -j{GHC_THREADS}
- Run
cabal new-install stack-1.9.3 -j{CABAL_THREADS}
We repeat this process for different values of GHC_THREADS
and CABAL_THREADS
, deleting the Cabal store between runs. The flags have the same meaning as in the “Building Clash” test, and we’ll be comparing the results for the same variation of GHC_THREADS
and CABAL_THREADS
as we do for the “Building Clash” test.
3. Building GHC
This builds an almost “perf” build of GHC, i.e. the one that’s included in binary distributions, for a specific commit. The almost part is that we do not build the documentation. The command that we run for the test is:
make -j{THREADS}
where we run make clean
and ./configure
before every run. We’ll compare results for THREADS=1
for single-core performance (task 3), and THREADS=N
for multi-core performance (task 2. and 3.).
4. GHC Testsuite
This runs the fast testsuite of GHC. We start with the the above-mentioned checkout of the GHC compiler. Run a make maintainer-clean
to clear ALL the build artifect, then run ./validate --build-only
to build a version of GHC that will execute the test suite, and then run:
THREADS={NUMTHREADS} ./validate --no-clean --testsuite-only
Although the script iterates over multiple NUMTHREADS
, for this blog post, we’ll just be looking at THREADS=N
, i.e. only compare multi-core performance.
5. Clash Testsuite
The Clash integration tests converts Haskell to HDL, and then runs the HDL simulator to see whether the generated HDL is correct. Because setting up these simulators can be a pain, for this benchmark we only run the convert-to-hdl part. The command that we run will be:
cabal new-run -- clash-testsuite -p clash -j{THREADS}
Although the script iterates over multiple THREADS
, for this blog post, we’ll just be looking at -jN
, i.e. only compare multi-core performance.
Systems
We had several systems at our disposal for this benchmark. We’ll classify them under “desktop” and “server” given their intended use.
Desktops
AMD Ryzen 7 2700X
This is the desktop that we acquired as an upgrade, it has the following specs:
- CPU: Ryzen 2700X (physical cores: 8)
- Motherboard: ASRock X470 Master SLI
- Memory: Corsair CMK32GX4M2B3000C15
- SSD: Samsung 970 Evo 1TB
Note that we actually ordered the above machine with some G.Skill Fortis F4-2400C15Q-64GFT memory, but for this performance shootout we’ll be comparing it using the faster Corsair CMK32GX4M2B3000C15 memory. We’ll discuss the effect of faster memory in a different section.
We configured this machine as follows:
- Memory settings: 32 GB (2x16GB) DDR4-3000 16-17-17-35
- OS: Ubuntu 18.04.1 LTS
uname -vr
: 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018- CPU power governer: performance
Where the “CPU power governer” is the value that set using:
cpupower frequency-set -g {GOVERNER}
and ensures that the linux kernel picks operating frequencies such that the CPU can perform at its very best (at the cost of power efficiency).
Intel Core i7-8700K
One of our clients, Myrtle.ai, graciously allowed us to use one of their desktops to run this benchmark. It’s roughly equal to the machine we would’ve picked as the counter part to the above Ryzen 7 2700X machine. It has the following specifications:
- CPU: Core i7-8700K (physical cores: 6)
- Motherboard: Asus PRIME Z370-P II
- Memory: 4x Corsair CM4X16GC3000C15K4
- SSD: Samsung 970 EVO 1TB
And is configured as follows:
- Memory settings: 64 GB (4x16GB) DDR4-3000 15-17-17-35
- OS: Ubuntu 18.04.1 LTS
uname -vr
: 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018- CPU power governer: performance
Intel Core i7-7700K
This is one of our own machines again. We used the RAM from this machine in the Ryzen 7 2700X machine for the purposes of this benchmark.
- CPU: Core i7-7700k (physical cores: 4)
- Motherboard: Asus Prime Z270-A
- Memory: Corsair CMK32GX4M2B3000C15
- SSD: Samsung 960 Pro 512GB + Samsung 960 EVO 250GB
It’s configured as follows, using a vendor overclock setting all cores to run at 4.8GHz.
- Overclock: all cores 4.8GHz
- Memory settings: 32 GB (2x16GB) DDR4-3000 16-17-17-35
- OS: Ubuntu 18.04.1 LTS
uname -vr
: 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018- CPU power governer: performance
Servers
We are including two server type machines as well, mostly to see how much more parallelism is available in the GHC and Clash test suites. For the Clash, Stack, and GHC compile benchmarks, they are under-utilized; i.e. to make a better comparison we should be looking at compiles-per-day where the server machines are configured to execute multiple compiles in parallel. Perhaps something for a follow-up blog post.
AMD Threadripper 2990wx
Our new build server:
- CPU: Threadripper 2990wx (physical cores: 32)
- Motherboard: ASRock X399 Taichi
- Memory: 8x Samsung M391A2K43BB1-CRC
- SSD: Samsung 970 Pro 1TB
Which for the purposes of this benchmark was configured as follows:
- Memory settings: 128 GB (8x16GB) DDR4-2666 18-19-19-43 ECC
- OS: Ubuntu 18.04.1 LTS
uname -vr
: 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018- CPU power governer: performance
Intel Xeon Gold 6140M
One of our clients, Myrtle.ai, graciously allowed us to use one of their beefy servers to run this benchmark.
- CPU: 2x Xeon Gold 6140M (physical cores: 2x 18)
- Motherboard: Intel S2600STB
- Memory: 16x Kingston KSM26RS4/16HAI
Which for the purposes of this benchmark was configured as follows:
- Memory settings: 256 GB (16x16GB) DDR4-2666 19-19-19-32 ECC
- OS: Ubuntu 18.04.1 LTS
uname -vr
: 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018- CPU power governer: performance
Shootout
Single core performance
We start by comparing single-core performance.
Building Clash
Machine | Time (s) | Compiles / Day | vs #1 | vs N-1 |
---|---|---|---|---|
Intel Core i7-7700K@4.8GHz | 674.83 | 128 | - | - |
Intel Core i7-8700K | 683.09 | 128 | 1.01x slower | 1.01x slower |
AMD Ryzen 7 2700X | 876.63 | 99 | 1.30x slower | 1.28x slower |
AMD Threadripper 2990wx | 949.85 | 91 | 1.41x slower | 1.08x slower |
2x Intel Xeon Gold 6140M | 952.82 | 91 | 1.41x slower | 1.00x slower |
Building Stack
Machine | Time(s) | Compiles / Day | vs #1 | vs N-1 |
---|---|---|---|---|
Intel Core i7-8700K | 1008.95 | 86 | - | - |
Intel Core i7-7700K@4.8GHz | 1030.55 | 84 | 1.02x slower | 1.02x slower |
AMD Ryzen 7 2700X | 1314.9 | 66 | 1.30x slower | 1.28x slower |
2x Intel Xeon Gold 6140M | 1406.78 | 61 | 1.39x slower | 1.07x slower |
AMD Threadripper 2990wx | 1443.92 | 60 | 1.43x slower | 1.03x slower |
Where we see that the Intel desktop CPUs are 30% faster than the AMD desktop CPUs; and 40$ faster than the server machines.
Average single-project multi-core performance
Next we compare multi-core performance for the “avarage” Haskell project.
Building Clash
Machine | Time (s) | Compiles / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
Intel Core i7-7700K@4.8GHz | 499.87 | 173 | - | - | GHC_THREADS=8 |
Intel Core i7-8700K | 502.44 | 172 | 1.01x slower | 1.01x slower | GHC_THREADS=8 |
AMD Ryzen 7 2700X | 642.1 | 135 | 1.28x slower | 1.28x slower | GHC_THREADS=16 |
AMD Threadripper 2990wx | 719.51 | 120 | 1.44x slower | 1.12x slower | GHC_THREADS=16 |
2x Intel Xeon Gold 6140M | 723.8 | 119 | 1.45x slower | 1.01x slower | GHC_THREADS=8 |
Building Stack
Machine | Time(s) | Compiles / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
Intel Core i7-8700K | 706.84 | 122 | - | - | GHC_THREADS=8 |
Intel Core i7-7700K@4.8GHz | 711.95 | 121 | 1.01x slower | 1.01x slower | GHC_THREADS=8 |
AMD Ryzen 7 2700X | 908.99 | 95 | 1.29x slower | 1.28x slower | GHC_THREADS=8 |
2x Intel Xeon Gold 6140M | 1023 | 84 | 1.45x slower | 1.13x slower | GHC_THREADS=8 |
AMD Threadripper 2990wx | 1036.4 | 83 | 1.47x slower | 1.01x slower | GHC_THREADS=32 |
Again we see that the Intel desktop CPUs are 30% faster than the AMD desktop CPUs. We get about a 1.4x speedup compared to single-core compiles, meaning that either there isn’t a lot of available parallelism within projects, or we are not able to exploit it. Additionally, it seems, with the exception of the Intel Core i7-7700K, that we do not achieve the best single-project multi-core performance by setting the number of GHC threads equal to the number of virtual CPU cores. We could only speculate as to the reasons for this.
Peak multi-core performance
Finally we compare peak multi-core performance, i.e. we try to exercise all CPU cores as much as possible.
Building Clash
Machine | Time (s) | Compiles / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
Intel Core i7-8700K | 289.65 | 298 | - | - | GHC_THREADS=12 CABAL_THREADS=8 |
Intel Core i7-7700K@4.8GHz | 306.53 | 282 | 1.06x slower | 1.06x slower | GHC_THREADS=4 CABAL_THREADS=4 |
2x Intel Xeon Gold 6140M | 369.72 | 234 | 1.28x slower | 1.21x slower | GHC_THREADS=8 CABAL_THREADS=72 |
AMD Ryzen 7 2700X | 372.79 | 232 | 1.29x slower | 1.01x slower | GHC_THREADS=16 CABAL_THREADS=16 |
AMD Threadripper 2990wx | 375.59 | 230 | 1.30x slower | 1.01x slower | GHC_THREADS=32 CABAL_THREADS=32 |
For building Clash, both Intel desktop CPUs are 30% faster than the rest. On average we get about a 2.4x speedup compared to single-core compiles, meaning that we there’s more inter-package parallelism available than there is inter-module parallelism available, or that we’re better at exploiting it.
Building Stack
Machine | Time(s) | Compiles / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
Intel Core i7-8700K | 289.42 | 298 | - | - | GHC_THREADS=4 CABAL_THREADS=8 |
2x Intel Xeon Gold 6140M | 315.74 | 273 | 1.09x slower | 1.09x slower | GHC_THREADS=8 CABAL_THREADS=18 |
AMD Threadripper 2990wx | 329.23 | 262 | 1.14x slower | 1.04x slower | GHC_THREADS=32 CABAL_THREADS=8 |
Intel Core i7-7700K@4.8GHz | 342.92 | 251 | 1.18x slower | 1.04x slower | GHC_THREADS=4 CABAL_THREADS=8 |
AMD Ryzen 7 2700X | 360.02 | 239 | 1.24x slower | 1.05x slower | GHC_THREADS=16 CABAL_THREADS=8 |
For building the Stack executable we get about a 4x speedup compared to the single-core compiles. The difference in improvement compared to building the Clash compiler could either be caused by:
- The lack of haddock generation in the Stack build, where haddock generation is highly sequential.
- Having even more inter-package parallelism at our disposal.
So while the Intel Core i7-8700K is 24% faster than the AMD Ryzen 7 2700X, Ryzen was able to close the gap with the Intel Core i7-7700K.
Building GHC
Machine | Time(s) | Compiles / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
Intel Core i7-8700K | 1205.29 | 72 | - | - | THREADS=8 |
Intel Core i7-7700K@4.8GHz | 1305.27 | 66 | 1.08x slower | 1.08x slower | THREADS=8 |
2x Intel Xeon Gold 6140M | 1328.3 | 65 | 1.10x slower | 1.02x slower | THREADS=72 |
AMD Threadripper 2990wx | 1382.93 | 62 | 1.15x slower | 1.04x slower | THREADS=64 |
AMD Ryzen 7 2700X | 1572.71 | 55 | 1.30x slower | 1.14x slower | THREADS=16 |
Also for building the GHC compiler, the Intel desktop CPUs perform better than the competition.
GHC Testsuite
Machine | Time(s) | Runs / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
2x Intel Xeon Gold 6140M | 106.44 | 812 | - | - | THREADS=72 |
AMD Threadripper 2990wx | 159.48 | 542 | 1.50x slower | 1.50x slower | THREADS=64 |
Intel Core i7-8700K | 265.16 | 326 | 2.49x slower | 1.66x slower | THREADS=12 |
AMD Ryzen 7 2700X | 293.69 | 294 | 2.76x slower | 1.11x slower | THREADS=16 |
Intel Core i7-7700K@4.8GHz | 343.06 | 252 | 3.22x slower | 1.17x slower | THREADS=8 |
It’s when we start running the highly parallel test suites that we finally get to see the benefit of the high core count of our servers, where the beefy Intel server takes the lead. While both Intel desktop CPUs took top spots in nearly all of the other benchmarks, the Intel i7-7700K’s 4 physical cores lose out against the AMD Ryzen 7 2700X’s 8 physical cores. However, while have two fewer cores, the Intel Core i7-8700K is still 11% faster than the AMD Ryzen 7 2700X.
Clash Testsuite
Machine | Time(s) | Runs / Day | vs #1 | vs N-1 | Configuration |
---|---|---|---|---|---|
2x Intel Xeon Gold 6140M | 45.63 | 1893 | - | - | THREADS=72 |
AMD Threadripper 2900wx | 64.84 | 1333 | 1.42x slower | 1.42x slower | THREADS=32 |
Intel Core i7-8700K | 134.27 | 643 | 2.94x slower | 2.07x slower | THREADS=8 |
AMD Ryzen 7 2700X | 157.87 | 547 | 3.46x slower | 1.18x slower | THREADS=16 |
Intel Core i7-7700K@4.8GHz | 177.77 | 486 | 3.90x slower | 1.13x slower | THREADS=8 |
We get similar results for the highly parallel Clash test suite, with the Intel Core i7-7700K coming in last, but the Intel Core i7-8700K still being 18% faster than the AMD Ryzen 7 2700X.
Effect of faster RAM
When picking parts for a new desktop, we always wondered whether faster RAM would have a significant impact. So we swapped the DDR4-2400 RAM from our AMD Ryzen 7 2700X desktop with the DDR4-3000 RAM from our Intel Core i7-7700k desktop, and observed the following differences.
Intel Core i7-7700K@4.8GHz
Across the board, the Intel Core i7-7700K does not benefit from the faster RAM.
Building Clash
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 306.53 | 282 | - | GHC_THREADS=4 CABAL_THREADS=4 |
2x 16GB DDR4-2400 15-15-15-39 | 306.88 | 282 | - | GHC_THREADS=4 CABAL_THREADS=4 |
Building Stack
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 342.92 | 252 | - | GHC_THREADS=4 CABAL_THREADS=8 |
2x 16GB DDR4-2400 15-15-15-39 | 346.59 | 249 | 1.01x slower | GHC_THREADS=4 CABAL_THREADS=8 |
Building GHC
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 1305.27 | 66 | - | THREADS=8 |
2x 16GB DDR4-2400 15-15-15-39 | 1331.31 | 65 | 1.02x slower | THREADS=8 |
GHC Testsuite
Memory | Time(s) | Runs / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 343.06 | 252 | - | THREADS=8 |
2x 16GB DDR4-2400 15-15-15-39 | 349.64 | 247 | 1.02x slower | THREADS=8 |
Clash Testsuite
Memory | Time(s) | Runs / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 177.77 | 486 | - | THREADS=8 |
2x 16GB DDR4-2400 15-15-15-39 | 184.04 | 469 | 1.04x slower | THREADS=8 |
AMD Ryzen 7 2700X
It’s quite a different story for our AMD Ryzen 7 2700X machine:
Building Clash
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 372.79 | 232 | - | GHC_THREADS=16 CABAL_THREADS=16 |
2x 16GB DDR4-2400 15-15-15-39 | 384.68 | 225 | 1.03x slower | GHC_THREADS=16 CABAL_THREADS=16 |
Building Stack
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 360.02 | 240 | - | GHC_THREADS=8 CABAL_THREADS=8 |
2x 16GB DDR4-2400 15-15-15-39 | 382.71 | 226 | 1.06x slower | GHC_THREADS=16 CABAL_THREADS=8 |
Building GHC
Memory | Time(s) | Compiles / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 1572.71 | 55 | - | THREADS=16 |
2x 16GB DDR4-2400 15-15-15-39 | 1693.69 | 51 | 1.08x slower | THREADS=16 |
GHC Testsuite
Memory | Time(s) | Runs / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 293.69 | 294 | - | THREADS=16 |
2x 16GB DDR4-2400 15-15-15-39 | 326.18 | 265 | 1.11x slower | THREADS=16 |
Clash Testsuite
Memory | Time(s) | Runs / Day | vs other | Configuration |
---|---|---|---|---|
2x 16GB DDR4-3000 16-17-17-35 | 157.87 | 547 | - | THREADS=16 |
2x 16GB DDR4-2400 15-15-15-39 | 171.09 | 505 | 1.08x slower | THREADS=8 |
Haskell desktop buyer’s guide
So let’s say you’re in a similar situation as us, you need to get a new Haskell desktop, what do you get?
Costs (on 12-Dec-2018)
First we check the costs of the Intel option and the AMD option. Note that for the component/price selection I went for:
- A respectable vendor
- The “cheapest” component that had decent reviews, from brands that haven’t failed me (your experience may differ!).
So there might be cheaper options, but at what cost?
Also, the prices listed are basically only valid at the time of collection: December 12th 2018. And, being from the Netherlands, we are ineligible for cashback/discounts available to e.g. those that live in the US.
Upgrade only
Let’s say you have an existing case and video card, and your previous machine used DDR3 memory, what are the costs for your upgrade path?. We picked DDR4-3000 for both the AMD and Intel option because we saw that the Ryzen 7 2700X definitely benefits from faster RAM; we use DDR4-3000 for our Core i7-8700K as well because that’s what our benchmarked i7-8700K machine had. Also, the difference in price compared to e.g. DDR4-2400 is worth it in terms of the performance improvement.
In the original calculations I forgot to add a CPU cooler to the costs of the Intel upgrade path, that has now been fixed
Option | Configuration | Price | Price vs other |
---|---|---|---|
AMD | CPU: AMD Ryzen 2700X | ||
Motherboard: Asrock B450M Pro4 | |||
Memory: Corsair CMK16GX4M2B3000C15 | |||
Total | €548,75 | - | |
Intel | CPU: Intel Core i7 8700K | ||
Motherboard: MSI 370-A PRO | |||
Memory: Corsair CMK16GX4M2B3000C15 | |||
CPU cooler: Cooler Master Hyper 212 Evo | |||
Total | €728,35 | 33% more expensive |
Although AMD allows memory overclock (DDR4-3000) at its midrage B450 motherboard chipsets, Intel only support memory overclock at its higher-end Z370/Z390 motherboard chipsets. Combined with the higher price of the i7-8700K itself, the higher price of the motherboard makes the Intel option 33% more expensive than the AMD option.
Complete system
A requirement that we set for the full system is that it should be able to handle a 4K@60Hz monitor, whether through HDMI or Display port; and that it is silent.
- Case: Cooler Master Silencio 452; used in the benchmarked i7-8700K machine; inaudible in a quiet office environment.
- PSU: Seasonic Focus 450 Gold; unlike some other brands, Seasonic’s single 12V-rail PSUs have never failed me.
- CPU cooler: Cooler Master Hyper 212 Evo (for Intel) machine; used in the benchmarked i7-8700K; inaudible in a quiet office environment.
- Video card: Gigabyte GeForce GT 1030 Silent Low Profile 2G (for AMD); brand never failed me.
- SSD: WD Black NVMe SSD 1TB; has good reviews, slightly cheaper than the Samsung 970 EVO 1TB.
Option | Configuration | Price | Price vs N-1 |
---|---|---|---|
AMD | CPU: AMD Ryzen 2700X | ||
Motherboard: Asrock B450M Pro4 | |||
Memory: Corsair CMK16GX4M2B3000C15 | |||
Videocard: Gigabyte GeForce GT 1030 Silent Low Profile 2G | |||
SSD: WD Black NVMe SSD 1TB | |||
Case: Cooler Master Silencio 452 | |||
PSU: Seasonic Focus 450 Gold | |||
Assembly | |||
Total | €1.066,74 | - | |
Intel | CPU: Intel Core i7 8700K | ||
Motherboard: MSI 370-A PRO | |||
Memory: Corsair CMK16GX4M2B3000C15 | |||
CPU cooler: Cooler Master Hyper 212 Evo | |||
SSD: WD Black NVMe SSD 1TB | |||
Case: Cooler Master Silencio 452 | |||
PSU: Seasonic Focus 450 Gold | |||
Assembly | |||
Total | €1.167,80 | 9% more expensive |
The relative cost difference for the full system change somewhat to the upgrade-only path due to:
- The total costs being higher for both, thus lowering the relative differences.
- The fact that the Intel Core i7-8700K has an onboard GPU which can drive the 4K@60Hz screen, where the AMD Ryzen 7 2700X needs a discrete GPU.
Here we see that the Intel Core i7-8700K system is only 9% more expensive compared to the AMD Ryzen 7 2700X system.
Value for money
We are using compiles per year per Euro as our criteria for judging value for money, i.e. the number of compiles per year you get for every Euro spent.
In the original calculations I forgot to add a CPU cooler to the costs of the Intel upgrade path, that has now been fixed
Building Clash
For building Clash, the AMD Ryzen 7 2700X is slightly better when taking the upgrade path, while for the full system path, the Intel Core i7-8700K is clearly better.
Upgrade
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
AMD Ryzen 2700X | 372.79 | 154 | 1.03x better | GHC_THREADS=16 CABAL_THREADS=16 |
Intel Core i7-8700K | 289.65 | 149 | GHC_THREADS=12 CABAL_THREADS=8 |
Full system
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
Intel Core i7-8700K | 289.65 | 93 | 1.18x better | GHC_THREADS=12 CABAL_THREADS=8 |
AMD Ryzen 2700X | 372.79 | 79 | - | GHC_THREADS=16 CABAL_THREADS=16 |
Building Stack
And we see similar results for building Stack.
Upgrade
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
AMD Ryzen 2700X | 360.02 | 160 | 1.07x better | GHC_THREADS=16 CABAL_THREADS=8 |
Intel Core i7-8700K | 289.42 | 150 | - | GHC_THREADS=4 CABAL_THREADS=8 |
Full system
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
Intel Core i7-8700K | 289.42 | 93 | 1.14x better | GHC_THREADS=4 CABAL_THREADS=8 |
AMD Ryzen 2700X | 360.02 | 82 | - | GHC_THREADS=16 CABAL_THREADS=8 |
Building GHC
And also for building GHC.
Upgrade
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
AMD Ryzen 2700X | 1572.71 | 37 | 1.02x better | THREADS=16 |
Intel Core i7-8700K | 1205.29 | 36 | - | THREADS=8 |
Full system
Machine | Time(s) | Compiles / Year / € | vs other | Configuration |
---|---|---|---|---|
Intel Core i7-8700K | 1205.29 | 22 | 1.19x better | THREADS=8 |
AMD Ryzen 2700X | 1572.71 | 19 | - | THREADS=16 |
GHC Testsuite
For the GHC test suite, for the upgrade path, the AMD Ryzen 7 2700X clearly offers better value for money, while they’re on par for the full system path.
Upgrade
Machine | Time(s) | Runs / Year / € | vs other | Configuration |
---|---|---|---|---|
AMD Ryzen 2700X | 293.69 | 196 | 1.20x better | THREADS=16 |
Intel Core i7-8700K | 265.16 | 163 | - | THREADS=12 |
Full system
Machine | Time(s) | Runs / Year / € | vs other | Configuration |
---|---|---|---|---|
Intel Core i7-8700K | 265.16 | 102 | 1.01x better | THREADS=12 |
AMD Ryzen 2700X | 293.69 | 101 | - | THREADS=16 |
Clash Testsuite
For the Clash test suite, the Intel Core i7-8700K and the AMD Ryzen 7 2700X trade places between the upgrade path and the fully system path. The AMD Ryzen 7 2700X gives better value for money at the upgrade path, while the Intel Core i7-8700K does better for the full system path.
Upgrade
Machine | Time(s) | Runs / Year / € | vs other | Configuration |
---|---|---|---|---|
AMD Ryzen 2700X | 157.87 | 364 | 1.13x better | THREADS=16 |
Intel Core i7-8700K | 134.27 | 322 | - | THREADS=8 |
Full system
Machine | Time(s) | Runs / Year / € | vs other | Configuration |
---|---|---|---|---|
Intel Core i7-8700K | 134.27 | 201 | 1.07x better | THREADS=8 |
AMD Ryzen 2700X | 157.87 | 187 | - | THREADS=16 |
Conclusions
We think it is safe to conclude that for building Haskell projects, the Intel Core i7-8700K is the better CPU in terms of absolute performance, and performance per Euro, compared to the AMD Ryzen 7 2700X. For the compile tasks, the Intel i7-8700K performs between 25%-30% better than the AMD Ryzen 7 2700X in terms of absolute performance, and it performs 7%-19% better in terms of performance per Euro. There can be a myriad of reasons why the relative performance of the AMD Ryzen 7 2700X vs the Intel Core i7-8700K is worse for Haskell compile workloads than it is for “the average” workload: caching strategies, cache sizes, prefetching implementation, branch-prediction implementations, memory hierarchies, core frequencies, Haskell/GHC evaluation/run-time behavior, etc. We might investigate whether it is the core frequency difference that is dominant, by artificially lowering the Core i7-8700K frequency; but that doesn’t change the out-of-the-box performance difference between the two parts.
In the future we plan to add some of the benchmarks from this blog post to the https://openbenchmarking.org/ suite, for multiple reasons:
- As we’ve discovered, creating and running benchmark scripts can be a pain. For ourselves, and others, making it part of a standard benchmarking suite means collecting stats will be easier and more reliable.
- Maybe we can convince review sites to include one of the benchmarks, e.g. building the stack executable, into their collection of tests; meaning we can get an early insight how new CPUs, at different configuration, perform at a task we care about.
- On that note, it would be interesting to see how Intel’s latest desktop CPUs, the Core i7-9700K and the Core i9-9900K, perform on our workloads. Especially given that i7-9700K has 8 physical cores, vs the i7-8700K’s 6 physical cores, but unlike the i7-8700K the i7-9700K doesn’t have hyperthreading, and an i7-9700K costs the same as an i7-8700K.
- Whether AMDs next line of CPUs can close the gap with Intel on our Haskell compilation workloads.