Sunday, November 24, 2019

Battle of the Boards: Jetson Nano vs Raspberry Pi 4 (and overclocked)

Shortly after I got my Jetson Nano up and running, the Raspberry Pi 4 came out - and, on paper, it looks like it should actually thrash the Nano for just about everything except GPU tasks.  Does it?

Actual comparisons between those two boards are hard to come by.  The internet is long on spec sheet comparisons and awfully short on real world, head to head benchmark results.  When the Pi4 came out, it had some firmware limitations that hurt performance and thermals, which have mostly been resolved by now (supposedly).  The Jetson Nano comes stock with a massive heatsink that really helps out.  So... how do things stack up in the real world?



I've got my Raspberry Pi 3B+, my Raspberry Pi 4, my Jetson Nano (with the very nice stock heatsink), and, for comparison, Clank and a few other machines.  Let's get testing!


The Pi3, Pi4, Jetson Nano, and Clank

I've got a collection of broadly similar machines here that I'm benchmarking.  They're all quad core machines, with various amounts of RAM.  And, by modern desktop standards, they're all "gutless wonders."  Despite that, I've used them all fairly extensively (except the Pi4, which just showed up).

I have a thing for gutless wonder ARM desktops.  I've talked about doing it with the Raspberry Pi 3B (results: Don't use btrfs on USB), the Raspberry Pi 3B+ (works fine, just a bit RAM limited), and the Jetson Nano (an honest desktop replacement for most tasks).

The Raspberry Pi 3B+ comes with 1GB of RAM, USB2, and a cluster of 4 ARM Cortex A-53 cores at 1.4GHz.  It also has improved thermals - the PCB uses a thicker metal base layer, and there's a heat spreader over the CPU.  Supposedly, these improve the throttling behavior under load a lot (though I typically just prefer a massive heatsink anyway).


Clank is an old Asus 1215N I wrote about as well.  This machine sports a dual core Atom D525 at 1.8GHz, but it's hyperthreaded - so I've got 4 logical cores.  It has the most RAM of anything in here, at 8GB, but the CPU shows it's age, and the (integrated) GPU is barely better than a framebuffer.  However, it's a full computer with battery pack - which means it's a lot more useful, out of the box, than these little boards.


Moving to more recent ARM computers, I've got the Jetson Nano (also the subject of several posts - benchmarking, desktop use).  This sports four Cortex A-57 cores at 1.47GHz, a luxurious 4GB of RAM, and a beefy nVidia Maxwell GPU.  My Nano has been serving as the light desktop in my office, having replaced the 3B+ back when it came out.  The Nano comes with USB3 ports, which are properly good for external interfaces - plus real gigabit.


Finally, I've got a new Raspberry Pi 4 with 4GB of RAM.  This is the newest system in the test, and it comes with a set of four Cortex A-72 cores at 1.5Ghz, USB3, and a new GPU architecture.  On paper, this should be a good bit faster than the Nano - but paper and reality often differ.


Interestingly, it turns out that the Pi4 can be overclocked rather substantially.  Not a few dozen MHz - you can, with a bit of extra voltage, typically get a production Pi4 up to 2GHz!  That's an extra 500MHz on a 1.5Ghz base speed, or an extra 33%!  That's the sort of difference you might notice - but how does it impact benchmarks?  As it turns out, significantly!

Then, as useful for comparisons, I'll toss in results from some newer systems if relevant.  They're just a random sampling of modern systems I have access to at some level.

Heatsinks: Yes, I Use Them

While I've done some testing on bare boards, most of the testing was done in a slightly more heatsink-heavy configuration.  The Jetson Nano looks identical to how it ships, because it's totally fine stock.  The Pi 3B+ has a rather comical heatsink arrangement I've covered before - it lets it run at 1.4Ghz all day long without any issues.  And the Pi4 has a neat little aluminum heatsink on it that does a great job with keeping it cool, even when overclocked quite hard (as long as the fans are turning).


With the exception of the kernel builds (which I used to test thermals as well as performance), all the tests are done with the heatsinks on.  This means that even on the long running tests, the Pi3B+ was running at 1.4Ghz, and the Pi4 was running at full intended speed (either 1.5Ghz or 2Ghz), and the Nano was similarly running at full tilt.  Throttling simply isn't a factor in these results unless noted.

Benchmark Tests and Goals

There are a lot of benchmarks out for Raspberry Pis, and there are a lot of benchmarks out for the Jetson Nano.  Unfortunately, they tend to be in totally different spaces.  The Raspberry Pi benchmarks are usually purely synthetic things like linpack, and the Jetson Nano, as a big GPU with a few ARM cores bolted on, tends to get benchmarked on machine learning tasks and GPU compute.

I'm not really interested in either of those.  I'm interested in the sort of practical, day to day benchmarks that show how these machines work as actual end user desktops (or laptops).  For that sort of use, disk IO on random small block workloads matters a ton.  As much as it sometimes pains me, browser performance is the dominant factor for most desktops these days.  And, just to stack things together, I'll be doing some large parallel builds to test sustained throughput.  You wouldn't want to use one of these as a build cluster node, but... well, actually, maybe you would!  That's an interesting question I might answer some other day, though.

If you're doing machine learning or scientific compute on little small board computers for some reason, these results probably won't be very useful to you.  But if you're planning to use one for desktop use, you might learn a lot!  And, I'll very successfully compare the Jetson Nano to the Raspberry Pi 4.

Tests I'll be using:

  • iozone3 to test disk performance.  Command line: iozone -e -I -a -s 200M -r 4k -r 1M -r 16M -i 0 -i 1 -i 2  This tests disk performance in a variety of methods and spits out throughput results.
  • mbw for basic RAM bandwidth testing.  This uses several different techniques to copy memory (ranging from "optimized" to "really dumb") and shows throughput.
  • 7zip's benchmark mode for CPU comparison (a general performance comparison).  Output is in MIPS.
  • SunSpider 1.0, JetStream 1.1, JetStream 2, Speedometer 2.0, and MotionMark.  Some of the newest versions wouldn't run properly on the Pi3, so I chose a set that works on most of the hardware.  These were all run in Chromium on the various different devices (up to date as of testing).  Since the goal is a hardware comparison and not a browser comparison, using the same browser across platforms seemed fair.
  • A Linux kernel build for the Raspberry Pi 4 (regardless of the host platform).  This is a long running test of real world performance (and also thermal throttling).

Raspberry Pi Setup

For the Pis, I'm using a recent Buster install - the 2019-09-26 full desktop image.  I'm installing it, going through the setup process (including fully updating things), and working from there.  This is the condition most Pis will run in - a stock Raspbian install, updated on occasion.  Could I get better performance with tweaks?  Probably.  Is that representative?  Probably not.  I'm also going to stick with 32-bit kernels, even though there's a 64-bit option that might help performance slightly.  Again, the common case is the 32-bit kernel (which is fine, even with 4GB of RAM on the Pi4 - it uses LPAE to fully access that RAM space).  I'm not actually sure if all the bugs with the 64-bit kernel have been fully worked out - for a while, you couldn't use more than 3GB of RAM if you wanted USB.

The Pi3B+ is in the FLIRC case with external heatsink, so can hold 1.4GHz.  The Pi4 has a nice chunky case with fans, so will hold whatever the current max clock rate is (1.5GHz or 2GHz).  The Jetson Nano comes with a large enough heatsink that throttling isn't an issue, and Clank has more than enough cooling to stay at full clocks - so these should be just about best case tests!

For the Pi4's overclocked test, I'm using the following lines in config.txt:

over_voltage=4
arm_freq=2000

Obviously, me being able to clock a Pi4 at 2GHz is no guarantee you will be able to, but it seems like almost all of them run there with a bit of voltage thrown at them - so it's a safe enough bet.  Just use a heatsink.

SD Card Tests

My opinion of running an OS off an SD card is probably well known at this point: I think it's a stupid idea.  SD cards are designed to hold media - photos, music, or video.  This implies large blocks of data being written/read linearly, and they handle that quite well.  They don't handle heavy OS use well, and that's before they fall over dead from taking a ton of write traffic.

But, they're still used by a lot of people.  Plus, they make a decent bit of scratch space for downloads or something like that.

The Raspberry Pi 3 is limited by using one of the older, slower SD card transfer modes.  It will never see more than about 25MB/s, and in practice you're doing well to see 20MB/s.

When I tested the Jetson Nano, it was obviously using the newer, higher speed transfer modes - and the Pi4 does as well.  It's a significant improvement over the Pi3 for sustained transfers, but this doesn't help with the random IO tests, because that's limited by the card.  How does it compare to the Nano?

I'm testing linear reads/writes, and random reads/writes.  To get the range of performance, I tesetd with 4k blocks, 1MB blocks, and 16MB blocks.  The smaller side represents regular OS use (loading binaries, writing small files, swap), and the large side represents the limit of performance one might see when loading bulk data, copying files around, or serving data across the network.  It turns out that the 1MB and 16MB performance is nearly identical, so I'm just using the 16MB numbers for a best case, large transfer situation.

The Raspberry Pi 3B+ and Raspberry Pi 4 are using the same physical card for the tests (along with Clank).  The Nano has a similar card, but it's not the same physical card (re-imaging stuff is pretty disruptive and the Nano is actually a working computer in my office).

Right away, it's obvious that the Pi4 is able to run a lot faster than the 3 in transfer from the SD card.  But even on large block transfers, which are the best case for an SD card, the Pi 4 falls far short of what the card should be able to do.  The Nano manages the sort of read and write speeds I'd expect from a modern SD card.  Clank is clearly using the low speed mode as well, and is slightly slower than the Pi3.


But operating systems don't normally do gigantic block reads and writes (except for bulk file copying).  They typically do small reads/writes - and, for most modern OSes, it's safe to assume these are 4kb in size.  This matches the (typical) memory page size used.  ARM can use a variety of page sizes, but 4k pages match x86, and most ARM systems are set up that way as well.

Here... well, there's very little difference between the Pi3B+ and the Pi4.  The 4 is slightly faster, but not by any real significant amount.  Interestingly, Clank manages a far higher write performance on the SD card than the Pis, but the Pis manage to read a lot faster.  Unsurprisingly, the Nano comes out on top in read performance again.

Practically, though, there's no difference in random IO performance between the Pi3 and the Pi4.  The difference is present, but it's not any sort of radical difference.


I know people are sick of hearing it, but spend the money on a crappy little used SSD and a USB to SSD adapter for desktop use of any of these small computers.

The results here are clear enough, though - the Nano's SD card performance, for whatever it's worth, is rather significantly better.

SSD Tests

All these SSD tests are with the exact same physical device - a little 32GB SSD in a USB3 enclosure.  It's formatted ext4 and run the iozone tests against it, mounted normally (with default options).

For the large block tests, the Pi3's performance isn't surprising.  The Pi3B+ only has USB2, and the SSD can trivially handle that ~35MB/s IO - so regardless of the test, the USB2 bus limits the Pi3 (and Clank).

What's interesting here is that, despite having USB3, the Pi4 can't manage even 200MB/s off a device that can exceed 350MB/s.  I tried the Pi4 both at stock clocks and overclocked to 2GHz - there's not a huge difference, but the disk IO is marginally faster at 2GHz.  Does it matter?  No, but it's at least interesting!

The Nano, for large block reads, absolutely thrashes the Pi4 - no questions.  It manages the exact same read speeds as a modern Intel desktop.  I've previously learned that Intel chips tend to radically outperform ARM chips in USB3 performance, so I like to test my devices on an Intel box as a "ground truth" set of numbers about what the actual hardware (SSD and adapter) can do.  The Nano easily keeps up with the Intel box here.


Moving to the small block performance, though, it's clear that none of the ARM boxes can really run the SSD at the performance limits.  The Intel box gets 50MB/s in small block read and write, and all the ARM boxes are stuck around 15MB/s.  Even overclocked, there's no real difference on the Pi4.

I ran into this result benchmarking the Nano the first time, and I don't have any real explanation.  Even with USB3, and a SSD clearly capable of quite reasonable 4k block performance, none of the ARM boxes can even manage to perform well enough to saturate a USB2 link.  There's barely a difference between the 3B+ (with USB2) and any of the USB3 systems.  Clank, of course... well, look, it's old, ok?


Again, the Jetson Nano comes out ahead in SSD performance - by a good margin.  But only on the large block transfers.  On the small, 4k blocks?  The only real conclusion one can draw is that ARM USB stacks suck universally.  If you're the sort of kernel developer who can enlighten the issues here, please, let's chat - I'd love to understand it.

But it's still an awful lot faster than the SD card, especially in write performance.

Memory Bandwidth Tests

My next head to head test is memory bandwidth.  Here, I'm using the mbw benchmark, with stock parameters.  This is a "dumb" benchmark - it uses things like "memcpy" and "I'm going to copy data in a loop."  It's the sort of stuff you'll find real software using, and it's "memory architecture oblivious" - it's not tuned to the memory system because it doesn't know anything about the memory system in the first place.  If you're not doing scientific compute, this is the sort of thing software uses.

In a totally unsurprising result, the Pi4 is rather substantially faster than the Pi3 - it manages around 2250MB/s instead of a hair over 1000MB/s.  The optimized copy performance isn't dependent on the clock speed, but the dumb copy?  That gets significantly faster when the Pi4 is overclocked.

But, it's still not enough to best the Jetson Nano, at up around 3GB/s of memory copy bandwidth.


Memory bandwidth: Another win for the Jetson Nano.

7z Benchmark

So far, the Nano just keeps coming out on top.  But it's running a far simpler ARM A-53 CPU core as compared to the Raspberry Pi 4's A-72 cores (which can be pushed an awful lot faster).

My next test is the 7zip benchmark.  This does some 7zip related testing, and spits out an average MIPS (Millions of Instructions Per Second) score (for compression, decompression, and then the combined average).  This is dependent on CPU core performance, caches, memory bandwidth... and here, the Raspberry Pi 4's architecture really shows off.  You can see the quite significant performance boost when overclocked, and it really likes decompressing stuff, coming in twice as fast as the Nano when overclocked.


So, here, on a mostly CPU-bound test, the Raspberry Pi 4's A-72 cores can show off and demonstrate a commanding lead.

Browser Tests

And here, we get to some interesting real world testing.  The Nano has faster memory.  The Pi4 has a faster CPU.  Which matters more?  Browser Benchmark Battle!

Before diving into the numbers, I want to give you some context as to how these systems actually perform on the modern internet (of 2019).

Clank is a gutless wonder.  It's usable on the internet, barely.  The Raspberry Pi 3B+ is more usable, but it still struggles with anything particularly complex.  It's the bottom end of "generally useful on the internet."  The Jetson Nano and Raspberry Pi 4 are perfectly capable modern internet machines, though it's still possible to choke them out if you try hard enough.

The first benchmark is SunSpider 1.0.  This is a set of Javascript tests, and lower scores are faster.  It's no longer used (partly because, like any benchmark, people started gaming it), but it shows the first hint of a trend we'll be seeing throughout these tests.  Stock, the Pi4 is slightly faster than the Jetson Nano, but overclocked?  There's a very significant performance gap between the two.  Clank and the Pi3B+ both come in around the same here, which is surprising given how different they feel on the internet.  And, while a 2Ghz Pi takes 670ms to chew through it, a 2018 Mac Min can rip through it in a mere 250ms!


The rest of the benchmarks all have the nice trait of "having a benchmark score in the same general numeric ballpark as each other" and "higher numbers are better."  I changed the ordering here to compare the stock Pi4 and the Jetson Nano next to each other, with the overclocked Pi4 also next to the Nano.

Stock for stock, the Pi4 and the Jetson Nano are pretty much dead even.  Interestingly (to me), the Pi4 actually outscores the Nano in MotionMark, despite the Nano having a cut down desktop GPU in it.  But they're really, really similar in performance results.  One has faster memory, one has a faster CPU, and it comes out about even.

But then I push the Pi4 up to 2GHz - and even though the memory bandwidth is no better, the CPU performance boost is enough to give the Pi4 a commanding lead over the Nano in every single benchmark.  Even with the GPU still running at stock speeds, MotionMark gets a significant improvement, and the rest follow.


There's no question that the Pi4, at 2GHz, is the clear winner in this test.  At stock speeds, they work out to just about the same performance.  And, of course, they utterly thrash the old Pi3B+.

Comparing the Pi4 and Nano to a 2018 Mac Mini, the Mac Mini is unsurprisingly a lot faster - but only by a factor of 2-4x.  It's not the sort of commanding lead one would expect a thousand dollar computer to have over a $65 board - but, the results don't lie.  In JetStream 2, the Mac Mini is 4.4x faster than the Pi4 at 2Ghz, and only 2.75x faster in Speedometer.  The Pi4 and Nano are both absolutely acceptable machines on the modern internet, and both have (or can have) enough RAM to easily handle modern tasks.


Kernel Builds and Thermal Management

Finally, because it's a wonderful real world workload, I've done a few kernel builds.  It's a non-synthetic benchmark that relies on the CPU and memory performance.  It's not Linpack - it's just beating the crap out of the system for most of an hour.

While I was playing with these builds, I did some thermal testing as well.  I've heard claims about how the Pi4 on the current firmware doesn't need a heatsink, and I know the Pi 3B+ has improved thermals - but I've not really beat on them in the "bare board" configuration.

The tests aren't exactly identical here - the Nano is building a 64-bit kernel and the Pis are building 32-bit kernels.  This is because the Nano has a 64-bit OS and the Pis have a 32-bit OS.  Yes, you can cross compile stuff, and, no, I just didn't feel like doing that much work.  All the boards are building a Raspberry Pi 4 stock 4.19 kernel in their native bit-width.

First, the results:


As has been the case on the rest of the tests, the Jetson Nano and the stock Raspberry Pi 4 run about dead even.  Overclocked to 2GHz, the Pi4, as expected, pulls ahead.  The Pi 3B+ is pretty far behind, as shouldn't be a surprise by now.

To really stress the thermals of the Pi4, I did the build at stock clocks "bare" - no heatsink, just the board in a 70F office.  It throttled.


But... it took a long time to start throttling - easily 15 minutes, though I didn't datalog the run.  Plus, it would jump right back up to the stock speed after a brief period of throttling.  If you're not running the Pi4 wide open, 100% of the time, it's actually decent enough without a heatsink.  If it's in a case, this would probably be worse, but it's not nearly as bad as the Pi3B (not the 3B+) that will throttle if you look at it wrong.

Running at 2Ghz, the system needs a heatsink - period.  Even with the massive heatsink on mine, without the fans running, it'll still get into throttling.  But with the fans turning, it's perfectly fine and holds about 60-65C while running under full load.

While it's slower, I'm really pretty impressed with the 3B+ as well.  Bare, it throttles down from 1.4Ghz to 1.2Ghz - which I knew it would do.  But it then soldiers on at 65C and 1.2Ghz for the rest of the kernel builds!  This is a significant improvement over the 3B, which will drop down to well below 1GHz in the same conditions.

But, while you can run the Pi4 acceptably without a heatsink, you really should use a good heatsink - and then overclock the thing to 2GHz!

Conclusions: Get the Pi4, Overclock It!

So, when it comes down to it, if you're looking for a little small board computer with halfway decent software support for desktop use, which one is better?

As much as I like the Jetson Nano, the results are pretty clear - stock for stock, they perform nearly identically and the Pi4 is quite a bit cheaper.  But, if you spend a couple bucks on a good heatsink and are willing to push your Pi4 to 2GHz, the Pi4 is significantly faster at the sort of things you care about doing - and also, still cheaper.  Just make sure you have a good power supply.

Plus, the Raspberry Pi 4 is going to have better software support long term.  It's a far more popular board than the Nano.  I'm not sure if the Nano will get a kernel update or not, and the Raspberry Pi foundation and developers are working to get a lot of the Pi code into the mainline kernel.


If you already have a Jetson Nano for desktop use and are happy with the performance, I wouldn't rush out and replace it with a Pi4 - even overclocked, I don't think there's quite enough of a performance delta to justify that, but if you have neither?  There's just no question that the Pi4 is a faster board for less money in late 2019.  The areas where the Nano pulls ahead simply don't matter for day to day use.

Finally, if you find my reviews like this useful, and are considering buying a board, I'd appreciate you using my eBay affiliate links.  It doesn't change the cost of your purchase any, and it tosses a few coins back my way to help cover the cost of the hardware I review.  I'm not getting rich on this blog by any means, but it's always nice when it covers the hardware costs.  There's also a donate button in the upper right, should you want to use that.

Raspberry Pi 4/4GB: $60 on eBay
Jetson Nano: $120 on eBay

2 comments:

  1. I appreciate the non-trivial amount of work that went into this post. I don't have any of these devices but thanks for all the testing.

    ReplyDelete
  2. Thank you for this. I use a Pi4 as a desktop and it (mostly) copes. Just enough grind to run Blender and do simple modelling, then Slic3r to print it.

    ReplyDelete

Comments on older posts are moderated due to spam issues. If you don't see your comment immediately, and you weren't just spamming me with some irrelevant comment and a link to whatever site you're trying to SEO, your comment should show up relatively soon. If you're trying to use my blog for your SEO purposes, your comments will never show up, so don't waste your time.