Saturday, February 11, 2017

Common Arduino Library SRAM Use

I've been playing with Arduino for a few months now, and one of the things I've found incredibly frustrating is just how much dynamic/global memory (SRAM) most of the common libraries use - the standard Serial library uses nearly 200 bytes of precious RAM, always and forever, just to print a single log message, and a lot of others aren't that much better.

Further, advice about Arduino memory pressure tends towards the handwavy "Well... don't use globals, and use the F() macro..." side of things - which, while accurate, is missing some very important information for understanding what is happening and how you can resolve the issues.

Here's a particularly bad example of what I'm talking about.  With literally nothing done but some libraries initialized, this code is using 1101/2048 bytes of RAM - 53%, and I haven't done a single useful thing yet!


I'm going to dive into library memory use in some depth, look at a few examples, and profile some commonly used libraries!  Read on if you're interested!

Arduino Memory Types

Hopefully this is review for you, but if not, the Arduino has 3 different types of memory - with 4 different ways of using it.  The sizes I list are based on the Arduino Uno (Atmega 328) - other variants will have the same types of memory, but different sizes.

Program Memory

The microcontroller has 32kb of program memory used for storing the programs (though 512 bytes is used by the bootloader).  This is the "hard drive" space used for storing the programs, but it's also useful as a place to store strings and other read only data.  This region is read only during normal execution - you can't update it from the executing code.

For most people, and for most programs, this is a huge amount of space, and it's unlikely to run out - so shove as much in here as you can!

The F() macro and PROGMEM keyword will store strings and other data structures in this memory.  You can't access them directly, though - you need to use special functions to access them.  Most of these functions are suffixed with _P - though I'll talk about this more later, and in future posts.

EEPROM

The what-huh?  Nobody tends to know much about this chunk of memory, but the Atmega 328 has 1024 bytes of EEPROM (Electrically Erasable Programmable Read Only Memory - basically flash memory, despite the rather contradictory name) accessible to the executing programs.  Data stored to this memory does persist across power cycles, and programs can write to it.  It's a great place to store unit IDs or calibration data, or to store data on shutdown for use on a future boot.  Note that this memory is by no means secure, so storing crypto keys here isn't a great idea if you're concerned about someone having physical access to your device.

The interface consists of "Here's 1024 bytes, have fun."  Structure and all the other details about how data is stored are left up to the programmer.  There's a very basic library, but you can just access it like an array if you want.

Importantly, the EEPROM has a limited number of write cycles.  The Atmel328 (Arduino Uno) has a rated cycle life of 100k cycles for EEPROM cells, though a perhaps mildly insane person has tested it and gotten past 1 million cycles before seeing failures.  Still, it's not a storage space you should be writing on every loop through your program.  On top of the limited write cycles, accessing this memory is slow - a write takes over 3ms.

SRAM

Finally!  The main topic of this post - the SRAM!  SRAM stands for "Static RAM" - as contrasted to the more common DRAM ("Dynamic RAM").  They're entirely different in implementation, but from a programmer perspective, they look the same.  It's RAM.

The Arduino Uno has 2048 bytes of SRAM available - 2kb.  It's a tight resource, but far too many people try to treat an Arduino an awful lot like a desktop computer, and don't think about how they're using memory (or just assume that their particular library is the only library anyone will ever use in a program).

The SRAM is used for global variables, as well as for stack variables (variables allocated within a function).  Any variable you can write to at runtime lives here.  And, if you run out of it, the stack and heap collide, and Bad Things happen (usually the executing program crashes, though weird data corruption is also possible).

Every byte is important - so don't waste it!  Sadly, a lot of common libraries do waste large amounts of it, or I wouldn't be writing this post.

It's worth mentioning that having separate memory regions (and data paths) for program code and data make the Arduino a solid example of the Harvard architecture.  Most modern computers are a Von Neumann architecture, in which program code and data are mixed in a single memory region (at least until you're deep inside the processor).  These architectures are not identical, and if you try to treat them like they are, you end up with sub-optimal results.

F() and PROGMEM - Strings in PROGMEM

Finally, it's helpful to know what the often-mentioned F() macro and PROGMEM specifier actually do on Arduino.

Here's why the Atmega being a Harvard architecture matters - strings, by default, go in the data memory!  This is so the string data can be treated as "normal data" and accessed by normal pointers - a pointer, on the Atmega, is a pointer to data memory and only data memory.  This is also the only memory that can be written at runtime, and you might want to change that string - so it goes in the SRAM, even if constant.

The problem here is that there's only 2048 bytes of RAM to use - and constant strings are not a good use of that RAM!

The PROGMEM specifier and F() macro simply tell the compiler, "Put this constant data in program memory."  Data in Program Space covers many of the details of this, and there are plenty of other pages showing how to do this if you have questions.  I'll just leave things at, "You should always put constant strings in program memory unless you have some very, very good reason."  Some of the subsequent posts in this series will have more concrete examples as well.

Profiling Library SRAM Use

And now, the interesting part of this post - library SRAM use!

There are a lot of common libraries that people use in Arduino sketches, and I haven't found a list of how much SRAM some of the common libraries use!

I gathered these numbers using Arduino 1.6.13 (current as of the time I wrote this post) on OS X.  I expect they'd be consistent across platforms, but they may not remain exactly accurate in future versions of Arduino as memory use is optimized (or made worse).  I expect they will remain reasonably close, though!

Totally Blank Sketch: 9 bytes of SRAM

My first test is a baseline - this is literally a sketch that doesn't do anything (like a certain band of pirates).


Code:
void setup() {}
void loop() {}

Compiler output:
Sketch uses 444 bytes (1%) of program storage space. Maximum is 32,256 bytes.
Global variables use 9 bytes (0%) of dynamic memory, leaving 2,039 bytes for local variables. Maximum is 2,048 bytes.

This is the absolute minimum you're ever going to use in an Arduino sketch: 444 bytes of program storage space (the basic runtime takes a bit of space), and 9 bytes of global SRAM.  The difference between 32,256 available bytes of program memory and the hardware program memory size of 32,768 bytes (512 bytes) goes to the bootloader.

Where is that 9 bytes of SRAM going?  The millisecond timer!  millis()!

If you snoop around wiring.c, you'll notice a few global variables:

volatile unsigned long timer0_overflow_count = 0;
volatile unsigned long timer0_millis = 0;
static unsigned char timer0_fract = 0;

These are included in every sketch, and as an unsigned long is 4 bytes, and an unsigned char is 1 byte, you can see how 4 + 4 + 1 makes 9 bytes of SRAM used!  Now you know, and knowing where that 9 bytes is going, I hear, is one amazing bit of party trivia!  Or maybe I just have friends who attend weird parties.  Anyway.

I'm inclined to say that the overflow_count doesn't really need to be 4 bytes, but, hey.  At least it's only 4 bytes.

Serial Logging: 177 bytes of SRAM

Let's test out something more useful, though - like a basic Hello World sketch.  That requires some serial logging output, which requires including the Serial library, initializing serial, and printing out a message (note that I store the message in program memory using the F() macro so the string doesn't use any SRAM).


Code:
void setup() {
  Serial.begin(9600);
  Serial.println(F("Hello, World!"));
}

void loop() {}

Compiler output:
Sketch uses 1,466 bytes (4%) of program storage space. Maximum is 32,256 bytes.
Global variables use 186 bytes (9%) of dynamic memory, leaving 1,862 bytes for local variables. Maximum is 2,048 bytes.

Woah!  What happened?  The additional 1022 bytes of program memory use makes sense, but an additional 177 bytes of SRAM, for this trivial chunk of code?

Understanding Serial Memory Use

Clearly, the Serial class, even when being used trivially, uses a lot of memory somewhere.  It's easy enough to find - variables and arrays stored in the class use memory, but this is an awful lot.  Let's take a look at HardwareSerial.h - you can find it in your local install, or you can just go look at it on Github.

The size of the transmit and receive buffers is set based on the Arduino memory size.  For an Uno, with 2k of RAM, each is 64 bytes - so 64 bytes for transmit, 64 bytes for receive.  That's 128 bytes right there.  These buffers allow the Serial library to do some cool stuff with interrupts for input and output, but they do take a hefty chunk of memory.

#if ((RAMEND - RAMSTART) < 1023)
#define SERIAL_TX_BUFFER_SIZE 16
#else
#define SERIAL_TX_BUFFER_SIZE 64
#endif

Here are all the variables stored in an instance of Serial:

volatile uint8_t * const _ubrrh;
volatile uint8_t * const _ubrrl;
volatile uint8_t * const _ucsra;
volatile uint8_t * const _ucsrb;
volatile uint8_t * const _ucsrc;
volatile uint8_t * const _udr;
bool _written;
volatile rx_buffer_index_t _rx_buffer_head;
volatile rx_buffer_index_t _rx_buffer_tail;
volatile tx_buffer_index_t _tx_buffer_head;
volatile tx_buffer_index_t _tx_buffer_tail;
unsigned char _rx_buffer[SERIAL_RX_BUFFER_SIZE];
unsigned char _tx_buffer[SERIAL_TX_BUFFER_SIZE];

A pointer on Arduino is 2 bytes, the buffer head and tail pointers are 1 byte, the buffers are 64 bytes each... it adds up!

On top of that, Serial inherits from Stream - which inherits from Print.  They have their own variables taking up space.

If you want to see how big a class is, as instantiated, you can use the "sizeof" operator:
Serial.print("Sizeof Serial: ");
Serial.println(sizeof(Serial));
Serial.print("Sizeof Stream: ");
Serial.println(sizeof(Stream));
Serial.print("Sizeof Print: ");
Serial.println(sizeof(Print));

This gives results that look like this:
Sizeof Serial: 157
Sizeof Stream: 12
Sizeof Print: 4

That adds up to 173 bytes - almost all of the 177 gone missing!  To borrow a useful phrase from academia, "Finding the remaining four bytes is left as an exercise to the reader."

Comprehensive Exercising of Functions

Code doesn't get included unless used.  Let's use a few more Serial functions and see what happens.

Floating point is usually a good way to bloat the code size for fun - so I'll try that, along with a few input functions.


Code:
void setup() {
  Serial.begin(9600);
  Serial.println(F("Serial Exercising"));
}

void loop() {
  if (Serial.available()) {
    int test_int = Serial.parseInt();
    Serial.print(F("Read int: "));
    Serial.println(test_int);
  
    float test_float = Serial.parseFloat();
    Serial.print(F("Read float: "));
    Serial.println(test_float);
  }
}

Compiler output:
Sketch uses 3,808 bytes (11%) of program storage space. Maximum is 32,256 bytes.
Global variables use 200 bytes (9%) of dynamic memory, leaving 1,848 bytes for local variables. Maximum is 2,048 bytes.

There's an additional 3364 bytes of program memory used over the blank sketch, or another 2342 bytes over the simple Hello World program - this shows that only the functions actually being used are compiled in, and also makes the point, rather clearly, that float processing code is bulky (I will argue, quite strongly, that you probably have no business using floats on an Arduino).  This code also uses another 14 bytes of SRAM used - have fun finding them!

SoftwareSerial Library: 119 bytes of SRAM

Another commonly used library in Arduino sketches is the SoftwareSerial library.  This is a library that allows for serial communication on any of the digital IO pins (though not as fast as on the hardware serial port).  Let's try it out!  I have it set up to output on the hardware serial pins - which actually does work as you might expect and send output over the hardware serial interface, though I wouldn't try to use both Serial and SoftwareSerial at the same time.


Code:
#include <SoftwareSerial.h>

SoftwareSerial SoftSerial(0, 1);

void setup() {
  SoftSerial.begin(9600);
  SoftSerial.println(F("Software Serial Exercising"));
}

void loop() {}

Compiler output:
Sketch uses 2,334 bytes (7%) of program storage space. Maximum is 32,256 bytes.
Global variables use 128 bytes (6%) of dynamic memory, leaving 1,920 bytes for local variables. Maximum is 2,048 bytes.

Wow!  Look at that!  I've got serial output on the hardware pins with only 119 bytes of SRAM use!  However, it uses 1890 bytes of program memory (more than the hardware serial library, which makes some sense as it's having to do things in software that the hardware library does with hardware).

If we take a look at SoftwareSerial.h,  we see some variables similar to HardwareSerial, but this class doesn't have a transmit buffer - things are just blasted out on the port without using interrupts.

uint8_t _receivePin;
uint8_t _receiveBitMask;
volatile uint8_t *_receivePortRegister;
uint8_t _transmitBitMask;
volatile uint8_t *_transmitPortRegister;
volatile uint8_t *_pcint_maskreg;
uint8_t _pcint_maskvalue;
uint16_t _rx_delay_centering;
uint16_t _rx_delay_intrabit;
uint16_t _rx_delay_stopbit;
uint16_t _tx_delay;
uint16_t _buffer_overflow:1;
uint16_t _inverse_logic:1;
static uint8_t _receive_buffer[_SS_MAX_RX_BUFF]; 
static volatile uint8_t _receive_buffer_tail;
static volatile uint8_t _receive_buffer_head;
static SoftwareSerial *active_object;

I'd offer that the _buffer_overflow and _inverse_logic bits aren't saving anything by being bit-size unless packed into a struct (further solidifying my opinion that the people writing this code seem to have little real world experience with actual embedded programming or low level programming in general), but otherwise, this is what the SoftwareSerial object has.  Since the receive buffers are static variables, they are actually defined over in the .cpp file - so I don't think you'll be able to receive on multiple SoftwareSerial instances at the same time without them stomping on each other.  Be warned.

Wire Library: 176 bytes of SRAM

Another useful Arduino library is the Wire library.  This handles communication with I2C/"Two Wire" devices.

A basic library to do something useful will open the Wire device, start a transaction, write some data, maybe read some data, and end the transmission.


Code:
#include <Wire.h>

void setup() {
  Wire.begin();
  Wire.beginTransmission(0x00);
  Wire.write(0); 
  Wire.endTransmission();
}

void loop() {}


Compiler output:
Sketch uses 2,090 bytes (6%) of program storage space. Maximum is 32,256 bytes.
Global variables use 185 bytes (9%) of dynamic memory, leaving 1,863 bytes for local variables. Maximum is 2,048 bytes.

This library uses 176 bytes of SRAM, and 1646 bytes of program memory to do something reasonably useful.

You can look at Wire.h and Wire.cpp if you want to see where the memory is going, but it's mostly transmit/receive buffers and state storage, just like the other communication libraries.

SparkFun ESP8266 AT Library: 684 bytes of SRAM (565 without SoftwareSerial)

Finally, I thought I'd share the particularly atrocious library that started me down this path.  SparkFun has an ESP8266 WiFi Shield and an AT library to go with it.  I've written up some notes on this device (I wasn't terribly happy with it, mostly due to the time wasted figuring out that the u.fl connector was flakey), but the library is... well.  Let's take a look.


Code:
#include <SoftwareSerial.h>
#include <SparkFunESP8266WiFi.h>

void setup() {
  esp8266.begin();
}

void loop() {}

Compiler output:
Sketch uses 4,486 bytes (13%) of program storage space. Maximum is 32,256 bytes.
Global variables use 693 bytes (33%) of dynamic memory, leaving 1,355 bytes for local variables. Maximum is 2,048 bytes.

Well then.  This library, with what it includes, uses 684 bytes of SRAM.  If you ignore the SoftwareSerial contribution, it only uses a mere 565 bytes.  Out of 2048 bytes!  At least it only uses 4042 bytes of program memory to initialize (it actually uses a lot more if you try to do something useful).

That's just obscene, and I don't even have the Serial library involved!

No wonder I was having trouble fitting something useful into my available RAM...

Understanding Library Memory Use

There's a thing called "embedded programming."  It involves understanding the nature and limits of one's device, and being careful in one's use of memory.  Whoever wrote this library is obviously not familiar with the concept.

Let's take a look at util/ESP8266_AT.h for a second.

const char RESPONSE_OK[] = "OK\r\n";
const char RESPONSE_ERROR[] = "ERROR\r\n";
const char RESPONSE_FAIL[] = "FAIL";
const char RESPONSE_READY[] = "READY!";
...

Do you see the PROGMEM specifier that puts these explicitly constant strings in program memory?  I certainly don't.  Do you see repeated "\r\n" strings that should probably be a common shared variable if they're taking up space in SRAM?  I do!

This file is just all kinds of wrong.  There's simply no reason to do this on a RAM-limited platform like the Arduino - either you're being lazy, or you don't know better.  I don't know for sure which is the case here (if I had to guess, I'd say an intern who didn't really understand C or embedded programming wrote this code), but publishing a library with stuff like this is just naughty!  It's wasteful, and it sets a really bad example.

There's a 128 byte receive buffer in SparkFunESP8266WiFi.cpp - on top of the SoftwareSerial receive buffer.  The command and response strings?  Still strings stored in SRAM!  Here's a particularly bad chunk (string constants in bold italics):

char *p, *q;
// Look for "AT version" in the rxBuffer
p = strstr(esp8266RxBuffer, "AT version:");
if (p == NULL) return ESP8266_RSP_UNKNOWN;
p += strlen("AT version:");
q = strchr(p, '\r'); // Look for \r
if (q == NULL) return ESP8266_RSP_UNKNOWN;
strncpy(ATversion, p, q-p);

// Look for "SDK version:" in the rxBuffer
p = strstr(esp8266RxBuffer, "SDK version:");
if (p == NULL) return ESP8266_RSP_UNKNOWN;
p += strlen("SDK version:");
q = strchr(p, '\r'); // Look for \r
if (q == NULL) return ESP8266_RSP_UNKNOWN;
strncpy(SDKversion, p, q-p);

// Look for "compile time:" in the rxBuffer
p = strstr(esp8266RxBuffer, "compile time:");
if (p == NULL) return ESP8266_RSP_UNKNOWN;
p += strlen("compile time:");
q = strchr(p, '\r'); // Look for \r
if (q == NULL) return ESP8266_RSP_UNKNOWN;
strncpy(compileTime, p, q-p);

Just... no.  Duplicating strings to be lazy about using strlen instead of storing the string in program memory and using the strlen_P function?  Sloppy and wasteful.  I know this can be done radically more efficiently, because I've written an ESP8266 AT client library with only 100 bytes of SRAM use - almost entirely in SoftwareSerial (I'll cover this in a future post - look for it, though poking through my GitHub repos might give you some ideas).

My advice?  Look through this library, then don't write anything like it.  It's simply bad programming for an embedded device.

Oh, and don't think the other ESP8266 AT client libraries are any better.  I tested a few of them - they're not.

Damned Kids On My Lawn

At this point, I feel like I'm required to say something snarky about colleges only teaching Java and Python on machines with 8GB or more of RAM - so I will.

Telling students, "Don't worry about memory, the garbage collector will handle it!" (though I expect the second part isn't often added) is a great way to end up with programmers who do insanely wasteful things with memory - like, on a Harvard architecture, put constant strings in data memory.  And don't understand the difference between stack and heap.  And very likely have never even been introduced to the differences between a Harvard and Von Neumann architecture.

I understand that teaching Python or Java is "easier" at some level - but not teaching people about RAM leads to writing some funky code later in life - for "fails catastrophically in production" varieties of "funky."  I can rant for half an hour straight about the many interesting ways Java GC blows up in production.

Teach people about RAM early in life, and if they do work with a garbage collected language, they have a chance of doing something reasonable.

If you're currently in college or just bored - there's an insane shortage of people who are comfortable with low level C and can write tight code that runs in Ring 0 of x86 or the comparable spaces in ARM, PowerPC, or whatever.  If your teachers refuse to teach you that stuff, learn it yourself.  Virtual machines and emulators make it so much easier today - you can single step your bootloader in Bochs and work out all sorts of goofy bugs.  If you can write your own hypervisor (that boots from the boot sector, or even that chainloads from Grub and runs as 64-bit code), you probably won't lack for work.  And if you can do that, and are looking for work, get in touch with me.  Contact form is on the right.

Closing Thoughts: Don't Be Stupid

If you're using libraries, this information should help you understand how to analyze how much of your precious SRAM they're using.

If you're writing libraries, don't be stupid.  If you can't justify every byte of SRAM you use, you're well on your way to being stupid with SRAM use.

I'll be talking about this more in the coming weeks, and showing off two of my low-SRAM libraries.  It turns out you can do perfectly functional serial logging with a whopping 0 bytes of SRAM, and can write an entirely functional ESP8266 AT client library in about 100 bytes of SRAM (counting SoftwareSerial's receive buffer - and you most assuredly don't need a duplicate buffer).

There's simply no excuse for sloppy libraries on the Arduino - it's possible to do better, and so you should.

If you happen to have found any other hugely wasteful libraries, let me know in the comments - I might add them to the post!

No comments:

Post a Comment