The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 1) GPIOs, based on the ATmega328P microcontroller.
The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.
On the programming level, GPIO ports are grouped into 3 “ports” (figure 2), and it is how you can access them:
A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is connected to the built-in LED.
Each Port has assigned three 8-bit registers:
There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 1.
DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
| Instruction | Description |
|---|---|
SBI | Set bit in register |
CBI | Clear bit in register |
SBIS | Skif if bit in register is set (1) |
SBIC | Skip if bit in register is clear (0) |
IN | Read hardware register to the general-purpose register (R0-R31) |
OUT | Write the general-purpose register to the hardware register. |
ANDI | Masks a bit |
ORI | Sets a bit |
A common scenario is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).
IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions, covering the whole range of IO registers (0-63), beyond aforementioned DDRx, PORTx and PINx registers.
Template for the assembler code
Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.
.org 0x0000 rjmp start start: ...
It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to put it may impact your programming experience later, when you decide to declare some data.
Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 2 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:
| Name | Address (I/O) | Description |
|---|---|---|
| PINB | 0x03 | Input pins register (Port B) |
| DDRB | 0x04 | Data direction register (Port B) |
| PORTB | 0x05 | Output register/pull-up enable (Port B) |
| PINC | 0x06 | Input pins register (Port C) |
| DDRC | 0x07 | Data direction register (Port C) |
| PORTC | 0x08 | Output register/pull-up enable (Port C) |
| PIND | 0x09 | Input pins register (Port D) |
| DDRD | 0x0A | Data direction register (Port D) |
| PORTD | 0x0B | Output register/pull-up enable (Port D) |
The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):
; I/O registers .equ PINB, 0x03 .equ DDRB, 0x04 .equ PORTB, 0x05 .equ PINC, 0x06 .equ DDRC, 0x07 .equ PORTC, 0x08 .equ PIND, 0x09 .equ DDRD, 0x0A .equ PORTD, 0x0B ; your code starts here .org 0x0000 rjmp start start: ...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
.equ PINB, 0x03 or .equ PINB = 0x03
Below are sections representing common usage scenarios for GPIO management:
USE GPIO as output
In this scenario, we use GPIO as an output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 2). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13.
It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.
This code flashes the built-in LED.
.equ DDRB, 0x04 .equ PORTB, 0x05 .equ PB5, 5 ; PB5 is GPIO 13, and it is a built-in LED .org 0x0000 rjmp RESET
Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:
RESET: ldi r16, 1 << PB5 ; Set bit 5 out DDRB, r16 ; Set PB5 as output
Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.
LOOP: sbi PORTB, PB5 ; Turn LED off rcall delay cbi PORTB, PB5 ; Turn LED on rcall delay rjmp LOOP
This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:
delay: ldi r20, 43 ; Outer loop outer_loop: ldi r18, 250 ; Mid loop mid_loop: ldi r19, 250 ; Inner loop inner_loop: dec r19 brne inner_loop dec r18 brne mid_loop dec r20 brne outer_loop ret
Instructions used in those loops are listed in the table 3, along with a number of cycles used:
| Instruction | Cycles |
|---|---|
| ldi | 1 |
| dec | 1 |
| brne | 2 (taken), 1 (not taken) |
| ret | 4 |
Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:
ldi r19,250),dec r19),
Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.
Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:
ldi r19,250 for mid loop init)dec r18)
Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles
The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:
lid r20,43 initialise the outer loop),dec r20 is 1 cycle),
The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.
Thus, the total cost of the delay section is 8 094 883 clock cycles.
ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.
Use serial port for tracing
The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.
UART uses two pins:
While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 4.
| Register | Address | Official Name | Common Name | Bits | Description |
|---|---|---|---|---|---|
| UDR0 | 0xC6 | USART I/O Data Register | Data register / TX-RX buffer | 7:0 | Write to transmit data, read to receive data |
| UCSR0A | 0xC0 | USART Control and Status Register A | Status register | RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 | Status flags (ready, complete, errors, speed mode) |
| UCSR0B | 0xC1 | USART Control and Status Register B | Control register | RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 | Enable TX/RX, interrupts, 9-bit mode |
| UCSR0C | 0xC2 | USART Control and Status Register C | Configuration / Frame register | UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 | Frame format (mode, parity, stop bits, data size) |
| UBRR0L | 0xC4 | USART Baud Rate Register Low | Baud rate register (low) | 7:0 | Lower byte of baud rate divider |
| UBRR0H | 0xC5 | USART Baud Rate Register High | Baud rate register (high) | 3:0 | Upper byte of baud rate divider |
In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:
.equ UBRR0H, 0xC5 .equ UBRR0L, 0xC4 .equ UCSR0A, 0xC0 .equ UCSR0B, 0xC1 .equ UCSR0C, 0xC2 .equ UDR0, 0xC6 .equ TXEN0, 3 ; bit 3 controls if UART is enabled or disabled .equ UDRE0, 5 ; bit 5 indicates the transmit buffer is empty
Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.
.org 0x0000 rjmp reset message: .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0
The following section initialises the serial port for 9600bps:
ldi r16, hi8(103) sts UBRR0H, r16 ldi r16, lo8(103) sts UBRR0L, r16
The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. Prescaler can be calculated according to the following formula 3:
Where Fcpu is 16MHz. Note that this calculation does not exactly give 9600 bps but rather ~9615 bps. A tolerance of up to 2% is acceptable (here, 0.16%).
Next step is to enable UART:
ldi r16, (1 << TXEN0) sts UCSR0B, r16
and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):
ldi r16, (1 << TXEN0) sts UCSR0B, r16
Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).
main: ldi ZH, hi8(message) ldi ZL, lo8(message) send_loop: lpm r18, Z+ cpi r18, 0 breq main
The next character can be sent only if the previous one is sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). When it is 0, the next byte can be written to UDR0:
wait_udre: lds r19, UCSR0A sbrs r19, UDRE0 rjmp wait_udre sts UDR0, r18 rjmp send_loop
Use GPIO as input
Use GPIO as input with pull-up
Reading of the analogue values is not so straightforward as in the case of binary ones.
Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable), typically 5V.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 4, based on the input value Vgpio and the reference value Vref).
Analogue reading uses a complex setup of ADC-related registers as presented in table 5:
| Register | Description |
|---|---|
ADMUX | Selects voltage reference and |