Introduction to the Arduino Uno programming in Assembler

The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 1) GPIOs, based on the ATmega328P microcontroller.

Figure 1: Arduino Uno development board

GPIO and Ports

The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.

On the programming level, GPIO ports are grouped into 3 “ports” (figure 2), and it is how you can access them:

  • PortB, with GPIOs from D8 to D13,
  • PortC, with GPIOs from port A0 to A5,
  • PortD, with GPIOs from D0 to D7.

A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is connected to the built-in LED.

Figure 2: Arduino ports

IO Registers

Each Port has assigned three 8-bit registers:

  • DDRx (Data Direction Register): there are 3 of those registers, one per Port (B, C, D): DDRB, DDRC and DDRD. This registers configures GPIO as Input (0) or Output (1). Configuration is done “per bit”, so it is equivalent to controlling each GPIO individually.
  • PORTx (Port Data Register): there are also 3 of those registers: PORTB, PORTC and PORTD. The operation depends on the value of the specific bit in the corresponding DDR register; either pin is configured as input or output:
    • If a specific GPIO pin (represented as a bit in the related DDRx register) is set as output, then PORTx bit directly affects the GPIO output: 1 is HIGH (+5V), while 0 is LOW (0V).
    • If a specific GPIO pin is set to input, PORTx value controls the internal pull-up resistor: 1 enables pull-up, 0 disables it.
  • PINx (Pin Value Register) represents the current input state of the GPIO.

Instructions

There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 1.

Assembler-level operations using ports are much faster than DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
Table 1: Common GPIO-related, I/O instructions
Instruction Description
SBI Set bit in register
CBI Clear bit in register
SBIS Skif if bit in register is set (1)
SBIC Skip if bit in register is clear (0)
IN Read hardware register to the general-purpose register (R0-R31)
OUT Write the general-purpose register to the hardware register.
ANDI Masks a bit
ORI Sets a bit

A common scenario is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).

IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions, covering the whole range of IO registers (0-63), beyond aforementioned DDRx, PORTx and PINx registers.

Examples

Template for the assembler code

Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.

    .org 0x0000
    rjmp start
 
start:
...

It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to put it may impact your programming experience later, when you decide to declare some data.

Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 2 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:

Table 2: I/O registers and their addresses (IDs)
Name Address (I/O) Description
PINB 0x03 Input pins register (Port B)
DDRB 0x04 Data direction register (Port B)
PORTB 0x05 Output register/pull-up enable (Port B)
PINC 0x06 Input pins register (Port C)
DDRC 0x07 Data direction register (Port C)
PORTC 0x08 Output register/pull-up enable (Port C)
PIND 0x09 Input pins register (Port D)
DDRD 0x0A Data direction register (Port D)
PORTD 0x0B Output register/pull-up enable (Port D)

The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):

; I/O registers
.equ PINB,  0x03
.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PINC,  0x06
.equ DDRC,  0x07
.equ PORTC, 0x08
.equ PIND,  0x09
.equ DDRD,  0x0A
.equ PORTD, 0x0B
 
; your code starts here
    .org 0x0000
    rjmp start
 
start:
...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
Depending on the compiler you use, there are two standards of syntax. You can find the correct .equ PINB, 0x03 or .equ PINB = 0x03

Below are sections representing common usage scenarios for GPIO management:

USE GPIO as output
In this scenario, we use GPIO as an output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 2). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13. It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.

This code flashes the built-in LED.

.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PB5, 5                 ; PB5 is GPIO 13, and it is a built-in LED
    .org 0x0000
    rjmp RESET

Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:

RESET:
    ldi r16, 1 << PB5        ; Set bit 5
    out DDRB, r16            ; Set PB5 as output

Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.

LOOP:
    sbi PORTB, PB5           ; Turn LED off
    rcall delay
    cbi PORTB, PB5           ; Turn LED on
    rcall delay
    rjmp LOOP

This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:

delay:
    ldi r20, 43     ; Outer loop
outer_loop:
    ldi r18, 250    ; Mid loop
mid_loop:
    ldi r19, 250    ; Inner loop
inner_loop:
    dec r19
    brne inner_loop
    dec r18
    brne mid_loop
    dec r20
    brne outer_loop
    ret

Instructions used in those loops are listed in the table 3, along with a number of cycles used:

Table 3: Selected AVR instruction timings
Instruction Cycles
ldi 1
dec 1
brne 2 (taken), 1 (not taken)
ret 4

Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:

  • 1×1 (loop init, ldi r19,250),
  • 250×1 (250 executions of dec r19),
  • 249×2 + 1+1 = 499 (249 executions of brne with jump + 1 when not jumping).

Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.

Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:

  • 1×1 (ldi r19,250 for mid loop init)
  • 250×750 (inner loop execution cost, as counted above, because inner loop is nested inside mid-loop)
  • 250×1 (250 executions of dec r18)
  • 249×2 + 1+1 = 499 (249 executions of brne with jump + 1 when not jumping)

Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles

The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:

  • 1×1 (lid r20,43 initialise the outer loop),
  • 43×188250 (call mid-loop 43 times),
  • 43×1 (cost of dec r20 is 1 cycle),
  • 42×2 + 1+1 = 85 (249 executions of brne with jump + 1 when not jumping).

The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.

Thus, the total cost of the delay section is 8 094 883 clock cycles.

ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.

Use serial port for tracing
The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.

UART uses two pins:

  • TX (PortD, pin 1) - data from MCU to the external world,
  • RX (PortD, pin 0) - data from the external world to the MCU.

While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 4.

Table 4: Serial port (UART) related registers
Register Address Official Name Common Name Bits Description
UDR0 0xC6 USART I/O Data Register Data register / TX-RX buffer 7:0 Write to transmit data, read to receive data
UCSR0A 0xC0 USART Control and Status Register A Status register RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 Status flags (ready, complete, errors, speed mode)
UCSR0B 0xC1 USART Control and Status Register B Control register RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 Enable TX/RX, interrupts, 9-bit mode
UCSR0C 0xC2 USART Control and Status Register C Configuration / Frame register UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 Frame format (mode, parity, stop bits, data size)
UBRR0L 0xC4 USART Baud Rate Register Low Baud rate register (low) 7:0 Lower byte of baud rate divider
UBRR0H 0xC5 USART Baud Rate Register High Baud rate register (high) 3:0 Upper byte of baud rate divider

In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:

.equ UBRR0H, 0xC5
.equ UBRR0L, 0xC4
.equ UCSR0A, 0xC0
.equ UCSR0B, 0xC1
.equ UCSR0C, 0xC2
.equ UDR0,   0xC6
 
.equ TXEN0, 3      ; bit 3 controls if UART is enabled or disabled
.equ UDRE0, 5      ; bit 5 indicates the transmit buffer is empty

Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.

.org 0x0000
    rjmp reset
message:
    .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0

The following section initialises the serial port for 9600bps:

ldi r16, hi8(103)
sts UBRR0H, r16
ldi r16, lo8(103)
sts UBRR0L, r16

The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. Prescaler can be calculated according to the following formula 3:

Figure 3: UART prescaler equation

Where Fcpu is 16MHz. Note that this calculation does not exactly give 9600 bps but rather ~9615 bps. A tolerance of up to 2% is acceptable (here, 0.16%).

Next step is to enable UART:

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).

main:
    ldi ZH, hi8(message)
    ldi ZL, lo8(message)
 
send_loop:
    lpm r18, Z+
    cpi r18, 0
    breq main

The next character can be sent only if the previous one is sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). When it is 0, the next byte can be written to UDR0:

wait_udre:
    lds r19, UCSR0A
    sbrs r19, UDRE0
    rjmp wait_udre
 
    sts UDR0, r18
    rjmp send_loop

Use GPIO as input

Use GPIO as input with pull-up

Reading analogue values

Reading of the analogue values is not so straightforward as in the case of binary ones. Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable), typically 5V.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 4, based on the input value Vgpio and the reference value Vref).

Figure 4: ADC value calculation based on the input voltage and reference voltage

Analogue reading uses a complex setup of ADC-related registers as presented in table 5:

Table 5: ADC-related registers used for reading the analogue values of GPIOs
Register Description
ADMUX Selects voltage reference and
en/multiasm/exercisesbook/arduinouno.txt · Last modified: 2026/04/02 23:03 by pczekalski
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0