This content was implemented under the following project:
Consortium Partners
Erasmus+ Disclaimer
This project has been co-funded by the European Union.
Views and opinions expressed are, however, those of the author or authors only and do not necessarily reflect those of the European Union or the Foundation for the Development of the Education System. Neither the European Union nor the entity providing the grant can be held responsible for them.
Copyright Notice
This content was created by the MultiASM Consortium 2023–2026.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.
In case of commercial use, please get in touch with MultiASM Consortium representative.
This manual is intended to help students bootstrap into assembler programming across a variety of applications. It presents practical exercises in a hands-on lab format, often also covering toolchain configuration. Some sections present details for hardware, such as remote IoT and remote ARM laboratories. Others assume the student owns or has access to the PC and can install software.
ARM processors are omnipresent, ranging from simple IoT devices to laptops, notebooks, and workstations.
For this reason, we had to select one technology to use for a practical introduction and experimentation.
To present both hardware interfacing and programming, the obvious choice is the Raspberry Pi. The following chapters present laboratory details and scenarios.
Follow the links below to the lab descriptions and scenarios:
Assembler programming for embedded systems uses an integrated solution for IoT laboratories, namely VREL NextGen Software.
Users connect to the system using a web browser and develop software in the browser, compile it and inject it into the microcontroller, all remotely. Next, they use a web camera to observe the results.
The following chapters present more data on how to use the VREL NextGen remote labs system.
The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 2) GPIOs, based on the ATmega328P microcontroller.
The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.
On the programming level, GPIO ports are grouped into 3 “ports” (figure 3), and it is how you can access them:
A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is connected to the built-in LED.
Each Port has assigned three 8-bit registers:
There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 1.
DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
| Instruction | Description |
|---|---|
SBI | Set bit in register |
CBI | Clear bit in register |
SBIS | Skif if bit in register is set (1) |
SBIC | Skip if bit in register is clear (0) |
IN | Read hardware register to the general-purpose register (R0-R31) |
OUT | Write the general-purpose register to the hardware register. |
ANDI | Masks a bit |
ORI | Sets a bit |
A common scenario is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).
IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions, covering the whole range of IO registers (0-63), beyond aforementioned DDRx, PORTx and PINx registers.
Template for the assembler code
Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.
.org 0x0000 rjmp start start: ...
It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to put it may impact your programming experience later, when you decide to declare some data.
Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 2 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:
| Name | Address (I/O) | Description |
|---|---|---|
| PINB | 0x03 | Input pins register (Port B) |
| DDRB | 0x04 | Data direction register (Port B) |
| PORTB | 0x05 | Output register/pull-up enable (Port B) |
| PINC | 0x06 | Input pins register (Port C) |
| DDRC | 0x07 | Data direction register (Port C) |
| PORTC | 0x08 | Output register/pull-up enable (Port C) |
| PIND | 0x09 | Input pins register (Port D) |
| DDRD | 0x0A | Data direction register (Port D) |
| PORTD | 0x0B | Output register/pull-up enable (Port D) |
The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):
; I/O registers .equ PINB, 0x03 .equ DDRB, 0x04 .equ PORTB, 0x05 .equ PINC, 0x06 .equ DDRC, 0x07 .equ PORTC, 0x08 .equ PIND, 0x09 .equ DDRD, 0x0A .equ PORTD, 0x0B ; your code starts here .org 0x0000 rjmp start start: ...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
.equ PINB, 0x03 or .equ PINB = 0x03
Below are sections representing common usage scenarios for GPIO management:
USE GPIO as output
In this scenario, we use GPIO as an output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 3). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13.
It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.
This code flashes the built-in LED.
.equ DDRB, 0x04 .equ PORTB, 0x05 .equ PB5, 5 ; PB5 is GPIO 13, and it is a built-in LED .org 0x0000 rjmp RESET
Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:
RESET: ldi r16, 1 << PB5 ; Set bit 5 out DDRB, r16 ; Set PB5 as output
Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.
LOOP: sbi PORTB, PB5 ; Turn LED off rcall delay cbi PORTB, PB5 ; Turn LED on rcall delay rjmp LOOP
This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:
delay: ldi r20, 43 ; Outer loop outer_loop: ldi r18, 250 ; Mid loop mid_loop: ldi r19, 250 ; Inner loop inner_loop: dec r19 brne inner_loop dec r18 brne mid_loop dec r20 brne outer_loop ret
Instructions used in those loops are listed in the table 3, along with a number of cycles used:
| Instruction | Cycles |
|---|---|
| ldi | 1 |
| dec | 1 |
| brne | 2 (taken), 1 (not taken) |
| ret | 4 |
Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:
ldi r19,250),dec r19),
Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.
Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:
ldi r19,250 for mid loop init)dec r18)
Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles
The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:
lid r20,43 initialise the outer loop),dec r20 is 1 cycle),
The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.
Thus, the total cost of the delay section is 8 094 883 clock cycles.
ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.
Use serial port for tracing
The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.
UART uses two pins:
While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 4.
| Register | Address | Official Name | Common Name | Bits | Description |
|---|---|---|---|---|---|
| UDR0 | 0xC6 | USART I/O Data Register | Data register / TX-RX buffer | 7:0 | Write to transmit data, read to receive data |
| UCSR0A | 0xC0 | USART Control and Status Register A | Status register | RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 | Status flags (ready, complete, errors, speed mode) |
| UCSR0B | 0xC1 | USART Control and Status Register B | Control register | RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 | Enable TX/RX, interrupts, 9-bit mode |
| UCSR0C | 0xC2 | USART Control and Status Register C | Configuration / Frame register | UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 | Frame format (mode, parity, stop bits, data size) |
| UBRR0L | 0xC4 | USART Baud Rate Register Low | Baud rate register (low) | 7:0 | Lower byte of baud rate divider |
| UBRR0H | 0xC5 | USART Baud Rate Register High | Baud rate register (high) | 3:0 | Upper byte of baud rate divider |
In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:
.equ UBRR0H, 0xC5 .equ UBRR0L, 0xC4 .equ UCSR0A, 0xC0 .equ UCSR0B, 0xC1 .equ UCSR0C, 0xC2 .equ UDR0, 0xC6 .equ TXEN0, 3 ; bit 3 controls if UART is enabled or disabled .equ UDRE0, 5 ; bit 5 indicates the transmit buffer is empty
Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.
.org 0x0000 rjmp reset message: .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0
The following section initialises the serial port for 9600bps:
ldi r16, hi8(103) sts UBRR0H, r16 ldi r16, lo8(103) sts UBRR0L, r16
The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. Prescaler can be calculated according to the following formula 4:
Where Fcpu is 16MHz. Note that this calculation does not exactly give 9600 bps but rather ~9615 bps. A tolerance of up to 2% is acceptable (here, 0.16%).
Next step is to enable UART:
ldi r16, (1 << TXEN0) sts UCSR0B, r16
and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):
ldi r16, (1 << TXEN0) sts UCSR0B, r16
Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).
main: ldi ZH, hi8(message) ldi ZL, lo8(message) send_loop: lpm r18, Z+ cpi r18, 0 breq main
The next character can be sent only if the previous one is sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). When it is 0, the next byte can be written to UDR0:
wait_udre: lds r19, UCSR0A sbrs r19, UDRE0 rjmp wait_udre sts UDR0, r18 rjmp send_loop
Use GPIO as input
Use GPIO as input with pull-up
Reading of the analogue values is not so straightforward as in the case of binary ones.
Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable), typically 5V.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 5, based on the input value Vgpio and the reference value Vref).
Analogue reading uses a complex setup of ADC-related registers as presented in table 5:
| Register | Description |
|---|---|
ADMUX | Selects voltage reference and |
Each laboratory node is equipped with an Arduino Uno R3 development board, based on the ATmega328P MCU. It also has two extension boards:
There are 10 laboratory nodes. They can be used independently, but for collaboration, nodes are interconnected symmetrically, with GPIOs described in the hardware reference section below.
The table 6 lists all hardware components and details. Note that some elements are accessible, but their use is not supported via the remote lab, e.g., buttons and a buzzer.
The node is depicted in the figure 6 and its interface visual schematic is presented in the figure 7. The schematic presents only components used in scenarios and accessible via the VREL NextGen environment (controllable and observable via video stream), omitting unused components such as buttons, a buzzer, and a potentiometer.
| Component ID | Component | Hardware Details (controller) | Control method | GPIOs (as mapped to the Arduno Uno) | Remarks |
|---|---|---|---|---|---|
| D1 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO13 | |
| D2 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO12 | |
| D3 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO11 | |
| D4 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO10 | shared with interconnection with another module |
| LED4 | 4x 7-segment display | indirect, via two 74HC575 registers | serial load to 2 registers, daisy-chained | GPIO8 - serial input of the controller GPIO7 - shift data internally, raising edge (write next bit and shift data in serial) GPIO4 - reset display buffer |
Devices (laboratory nodes) are interconnected in pairs, so it is possible to work in groups and implement scenarios involving more than one device:
Interconnections are symmetrical, so that device 1 can send data to device 2 and vice versa (similar to serial communication). Note that analogue inputs are also involved in the interconnection interface. See image 8 for details.
The in-series resistors protect the Arduino boards' outputs from excessive current when both pins are configured as outputs with opposite logic states.
The capacitors on the analogue lines filter the PWM signal, providing a stable voltage for the analogue-to-digital converter to measure.
| Arduino Uno pin name | AVR pin name | Alternate function | Comment |
|---|---|---|---|
| D2 | PD2 | INT0 | Interrupt input |
| D5 | PD5 | T1 | Timer/counter input |
| D6 | PD6 | OC0A | PWM output to generate analogue voltage |
| D9 | PB1 | OC1A | Digital output / Timer output |
| D10 | PB2 | OC1B | Digital output / Timer output |
| A5 | PC5 | ADC5 | Analogue input |
Such a connection makes it possible to implement a variety of scenarios: