Overview - Branch and link (BL) for subroutines

What is it?

Branch and link (BL) is an instruction in ARM processors used to call subroutines or functions. It saves the address of the next instruction so the program can return after the subroutine finishes. This allows the processor to jump to a different part of the code and come back later. It is essential for organizing code into reusable blocks.

Why it matters

Without BL, programs would have to repeat code instead of reusing it, making them larger and harder to maintain. BL enables efficient function calls and returns, which are fundamental for structured programming and complex software. It also allows the processor to keep track of where to return after a subroutine, preventing errors and crashes.

Where it fits

Before learning BL, you should understand basic ARM instructions and how the program counter (PC) works. After mastering BL, you can learn about stack usage for saving registers during subroutine calls and advanced control flow instructions like conditional branches and interrupts.

Mental Model

Core Idea

Branch and link lets the processor jump to a subroutine while remembering where to return by saving the return address in a special register.

Think of it like...

Imagine you are reading a book and come across a footnote. You mark the page you are on before flipping to the footnote, so after reading it, you can return exactly where you left off.

┌───────────────┐     ┌───────────────┐
│ Current Code  │────▶│ Subroutine    │
│ (PC points)   │     │ (BL jumps)    │
└──────┬────────┘     └──────┬────────┘
       │                      │
       │ Save return address  │
       │ in Link Register (LR)│
       │                      │
       ◀──────────────────────┘
       Return after subroutine finishes

Build-Up - 7 Steps

1

FoundationUnderstanding the Program Counter

Concept: Introduce the program counter (PC) as the register that holds the address of the next instruction to execute.

The program counter (PC) in ARM processors points to the current instruction being executed. After each instruction, the PC moves to the next instruction automatically. This keeps the program running in sequence.

Result

You know that the PC controls the flow of instructions and moves forward step-by-step.

Understanding the PC is crucial because BL changes the PC to jump to subroutines and must save the return address to continue execution later.

2

FoundationWhat is a Subroutine?

3

IntermediateHow Branch and Link Works

4

IntermediateReturning from Subroutines Using LR

5

IntermediateLimitations of Single Link Register

6

AdvancedBL in Thumb and ARM Modes

7

ExpertBL Instruction Encoding and Pipeline Effects

Under the Hood

When BL executes, the processor calculates the target address by adding a signed offset to the current PC. It then stores the return address (PC + 4 or PC + 8 depending on mode) into the link register (LR). The PC is updated to the target address, causing the processor to fetch instructions from the subroutine. Upon return, the LR is loaded back into the PC to resume execution. Internally, this involves pipeline flushing and register updates to maintain correct flow.

Why designed this way?

BL was designed to efficiently support subroutine calls with minimal instructions and hardware overhead. Using a dedicated link register avoids the need to push return addresses onto the stack for simple calls, speeding up execution. The relative addressing allows position-independent code, which is important for flexible memory layouts. Alternatives like pushing PC to stack were slower and more complex.

┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ Current PC    │─────▶│ Calculate     │─────▶│ Save return   │
│ (Instruction) │      │ Target Addr   │      │ address in LR │
└──────┬────────┘      └──────┬────────┘      └──────┬────────┘
       │                      │                      │
       │                      │                      │
       │                      ▼                      ▼
       │               ┌───────────────┐      ┌───────────────┐
       │               │ Update PC to  │◀─────│ Return from   │
       │               │ Target Addr   │      │ Subroutine    │
       │               └───────────────┘      └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does BL automatically save all registers before calling a subroutine? Commit to yes or no.

Common Belief:BL saves all the processor registers automatically before jumping to the subroutine.

Tap to reveal reality

Quick: Can BL jump to any address in memory? Commit to yes or no.

Common Belief:BL can jump to any address in memory regardless of distance.

Tap to reveal reality

Quick: Does the link register (LR) hold the return address permanently? Commit to yes or no.

Common Belief:LR holds the return address permanently until the program ends or is reset.

Tap to reveal reality

Quick: Is the return from subroutine always done by a special 'return' instruction? Commit to yes or no.

Common Belief:ARM processors have a dedicated 'return' instruction to come back from subroutines.

Tap to reveal reality

Expert Zone

1

BL uses relative addressing which enables position-independent code, crucial for shared libraries and embedded systems.

2

The link register (LR) is a general-purpose register and can be used for other purposes if the program carefully saves and restores it.

3

Some ARM cores implement branch prediction and pipeline optimizations that affect BL performance, which experts must consider in real-time systems.

When NOT to use

BL is not suitable for very long jumps beyond its offset range; in such cases, indirect branching via registers or other instructions like BLX with register targets should be used. Also, for complex nested calls, managing LR manually or using stack-based call/return sequences is necessary.

Production Patterns

In production ARM code, BL is used for fast subroutine calls with minimal overhead. Compilers generate BL for function calls and manage LR saving/restoring on the stack for nested calls. Hand-written assembly often uses BL combined with stack operations to handle recursion and interrupts.

Connections

Function Calls in High-Level Languages

BL is the low-level hardware equivalent of function calls in languages like C or Python.

Understanding BL helps demystify how high-level function calls translate into machine instructions and how return addresses are managed.

Stack and Call Stack

BL relies on the link register for return addresses, but complex programs use the stack to save LR and other registers during nested calls.

Knowing BL's limitations clarifies why the call stack is essential for managing multiple nested function calls safely.

Human Memory and Task Switching

BL's saving and restoring of return addresses is similar to how humans remember where they left off when switching tasks.

This cross-domain connection shows how managing context and return points is a universal concept in both computing and human cognition.

Common Pitfalls

#1Overwriting LR without saving before nested calls

Wrong approach:BL subroutine1 BL subroutine2 MOV PC, LR

Correct approach:PUSH {LR} BL subroutine1 BL subroutine2 POP {LR} MOV PC, LR

Root cause:Assuming LR is preserved automatically leads to losing the original return address during nested calls.

#2Using BL to jump beyond its range

Wrong approach:BL far_away_function ; target too far for BL offset

Correct approach:LDR R12, =far_away_function BLX R12

Root cause:Not understanding BL's limited relative offset range causes incorrect jumps.

#3Returning from subroutine without restoring PC from LR

Wrong approach:BL subroutine ; no return instruction ; program continues incorrectly

Correct approach:BL subroutine MOV PC, LR

Root cause:Forgetting to load LR back into PC prevents returning to the caller.

Key Takeaways

Branch and link (BL) is an ARM instruction that calls subroutines by jumping to their address and saving the return address in the link register (LR).

The link register holds the return address temporarily and must be saved manually if subroutines call other subroutines.

Returning from a subroutine is done by moving the LR back into the program counter (PC), resuming execution after the call.

BL uses relative addressing with limited range and different encodings in ARM and Thumb modes, affecting how far it can jump.

Understanding BL's mechanism and limitations is essential for writing correct and efficient ARM assembly, especially in nested or recursive function calls.