0
0
ARM Architectureknowledge~15 mins

Branch and link (BL) for subroutines in ARM Architecture - Deep Dive

Choose your learning style9 modes available
Overview - Branch and link (BL) for subroutines
What is it?
Branch and link (BL) is an instruction in ARM processors used to call subroutines or functions. It saves the address of the next instruction so the program can return after the subroutine finishes. This allows the processor to jump to a different part of the code and come back later. It is essential for organizing code into reusable blocks.
Why it matters
Without BL, programs would have to repeat code instead of reusing it, making them larger and harder to maintain. BL enables efficient function calls and returns, which are fundamental for structured programming and complex software. It also allows the processor to keep track of where to return after a subroutine, preventing errors and crashes.
Where it fits
Before learning BL, you should understand basic ARM instructions and how the program counter (PC) works. After mastering BL, you can learn about stack usage for saving registers during subroutine calls and advanced control flow instructions like conditional branches and interrupts.
Mental Model
Core Idea
Branch and link lets the processor jump to a subroutine while remembering where to return by saving the return address in a special register.
Think of it like...
Imagine you are reading a book and come across a footnote. You mark the page you are on before flipping to the footnote, so after reading it, you can return exactly where you left off.
┌───────────────┐     ┌───────────────┐
│ Current Code  │────▶│ Subroutine    │
│ (PC points)   │     │ (BL jumps)    │
└──────┬────────┘     └──────┬────────┘
       │                      │
       │ Save return address  │
       │ in Link Register (LR)│
       │                      │
       ◀──────────────────────┘
       Return after subroutine finishes
Build-Up - 7 Steps
1
FoundationUnderstanding the Program Counter
🤔
Concept: Introduce the program counter (PC) as the register that holds the address of the next instruction to execute.
The program counter (PC) in ARM processors points to the current instruction being executed. After each instruction, the PC moves to the next instruction automatically. This keeps the program running in sequence.
Result
You know that the PC controls the flow of instructions and moves forward step-by-step.
Understanding the PC is crucial because BL changes the PC to jump to subroutines and must save the return address to continue execution later.
2
FoundationWhat is a Subroutine?
🤔
Concept: Explain subroutines as reusable blocks of code that perform specific tasks and can be called from different places.
A subroutine is a set of instructions designed to perform a particular job. Instead of writing the same code multiple times, you write it once as a subroutine and call it whenever needed. This makes programs shorter and easier to manage.
Result
You understand that subroutines help organize code and avoid repetition.
Knowing what subroutines are helps you see why the processor needs a way to jump to them and return back, which BL provides.
3
IntermediateHow Branch and Link Works
🤔Before reading on: do you think BL saves the return address automatically or do you have to save it manually? Commit to your answer.
Concept: BL instruction jumps to a subroutine and automatically saves the return address in the link register (LR).
When the processor executes BL, it stores the address of the next instruction (after BL) into the link register (LR). Then it sets the PC to the subroutine's address, effectively jumping there. After the subroutine finishes, the program can use LR to return to the saved address.
Result
The processor jumps to the subroutine and remembers where to come back without extra instructions.
Knowing that BL automatically saves the return address in LR simplifies subroutine calls and prevents errors from manual saving.
4
IntermediateReturning from Subroutines Using LR
🤔Before reading on: do you think the processor returns from a subroutine by using the PC or the LR? Commit to your answer.
Concept: The link register (LR) holds the return address, which is loaded back into the PC to continue execution after the subroutine.
After the subroutine completes, the program executes a return instruction (usually MOV PC, LR) that copies the LR value back into the PC. This causes the processor to jump back to the instruction following the original BL call.
Result
The program resumes exactly where it left off before the subroutine call.
Understanding the LR's role in returning ensures you grasp how control flow is maintained across subroutine calls.
5
IntermediateLimitations of Single Link Register
🤔Before reading on: can BL handle nested subroutine calls without extra steps? Commit to your answer.
Concept: Since LR holds only one return address, nested or recursive calls require saving LR elsewhere to avoid overwriting it.
If a subroutine calls another subroutine, the LR value will be overwritten by the new BL instruction. To handle this, the program must save LR on the stack or another register before calling another subroutine, then restore it before returning.
Result
You learn that managing LR is essential for nested or recursive subroutine calls.
Knowing LR's single-slot nature explains why stack usage is critical in real programs for preserving return addresses.
6
AdvancedBL in Thumb and ARM Modes
🤔Before reading on: do you think BL works the same in ARM and Thumb instruction sets? Commit to your answer.
Concept: BL instruction exists in both ARM and Thumb modes but differs in encoding and range due to instruction size differences.
ARM mode uses 32-bit instructions, allowing BL to jump within ±32MB range. Thumb mode uses 16-bit instructions, so BL is encoded as two 16-bit instructions to cover a smaller range. This affects how far subroutines can be located and how BL is assembled.
Result
You understand that BL adapts to different ARM instruction sets with different encodings and ranges.
Recognizing BL's mode-dependent behavior helps in writing and debugging assembly for different ARM architectures.
7
ExpertBL Instruction Encoding and Pipeline Effects
🤔Before reading on: does BL affect the processor pipeline or cause delays? Commit to your answer.
Concept: BL encoding uses relative addressing and affects the instruction pipeline, causing a delay slot or pipeline flush in some ARM cores.
BL encodes the target address as an offset relative to the current PC. When executed, the processor must flush or stall the pipeline to fetch instructions from the new address. This can cause a small delay known as branch delay or pipeline bubble. Some ARM cores optimize this, but understanding it is key for performance tuning.
Result
You learn that BL impacts processor performance due to pipeline changes during branching.
Knowing pipeline effects of BL helps optimize critical code sections and understand timing in embedded systems.
Under the Hood
When BL executes, the processor calculates the target address by adding a signed offset to the current PC. It then stores the return address (PC + 4 or PC + 8 depending on mode) into the link register (LR). The PC is updated to the target address, causing the processor to fetch instructions from the subroutine. Upon return, the LR is loaded back into the PC to resume execution. Internally, this involves pipeline flushing and register updates to maintain correct flow.
Why designed this way?
BL was designed to efficiently support subroutine calls with minimal instructions and hardware overhead. Using a dedicated link register avoids the need to push return addresses onto the stack for simple calls, speeding up execution. The relative addressing allows position-independent code, which is important for flexible memory layouts. Alternatives like pushing PC to stack were slower and more complex.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ Current PC    │─────▶│ Calculate     │─────▶│ Save return   │
│ (Instruction) │      │ Target Addr   │      │ address in LR │
└──────┬────────┘      └──────┬────────┘      └──────┬────────┘
       │                      │                      │
       │                      │                      │
       │                      ▼                      ▼
       │               ┌───────────────┐      ┌───────────────┐
       │               │ Update PC to  │◀─────│ Return from   │
       │               │ Target Addr   │      │ Subroutine    │
       │               └───────────────┘      └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does BL automatically save all registers before calling a subroutine? Commit to yes or no.
Common Belief:BL saves all the processor registers automatically before jumping to the subroutine.
Tap to reveal reality
Reality:BL only saves the return address in the link register (LR). It does not save any other registers; the programmer must save and restore them if needed.
Why it matters:Assuming BL saves all registers can cause bugs where important data is overwritten during subroutine calls, leading to unpredictable behavior.
Quick: Can BL jump to any address in memory? Commit to yes or no.
Common Belief:BL can jump to any address in memory regardless of distance.
Tap to reveal reality
Reality:BL uses a relative offset with limited range (±32MB in ARM mode), so it cannot jump to addresses outside this range directly.
Why it matters:Trying to call distant subroutines without proper handling can cause incorrect jumps or crashes.
Quick: Does the link register (LR) hold the return address permanently? Commit to yes or no.
Common Belief:LR holds the return address permanently until the program ends or is reset.
Tap to reveal reality
Reality:LR is overwritten by each BL instruction; it only holds the return address temporarily and must be saved if nested calls occur.
Why it matters:Not saving LR before nested calls leads to losing return addresses and program crashes.
Quick: Is the return from subroutine always done by a special 'return' instruction? Commit to yes or no.
Common Belief:ARM processors have a dedicated 'return' instruction to come back from subroutines.
Tap to reveal reality
Reality:Return is usually done by moving LR back to PC (e.g., MOV PC, LR), not a special return instruction.
Why it matters:Misunderstanding this can confuse learners about how control flow returns and how to write correct assembly.
Expert Zone
1
BL uses relative addressing which enables position-independent code, crucial for shared libraries and embedded systems.
2
The link register (LR) is a general-purpose register and can be used for other purposes if the program carefully saves and restores it.
3
Some ARM cores implement branch prediction and pipeline optimizations that affect BL performance, which experts must consider in real-time systems.
When NOT to use
BL is not suitable for very long jumps beyond its offset range; in such cases, indirect branching via registers or other instructions like BLX with register targets should be used. Also, for complex nested calls, managing LR manually or using stack-based call/return sequences is necessary.
Production Patterns
In production ARM code, BL is used for fast subroutine calls with minimal overhead. Compilers generate BL for function calls and manage LR saving/restoring on the stack for nested calls. Hand-written assembly often uses BL combined with stack operations to handle recursion and interrupts.
Connections
Function Calls in High-Level Languages
BL is the low-level hardware equivalent of function calls in languages like C or Python.
Understanding BL helps demystify how high-level function calls translate into machine instructions and how return addresses are managed.
Stack and Call Stack
BL relies on the link register for return addresses, but complex programs use the stack to save LR and other registers during nested calls.
Knowing BL's limitations clarifies why the call stack is essential for managing multiple nested function calls safely.
Human Memory and Task Switching
BL's saving and restoring of return addresses is similar to how humans remember where they left off when switching tasks.
This cross-domain connection shows how managing context and return points is a universal concept in both computing and human cognition.
Common Pitfalls
#1Overwriting LR without saving before nested calls
Wrong approach:BL subroutine1 BL subroutine2 MOV PC, LR
Correct approach:PUSH {LR} BL subroutine1 BL subroutine2 POP {LR} MOV PC, LR
Root cause:Assuming LR is preserved automatically leads to losing the original return address during nested calls.
#2Using BL to jump beyond its range
Wrong approach:BL far_away_function ; target too far for BL offset
Correct approach:LDR R12, =far_away_function BLX R12
Root cause:Not understanding BL's limited relative offset range causes incorrect jumps.
#3Returning from subroutine without restoring PC from LR
Wrong approach:BL subroutine ; no return instruction ; program continues incorrectly
Correct approach:BL subroutine MOV PC, LR
Root cause:Forgetting to load LR back into PC prevents returning to the caller.
Key Takeaways
Branch and link (BL) is an ARM instruction that calls subroutines by jumping to their address and saving the return address in the link register (LR).
The link register holds the return address temporarily and must be saved manually if subroutines call other subroutines.
Returning from a subroutine is done by moving the LR back into the program counter (PC), resuming execution after the call.
BL uses relative addressing with limited range and different encodings in ARM and Thumb modes, affecting how far it can jump.
Understanding BL's mechanism and limitations is essential for writing correct and efficient ARM assembly, especially in nested or recursive function calls.