Overview - Common subexpression elimination

What is it?

Common subexpression elimination (CSE) is a technique used in compilers to find and remove repeated calculations in a program. When the same expression is computed multiple times, CSE saves the result the first time and reuses it later instead of recalculating. This makes the program run faster and use less computing power. It works by analyzing the program's code to spot these repeated expressions.

Why it matters

Without CSE, programs waste time doing the same calculations over and over, which slows down execution and uses more energy. By eliminating repeated work, programs become more efficient, which is especially important in large software or systems with limited resources. This optimization helps software run faster and can reduce costs in computing environments.

Where it fits

Before learning CSE, you should understand basic compiler concepts like syntax trees and intermediate code representation. After mastering CSE, you can explore other compiler optimizations like dead code elimination and loop invariant code motion. CSE fits into the optimization phase of a compiler's workflow.

Mental Model

Core Idea

If a calculation is done once and its result doesn't change, reuse that result instead of recalculating it multiple times.

Think of it like...

Imagine you need to bake several cakes using the same frosting recipe. Instead of making the frosting fresh for each cake, you make it once and use it for all cakes. This saves time and effort.

Program Code
  │
  ▼
Detect repeated expressions
  │
  ▼
Calculate expression once
  │
  ▼
Replace repeats with stored result
  │
  ▼
Optimized Program

Build-Up - 7 Steps

1

FoundationUnderstanding expressions in code

Concept: Learn what expressions are and how they appear in programs.

Expressions are parts of code that compute values, like adding numbers or calling functions. For example, in 'a + b', 'a + b' is an expression. Programs often have many expressions, some of which might be the same.

Result

You can identify expressions in simple code snippets.

Knowing what expressions are is essential because CSE focuses on optimizing repeated expressions.

2

FoundationWhat makes expressions common

3

IntermediateDetecting common subexpressions

4

IntermediateReplacing expressions with stored results

5

IntermediateHandling side effects and variable changes

6

AdvancedGlobal vs local common subexpression elimination

7

ExpertChallenges and surprises in CSE implementation

Under the Hood

CSE works by analyzing the program's intermediate representation, usually a control flow graph, to find expressions computed multiple times with unchanged inputs. It uses data flow analysis to track where values are available and safe to reuse. When a common expression is found, the compiler inserts instructions to compute it once and stores the result in a temporary location. Later uses are replaced by this stored value, avoiding recomputation.

Why designed this way?

CSE was designed to improve program efficiency by reducing redundant work. Early compilers executed code literally as written, causing repeated calculations. By analyzing code structure and data dependencies, CSE balances correctness and optimization. Alternatives like recomputing every time are simpler but slower, while more aggressive optimizations risk changing program behavior.

┌─────────────────────────────┐
│       Source Code           │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Intermediate Representation │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Data Flow Analysis for CSE   │
│ - Identify repeated exprs   │
│ - Check variable stability  │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Insert temp vars & replace   │
│ repeated exprs with temps    │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│    Optimized Intermediate    │
│          Code                │
└─────────────────────────────┘

Myth Busters - 3 Common Misconceptions

Quick: Do you think all repeated expressions can be safely reused? Commit to yes or no.

Common Belief:If an expression appears multiple times, it can always be replaced by a stored result.

Tap to reveal reality

Quick: Does CSE always reduce the size of the compiled program? Commit to yes or no.

Common Belief:CSE always makes the program smaller because it removes repeated calculations.

Tap to reveal reality

Quick: Can CSE sometimes slow down a program? Commit to yes or no.

Common Belief:CSE always speeds up programs by avoiding repeated work.

Tap to reveal reality

Expert Zone

1

CSE effectiveness depends heavily on the target machine's architecture, especially register availability and cache behavior.

2

Global CSE requires careful handling of control flow to avoid incorrect reuse across different execution paths.

3

Some compilers combine CSE with other optimizations like constant propagation for greater effect.

When NOT to use

Avoid aggressive CSE in programs with many side-effecting operations or when targeting architectures with limited registers. Instead, use simpler local optimizations or profile-guided optimization to decide where CSE helps most.

Production Patterns

In real-world compilers, CSE is often applied selectively based on profiling data. It is combined with register allocation and instruction scheduling to balance speed and code size. Some compilers perform CSE after inlining functions to catch more opportunities.

Connections

Memoization

Both reuse previously computed results to save work.

Understanding CSE helps grasp memoization in programming, where function results are cached to avoid repeated calculations.

Data Flow Analysis

CSE relies on data flow analysis to determine where expressions are safe to reuse.

Knowing data flow analysis deepens understanding of how compilers optimize code safely.

Caching in Computer Systems

Both CSE and caching store results to avoid repeated work, improving efficiency.

Recognizing this connection shows how similar principles apply from low-level hardware to high-level software optimization.

Common Pitfalls

#1Reusing expressions that involve variables which change between uses.

Wrong approach:temp = a + b ... a = 5 ... use temp instead of a + b

Correct approach:temp = a + b ... use a + b again after a changes

Root cause:Misunderstanding that variable changes invalidate previously computed results.

#2Applying CSE to expressions with side effects like function calls that modify state.

Wrong approach:temp = print_and_return(x) ... use temp instead of calling print_and_return(x) again

Correct approach:Call print_and_return(x) each time to preserve side effects

Root cause:Ignoring that some expressions do more than just compute values.

#3Assuming CSE always reduces code size and applying it everywhere.

Wrong approach:Replace all repeated expressions with temps regardless of context.

Correct approach:Apply CSE selectively, considering code size and register pressure.

Root cause:Not recognizing the trade-offs between speed and code size.

Key Takeaways

Common subexpression elimination finds repeated calculations and reuses their results to make programs faster.

Not all repeated expressions can be safely reused; side effects and variable changes limit CSE's application.

CSE involves a trade-off between reducing computation time and potentially increasing code size or register usage.

Understanding data flow and program structure is essential to applying CSE correctly and effectively.

Advanced CSE techniques analyze the whole program and balance optimization benefits against resource costs.