Overview - Remove Duplicates from Sorted Array Two Pointer

What is it?

Removing duplicates from a sorted array means changing the array so that each number appears only once. Since the array is sorted, duplicates appear next to each other. The two-pointer technique uses two markers to scan and update the array efficiently without extra space. This method changes the array in place and returns the new length of the unique elements.

Why it matters

Without this technique, removing duplicates would require extra space or multiple passes, making it slower and less memory efficient. In real life, when you want to clean up sorted data like contact lists or logs, this method helps do it quickly and neatly. It saves memory and time, which is important for large data sets or limited devices.

Where it fits

Before learning this, you should understand arrays and basic loops. After this, you can learn more complex array problems, sliding window techniques, or linked list duplicate removals.

Mental Model

Core Idea

Use two pointers to track unique elements and overwrite duplicates in a sorted array, keeping only one copy of each number.

Think of it like...

Imagine walking through a line of people sorted by height, and you want to keep only one person of each height. One hand points to the last unique person you kept, and the other hand scans ahead to find the next new height to keep.

Array: [1, 1, 2, 3, 3, 4]

Pointers:
  i -> last unique index
  j -> current scanning index

Start:
 i=0 (points to 1), j=1 (points to 1)

Process:
 j moves forward, when nums[j] != nums[i], i++ and nums[i] = nums[j]

Result:
 [1, 2, 3, 4, 3, 4]
 i=3 (index of last unique)

Unique elements: nums[0..i] = [1, 2, 3, 4]

Build-Up - 7 Steps

1

FoundationUnderstanding Sorted Arrays

Concept: Learn what a sorted array is and why duplicates appear consecutively.

A sorted array is a list of numbers arranged from smallest to largest. Because of this order, if a number repeats, all its copies are next to each other. For example, in [1, 1, 2, 3, 3], the duplicates 1 and 3 are side by side.

Result

Duplicates are grouped together, making it easier to find and remove them.

Knowing duplicates are consecutive lets us scan the array once without jumping around.

2

FoundationWhat Does Removing Duplicates Mean?

3

IntermediateIntroducing Two Pointers Technique

4

IntermediateStep-by-Step Dry Run Example

5

IntermediateWriting the Python Code

6

AdvancedWhy This Method Is Efficient

7

ExpertHandling Edge Cases and Variations

Under the Hood

The two-pointer method uses one pointer to track the position of the last unique element found (i), and another pointer (j) to scan through the array. When nums[j] differs from nums[i], it means a new unique element is found. The algorithm increments i and copies nums[j] to nums[i], effectively overwriting duplicates. This happens in a single pass, modifying the array in place without extra memory allocation.

Why designed this way?

This method was designed to optimize both time and space. Before, removing duplicates often required extra arrays or multiple passes. Sorting arrays made duplicates consecutive, enabling a linear scan. The two-pointer approach leverages this property to overwrite duplicates efficiently. Alternatives like hash sets use extra space, which is costly for large data. This design balances simplicity, speed, and memory use.

Input Array: [1, 1, 2, 3, 3, 4]

Pointers:
 i -> last unique index
 j -> scanning index

Start:
 i=0 (points to 1)
 j=1 (points to 1)

Process:
 j moves forward
 ├─ if nums[j] == nums[i], skip
 └─ if nums[j] != nums[i], i++, nums[i] = nums[j]

Final:
 Array front: [1, 2, 3, 4]
 i=3 (last unique index)

Return i+1 = 4

Myth Busters - 4 Common Misconceptions

Quick: Does this method work correctly on unsorted arrays? Commit yes or no before reading on.

Common Belief:This two-pointer method removes duplicates from any array, sorted or not.

Tap to reveal reality

Quick: Does this method create a new array to store unique elements? Commit yes or no before reading on.

Common Belief:The method creates a new array to hold unique elements.

Tap to reveal reality

Quick: After running the method, is the entire array guaranteed to have only unique elements? Commit yes or no before reading on.

Common Belief:The entire array after the method contains only unique elements.

Tap to reveal reality

Quick: Does the method always return the new length as the last index of the array? Commit yes or no before reading on.

Common Belief:The new length returned is the last index of the array.

Tap to reveal reality

Expert Zone

1

The method relies on the array being sorted; if the array is nearly sorted, small modifications can optimize performance further.

2

Overwriting duplicates in place means the tail of the array still holds old values; understanding this helps avoid bugs when using the array after removal.

3

The two-pointer approach can be adapted to remove duplicates from linked lists or other linear data structures with minor changes.

When NOT to use

Do not use this method on unsorted arrays; instead, use hash sets or sorting first. Also avoid if you need to preserve the original array order without modification; in that case, create a new array for unique elements.

Production Patterns

This method is widely used in coding interviews and production systems where memory efficiency is critical, such as embedded systems or mobile apps. It is also a building block for more complex algorithms like merging sorted lists or cleaning data streams.

Connections

Sliding Window Technique

Both use multiple pointers to scan arrays efficiently.

Understanding two-pointer methods helps grasp sliding windows, which track ranges instead of unique elements.

Hash Set for Duplicate Removal

Alternative approach to remove duplicates using extra memory.

Knowing the two-pointer method clarifies tradeoffs between time, space, and input requirements.

Data Deduplication in Storage Systems

Both remove repeated data to save space.

Understanding array duplicate removal helps appreciate how storage systems optimize disk usage by identifying and removing repeated data blocks.

Common Pitfalls

#1Trying to remove duplicates from an unsorted array using this method.

Wrong approach:def remove_duplicates(nums): i = 0 for j in range(1, len(nums)): if nums[j] != nums[i]: i += 1 nums[i] = nums[j] return i + 1 arr = [3,1,2,3,1] length = remove_duplicates(arr) print(arr[:length])

Correct approach:arr.sort() length = remove_duplicates(arr) print(arr[:length])

Root cause:Misunderstanding that the method requires sorted arrays to work correctly.

#2Using the returned length as an index without adjusting for zero-based indexing.

Wrong approach:length = remove_duplicates(arr) print(arr[length]) # Incorrect: index out of range or wrong element

Correct approach:length = remove_duplicates(arr) print(arr[length - 1]) # Correct: last unique element

Root cause:Confusing length (count) with last index (length - 1).

#3Assuming the entire array after removal contains only unique elements.

Wrong approach:length = remove_duplicates(arr) for num in arr: print(num) # Prints duplicates beyond length

Correct approach:length = remove_duplicates(arr) for num in arr[:length]: print(num) # Prints only unique elements

Root cause:Not using the returned length to limit array access.

Key Takeaways

Removing duplicates from a sorted array is efficient because duplicates are consecutive.

The two-pointer technique uses one pointer to track unique elements and another to scan, modifying the array in place.

This method runs in linear time and constant space, making it ideal for large data sets.

It only works correctly on sorted arrays; unsorted arrays need sorting or different methods.

Always use the returned length to access the unique elements portion of the array.