Overview - Remove Duplicates from Sorted Array Two Pointer

What is it?

Removing duplicates from a sorted array means changing the array so that each number appears only once. Since the array is sorted, duplicates are always next to each other. The two-pointer technique uses two indexes to scan and rewrite the array in place without extra space. This method keeps the unique numbers at the start of the array and returns the new length.

Why it matters

Without removing duplicates, data can be messy and cause wrong results in programs. For example, counting unique items or searching becomes inefficient. This technique helps save memory and time by modifying the array directly. It is a common step in many real-world tasks like cleaning data or preparing lists for fast searching.

Where it fits

Before learning this, you should understand arrays and basic loops. After this, you can learn about more complex array problems, like merging sorted arrays or sliding window techniques. This also builds a foundation for understanding in-place algorithms and space optimization.

Mental Model

Core Idea

Use two pointers to track unique elements and overwrite duplicates in a sorted array, keeping only one copy of each number.

Think of it like...

Imagine you have a line of identical books sorted by title. You walk along with two fingers: one finger marks where to place the next unique book, and the other finger scans through the line. When you find a new title, you move it next to the last unique book found.

Index:  0   1   2   3   4   5   6
Array: [1,  1,  2,  2,  3,  4,  4]

Two pointers:
  i -> position to place next unique number
  j -> scanning through array

Step by step:
 i=0, j=1: arr[j]=1 same as arr[i]=1, skip
 i=0, j=2: arr[j]=2 different, i=1, arr[i]=2
 i=1, j=3: arr[j]=2 same, skip
 i=1, j=4: arr[j]=3 different, i=2, arr[i]=3
 i=2, j=5: arr[j]=4 different, i=3, arr[i]=4
 i=3, j=6: arr[j]=4 same, skip

Result array front: [1, 2, 3, 4]

Build-Up - 7 Steps

1

FoundationUnderstanding Sorted Arrays

Concept: Learn what a sorted array is and why duplicates appear next to each other.

A sorted array is a list of numbers arranged from smallest to largest. Because of this order, if a number repeats, all copies are right next to each other. For example, in [1,1,2,2,3], the duplicates 1 and 2 are side by side.

Result

Knowing that duplicates are neighbors helps us find and remove them easily.

Understanding the sorted property is key because it allows us to detect duplicates by comparing neighbors only.

2

FoundationBasics of Two Pointer Technique

3

IntermediateApplying Two Pointers to Remove Duplicates

4

IntermediateIn-place Modification Without Extra Space

5

IntermediateHandling Edge Cases in Two Pointer Method

6

AdvancedCode Implementation in C with Annotations

7

ExpertPerformance and Limitations of Two Pointer Approach

Under the Hood

The two-pointer method uses one pointer (j) to scan each element and another pointer (i) to track the position of the last unique element. When a new unique element is found at j, i is incremented and the element at j is copied to i. This overwrites duplicates and compacts unique elements at the start. The array is modified in place, so no extra memory is used. The final count is i+1 because i is zero-based.

Why designed this way?

This method was designed to efficiently remove duplicates without extra memory, leveraging the sorted property to detect duplicates by simple neighbor comparison. Alternatives like using hash sets require extra space. Sorting first then removing duplicates is slower. The two-pointer approach balances speed and memory use, ideal for large sorted datasets.

Start:
 nums: [1, 1, 2, 2, 3, 4, 4]
 i=0 (write), j=1 (scan)

Loop:
 j=1: nums[j]=1 == nums[i]=1 -> skip
 j=2: nums[j]=2 != nums[i]=1 -> i=1, nums[i]=2
 j=3: nums[j]=2 == nums[i]=2 -> skip
 j=4: nums[j]=3 != nums[i]=2 -> i=2, nums[i]=3
 j=5: nums[j]=4 != nums[i]=3 -> i=3, nums[i]=4
 j=6: nums[j]=4 == nums[i]=4 -> skip

End:
 Unique elements at nums[0..i]: [1, 2, 3, 4]

Myth Busters - 3 Common Misconceptions

Quick: Does this method work on unsorted arrays without changes? Commit yes or no.

Common Belief:This two-pointer method removes duplicates from any array, sorted or not.

Tap to reveal reality

Quick: Does this method create a new array to store unique elements? Commit yes or no.

Common Belief:The method creates a new array to hold unique elements, so the original array stays unchanged.

Tap to reveal reality

Quick: After running, does the array length automatically change? Commit yes or no.

Common Belief:The array length changes automatically after removing duplicates.

Tap to reveal reality

Expert Zone

1

The write pointer i only moves when a new unique element is found, preventing unnecessary writes and improving cache efficiency.

2

This method assumes stable memory and no concurrent modifications; in multithreaded contexts, synchronization is needed.

3

The approach can be adapted to remove duplicates allowing up to k occurrences by adjusting the comparison logic.

When NOT to use

Do not use this method on unsorted arrays or when the original array must remain unchanged. Instead, use hash sets for unsorted arrays or copy the array before modification. For allowing duplicates up to a limit, use modified two-pointer logic or frequency counting.

Production Patterns

In production, this method is used in database query optimizations to remove duplicate sorted results, in data cleaning pipelines to compact sorted logs, and in embedded systems where memory is limited and in-place operations are critical.

Connections

Sliding Window Technique

Both use multiple pointers to scan arrays efficiently.

Understanding two pointers here helps grasp sliding windows, which also move pointers to track subarrays dynamically.

Run-Length Encoding (RLE)

Both exploit consecutive duplicates in sorted data.

Knowing how duplicates cluster in sorted arrays aids understanding RLE compression that counts repeated elements.

Data Deduplication in Storage Systems

The concept of removing duplicates in arrays parallels removing duplicate data blocks in storage.

Recognizing this connection shows how simple array techniques scale to complex systems saving space and improving efficiency.

Common Pitfalls

#1Trying to remove duplicates from an unsorted array using this method.

Wrong approach:int removeDuplicates(int* nums, int numsSize) { int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; } // Called with nums = [3,1,2,3,2], unsorted array

Correct approach:// First sort the array, then remove duplicates qsort(nums, numsSize, sizeof(int), compare); int removeDuplicates(int* nums, int numsSize) { if (numsSize == 0) return 0; int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; }

Root cause:Misunderstanding that the method requires sorted input to detect duplicates by neighbor comparison.

#2Assuming the array length changes automatically after removal.

Wrong approach:int newLength = removeDuplicates(arr, size); for (int i = 0; i < size; i++) { printf("%d ", arr[i]); } // Prints entire original array including duplicates

Correct approach:int newLength = removeDuplicates(arr, size); for (int i = 0; i < newLength; i++) { printf("%d ", arr[i]); } // Prints only unique elements

Root cause:Not using the returned new length to limit access to valid unique elements.

#3Creating a new array inside the function to store unique elements.

Wrong approach:int* removeDuplicates(int* nums, int numsSize, int* returnSize) { int* result = malloc(numsSize * sizeof(int)); int i = 0; for (int j = 0; j < numsSize; j++) { if (j == 0 || nums[j] != nums[j-1]) { result[i++] = nums[j]; } } *returnSize = i; return result; }

Correct approach:int removeDuplicates(int* nums, int numsSize) { if (numsSize == 0) return 0; int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; }

Root cause:Not realizing the problem requires in-place modification without extra memory.

Key Takeaways

Removing duplicates from a sorted array is efficient because duplicates are adjacent, allowing simple neighbor comparison.

The two-pointer technique uses one pointer to scan and another to write unique elements, modifying the array in place.

This method runs in linear time and constant space, making it optimal for large sorted arrays.

It only works on sorted arrays and modifies the original array, so input conditions and side effects must be considered.

Understanding this technique builds a foundation for many array problems and teaches in-place data manipulation.