0
0
DSA Pythonprogramming~15 mins

Remove Duplicates from Sorted Array Two Pointer in DSA Python - Deep Dive

Choose your learning style9 modes available
Overview - Remove Duplicates from Sorted Array Two Pointer
What is it?
Removing duplicates from a sorted array means changing the array so that each number appears only once. Since the array is sorted, duplicates appear next to each other. The two-pointer technique uses two markers to scan and update the array efficiently without extra space. This method changes the array in place and returns the new length of the unique elements.
Why it matters
Without this technique, removing duplicates would require extra space or multiple passes, making it slower and less memory efficient. In real life, when you want to clean up sorted data like contact lists or logs, this method helps do it quickly and neatly. It saves memory and time, which is important for large data sets or limited devices.
Where it fits
Before learning this, you should understand arrays and basic loops. After this, you can learn more complex array problems, sliding window techniques, or linked list duplicate removals.
Mental Model
Core Idea
Use two pointers to track unique elements and overwrite duplicates in a sorted array, keeping only one copy of each number.
Think of it like...
Imagine walking through a line of people sorted by height, and you want to keep only one person of each height. One hand points to the last unique person you kept, and the other hand scans ahead to find the next new height to keep.
Array: [1, 1, 2, 3, 3, 4]

Pointers:
  i -> last unique index
  j -> current scanning index

Start:
 i=0 (points to 1), j=1 (points to 1)

Process:
 j moves forward, when nums[j] != nums[i], i++ and nums[i] = nums[j]

Result:
 [1, 2, 3, 4, 3, 4]
 i=3 (index of last unique)

Unique elements: nums[0..i] = [1, 2, 3, 4]
Build-Up - 7 Steps
1
FoundationUnderstanding Sorted Arrays
🤔
Concept: Learn what a sorted array is and why duplicates appear consecutively.
A sorted array is a list of numbers arranged from smallest to largest. Because of this order, if a number repeats, all its copies are next to each other. For example, in [1, 1, 2, 3, 3], the duplicates 1 and 3 are side by side.
Result
Duplicates are grouped together, making it easier to find and remove them.
Knowing duplicates are consecutive lets us scan the array once without jumping around.
2
FoundationWhat Does Removing Duplicates Mean?
🤔
Concept: Removing duplicates means keeping only one copy of each number in the array.
If the array is [1, 1, 2, 3, 3], after removing duplicates it becomes [1, 2, 3]. We want to do this without creating a new array, changing the original one instead.
Result
The array's first part holds unique numbers, and the rest can be ignored.
Changing the array in place saves memory and is faster for large data.
3
IntermediateIntroducing Two Pointers Technique
🤔
Concept: Use two pointers to track unique elements and scan the array simultaneously.
One pointer (i) marks the position of the last unique number found. The other pointer (j) scans through the array. When nums[j] is different from nums[i], we move i forward and copy nums[j] there. This way, unique numbers accumulate at the start.
Result
Unique numbers are collected at the front of the array without extra space.
Two pointers let us compare and overwrite duplicates efficiently in one pass.
4
IntermediateStep-by-Step Dry Run Example
🤔Before reading on: do you think the array changes immediately when a duplicate is found, or only when a new unique number appears? Commit to your answer.
Concept: Walk through the algorithm with a sample array to see how pointers move and update the array.
Array: [0,0,1,1,1,2,2,3,3,4] Start: i=0, j=1 - j=1: nums[j]=0 equals nums[i]=0, skip - j=2: nums[j]=1 != nums[i]=0, i=1, nums[i]=1 - j=3: nums[j]=1 == nums[i]=1, skip - j=4: nums[j]=1 == nums[i]=1, skip - j=5: nums[j]=2 != nums[i]=1, i=2, nums[i]=2 - j=6: nums[j]=2 == nums[i]=2, skip - j=7: nums[j]=3 != nums[i]=2, i=3, nums[i]=3 - j=8: nums[j]=3 == nums[i]=3, skip - j=9: nums[j]=4 != nums[i]=3, i=4, nums[i]=4 Final array front: [0,1,2,3,4] New length: i+1 = 5
Result
Array front holds unique elements: [0,1,2,3,4], length 5
Seeing the pointers move clarifies how duplicates are skipped and unique values collected.
5
IntermediateWriting the Python Code
🤔
Concept: Translate the two-pointer logic into clean, runnable Python code.
def remove_duplicates(nums: list[int]) -> int: if not nums: return 0 i = 0 for j in range(1, len(nums)): if nums[j] != nums[i]: i += 1 nums[i] = nums[j] return i + 1 # Example usage: arr = [1,1,2,3,3] length = remove_duplicates(arr) print(arr[:length])
Result
[1, 2, 3]
Writing code solidifies understanding and shows how simple the approach is in practice.
6
AdvancedWhy This Method Is Efficient
🤔Before reading on: do you think this method uses extra memory or modifies the array in place? Commit to your answer.
Concept: Understand the time and space efficiency of the two-pointer approach.
This method scans the array once (O(n) time) and uses only two variables (O(1) space). It modifies the array in place, so no extra array is needed. This is better than creating a new array or nested loops that increase time.
Result
Fast and memory-efficient duplicate removal.
Knowing the efficiency helps choose this method for large data or memory-limited environments.
7
ExpertHandling Edge Cases and Variations
🤔Before reading on: do you think this method works if the array is empty or has one element? Commit to your answer.
Concept: Explore how the method behaves with empty arrays, single elements, or all duplicates, and how to adapt it for unsorted arrays.
If the array is empty, return 0 immediately. If it has one element, return 1. If all elements are duplicates, the method returns 1. For unsorted arrays, this method fails because duplicates are not consecutive; sorting first or using a set is needed. Variations include removing duplicates from linked lists or counting unique elements without modifying the array.
Result
Robust handling of all input cases and understanding method limits.
Anticipating edge cases prevents bugs and knowing limits guides correct method choice.
Under the Hood
The two-pointer method uses one pointer to track the position of the last unique element found (i), and another pointer (j) to scan through the array. When nums[j] differs from nums[i], it means a new unique element is found. The algorithm increments i and copies nums[j] to nums[i], effectively overwriting duplicates. This happens in a single pass, modifying the array in place without extra memory allocation.
Why designed this way?
This method was designed to optimize both time and space. Before, removing duplicates often required extra arrays or multiple passes. Sorting arrays made duplicates consecutive, enabling a linear scan. The two-pointer approach leverages this property to overwrite duplicates efficiently. Alternatives like hash sets use extra space, which is costly for large data. This design balances simplicity, speed, and memory use.
Input Array: [1, 1, 2, 3, 3, 4]

Pointers:
 i -> last unique index
 j -> scanning index

Start:
 i=0 (points to 1)
 j=1 (points to 1)

Process:
 j moves forward
 ├─ if nums[j] == nums[i], skip
 └─ if nums[j] != nums[i], i++, nums[i] = nums[j]

Final:
 Array front: [1, 2, 3, 4]
 i=3 (last unique index)

Return i+1 = 4
Myth Busters - 4 Common Misconceptions
Quick: Does this method work correctly on unsorted arrays? Commit yes or no before reading on.
Common Belief:This two-pointer method removes duplicates from any array, sorted or not.
Tap to reveal reality
Reality:It only works correctly on sorted arrays because duplicates must be consecutive to detect them by comparing neighbors.
Why it matters:Using it on unsorted arrays leads to incorrect results, leaving duplicates or removing unique elements wrongly.
Quick: Does this method create a new array to store unique elements? Commit yes or no before reading on.
Common Belief:The method creates a new array to hold unique elements.
Tap to reveal reality
Reality:It modifies the original array in place without extra arrays, saving memory.
Why it matters:Thinking it uses extra space may discourage using this efficient method in memory-sensitive situations.
Quick: After running the method, is the entire array guaranteed to have only unique elements? Commit yes or no before reading on.
Common Belief:The entire array after the method contains only unique elements.
Tap to reveal reality
Reality:Only the first part of the array up to the returned length contains unique elements; the rest remains unchanged and may have duplicates.
Why it matters:Misunderstanding this can cause bugs if code uses the whole array without considering the new length.
Quick: Does the method always return the new length as the last index of the array? Commit yes or no before reading on.
Common Belief:The new length returned is the last index of the array.
Tap to reveal reality
Reality:The method returns the count of unique elements, which is last unique index plus one, not the last index of the original array.
Why it matters:Confusing length with index can cause off-by-one errors in further processing.
Expert Zone
1
The method relies on the array being sorted; if the array is nearly sorted, small modifications can optimize performance further.
2
Overwriting duplicates in place means the tail of the array still holds old values; understanding this helps avoid bugs when using the array after removal.
3
The two-pointer approach can be adapted to remove duplicates from linked lists or other linear data structures with minor changes.
When NOT to use
Do not use this method on unsorted arrays; instead, use hash sets or sorting first. Also avoid if you need to preserve the original array order without modification; in that case, create a new array for unique elements.
Production Patterns
This method is widely used in coding interviews and production systems where memory efficiency is critical, such as embedded systems or mobile apps. It is also a building block for more complex algorithms like merging sorted lists or cleaning data streams.
Connections
Sliding Window Technique
Both use multiple pointers to scan arrays efficiently.
Understanding two-pointer methods helps grasp sliding windows, which track ranges instead of unique elements.
Hash Set for Duplicate Removal
Alternative approach to remove duplicates using extra memory.
Knowing the two-pointer method clarifies tradeoffs between time, space, and input requirements.
Data Deduplication in Storage Systems
Both remove repeated data to save space.
Understanding array duplicate removal helps appreciate how storage systems optimize disk usage by identifying and removing repeated data blocks.
Common Pitfalls
#1Trying to remove duplicates from an unsorted array using this method.
Wrong approach:def remove_duplicates(nums): i = 0 for j in range(1, len(nums)): if nums[j] != nums[i]: i += 1 nums[i] = nums[j] return i + 1 arr = [3,1,2,3,1] length = remove_duplicates(arr) print(arr[:length])
Correct approach:arr.sort() length = remove_duplicates(arr) print(arr[:length])
Root cause:Misunderstanding that the method requires sorted arrays to work correctly.
#2Using the returned length as an index without adjusting for zero-based indexing.
Wrong approach:length = remove_duplicates(arr) print(arr[length]) # Incorrect: index out of range or wrong element
Correct approach:length = remove_duplicates(arr) print(arr[length - 1]) # Correct: last unique element
Root cause:Confusing length (count) with last index (length - 1).
#3Assuming the entire array after removal contains only unique elements.
Wrong approach:length = remove_duplicates(arr) for num in arr: print(num) # Prints duplicates beyond length
Correct approach:length = remove_duplicates(arr) for num in arr[:length]: print(num) # Prints only unique elements
Root cause:Not using the returned length to limit array access.
Key Takeaways
Removing duplicates from a sorted array is efficient because duplicates are consecutive.
The two-pointer technique uses one pointer to track unique elements and another to scan, modifying the array in place.
This method runs in linear time and constant space, making it ideal for large data sets.
It only works correctly on sorted arrays; unsorted arrays need sorting or different methods.
Always use the returned length to access the unique elements portion of the array.