Bird
0
0
DSA Cprogramming~15 mins

Remove Duplicates from Sorted Array Two Pointer in DSA C - Deep Dive

Choose your learning style9 modes available
Overview - Remove Duplicates from Sorted Array Two Pointer
What is it?
Removing duplicates from a sorted array means changing the array so that each number appears only once. Since the array is sorted, duplicates are always next to each other. The two-pointer technique uses two indexes to scan and rewrite the array in place without extra space. This method keeps the unique numbers at the start of the array and returns the new length.
Why it matters
Without removing duplicates, data can be messy and cause wrong results in programs. For example, counting unique items or searching becomes inefficient. This technique helps save memory and time by modifying the array directly. It is a common step in many real-world tasks like cleaning data or preparing lists for fast searching.
Where it fits
Before learning this, you should understand arrays and basic loops. After this, you can learn about more complex array problems, like merging sorted arrays or sliding window techniques. This also builds a foundation for understanding in-place algorithms and space optimization.
Mental Model
Core Idea
Use two pointers to track unique elements and overwrite duplicates in a sorted array, keeping only one copy of each number.
Think of it like...
Imagine you have a line of identical books sorted by title. You walk along with two fingers: one finger marks where to place the next unique book, and the other finger scans through the line. When you find a new title, you move it next to the last unique book found.
Index:  0   1   2   3   4   5   6
Array: [1,  1,  2,  2,  3,  4,  4]

Two pointers:
  i -> position to place next unique number
  j -> scanning through array

Step by step:
 i=0, j=1: arr[j]=1 same as arr[i]=1, skip
 i=0, j=2: arr[j]=2 different, i=1, arr[i]=2
 i=1, j=3: arr[j]=2 same, skip
 i=1, j=4: arr[j]=3 different, i=2, arr[i]=3
 i=2, j=5: arr[j]=4 different, i=3, arr[i]=4
 i=3, j=6: arr[j]=4 same, skip

Result array front: [1, 2, 3, 4]
Build-Up - 7 Steps
1
FoundationUnderstanding Sorted Arrays
🤔
Concept: Learn what a sorted array is and why duplicates appear next to each other.
A sorted array is a list of numbers arranged from smallest to largest. Because of this order, if a number repeats, all copies are right next to each other. For example, in [1,1,2,2,3], the duplicates 1 and 2 are side by side.
Result
Knowing that duplicates are neighbors helps us find and remove them easily.
Understanding the sorted property is key because it allows us to detect duplicates by comparing neighbors only.
2
FoundationBasics of Two Pointer Technique
🤔
Concept: Introduce two pointers to scan and modify arrays efficiently.
Two pointers are two indexes that move through the array. One pointer scans every element, the other tracks where to write unique elements. This avoids extra space and keeps the array compact.
Result
You can process arrays in one pass, saving time and memory.
Knowing how two pointers work sets the stage for many efficient array algorithms.
3
IntermediateApplying Two Pointers to Remove Duplicates
🤔Before reading on: do you think both pointers move forward every time, or only one moves sometimes? Commit to your answer.
Concept: Use one pointer to scan and another to mark unique positions, moving the second only when a new unique number is found.
Start with i=0 (write pointer) and j=1 (scan pointer). Compare arr[j] with arr[i]. If different, increment i and copy arr[j] to arr[i]. If same, just move j forward. This way, unique numbers accumulate at the start.
Result
The array front holds unique numbers, and the function returns the count of unique elements.
Understanding that the write pointer only moves on unique finds prevents overwriting and keeps the array clean.
4
IntermediateIn-place Modification Without Extra Space
🤔Before reading on: do you think this method needs extra arrays or memory? Commit to yes or no.
Concept: Modify the original array directly without using extra space for duplicates removal.
Instead of creating a new array, we overwrite duplicates in the original array. This saves memory and is efficient for large data.
Result
The original array is changed to hold only unique elements at the start.
Knowing in-place modification is possible helps write memory-efficient code, crucial for large datasets.
5
IntermediateHandling Edge Cases in Two Pointer Method
🤔Before reading on: what happens if the array is empty or has one element? Predict the output.
Concept: Consider special cases like empty arrays or arrays with all duplicates to ensure robustness.
If the array is empty, return 0. If only one element, return 1. If all elements are the same, the result is one unique element. The two-pointer method naturally handles these by careful pointer checks.
Result
The method works correctly for all edge cases without extra code.
Understanding edge cases prevents bugs and ensures the method is reliable in all situations.
6
AdvancedCode Implementation in C with Annotations
🤔Before reading on: do you think the function returns the new length or the modified array? Commit to your answer.
Concept: Write a complete C function using two pointers to remove duplicates and return the new length.
int removeDuplicates(int* nums, int numsSize) { if (numsSize == 0) return 0; int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; } // Example usage: // int arr[] = {1,1,2,2,3,4,4}; // int len = removeDuplicates(arr, 7); // After call, arr front: 1,2,3,4 and len=4
Result
Function returns 4 and array front is [1,2,3,4].
Seeing the full code clarifies how pointers move and how the array is updated step-by-step.
7
ExpertPerformance and Limitations of Two Pointer Approach
🤔Before reading on: do you think this method is always the fastest or are there cases it struggles? Commit to your answer.
Concept: Analyze time and space complexity and understand when this method is best or limited.
The method runs in O(n) time and O(1) space, optimal for sorted arrays. However, it only works because the array is sorted. For unsorted arrays, this approach fails and needs sorting first or a different method. Also, it modifies the original array, which might not be allowed in some cases.
Result
You know when to apply this method and when to choose alternatives.
Understanding the method's boundaries helps avoid misuse and guides choosing the right tool for the problem.
Under the Hood
The two-pointer method uses one pointer (j) to scan each element and another pointer (i) to track the position of the last unique element. When a new unique element is found at j, i is incremented and the element at j is copied to i. This overwrites duplicates and compacts unique elements at the start. The array is modified in place, so no extra memory is used. The final count is i+1 because i is zero-based.
Why designed this way?
This method was designed to efficiently remove duplicates without extra memory, leveraging the sorted property to detect duplicates by simple neighbor comparison. Alternatives like using hash sets require extra space. Sorting first then removing duplicates is slower. The two-pointer approach balances speed and memory use, ideal for large sorted datasets.
Start:
 nums: [1, 1, 2, 2, 3, 4, 4]
 i=0 (write), j=1 (scan)

Loop:
 j=1: nums[j]=1 == nums[i]=1 -> skip
 j=2: nums[j]=2 != nums[i]=1 -> i=1, nums[i]=2
 j=3: nums[j]=2 == nums[i]=2 -> skip
 j=4: nums[j]=3 != nums[i]=2 -> i=2, nums[i]=3
 j=5: nums[j]=4 != nums[i]=3 -> i=3, nums[i]=4
 j=6: nums[j]=4 == nums[i]=4 -> skip

End:
 Unique elements at nums[0..i]: [1, 2, 3, 4]
Myth Busters - 3 Common Misconceptions
Quick: Does this method work on unsorted arrays without changes? Commit yes or no.
Common Belief:This two-pointer method removes duplicates from any array, sorted or not.
Tap to reveal reality
Reality:It only works correctly on sorted arrays because duplicates must be adjacent to detect them by neighbor comparison.
Why it matters:Using it on unsorted arrays leads to incorrect results and missed duplicates, causing bugs in programs.
Quick: Does this method create a new array to store unique elements? Commit yes or no.
Common Belief:The method creates a new array to hold unique elements, so the original array stays unchanged.
Tap to reveal reality
Reality:The method modifies the original array in place, overwriting duplicates and changing the input array.
Why it matters:If the original array must be preserved, this method is unsuitable without copying first.
Quick: After running, does the array length automatically change? Commit yes or no.
Common Belief:The array length changes automatically after removing duplicates.
Tap to reveal reality
Reality:The array size in memory stays the same; only the first part contains unique elements. The function returns the new length to indicate how many elements are valid.
Why it matters:Misunderstanding this leads to accessing invalid elements beyond the new length, causing errors.
Expert Zone
1
The write pointer i only moves when a new unique element is found, preventing unnecessary writes and improving cache efficiency.
2
This method assumes stable memory and no concurrent modifications; in multithreaded contexts, synchronization is needed.
3
The approach can be adapted to remove duplicates allowing up to k occurrences by adjusting the comparison logic.
When NOT to use
Do not use this method on unsorted arrays or when the original array must remain unchanged. Instead, use hash sets for unsorted arrays or copy the array before modification. For allowing duplicates up to a limit, use modified two-pointer logic or frequency counting.
Production Patterns
In production, this method is used in database query optimizations to remove duplicate sorted results, in data cleaning pipelines to compact sorted logs, and in embedded systems where memory is limited and in-place operations are critical.
Connections
Sliding Window Technique
Both use multiple pointers to scan arrays efficiently.
Understanding two pointers here helps grasp sliding windows, which also move pointers to track subarrays dynamically.
Run-Length Encoding (RLE)
Both exploit consecutive duplicates in sorted data.
Knowing how duplicates cluster in sorted arrays aids understanding RLE compression that counts repeated elements.
Data Deduplication in Storage Systems
The concept of removing duplicates in arrays parallels removing duplicate data blocks in storage.
Recognizing this connection shows how simple array techniques scale to complex systems saving space and improving efficiency.
Common Pitfalls
#1Trying to remove duplicates from an unsorted array using this method.
Wrong approach:int removeDuplicates(int* nums, int numsSize) { int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; } // Called with nums = [3,1,2,3,2], unsorted array
Correct approach:// First sort the array, then remove duplicates qsort(nums, numsSize, sizeof(int), compare); int removeDuplicates(int* nums, int numsSize) { if (numsSize == 0) return 0; int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; }
Root cause:Misunderstanding that the method requires sorted input to detect duplicates by neighbor comparison.
#2Assuming the array length changes automatically after removal.
Wrong approach:int newLength = removeDuplicates(arr, size); for (int i = 0; i < size; i++) { printf("%d ", arr[i]); } // Prints entire original array including duplicates
Correct approach:int newLength = removeDuplicates(arr, size); for (int i = 0; i < newLength; i++) { printf("%d ", arr[i]); } // Prints only unique elements
Root cause:Not using the returned new length to limit access to valid unique elements.
#3Creating a new array inside the function to store unique elements.
Wrong approach:int* removeDuplicates(int* nums, int numsSize, int* returnSize) { int* result = malloc(numsSize * sizeof(int)); int i = 0; for (int j = 0; j < numsSize; j++) { if (j == 0 || nums[j] != nums[j-1]) { result[i++] = nums[j]; } } *returnSize = i; return result; }
Correct approach:int removeDuplicates(int* nums, int numsSize) { if (numsSize == 0) return 0; int i = 0; for (int j = 1; j < numsSize; j++) { if (nums[j] != nums[i]) { i++; nums[i] = nums[j]; } } return i + 1; }
Root cause:Not realizing the problem requires in-place modification without extra memory.
Key Takeaways
Removing duplicates from a sorted array is efficient because duplicates are adjacent, allowing simple neighbor comparison.
The two-pointer technique uses one pointer to scan and another to write unique elements, modifying the array in place.
This method runs in linear time and constant space, making it optimal for large sorted arrays.
It only works on sorted arrays and modifies the original array, so input conditions and side effects must be considered.
Understanding this technique builds a foundation for many array problems and teaches in-place data manipulation.