Remove Duplicates from a Sorted Array: A Complete Guide

Remove duplicates from a sorted array

Have you ever seen something twice?

Have you ever been to a party and noticed someone wearing the same shirt as you? It’s a little awkward, right? This is called duplication. It happens in many places, even in the digital world. Imagine you’re working on a project, and you have a list of numbers or names. Sometimes, the same number or name shows up more than once. This is what we call duplicates.

Remove duplicates from a sorted array

In real life, think about Thomas Edison, the inventor of the light bulb. He tried thousands of different ways to create a light bulb before he succeeded. Imagine if he kept trying the same thing over and over again. It would be a waste of time, just like duplicates in our data can waste space and cause confusion. That's why it's important to know how to remove them.

So, how do we get rid of these duplicates in a computer list? Let's explore this step by step.

Problem statement: Why do we need to remove duplicates?

When you have a list, or as we call it in computer terms, an array, sometimes the same number or word can appear more than once. This can happen for many reasons, like mistakes in data entry or errors in calculations.

For example, if you have a list of your favorite numbers: [2, 2, 3, 3, 5, 7, 7, 7], you can see that some numbers are repeated. But if you only want unique numbers in your list, you need to remove the duplicates. Your final list should look like this: [2, 3, 5, 7].

Now, how do we solve this problem?

Sample input

Let’s look at a simple example to understand this better:

python
1 nums = [1, 1, 2]

Sample output

python
1[1, 2, 3, 4]

The goal is to remove the repeated numbers and only keep one of each.

How to solve the problem?

Now that we know what the problem is, let’s think about how to solve it.

  • Start with the First Number: Look at the first number in the list.
  • Compare with the Next: Check if the next number is the same.
  • Skip or Keep: If it’s the same, skip it. If it’s different, keep it.
  • Move to the Next: Move to the next number and repeat the process until you reach the end of the list.

This process is simple and can be broken down into a few easy steps.

Pseudo code: Explaining in simple terms

Before we jump into the actual coding, let’s write down what we need to do in plain English. This is called pseudo code, and it’s great for beginners.

  • Start with an empty list to store unique numbers.
  • Look at each number in the original list.
  • If the number is not the same as the previous number, add it to the new list.
  • Continue until all numbers have been checked.
  • Return the new list with no duplicates.

Method 1: Brute force approach using extra memory

The brute force approach is a simple way to remove duplicates. It uses extra memory to store the unique elements and then creates a new list from them.

How it works:

  • Step 1: Create a set, which automatically removes duplicates because sets do not allow duplicate values.
  • Step 2: Loop through the array and add each element to the set.
  • Step 3: Convert the set back to a list.

Python code:

python
1def remove_duplicates_brute_force(arr):
2    # Using a set to store unique elements
3    unique_set = set(arr)
4    
5    # Convert the set back to a list
6    unique_list = list(unique_set)
7    
8    # Sort the list to maintain order
9    unique_list.sort()
10    
11    return unique_list
12
13# Sample input
14arr = [1, 1, 2, 2, 3, 3, 4]
15# Expected output: [1, 2, 3, 4]
16print("Output:", remove_duplicates_brute_force(arr))

Code explanation:

  • Set: The set automatically removes duplicates.
  • Sorting: We sort the list again after converting from the set to maintain the original order.

Java Code:

java
1import java.util.HashSet;
2import java.util.List;
3import java.util.ArrayList;
4import java.util.Collections;
5
6public class RemoveDuplicatesBruteForce {
7    public static List<Integer> removeDuplicates(int[] arr) {
8        // Using a set to store unique elements
9        HashSet<Integer> uniqueSet = new HashSet<>();
10        
11        // Add elements to the set
12        for (int num : arr) {
13            uniqueSet.add(num);
14        }
15        
16        // Convert the set back to a list
17        List<Integer> uniqueList = new ArrayList<>(uniqueSet);
18        
19        // Sort the list to maintain order
20        Collections.sort(uniqueList);
21        
22        return uniqueList;
23    }
24
25    public static void main(String[] args) {
26        int[] arr = {1, 1, 2, 2, 3, 3, 4};
27        System.out.println("Output: " + removeDuplicates(arr));
28    }
29}

C++ code:

cpp
1#include <iostream>
2#include <set>
3#include <vector>
4#include <algorithm>
5
6std::vector<int> removeDuplicatesBruteForce(const std::vector<int>& arr) {
7    // Using a set to store unique elements
8    std::set<int> uniqueSet(arr.begin(), arr.end());
9    
10    // Convert the set back to a vector
11    std::vector<int> uniqueList(uniqueSet.begin(), uniqueSet.end());
12    
13    return uniqueList;
14}
15
16int main() {
17    std::vector<int> arr = {1, 1, 2, 2, 3, 3, 4};
18    std::vector<int> result = removeDuplicatesBruteForce(arr);
19    
20    std::cout << "Output: ";
21    for (int num : result) {
22        std::cout << num << " ";
23    }
24    std::cout << std::endl;
25    
26    return 0;
27}

Pros and cons:

  • Pros: Easy to understand and implement.
  • Cons: Uses extra memory because of the set and sorting, which adds to the time complexity.

Method 2: Optimal approach using index pointer

The optimal approach is more efficient and doesn’t use extra memory. This method removes duplicates in place by using an index pointer.

How it works:

  • Use Two Pointers: One pointer i to iterate through the array and another pointer j to track the position of unique elements.
  • Compare Adjacent Elements: If the current element is different from the previous one, move the pointer j and copy the current element to this position.
  • Return the Unique Array: The first j+1 elements of the array will be the unique elements.

Python Code:

python
1def remove_duplicates_optimal(arr):
2    if not arr:
3        return []
4    
5    j = 0  # Pointer to track the position of unique elements
6    
7    for i in range(1, len(arr)):
8        if arr[i] != arr[j]:
9            j += 1
10            arr[j] = arr[i]
11    
12    return arr[:j+1]
13
14# Sample input
15arr = [1, 1, 2, 2, 3, 3, 4]
16# Expected output: [1, 2, 3, 4]
17print("Output:", remove_duplicates_optimal(arr))

Code explanation:

  • Pointer j: Tracks the position where the next unique element should be placed.
  • Efficient In-Place Operation: This method uses constant extra space, making it very efficient.
java
1public class RemoveDuplicatesOptimal {
2    public static int[] removeDuplicates(int[] arr) {
3        if (arr.length == 0) {
4            return new int[0];
5        }
6        
7        int j = 0; // Pointer to track the position of unique elements
8        
9        for (int i = 1; i < arr.length; i++) {
10            if (arr[i] != arr[j]) {
11                j++;
12                arr[j] = arr[i];
13            }
14        }
15        
16        // Return the unique part of the array
17        int[] result = new int[j + 1];
18        System.arraycopy(arr, 0, result, 0, j + 1);
19        return result;
20    }
21
22    public static void main(String[] args) {
23        int[] arr = {1, 1, 2, 2, 3, 3, 4};
24        int[] result = removeDuplicates(arr);
25        System.out.print("Output: ");
26        for (int num : result) {
27            System.out.print(num + " ");
28        }
29    }
30}

C++ code:

cpp
1#include <iostream>
2#include <vector>
3
4std::vector<int> removeDuplicatesOptimal(std::vector<int>& arr) {
5    if (arr.empty()) {
6        return {};
7    }
8    
9    int j = 0; // Pointer to track the position of unique elements
10    
11    for (size_t i = 1; i < arr.size(); i++) {
12        if (arr[i] != arr[j]) {
13            j++;
14            arr[j] = arr[i];
15        }
16    }
17    
18    return std::vector<int>(arr.begin(), arr.begin() + j + 1);
19}
20
21int main() {
22    std::vector<int> arr = {1, 1, 2, 2, 3, 3, 4};
23    std::vector<int> result = removeDuplicatesOptimal(arr);
24    
25    std::cout << "Output: ";
26    for (int num : result) {
27        std::cout << num << " ";
28    }
29    std::cout << std::endl;
30    
31    return 0;
32}

Pros and cons:

  • Pros: Very efficient in terms of both time and space complexity.
  • Cons: Modifies the original array, which might not be suitable if you need to keep the original data intact.

Conclusion: Keep your data clean and efficient

Removing duplicates from a sorted array is a common task that can be solved in different ways. The brute force approach is simple and easy to understand, but it uses extra memory. The optimal approach, on the other hand, is more efficient, especially for large datasets, as it doesn’t require additional space.

Whether you’re a beginner or an experienced coder, understanding both methods gives you the flexibility to choose the best solution for your needs. With these methods in your toolkit, you’ll be well-prepared to handle similar problems in your coding journey.

Remember, clean data means better results, just like how Thomas Edison’s persistence led to the invention of the light bulb. Now that you know how to remove duplicates from a sorted array, you’re ready to take on more challenging coding tasks!

Frequently Asked Questions

Sign-in First to Add Comment

Leave a comment 💬

All Comments

No comments yet.