You may already have the formula to remove duplicates from comma-delimited strings in Google Sheets. But do you have an array formula to expand for each row? Possibly, won’t, am I right?

The purpose of the array formula here is to remove duplicates from comma (pipe or any other) delimited strings from each row by residing in the topmost row in the range.

I am not simply going to provide you the array formula. But I have included the total steps involved, and there are nine main steps.

If you are a newbie, then you might not understand the steps. But definitely, you can understand what is going behind the array formula that removes duplicates from comma delimited strings. In addition to that, you can start learning some tips from the steps, such as regex use.

**Non-Array Formula to Remove Duplicates from Comma-Delimited Strings**

Regarding the non-array formula, you can use the below one in cell B2 for the comma-delimited string in cell A2 that contains duplicates.

`=ArrayFormula(textjoin(", ",true,unique(trim(split(A2,",")),true)))`

The formula separates the strings by splitting (SPLIT) them. Then UNIQUE and combine (TEXTJOIN) it back to a comma-delimited string.

Even if the formula contains the ARRAYFORMULA function, it will only remove duplicates from the comma-delimited string in cell A2. For cell A3, you should drag this formula down.

The SPLIT function in the formula creates an array by splitting comma delimited strings. That’s why I have used the ARRAYFORMULA function.

What about writing an expanding array formula to remove duplicates from the comma-delimited strings in Google Sheets?

As I have mentioned, I do have the expanding formula. If you use the same in the above example, you should insert the formula only in cell B2. It will take the range A2:A and expand the results accordingly.

**Array Formula to Remove Duplicates from Comma-Delimited Strings**

The array formula is a bit lengthier. Here it is.

`=ArrayFormula(transpose(split(textjoin(", ",1,IFERROR((if(row(indirect("A2:A"&counta(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))))))-match(REGEXEXTRACT(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))),"\w+"),REGEXEXTRACT(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&substitute(A2:A,", ",","&row(A2:A)&" "),","))<>""))),"\w+"),0)=1,"|",)®EXREPLACE(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))),"^(\w*\W*)1","")))),"|")))`

For the sample sheet with the formula entered, please scroll down to the end of the post.

As I have mentioned, we can explain the above formula in 9 steps. For the explanation purpose, I am going to classify them under three categories. Here are them.

- Split Strings and Row Wise Unique (3 steps).
- Elements to Combine Strings Back to Its Rows (4 steps).
- Final Steps to Remove Duplicates from Comma Delimited Strings (2 steps).

I hope the below steps could enlighten you on what is happening behind the array formula that removes duplicates from the comma-delimited strings.

**Notes**

Please note that I won’t explain how to combine the formulas under each step to make the final formula. It’s easy to do but a little trickier.

The reason, the step-by-step formulas are for a limited (closed) range. But the final combined one is for an infinite (open) range.

We are considering comma delimited strings. If the strings are separated by some other delimiter, such as a pipe; then, you should make the changes accordingly in the formula.

Let’s go to the steps to understand how the above array formula removes duplicates from comma-delimited strings in Google Sheets.

**Split Strings and Row Wise Unique**

The below formulas go in cells B2, C2, and D2. Please see the image below to understand what they return.

1. Inserting Row Numbers and a White Space Immediately After the Delimiters.

`=ArrayFormula(trim(substitute(A2:A,",",","&row(A2:A)&" ")))`

2. Inserting Row Numbers at the Beginning of Values in Each Row.

`=ArrayFormula(if(len(A2:A),row(A2:A),)&" "&B2:B)`

The purpose of the above two formulas is to add row numbers to each string. For example, all the values in row # 2 will get row number 2 preceding them.

3. Split Strings Based on Delimiter, Unique, and Filter.

`=ArrayFormula(to_text(unique(filter(flatten(split(C2:C,",")),flatten(split(C2:C,","))<>""))))`

When you unique the strings this way, the formula will only remove duplicates from the same row. Because each value has its row numbers associated with it.

For example, if the text “BB” is in row 2 and row # 100 one time each, they won’t get eliminated because the values are now “2 BB” and “100 BB”, not duplicates.

We have solved one main issue that we may encounter when trying to remove duplicates from comma-delimited strings.

**Elements to Combine Strings Back to Its Rows**

The below formulas go in cells E2, F2, G2, and H2.

The step # 3 formula in the previous step returns unique row number added values. Please see column D. Below is the 4th step, which will extract the row numbers, and in the fifth step, the rest of the values. Please refer to the image below.

4. Extracting Row Numbers.

`=ArrayFormula(REGEXEXTRACT(D2:D16,"\w+"))`

5. Extracting Strings.

`=ArrayFormula(REGEXREPLACE(D2:D16,"^(\w*\W*)1",""))`

Related: Regex to Get All Words after Nth Word in a Sentence in Google Sheets.

The purpose of the following two (steps # 6 and 7) formulas is to return a separator to mark the end of each row based on column D.

We will achieve that by returning the running count of step # 4 output and then replacing sequence number 1 with a pipe and the rest of the values with blanks.

You can better understand them with the help of the screenshot below.

6. Running Count Based on Extracted Row Numbers.

`=ArrayFormula(row(A2:A16)-match(E2:E16,E2:E16,0))`

7. Delimiter to Separate Rows.

`=ArrayFormula(if(G2:G16=1,"|",))`

Note:- Here, I have used the `|`

character. If your strings are separated by the same, not a comma, then use here a `~`

character instead.

**Final Formula that Removes Duplicates from Comma-Delimited Strings**

Please see columns F and H. We want those columns in the final step. First, we will combine the pipe (or tilde) separators in column H with the strings in column F.

Then join them using a comma separator, split at the pipe (or tilde) separator, and transpose.

The following formulas go in cells I2 and J2.

8. Add Separators with Extracted Strings.

`=ArrayFormula(H2:H16&F2:F16)`

9. Combine Strings Using Comma Delimiter and Split Based on Separator.

`=transpose(split(textjoin(", ",1,I2:I16),"|"))`

This way we can remove duplicates from comma-delimited strings in Google Sheets.

**Pros and Cons**

**Pros**

- Array Formula:- It’s an array formula. So it will automatically include new values in A2:A.
- Row Wise Duplicate Identification:- One of the advantages of the above formula is that it removes duplicates based on rows. I have already mentioned the same in the post. If that’s not your priority, then there is no need to go for such a lengthier formula. You can try this one – Extract Unique Values from a Comma Separated List in Google Sheets.

**Cons**

- It may resource hungry. I didn’t test the formula in a larger range.
- Blank Row Issue:- For example, if the comma-delimited strings are in A2:A100, there shouldn’t be any blank row in between.
- May return error in a larger data range.
- May contain other unknown flaws (you can notify the same via comments).

Thanks for the stay. Enjoy!

## No comments:

Write Comments