How to categorize data in Google Spreadsheet using regular expressions

by G Adams   Last Updated August 10, 2016 08:03 AM

I'm trying to categorize some data in a Google spreadsheet.

  • I have "phrases to screen" in column A, TAB "Phrases". For example "blueish house"
  • I have "words" in column A, TAB "Words". For example, "blue"
  • I have "categories" in column B, TAB "Words". For example, "color"

I created a sample file here: There's a link to a related post too that can help.

I'd like to use regular expressions using the words in column A to match some phrases in A, and then return the appropriate category from B.

For example, I'd like to find "blue" in "blueish house" and return "color". I'd like to find "white" in "great white shoes" and return "color", but not when followed by numbers. Hence, I need to use regular expressions.

I'm using the following formula in column C TAB "phrases". It works, but not with regular expressions. It returns the reg. expressions itself and then can't match the category.

=arrayformula(vlookup(arrayformula(iferror(regexextract(A2:A8,join("|",Words!$A$1:$A$7)))),Words!A$1:B,2,0))

I tried to adapt the formula (from the related post) in column D but it's not working.



Answers 1


SO here is what I did - I added a sheet to your doc called SO Test - Aurielle.

Then I made a unique list of the possible categories in column B using:

=UNIQUE(Words!B:B)

In Column A, I did a JOIN using the regex AND operator which is | and used the formula:

=IF(ISTEXT(B2),JOIN("|",FILTER(Words!A:A,Words!B:B=B2)),)

Basically the filter restricts it to combine keywords by their category value.

Then in column D I added this formula:

=IFERROR(INDEX(A$2:B,MATCH(TRUE,ARRAYFORMULA(REGEXMATCH(C2,INDIRECT("A"&2&":A"&3+COUNTA(B$2:B)))),0),2))

Basically what happens here , is if you break it down from the inside out is I'm using ARRAYFORMULA along with REGEXMATCH - which returns true or false depending on which row the value actually exists in.

So I use the word TRUE to be my key for the MATCH formula, then using INDEX, I navigate it to pull in the index row, and one column over thus grabbing the category.

NOTE: I also added an additional INDIRECT formula in there to calculate how many values actually exist in column B so that your formula will dynamically accommodate on the number of rows it needs to..

enter image description here

Aurielle Perlmann
Aurielle Perlmann
August 13, 2016 12:09 PM

Related Questions


Is it possible to use a loop in Google Spreadsheets?

Updated October 01, 2017 08:03 AM

REGEXEXTRACT and French accents in Google Spreadsheets

Updated October 01, 2017 06:03 AM