Indexing strings with an specific length inside

by Wencheng Lau-Medrano   Last Updated May 15, 2019 16:26 PM - source

I have a list of names which looks like this:

c("xxxxxx xx",             "xxx yyy xxxxx",       "xxx yy xxxxxx", 
  "xxxxxxx yyyyyyy xxxxx", "xxxx xxxx",           "xxx yyyyyy xxx", 
  "xxxxx yyyyy xxxxxxxx",  "xxx yyyyyyyy xxxx",   "xx xxx", 
  "xxxxx yyyyy xxxxx",     "xxxx yy xxxxxx",      "xxxxx yyyy xxx", 
  "xxxxxxx yy xxxxx",      "xxxxx yyyyyyy xxxxx", "xxxx yyyy xxxxxx", 
  "xxxxx yyyy xxxxx",      "xxxxxxxx  xxxxx",     "xxxxxx yyyyyyyy xxxxx", 
  "xxxxxx yy xxxxx",       "xxx yyyy xxxxxx")

I need to extract (index) all those names with word of 4-6 letters.

I know that I could split each string, calculate their number of characters with nchar and then index which ones have a length between 2 and 4. But, is there any way to do that with a single line using regular expressions?

The expected output must be a vector: Numeric

[1]  1  2  3  5  6  8  9 11 12 13 15 16 20

Or logical

[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE 
[11] TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE
Tags : r regex


Answers 1


Base R
You can use grepl

grepl("\\w{4,6}", my.text)
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

stringr
You can use stringr's str_detect with

library(stringr)
str_detect(my.text, "\\w{4,6}")
# [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

In both versions the keypoint is the regular expression which matches words of length 4 to 6.

Data

my.text <- c("xxxxxx xx", "xxx yyy xxxxx", "xxx yy xxxxxx", "xxxxxxx yyyyyyy xxxxx", 
             "xxxx xxxx", "xxx yyyyyy xxx", "xxxxx yyyyy xxxxxxxx","xxx yyyyyyyy xxxx", "xx xxx")
kath
kath
May 15, 2019 16:21 PM

Related Questions




RegEx for extracting a value from URLs

Updated May 17, 2019 18:26 PM

regex to match the url pattern

Updated April 03, 2015 23:11 PM

RegEx for matching commas inside array values

Updated May 08, 2019 20:26 PM