Text patterns with regular expressions

Visualizing matches

See the str_view() and str_view_all() widgets in version 1.1 of stringr.

Example

Pull out 100,000 names from BabyNames, adding up the totals over the years by sex. Call this NameList.

NameList <-
  BabyNames %>%
  group_by(name, sex) %>%
  summarise(total = sum(count))
  • Names ending with “a”.
  • Names ending with a vowel.
  • Names ending with a vowel or “y”.
  • Names with 3 consonants in a row.
  • Names with 3 vowels in a row.

Interactive site for testing expressions: http://regexone.com/

Extraction

With just one pattern to match, use stringr::str_extract().

What are the most common vowels, by sex

NameList %>%
  mutate(vowel = stringr::str_extract(name, "([aeiou]+)$")) %>%
  group_by(sex, vowel) %>%
  summarise( total = sum(total) ) %>%
  arrange(sex, desc(total)) %>%
  spread(key=sex, total)
## # A tibble: 92 × 3
##    vowel        F      M
## *  <chr>    <int>  <int>
## 1      a 45710854 594191
## 2     aa     8827    572
## 3    aai       80     NA
## 4     ae   268561  37868
## 5    aea      452     NA
## 6    aee       45     35
## 7    aeo       NA    270
## 8     ai    52462  79942
## 9    aia    25078    579
## 10   aie      420     15
## # ... with 82 more rows

With multiple patterns, use tidyr::extract().