R Strings

Partial capture of text on file.
       Work with strings with stringr : : CHEAT SHEET 
        The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
       Detect Matches                                                                                     Subset Strings                                                                                        Manage Lengths
                    TRUE           str_detect(string, pattern) Detect the                                                            str_sub(string, start = 1L, end = -1L) Extract                                           4            str_length(string) The width of strings (i.e. 
                    TRUE           presence of a pattern match in a string.                                                          substrings from a character vector.                                                      6            number of code points, which generally equals 
                    FALSE          str_detect(fruit, "a")                                                                            str_sub(fruit, 1, 3); str_sub(fruit, -2)                                                 2            the number of characters). str_length(fruit) 
                    TRUE                                                                                                                                                                                                      3
                      1            str_which(string, pattern) Find the indexes of                                                    str_subset(string, pattern) Return only the                                                           str_pad(string, width, side = c("left", "right", 
                      2            strings that contain a pattern match.                                                             strings that contain a pattern match.                                                                 "both"), pad = " ") Pad strings to constant 
                      4            str_which(fruit, "a")                                                                             str_subset(fruit, "b")                                                                                width. str_pad(fruit, 17) 
                      0            str_count(string, pattern) Count the number                                                       str_extract(string, pattern) Return the first                                                         str_trunc(string, width, side = c("right", "left", 
                      3            of matches in a string.                                                             NA            pattern match found in each string, as a vector.                                                      "center"), ellipsis = "...") Truncate the width of 
                      1            str_count(fruit, "a")                                                                             Also str_extract_all to return every pattern                                                          strings, replacing content with ellipsis. 
                      2                                                                                                              match. str_extract(fruit, "[aeiou]")                                                                  str_trunc(fruit, 3) 
                    start end      str_locate(string, pattern) Locate the 
                     2 4           positions of pattern matches in a string. Also                                                    str_match(string, pattern) Return the first                                                           str_trim(string, side = c("both", "left", "right")) 
                     4 7
                    NANA           str_locate_all. str_locate(fruit, "a")                                                            pattern match found in each string, as a                                                              Trim whitespace from the start and/or end of a 
                     3 4                                                                                             NANA            matrix with a column for each ( ) group in                                                            string. str_trim(fruit)
                                                                                                                                     pattern. Also str_match_all.  
                                                                                                                                     str_match(sentences, "(a|the) ([^ ]+)")
       Mutate Strings                                                                                     Join and Split                                                                                        Order Strings
                                      str_sub() <- value. Replace substrings by                                                          str_c(..., sep = "", collapse = NULL) Join                                           4            str_order(x, decreasing = FALSE, na_last = 
                                                                                                                                         multiple strings into a single string.                                               1                                                                   1
                                      identifying the substrings with str_sub() and                                                                                                                                                        TRUE, locale = "en", numeric = FALSE, ...)  Return 
                                      assigning into the results.                                                                        str_c(letters, LETTERS)                                                              3            the vector of indexes that sorts a character 
                                      str_sub(fruit, 1, 3) <- "str"                                                                                                                                                           2            vector. x[str_order(x)] 
                                                                                                                                         str_c(..., sep = "", collapse = NULL) Collapse a 
                                      str_replace(string, pattern, replacement)                                                          vector of strings into a single string.                                                           str_sort(x, decreasing = FALSE, na_last = TRUE, 
                                                                                                                                         str_c(letters, collapse = "")                                                                                                                   1
                                      Replace the first matched pattern in each                                                                                                                                                            locale = "en", numeric = FALSE, ...)  Sort a 
                                      string. str_replace(fruit, "a", "-")                                                               str_dup(string, times) Repeat strings times                                                       character vector. 
                                                                                                                                         times. str_dup(fruit, times = 2)                                                                  str_sort(x)
                                      str_replace_all(string, pattern, 
                                      replacement) Replace all matched patterns 
                                      in each string. str_replace_all(fruit, "a", "-")                                                   str_split_fixed(string, pattern, n) Split a                            Helpers
                                                                                                                                         vector of strings into a matrix of substrings                                                     str_conv(string, encoding) Override the 
            A STRING                                                                  1
                                      str_to_lower(string, locale = "en")  Convert                                                       (splitting at occurrences of a pattern match).                                                    encoding of a string. str_conv(fruit,"ISO-8859-1") 
             a string                 strings to lower case.                                                                             Also str_split to return a list of substrings.  
                                      str_to_lower(sentences)                                                                            str_split_fixed(fruit, " ", n=2)                                                                  str_view(string, pattern, match = NA) View 
             a string                                                                  1                           {xx}   {yy}           glue::glue(..., .sep = "", .envir =                                                               HTML rendering of first regex match in each 
                                      str_to_upper(string, locale = "en")  Convert                                                       parent.frame(), .open = "{", .close = "}") Create                                                 string. str_view(fruit, "[aeiou]") 
            A STRING                  strings to upper case.                                                                             a string from strings and {expressions} to 
                                      str_to_upper(sentences)                                                                            evaluate. glue::glue("Pi is {pi}")                                                                str_view_all(string, pattern, match = NA) View 
             a string                                                               1                                                                                                                                                      HTML rendering of all regex matches. 
                                      str_to_title(string, locale = "en")  Convert                                                       glue::glue_data(.x, ..., .sep = "", .envir =                                                      str_view_all(fruit, "[aeiou]") 
             A String                 strings to title case. str_to_title(sentences)                                                     parent.frame(), .open = "{", .close = "}") Use a 
                                                                                                                                         data frame, list, or environment to create a                                                      str_wrap(string, width = 80, indent = 0, exdent 
                                                                                                                                         string from strings and {expressions} to                                                          = 0) Wrap strings into nicely formatted 
                                                                                                                                         evaluate. glue::glue_data(mtcars,                                                                 paragraphs. str_wrap(sentences, 20)
                                                                                                                                         "{rownames(mtcars)} has {hp} hp")
                                                                                                                                                                                                                                          1
                                                                                                                                                                                                                                            See bit.ly/ISO639-1 for a complete list of locales.
                                                                             RStudio® is a trademark of RStudio, Inc.  •  CC BY SA  RStudio •  info@rstudio.com  •  844-448-1212 • rstudio.com •  Learn more at stringr.tidyverse.org •  Diagrams from @LVaudor ! • stringr  1.2.0 •   Updated: 2017-10
         Need to Know                                                                 Regular Expressions - Regular expressions, or regexps, are a concise language for                                                                                        [:space:]
                                                                                                                                                         describing patterns in strings. 
                                                                                                                                                                                                                                                             "   new line
          Pattern arguments in stringr are interpreted as                             MATCH CHARACTERS                                                     see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)                                   
          regular expressions after any special characters                                                                                                                                                                                                     [:blank:]            .
          have been parsed.                                                           string (type  regexp               matches                                                      example                                                                        space
                                                                                      this)          (to mean this)      (which matches this)
          In R, you write regular expressions as strings,                                            a  (etc.)           a (etc.)                                                     see("a")                   abc ABC 123   .!?\(){}                                 tab
          sequences of characters surrounded by quotes                                \\.            \.                  .                                                            see("\\.")                 abc ABC 123   .!?\(){}
          ("") or single quotes('').                                                  \\!            \!                  !                                                            see("\\!")                 abc ABC 123   .!?\(){}                                       [:graph:]
          Some characters cannot be represented directly                              \\?            \?                  ?                                                            see("\\?")                 abc ABC 123   .!?\(){}
          in an R string . These must be represented as                               \\\\           \\                  \                                                            see("\\\\")                abc ABC 123   .!?\(){}                                       [:punct:]
          special characters, sequences of characters that                            \\(            \(                  (                                                            see("\\(")                 abc ABC 123   .!?\(){}
          have a specific meaning., e.g.                                              \\)            \)                  )                                                            see("\\)")                 abc ABC 123   .!?\(){}                     .  ,   :  ;   ? ! \ | / `  = * + - ^
                        Special Character      Represents                             \\{            \{                  {                                                            see("\\{")                 abc ABC 123   .!?\(){}                    _ ~ " ' [ ] { } ( ) < > @# $
                        \\                     \                                      \\}            \}                  }                                                            see( "\\}")                abc ABC 123   .!?\(){}
                        \"                     "                                      \\n            \n                  new line (return)                                            see("\\n")                 abc ABC 123   .!?\(){}                                       [:alnum:]
                        \n                     new line                               \\t            \t                  tab                                                          see("\\t")                 abc ABC 123   .!?\(){}
                    Run ?"'" to see a complete list                                   \\s            \s                  any whitespace  (\S for non-whitespaces)                     see("\\s")                 abc ABC 123   .!?\(){}                                        [:digit:]
                                                                                      \\d            \d                  any digit  (\D for non-digits)                               see("\\d")                 abc ABC 123   .!?\(){}                              0 1 2 3 4 5 6 7 8 9
          Because of this, whenever a \ appears in a regular                          \\w            \w                  any word character  (\W for non-word chars)                  see("\\w")                 abc ABC 123   .!?\(){}
          expression, you must write it as \\ in the string                           \\b            \b                  word boundaries                                              see("\\b")                 abc ABC 123   .!?\(){}
          that represents the regular expression.                                                               1
                                                                                                     [:digit:]           digits                                                       see("[:digit:]")           abc ABC 123   .!?\(){}                                       [:alpha:]
          Use writeLines() to see how R views your string                                            [:alpha:] 1         letters                                                      see("[:alpha:]")           abc ABC 123   .!?\(){}                           [:lower:]                [:upper:]
          after all special characters have been parsed.                                             [:lower:] 1         lowercase letters                                            see("[:lower:]")           abc ABC 123   .!?\(){}
                                                                                                     [:upper:] 1         uppercase letters                                            see("[:upper:]")           abc ABC 123   .!?\(){}                        a b c d e f             A B C D E F
          writeLines("\\.")                                                                                        1
          # \.                                                                                       [:alnum:]           letters and numbers                                          see("[:alnum:]")           abc ABC 123   .!?\(){}                        g h i j k l             G H I J K L
                                                                                                     [:punct:] 1         punctuation                                                  see("[:punct:]")           abc ABC 123   .!?\(){}
          writeLines("\\ is a backslash")                                                            [:graph:] 1         letters, numbers, and punctuation                            see("[:graph:]")           abc ABC 123   .!?\(){}                       mn o p q r               MNOPQR
          # \ is a backslash                                                                         [:space:] 1         space characters (i.e. \s)                                   see("[:space:]")           abc ABC 123   .!?\(){}                        s t u v w x             S T U V W X
                                                                                                     [:blank:] 1         space and tab (but not new line)                             see("[:blank:]")           abc ABC 123   .!?\(){}                        z                       Z
         INTERPRETATION                                                                              .                   every character except a new line                            see(".")                   abc ABC 123   .!?\(){}
                                                                                                                             1
                                                                                                                               Many base R functions require classes to be wrapped in a second set of [ ], e.g.  [[:digit:]]
          Patterns in stringr are interpreted as regexs To 
          change this default, wrap the pattern in one of:
                                                                                      ALTERNATES                                             alt <- function(rx) str_view_all("abcde", rx)                  QUANTIFIERS                                   quant <- function(rx) str_view_all(".a.aa.aaa", rx)
          regex(pattern, ignore_case = FALSE, multiline =                                                                regexp         matches                   example                                                               regexp         matches                       example 
          FALSE, comments = FALSE, dotall = FALSE, ...)                                                                                                                                                                                 a?             zero or one                   quant("a?")             .a.aa.aaa
          Modifies a regex to ignore cases, match end of                                                                 ab|d           or                        alt("ab|d")               abcde
          lines as well of end of strings, allow R comments                                                              [abe]          one of                    alt("[abe]")              abcde                                       a*             zero or more                  quant("a*")             .a.aa.aaa
          within regex's , and/or to have . match everything                                                             [^abe]         anything but              alt("[^abe]")             abcde                                       a+             one or more                   quant("a+")             .a.aa.aaa
          including \n.                                                                                                  [a-c]          range                     alt("[a-c]")              abcde            1    2 ...    n            a{n}           exactly n                     quant("a{2}")           .a.aa.aaa
          str_detect("I", regex("i", TRUE))  
                                                                                                                                                                                                             1    2 ... n               a{n, }         n or more                     quant("a{2,}")          .a.aa.aaa
          fixed() Matches raw bytes but will miss some                                ANCHORS                                             anchor <- function(rx) str_view_all("aaa", rx)                          n ... m               a{n, m}        between n and m               quant("a{2,4}")         .a.aa.aaa
          characters that can be represented in multiple 
          ways (fast). str_detect("\u0130", fixed("i"))                                                                 regexp          matches                   example 
          coll() Matches raw bytes and will use locale                                                                  ^a              start of string           anchor("^a")                 aaa          GROUPS                                             ref <- function(rx) str_view_all("abbaab", rx)
          specific collation rules to recognize characters                                                              a$              end of string             anchor("a$")                 aaa          Use parentheses to set precedent (order of evaluation) and create groups
          that can be represented in multiple ways (slow).                                                                                                                                                                 regexp             matches                       example 
          str_detect("\u0130", coll("i", TRUE, locale = "tr"))                                                                                                                                                             (ab|d)e            sets precedence               alt("(ab|d)e")                  abcde
          boundary() Matches boundaries between                                       LOOK AROUNDS                                        look <- function(rx) str_view_all("bacad", rx)
          characters, line_breaks, sentences, or words.                                                                 regexp          matches                   example                                   Use an escaped number to refer to and duplicate parentheses groups that occur 
          str_split(sentences, boundary("word"))                                                                        a(?=c)          followed by               look("a(?=c)")            bacad           earlier in a pattern. Refer to each group by its order of appearance
                                                                                                                        a(?!c)          not followed by           look("a(?!c)")            bacad           string         regexp             matches                      example 
                                                                                                                        (?<=b)a         preceded by               look("(?<=b)a")           bacad           (type this)    (to mean this)     (which matches this)         (the result is the same as ref("abba"))
                                                                                                                        (?
The words contained in this file might help you see if this file matches what you are looking for:

...Work with strings stringr cheat sheet the package provides a set of internally consistent tools for working character i e sequences characters surrounded by quotation marks detect matches subset manage lengths true str string pattern sub start l end extract length width presence match in substrings from vector number code points which generally equals false fruit find indexes return only pad side c left right that contain both to constant b count first trunc na found each as center ellipsis truncate also all every replacing content locate positions trim nana whitespace and or matrix column group sentences mutate join split order...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area