Title: | Safe, Multiple, Simultaneous String Substitution |
---|---|
Description: | Designed to enable simultaneous substitution in strings in a safe fashion. Safe means it does not rely on placeholders (which can cause errors in same length matches). |
Authors: | Mark Ewing [aut, cre] |
Maintainer: | Mark Ewing <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.3 |
Built: | 2025-01-10 03:00:00 UTC |
Source: | https://github.com/bmewing/mgsub |
The hard worker doing everything for mgsub_censor
censor_worker( string, pattern, censor, split = any(nchar(censor) > 1), seed = NULL, ... )
censor_worker( string, pattern, censor, split = any(nchar(censor) > 1), seed = NULL, ... )
string |
a character vector where replacements are sought |
pattern |
Character string to be matched in the given character vector |
censor |
character to use in censoring - see details |
split |
if a multicharacter censor pattern is provided, should it be split to preserve original string length |
seed |
optional parameter to fix sampling of multicharacter censors |
... |
arguments to pass to regexpr family |
Fast escape function for limited case where only one pattern provided actually matches anything
fast_replace(string, pattern, replacement, ...)
fast_replace(string, pattern, replacement, ...)
string |
a character vector where replacements are sought |
pattern |
Character string to be matched in the given character vector |
replacement |
Character string equal in length to pattern or of length one which are a replacement for matched pattern. |
... |
arguments to pass to gsub |
Helper function used to identify which results from gregexpr overlap other matches and filter out shorter, overlapped results
filter_overlap(x)
filter_overlap(x)
x |
Matrix of gregexpr results, 4 columns, index of column matched, start of match, length of match, end of match. Produced exclusively from a worker function in mgsub |
Helper function to be used in a loop to check each pattern provided for matches
get_matches(string, pattern, i, ...)
get_matches(string, pattern, i, ...)
string |
a character vector where replacements are sought |
pattern |
Character string to be matched in the given character vector |
i |
an iterator provided by a looping function |
... |
arguments to pass to gregexpr |
mgsub
- A safe, simultaneous, multiple global string
replacement wrapper that allows access to multiple methods of specifying
matches and replacements.
mgsub(string, pattern, replacement, recycle = FALSE, ...)
mgsub(string, pattern, replacement, recycle = FALSE, ...)
string |
a character vector where replacements are sought |
pattern |
Character string to be matched in the given character vector |
replacement |
Character string equal in length to pattern or of length one which are a replacement for matched pattern. |
recycle |
logical. should replacement be recycled if lengths differ? |
... |
Converted string.
mgsub("hey, ho", pattern = c("hey", "ho"), replacement = c("ho", "hey")) mgsub("developer", pattern = c("e", "p"), replacement = c("p", "e")) mgsub("The chemical Dopaziamine is fake", pattern = c("dopa(.*?) ", "fake"), replacement = c("mega\\1 ", "real"), ignore.case = TRUE)
mgsub("hey, ho", pattern = c("hey", "ho"), replacement = c("ho", "hey")) mgsub("developer", pattern = c("e", "p"), replacement = c("p", "e")) mgsub("The chemical Dopaziamine is fake", pattern = c("dopa(.*?) ", "fake"), replacement = c("mega\\1 ", "real"), ignore.case = TRUE)
mgsub_censor
- A safe, simultaneous, multiple global string censoring
(replace matches with a censoring character like '*')
mgsub_censor( string, pattern, censor = "*", split = any(nchar(censor) > 1), seed = NULL, ... )
mgsub_censor( string, pattern, censor = "*", split = any(nchar(censor) > 1), seed = NULL, ... )
string |
a character vector to censor |
pattern |
regular expressions used to identify where to censor |
censor |
character to use in censoring - see details |
split |
if a multicharacter censor pattern is provided, should it be split to preserve original string length |
seed |
optional parameter to fix sampling of multicharacter censors |
... |
When censor is provided as a >1 length vector or as a multicharacter string with split = TRUE, it will be sampled to return random censoring patterns. This can be helpful if you want to create cartoonish swear censoring. If needed, the randomization can be controlled with the seed argument.
Censored string.
mgsub_censor("Flowers for a friend", pattern=c("low"), censor="*")
mgsub_censor("Flowers for a friend", pattern=c("low"), censor="*")
The hard worker doing everything for mgsub
worker(string, pattern, replacement, ...)
worker(string, pattern, replacement, ...)
string |
a character vector where replacements are sought |
pattern |
Character string to be matched in the given character vector |
replacement |
Character string equal in length to pattern or of length one which are a replacement for matched pattern. |
... |
arguments to pass to regexpr family |