\documentclass[nojss]{jss} \DeclareGraphicsExtensions{.pdf,.eps} %% need no \usepackage{Sweave} \author{Gabor Grothendieck\\GKX Associates Inc.} \Plainauthor{Gabor Grothendieck} \title{\pkg{gsubfn}: Utilities for Strings and for Function Arguments.} \Plaintitle{gsubfn: Utilities for Strings and for Function Arguments.} \Keywords{gsub, strings, \proglang{R}} \Plainkeywords{gsub, strings, R} \Abstract{ \pkg{gsubfn} is an \proglang{R} package used for string matching, substitution and parsing. A seemingly small generalization of \code{gsub}, namely allow the replacement string to be a replacement function, formula or \pkg{proto} object, can result in significantly increased power and applicability. The resulting function, \code{gsubfn} is the namesake of this package. Built on top of \code{gsubfn} is \code{strapply} which is similar to \code{gsubfn} except that it returns the output of the function rather than substituting it back into the source string. In the case of a replacement formula the formula is interpreted as a function as explained in the text. In the case of a replacement \pkg{proto} object the object space is used to store persistant data to be communicated from one function invocation to the next as well as to store the replacement function/method itself. The ability to have formula arguments that represent functions can be used not only in the functions of the \pkg{gsubfn} package but can also be used with any \proglang{R} function without modifying its source. Just preface any \proglang{R} function with \code{fn\$} and subject to certain rules which are intended to distinguish which formulas are intended to be functions and which are not, the formula arguments will be translated to functions, e.g. \code{fn\$integrate(\~{} x\^{}2/, 0, 1)}. This facility has widespread applicability right across \proglang{R} and its packages. \code{match.funfn}, is provided to allow developers to readily build this functionality into their own functions so that even the \code{fn\$} prefix need not be used. } \Address{ Gabor Grothendieck\\ GKX Associates Inc.\\ E-mail: \email{ggrothendieck@gmail.com} } \begin{document} \SweaveOpts{engine=R,eps=FALSE} %\VignetteIndexEntry{gsubfn: Utilities for Strings and for Function Arguments.} %\VignetteDepends{proto} %\VignetteKeywords{gsub, strings, R} %\VignettePackage{gsubfn} <>= library("gsubfn") library("proto") @ \section{Introduction} \label{sec:intro} The \proglang{R} system for statistical computing %% \citep[\url{http://www.R-project.org/}]{zoo:R:2005} contains a powerful function for string substitution called \code{gsub} which takes a regular expression, replacement string and source string and replaces all matches of the regular expression in the source string with the replacement string. Parenthesized items in the regular expression, called back references, can be referred to in the replacement string further increasing the range of applications that \code{gsub} can address. The key function and namesake of the \pkg{gsubfn} package is a function which is similar to \code{gsub} but the replacement string can optionally be a replacement function, formula (representing a function) or replacement \code{proto} object. Associated functions built on top of \code{gsubfn} are \code{strapply} which is an \code{apply} style function that is like \code{gsubfn} except that it returns the output of the replacement function rather than substituting it back into the string and \code{strapplyc} which is a faster version specialized to use \code{c} rather than a general function. In the case that a function is passed to \code{gsubfn}, for each match of the regular expression in the source string, the replacement function is called with one argument per backreference or if no backreferences with the match (unless instructed otherwise by the \code{backref} argument). The output of the replacement function is substituted back into the string replacing the match. In those cases where persistance is needed between invocations of the function a \code{proto} object containing a replacement method (a method is another name for function in this context) can be used and the object itself can be used by the replacement method as a repository for data that is to persist between calls to the replacement method. Such persistant data might be counts, prior matches and so on. Also \code{gsubfn} automatically places the argument values that \code{gsubfn} was called with as well as a \code{count} representing the number of matches so far into the object for use by the function. \code{pre} and \code{post} functions can also be entered into the object and are triggerred at the beginning and end, respectively, of each string. The idea of using a replacement function is also found in the \proglang{Lua} language \url{http://www.lua.org/manual/5.1/manual.html#pdf-string.gsub}. %% need to figure out how to to bibtex stuff %% \citep{gsubfn:Ierusalimschy:2003} . \pkg{gsubfn} follows that idea and builds on it with \pkg{proto} objects, formulas and associated function \code{strapply}. The remainder of this article is organized as follows: Section~\ref{sec:gsubfn with functions} explains the use \code{gsubfn} with replacement functions. Section~\ref{sec:gsubfn with proto objects} explains the use \code{gsubfn} with \pkg{proto} objects for applications requiring persistance between calls. Section~\ref{sec:strapply} explains the use \code{strapply} and Section~\ref{sec:Misc} explains the use of \code{cat0} and \code{paste0}. The functions specified in \code{gsubfn} can be specified as functions or using a formula notation. Facilities are included for using that notation with any \proglang{R} function, not just the ones in the \pkg{gsubfn} package. Section~\ref{sec:fn} explains this facility even if the function in question, e.g. \code{apply}, \code{integrate} was not so written and Section~\ref{sec:match.funfn} explains how developers can embed this into their own functions. \textit{Prerequisites}. The reader should be familiar with \proglang{R} and, in particular the \proglang{R} \code{gsub} function. Within \proglang{R}, help on \code{gsub} is found via the \code{?gsub} command and on the net it can be found at \begin{itemize} \item{\url{http://stat.ethz.ch/R-manual/R-patched/library/base/html/grep.html}} \end{itemize} The reader should also be familiar with regular expressions. Within \proglang{R}, help on regular expressions is found via the command \code{?regex} and on the net it can be found at \begin{itemize} \item{\url{http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html}} \end{itemize} Other Internet sources of information on regular expressions not specifically concerned with \proglang{R} are \begin{itemize} \item{Perl compatible regular expressions. \url{http://www.pcre.org/}} \item{Regular expressions. \url{http://www.regular-expressions.info/}} \item{Wikipedia. \url{http://en.wikipedia.org/wiki/Regular_expression}} \end{itemize} The discussions of passing \pkg{proto} objects to \code{gsubfn} and \code{strapply} require a minimal understanding of \proglang{R} environments using the \proglang{R} help command \code{?environment} and the \proglang{R} Language Manual found online at \begin{itemize} \item{\url{http://stat.ethz.ch/R-manual/R-patched/library/base/html/environment.html}} \item{\url{http://finzi.psych.upenn.edu/R/doc/manual/R-lang.html#Environment-objects}} \end{itemize} Since the use of the \pkg{proto} package itself is relatively restricted we will include sufficient information so that outside reference to the \pkg{proto} package will be unnecessary for the restricted purpose of using it here.\footnote{ More about \pkg{proto} is available in on the \pkg{proto} home page: \url{http://r-proto.googlecode.com} .} \section[The gsubfn Function]{The \code{gsubfn} Function} \label{sec:gsubfn with functions} \textit{Introduction}. The \code{gsubfn} function has a similar calling sequence to the \proglang{R} \code{gsub} function. The first argument is a regular expression, the second argument is a replacement string, replacement function, replacement formula representing a function or a replacement \pkg{proto} object. The third argument is the source string or a vector of such strings. In this section we are mainly concerned with replacement functions and replacement formulas representing replacement functions. In this case the replacement function is called for each match. The match and back references are passed as arguments. The input string is then copied to the output with the match being replaced with the output of the replacement function. \textit{Replacement function}. The replacement function can be specified by a formula in which the left hand side of the formula are the arguments separated by \code{"+"} (or any other valid formula symbol) while the right hand side represents the body. The environment of the formula will be used as the environment of the generated funciton. If the arguments on the left hand side are omitted then the free variables on the right hand side are used as arguments in the order encountered. \textit{Back References}. If the \code{backref} argument is not specified then all backreferences are passed to the function as separate arguments. If \code{backref} is \code{0} then no back references are passed and the entire match is passed. If \code{backref} is a postive integer, $n$, then the match and the first $n$ back references are passed. If \code{backref} is a negative integer then the match is not passed and the absolute value of \code{backref} is used as the number of back references to pass. Since \code{gsubfn} uses a potentially time consuming trial and error algorithm to automatically determine the number of back references the performance can be sped up somewhat by specifying \code{backref} even if all back references are to be passed. \textit{Example}. This example below replaces \code{x:y} pairs in \code{s} with their sum. The formula in this example is equivalent to specifying the function \code{function(x, y) as.numeric(x) + as.numeric(y)} : <>= s <- 'abc 10:20 def 30:40 50' gsubfn('([0-9]+):([0-9]+)', ~ as.numeric(x) + as.numeric(y), s) @ \section[gsubfn with lists]{\code{gsubfn} with \code{list} objects} \label{sec:gsubfn with list objects} \textit{Example}. If the replacement object is a list then the match is matched against the names of the list and the corresponding value is returned. If no name matches then the first unnamed list component is returned. If there is still no match then the string to be matched is returned so that effectively the lookup is ignored. For example: <>= dat <- c('3.5G', '88P', '19') # test data gsubfn('[MGP]$', list(M = 'e6', G = 'e9', P = 'e12'), dat) @ \section[gsubfn with proto objects]{\code{gsubfn} with \pkg{proto} objects} \label{sec:gsubfn with proto objects} \textit{Introduction}. In some applications one may need information from prior matches on current matches. This may be as simple as a count or as comprehensive as all prior matches. This is accomplished by passing a \pkg{proto} object whose object space can contain variables to be shared among the invocations of the matching function. The matching function itself is also be stored in the object as are the arguments to \code{gsubfn} and a special variable \code{count} which is automatically set to the match number. \textit{Proto}. A \pkg{proto} object is an \proglang{R} environment with an \proglang{S3} class of \code{c("proto", "environment")}. A \pkg{proto} object is created by calling the \code{"proto"} function with the components to be inserted given as arguments. This is very similar to the way lists are constructed in \proglang{R} except that unlike a list a \pkg{proto} object represents an \proglang{R} environment. \textit{Example}. The use of \pkg{proto} objects is best introduced via example. In the following example \code{p} is a \pkg{proto} object which contains one function \code{fun}. A function component of a \pkg{proto} object is called a method and we will use this terminology henceforth. In this example after the \code{proto} command to create \code{p} we examine the class of \code{p} and check the components of \code{p} using \code{ls}. Also we display the \code{fun} component itself. These are some of the basic operations on \pkg{proto} objects. Finally we run \code{gsubfn} using the regular expression \code{\textbackslash\textbackslash{}w+} and the \pkg{proto} object \code{p}. \code{gsubfn} looks for a component called \code{fun} in \code{p} and uses that as the replacement method/function. The arguments to \code{fun} are always the object itself, often represented by the formal argument \code{this}, \code{self} or just \code{.}, followed by the match and back references. In this example there are no back references. Here \code{fun} simply returns the match suffixed by the count of the match. The \code{count} variable is automatically placed into \code{p} by \code{gsubfn}. This has the effect of suffixing the first word with with \code{{1}}, the second with \code{{2}} and so on. After running \code{gsubfn} we examine \code{p} again noticing all the components that were added by \code{gsubfn} and we also examine the \code{count} component which shows how many matches were found. Note that use of \code{paste0} which is like \code{paste} but has a default \code{sep} of \code{""}. <>= p <- proto(fun = function(this, x) paste0(x, "{", count, "}")) class(p) ls(p) with(p, fun) s <- c("the dog and the cat are in the house", "x y x") gsubfn("\\w+", p, s) ls(p) p$count @ \textit{\code{pre} and \code{post}}. \code{gsubfn} knows about three methods: \code{fun} which we have already seen as well as \code{pre} and \code{post}. The latter two are optional and are run before each string and after each string respectively. Suppose we wish to suffix each word not by the count of all words but just by the count of that word. Thus the third occurrence of \code{"the"} will be suffixed with \code{{3}} rather than \code{{8}}. In that case we will set up a \code{words} list in the \code{pre} method. This method will be invoked at the start of each of the two strings in \code{s}. The \code{words} list itself is stored in the \code{pwords} \pkg{proto} object. Since all the methods of a \code{proto} object can share its contents \code{fun} can also make use of it. In the example below, each time we match a word, \code{pwords\$fun} adds it to the list \code{words}, if not already there, and increments it so that words[["the"]] will be \code{1} after \code{"the"} is encountered for the first time, \code{2} after the second time and so on. At the end of the example we look at what variables are in \code{pwords} and also check the contents of the \code{words} list. <>= pwords <- proto( pre = function(this) { this$words <- list() }, fun = function(this, x) { if (is.null(words[[x]])) this$words[[x]] <- 0 this$words[[x]] <- words[[x]] + 1 paste0(x, "{", words[[x]], "}") } ) gsubfn("\\w+", pwords, "the dog and the cat are in the house") ls(pwords) dput(pwords$words) @ Additional examples of the use of \pkg{proto} objects with \code{gsubfn} are available via the command \code{demo("gsubfn-proto")}. \section[strapply]{\code{strapply}} \label{sec:strapply} \textit{Introduction}. The strapply function is similar to the \code{gsubfn} function but instead of replacing the matched strings it returns the output of the function in a list or simplified structure. A typical use would be to split a string based on content rather than on delimiters. The arguments are analogous to the arguments in \code{apply}. In both the object to be applied over is the first argument. A modifier, which is an index for \code{apply} and a regular expression for \code{strapply} is the second argument. The third argument is a function in both cases although in strapply, in analogy to \code{gsubfn} it can also be a \pkg{proto} object. By default \code{strapply} uses the \code{tcl} regular expression engine but if the argument \code{engine="R"} is used or if the function is a proto object then the \code{R} regular expression engine is used instead. The \code{tcl} engine is much faster. (\code{tcl} regular expressions are largely identical to regular expressions in R. See this link \url{https://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm} for details.) The \code{simplify} argument is similar to the \code{simplify} argument in \code{sapply} and, in fact, is passed to \code{sapply} if it is \code{logical}. If \code{simplify} is a function or a formula representing a function then the output of \code{strapply} is passed as \code{output} to it via \code{do.call(simplify, output)}. \textit{Example}. To separate out the initial digits from the rest returning the the initial digits and the rest as two separate fields we can write this: <>= s <- c('123abc', '12cd34', '1e23') strapply(s, '^([[:digit:]]+)(.*)', c, simplify = rbind) @ In this example we calculate the midpoint of each interval. (Note to myself. The following code works if we enter it into R but not in the vignette. Figure out what is wrong. In the meantime we only show the source and but don't run it.) <>= as.num <- function(x) if (x == "NA") NA else as.numeric(x) rn <- c("[-11.9,-10.6]", "(NA,9.3]", "(9.3,8e01]", "(8.01,Inf]") colMeans(strapply(rn, "[^][(),]+", as.num, simplify = TRUE)) @ \code{combine}. The \code{combine} argument can be specified as a function which is to be applied to the output of the replacement function after each call. It defaults to \code{c}. Another popular choice is \code{list}. The following example illustrates the difference: <>= s <- c('a:b c:d', 'e:f') dput(strapply(s, '(.):(.)', c)) dput(strapply(s, '(.):(.)', c, combine = list)) @ \code{strapply} and \code{proto}. \code{strapply} can be used with \pkg{proto} in the same way as as \code{gsubfn}. For example, suppose we wish to extract the words from a string together with their ordinal occurrence number. Previously we did this with \code{gsubfn} and inserted the number back into the string. This time we want to extract it. (Note to myself. The following code works if we enter it into R and even works as part of the vignette if we use R CMD Sweave but if we use R CMD build then it does not work. Figure out what is wrong. In the meantime we only show the source and but don't run it.) <>= pwords2 <- proto( pre = function(this) { this$words <- list() }, fun = function(this, x) { if (is.null(words[[x]])) this$words[[x]] <- 0 this$words[[x]] <- words[[x]] + 1 list(x, words[[x]]) } ) strapply("the dog and the cat are in the house", "\\w+", pwords2, combine = list, simplify = x ~ do.call(rbind, x) ) ls(pwords2) dput(pwords2$words) @ \section[Miscellaneous]{Miscellaneous} \label{sec:Misc} The \code{cat0} and \code{paste0} function are like \code{cat} and \code{paste} they have a default \code{sep} of \code{""}. Here is an example of using paste0. This example retrieves overlapping segments consisting of a space, letter, space, letter and space. Only the final space, letter, space is returned. Because we did not specify \code{backref} it will think there are two back references (since it will interpret the lookahead expression as an extra back reference); however, the second is empty so it does no harm in passing it to \code{paste0}. It uses the zero-lookahead perl style pattern matching expression. <>= strapply(' a b c d e f ', ' [a-z](?=( [a-z] ))', paste0)[[1]] @ \section[fn]{\code{fn}} \label{sec:fn} Wherever a function can be specified in \code{gsubfn} and \code{strapply} one can specify a formula instead as discussed previously. This facility has been extended to work with any \proglang{R} function. Just preface the function with \code{fn\$} and \begin{enumerate} \item{formula arguments will be intercepted and translated to functions allowing a compact representation of the call. Which formulas are actually translated to functions is dependent on rules to be discussed. The right hand side of the formula represents the body of the function. The left hand side of the formula represents the arguments and defaults to the free variables in the order encountered. The environment of the function is set to the environment of the formula. \code{letters}, \code{LETTERS} and \code{pi} are not considered free variables and will not appear in arguments. } \item{character arguments will be intercepted and quasi-perl style string interpolation will be performed. Which character strings to operate on are dependent on rules to be discussed.} \item{the \code{simplify} argument if its value is a function is intercepted. In that case if \code{result} is the result of running the function without the \code{simplify} argument then it returns \code{do.call(simplify, result)}.} \end{enumerate} The rules for determining which formulas to translate and which character strings to apply quasi-perl style string interpolation are as follows: \begin{enumerate} \item{any formula argument that has been specified with a double \code{\~{}}, i.e. \code{\~{}\~{}}, is converted to a function after removing the double \code{\~{}} and replacing it with a single \code{\~{}}.} \item{any character string argument that has been specified with a first character of \code{\textbackslash{}1} has string interpolation applied to it after the \code{\textbackslash{}1} is removed.} \item{if there are no formulas with double \code{\~{}} and no character strings beginning with \code{\textbackslash{}1} then all formulas are converted to functions and if there are no formulas then all character strings have string interpolation done.} \end{enumerate} The last possibility is the actually the most commonly used and almost all our examples will illustrate that case. For example, <>= fn$integrate(~ sin(x) + sin(x), 0, pi/2) fn$lapply(list(1:4, 1:5), ~ LETTERS[x]) fn$mapply(~ seq_len(x) + y * z, 1:3, 4:6, 2) # list(9, 11:12, 13:15) fn$by(CO2[4:5], CO2[2], x ~ coef(lm(uptake ~ ., x)), simplify = rbind) @ Here is an example where we have two formulas, one of which should be translated and another should not. In this case we place a double \code{\~{}} in the second formula to signify that one it represents a function. The first formula is then correctly left untranslated. This example places a panel number in the body of each panel. <>= library(lattice) library(grid) print(fn$xyplot(uptake ~ conc | Plant, CO2, panel = ~~ { panel.xyplot(...); grid.text(panel.number(), .1, .85) })) @ \begin{figure}[hpb] \begin{center} <>= <> @ \caption{ \code{ fn\$xyplot }} \label{fig:gsubfn-fn-lattice-caption} \end{center} \end{figure} As mentioned briefly above, the \code{fn\$} prefix will also intercept any \code{simplify} argument if that argument is a function (but will not intercept it if it is \code{TRUE} or \code{FALSE}). In the case of inteception it runs the command then applies \code{do.call(simplify, result)} to the result of the command. A typical use would be with \code{by} as in the following example to calculate the regression coefficients of \code{uptake} on \code{conc} for each \code{Treatment}. This replaces the sligtly uglier \code{do.call} construct which would otherwise have been required. <>= fn$by(CO2, CO2$Treatment, d ~ coef(lm(uptake ~ conc, d)), simplify = rbind) @ Here are some additional examples to illustrate the wide range of application. The first replaces codes with upper case letters. Note that \code{LETTERS} is never interpreted as a free variable so the default argument is \code{x} here: <>= fn$lapply(list(1:4, 1:3), ~ LETTERS[x]) @ The next example uses \code{aggregate} to calculate the midrange of each of \code{conc} and \code{uptake} for each \code{Type}. The calculation is repeated using \code{cast} from the \pkg{reshape} package and again using \code{summaryBy} from the \pkg{doBy} package. Note that in two of the examples there are two formulas each and one is to be regarded as a function and the other not so we must use double \code{~{}} to distinguish the cases. <>= fn$aggregate(iris[-5], iris[5], ~ mean(range(x))) library(reshape) fn$cast(Type ~ variable, data = melt(CO2, id = 1:3), ~~ mean(range(x))) library(doBy) # need version 0.9 or later fn$summaryBy(. ~ Treatment, CO2[3:5], FUN = ~~ c(midrange = mean(range(x)))) @ <>= fn$aggregate(iris[-5], iris[5], ~ mean(range(x))) @ <>= library(reshape) fn$cast(Treatment ~ variable, data = melt(CO2, id = 1:3), ~~ mean(range(x))) library(doBy) # need version 0.9 or later fn$summaryBy(. ~ Treatment, CO2[3:5], FUN = ~~ c(midrange = mean(range(x)))) @ Here is another common use of \code{aggregate} or \code{by}. This calculates a weighted mean of the first column using weights in the second column all grouped by columns \code{A} and \code{B}. The \code{aggregate} example aggregates over indexes to circumvent the restriction of a single input to the aggregation function. Both \code{i} and \code{X} are free variables but we only want \code{i} to be an argument so we must specify it explicitly. <>= set.seed(1) X <- data.frame(X = rnorm(24), W = runif(24), A = gl(2, 1, 24), B = gl(2, 2, 24)) fn$aggregate(1:nrow(X), X[3:4], i ~ weighted.mean(X[i,1], X[i,2])) @ A number of mathematical functions take functions as arguments. Here we show the use of \code{fn\$} with \code{integrate} and \code{optimize}. <>= fn$integrate(~1/((x+1)*sqrt(x)), lower = 0, upper = Inf) fn$optimize(~ x^2, c(-1,1)) @ \proglang{S4} \code{setGeneric} and \code{setMethod} calls have function arguments that \code{fn\$} can be used with. In the following example we create an \proglang{S4} class \code{ooc} whose representation contains a single variable \code{a}. We then define a generic function \code{incr}. In this case the function arguments cannot be deduced from the body so we specify them explicitly. Then we define an \code{incr} method for class \code{ooc}. Since \code{a} is a free variable again we must define the arguments explicitly to ensure that it is not automatically included. Finally we illustrate the use of the \code{incr} method we just defined. <>= setClass('ooc', representation(a = 'numeric')) fn$setGeneric('incr', x + value ~ standardGeneric('incr')) fn$setMethod('incr', 'ooc', x + value ~ {x@a <- x@a+value; x}) oo <- new('ooc', a = 1) oo <- incr(oo,1) oo @ One commonly used calculation in quantile regression is the creation of a regression plot for each of a variety of values of \code{tau}. Here we plot \code{x} vs. \code{y} and then superimpose quantile regression lines for various \code{tau} values using \code{lapply} to avoid a loop. The \code{lapply} function of \code{tau} is specified using a formula. <>= library(quantreg) data(engel) plot(engel$income, engel$foodexp, xlab = 'income', ylab = 'food expenditure') junk <- fn$lapply(1:9/10, tau ~ abline(coef(rq(foodexp ~ income, tau, engel)))) @ <>= plot(engel$income, engel$foodexp, xlab = 'income', ylab = 'food expenditure') junk <- fn$lapply(1:9/10, tau ~ abline(coef(rq(foodexp ~ income, tau, engel)))) @ \begin{figure}[hpb] \begin{center} <>= <> @ \caption{ \code{ Plot \code{engel} data with quantile lines } } \label{fig:gsubfn-fn-quantreg-caption} \end{center} \end{figure} In time series we may wish to calculate a rolling summary of the data. In this case we calculate a rolling midrange of the data using the \pkg{zoo} function \code{rollapply}: <>= library(zoo) fn$rollapply(LakeHuron, 12, ~ mean(range(x))) @ A common statistical technique for assessing statistics is the bootstrap technique provided in package \pkg{boot}. Here we compactly the bias and standard error of the median statistic using the \code{rivers} data set and 2000 samples. <>= library(boot) set.seed(1) fn$boot(rivers, ~ median(x[d]), R = 2000) @ Here is a plotting application that illustrates that \code{pi} is automatically excluded from default arguments. <>= x <- 0:50/50 matplot(x, fn$outer(x, 1:8, ~ sin(x * k*pi)), type = 'blobcsSh') @ \begin{figure}[hpb] \begin{center} <>= <> @ \caption{ \code{matplot(x, fn\$outer(x, 1:8, \~{} sin(x * k*pi)), type = 'blobcsSh')} } \label{fig:gsubfn-fn-pi-caption} \end{center} \end{figure} Here we define matrix multiplication in terms of two calls to \code{apply} and the inner product definition. The advantage of this is that it can easily be modified to use different inner products. This illustrates a nested use of \code{fn\$}: <>= a <- matrix(4:1, 2); b <- matrix(1:4, 2) # test matrices fn$apply(b, 2, x ~ fn$apply(a, 1, y ~ sum(x*y))) a %*% b @ Another example of nesting is the following which generates all subsequences of \code{1:4}. <>= L <- fn$apply(fn$sapply(1:4, ~ rbind(i,i:4), simplify = cbind), 2, ~ x[1]:x[2]) dput(L) @ In the \proglang{Python} language there exists a convenient notation for expressing lists with side conditions. For example, \code{[ x*x for x in range(1,11) if x\%2 == 0]}. To express this in \proglang{R} using \code{fn\$} we can write it like this which gets fairly close to the \proglang{Python} formulation: <>= fn$sapply( 1:10, ~ if (x%%2==0) x^2, simplify = c) @ Here is an example of string interpolation: <>= fn$cat("pi = $pi, exp = `exp(1)`\n") @ \section[match.funfn and as.function.formula]{\code{match.funfn} and \code{as.function.formula}} \label{sec:match.funfn} Developers who wish to add the \code{fn\$} capability to their own functions (so that the user does not have to prepend them with \code{fn\$)} can use the supplied \code{match.funfn} function which in turn uses the \code{as.function.formula} function to convert formulas to functions. \code{match.funfn} is like the \code{match.fun} in \proglang{R} function except that it also converts formulas, not just character strings. For example with the definition of \code{sq} shown below the formal argument $f$ can be a formula, character string or function as shown in the statements following: <>= sq <- function(f, x) { f <- match.funfn(f); f(x^2) } sq(~ exp(x)/x, pi) f <- function(x) exp(x)/x sq('f', pi) # character string f <- function(x) exp(x)/x sq(f, pi) sq(function(x) exp(x)/x, pi) @ \section{Summary} \label{sec:summary} By simply extending the replacement string in \code{gsub} to functions, formulas and \pkg{proto} objects we obtain a function which on the surface appears nearly identical to \code{gsub} but, in fact, has powerful ramifications for processing. \section*{Computational details} The results in this paper were obtained using \proglang{R} \Sexpr{paste(R.Version()[6:7], collapse = ".")} with the packages \pkg{boot} \Sexpr{gsub("-", "--", packageDescription("boot")$Version)}, \pkg{doBy} \Sexpr{gsub("-", "--", packageDescription("doBy")$Version)}, \pkg{grid} \Sexpr{gsub("-", "--", packageDescription("grid")$Version)}, \pkg{gsubfn} \Sexpr{gsub("-", "--", packageDescription("gsubfn")$Version)}, \pkg{lattice} \Sexpr{gsub("-", "--", packageDescription("lattice")$Version)}, \pkg{proto} \Sexpr{gsub("-", "--", packageDescription("proto")$Version)}, \pkg{quantreg} \Sexpr{gsub("-", "--", packageDescription("quantreg")$Version)} and \pkg{reshape} \Sexpr{gsub("-", "--", packageDescription("reshape")$Version)}. \proglang{R} itself and all packages used are available from CRAN at \url{http://CRAN.R-project.org/}. %% \bibliography{gsubfn} %% \newpage %% \begin{appendix} %% \section{Reference card} %% \input{gsubfn-refcard-raw} %% \end{appendix} \end{document}