--- title: "Introduction to R6 classes" output: html_document: theme: null css: mystyle.css toc: yes vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Introduction to R6 classes} %\usepackage[utf8]{inputenc} --- ```{r echo = FALSE} library(pryr) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` The R6 package provides a type of class which is similar to R's standard reference classes, but it is more efficient and doesn't depend on S4 classes and the methods package. ## R6 classes R6 classes are similar to R's standard reference classes, but are lighter weight, and avoid some issues that come along with using S4 classes (R's reference classes are based on S4). For more information about speed and memory footprint, see the Performance vignette. Unlike many objects in R, instances (objects) of R6 classes have reference semantics. R6 classes also support: * public and private methods * active bindings * inheritance (superclasses) which works across packages Why the name R6? When R's reference classes were introduced, some users, following the names of R's existing class systems S3 and S4, called the new class system R5 in jest. Although reference classes are not actually called R5, the name of this package and its classes takes inspiration from that name. The name R5 was also a code-name used for a different object system started by Simon Urbanek, meant to solve some issues with S4 relating to syntax and performance. However, the R5 branch was shelved after a little development, and it was never released. ### Basics Here's how to create a simple R6 class. The `public` argument is a list of items, which can be functions and fields (non-functions). Functions will be used as methods. ```{r} library(R6) Person <- R6Class("Person", public = list( name = NULL, hair = NULL, initialize = function(name = NA, hair = NA) { self$name <- name self$hair <- hair self$greet() }, set_hair = function(val) { self$hair <- val }, greet = function() { cat(paste0("Hello, my name is ", self$name, ".\n")) } ) ) ``` To instantiate an object of this class, use `$new()`: ```{r} ann <- Person$new("Ann", "black") ann ``` The `$new()` method creates the object and calls the `initialize()` method, if it exists. Inside methods of the class, `self` refers to the object. Public members of the object (all you've seen so far) are accessed with `self$x`, and assignment is done with `self$x <- y`. Note that by default, `self` is required to access members, although for non-portable classes which we'll see later, it is optional. Once the object is instantiated, you can access values and methods with `$`: ```{r} ann$hair ann$greet() ann$set_hair("red") ann$hair ``` Implementation note: The external face of an R6 object is basically an environment with the public members in it. This is also known as the *public environment*. An R6 object's methods have a separate *enclosing environment* which, roughly speaking, is the environment they "run in". This is where `self` binding is found, and it is simply a reference back to public environment. ### Private members In the previous example, all the members were public. It's also possible to add private members: ```{r} Queue <- R6Class("Queue", public = list( initialize = function(...) { for (item in list(...)) { self$add(item) } }, add = function(x) { private$queue <- c(private$queue, list(x)) invisible(self) }, remove = function() { if (private$length() == 0) return(NULL) # Can use private$queue for explicit access head <- private$queue[[1]] private$queue <- private$queue[-1] head } ), private = list( queue = list(), length = function() base::length(private$queue) ) ) q <- Queue$new(5, 6, "foo") ``` Whereas public members are accessed with `self`, like `self$add()`, private members are accessed with `private`, like `private$queue`. The public members can be accessed as usual: ```{r} # Add and remove items q$add("something") q$add("another thing") q$add(17) q$remove() q$remove() ``` However, private members can't be accessed directly: ```{r eval = FALSE} q$queue #> NULL q$length() #> Error: attempt to apply non-function ``` A useful design pattern is for methods to return `self` (invisibly) when possible, because it makes them chainable. For example, the `add()` method returns `self` so you can chain them together: ```{r} q$add(10)$add(11)$add(12) ``` On the other hand, `remove()` returns the value removed, so it's not chainable: ```{r} q$remove() q$remove() q$remove() q$remove() ``` ### Active bindings Active bindings look like fields, but each time they are accessed, they call a function. They are always publicly visible. ```{r} Numbers <- R6Class("Numbers", public = list( x = 100 ), active = list( x2 = function(value) { if (missing(value)) return(self$x * 2) else self$x <- value/2 }, rand = function() rnorm(1) ) ) n <- Numbers$new() n$x ``` When an active binding is accessed as if reading a value, it calls the function with `value` as a missing argument: ```{r} n$x2 ``` When it's accessed as if assigning a value, it uses the assignment value as the `value` argument: ```{r} n$x2 <- 1000 n$x ``` If the function takes no arguments, it's not possible to use it with `<-`: ```{r eval=FALSE} n$rand #> [1] 0.2648 n$rand #> [1] 2.171 n$rand <- 3 #> Error: unused argument (quote(3)) ``` Implementation note: Active bindings are bound in the public environment. The enclosing environment for these functions is also the public environment. ### Inheritance One R6 class can inherit from another. In other words, you can have super- and sub-classes. Subclasses can have additional methods, and they can also have methods that override the superclass methods. In this example of a queue that retains its history, we'll add a `show()` method and override the `remove()` method: ```{r} # Note that this isn't very efficient - it's just for illustrating inheritance. HistoryQueue <- R6Class("HistoryQueue", inherit = Queue, public = list( show = function() { cat("Next item is at index", private$head_idx + 1, "\n") for (i in seq_along(private$queue)) { cat(i, ": ", private$queue[[i]], "\n", sep = "") } }, remove = function() { if (private$length() - private$head_idx == 0) return(NULL) private$head_idx <<- private$head_idx + 1 private$queue[[private$head_idx]] } ), private = list( head_idx = 0 ) ) hq <- HistoryQueue$new(5, 6, "foo") hq$show() hq$remove() hq$show() hq$remove() ``` Superclass methods can be called with `super$xx()`. The `CountingQueue` (example below) keeps a count of the total number of objects that have ever been added to the queue. It does this by overriding the `add()` method -- it increments a counter and then calls the superclass's `add()` method, with `super$add(x)`: ```{r} CountingQueue <- R6Class("CountingQueue", inherit = Queue, public = list( add = function(x) { private$total <<- private$total + 1 super$add(x) }, get_total = function() private$total ), private = list( total = 0 ) ) cq <- CountingQueue$new("x", "y") cq$get_total() cq$add("z") cq$remove() cq$remove() cq$get_total() ``` ### Fields containing reference objects If your R6 class contains any fields that also have reference semantics (e.g., other R6 objects, and environments), those fields should be populated in the `initialize` method. If the field set to the reference object directly in the class definition, that object will be shared across all instances of the R6 objects. Here's an example: ```{r} SimpleClass <- R6Class("SimpleClass", public = list(x = NULL) ) SharedField <- R6Class("SharedField", public = list( e = SimpleClass$new() ) ) s1 <- SharedField$new() s1$e$x <- 1 s2 <- SharedField$new() s2$e$x <- 2 # Changing s2$e$x has changed the value of s1$e$x s1$e$x ``` To avoid this, populate the field in the `initialize` method: ```{r} NonSharedField <- R6Class("NonSharedField", public = list( e = NULL, initialize = function() self$e <- SimpleClass$new() ) ) n1 <- NonSharedField$new() n1$e$x <- 1 n2 <- NonSharedField$new() n2$e$x <- 2 # n2$e$x does not affect n1$e$x n1$e$x ``` ## Portable and non-portable classes In R6 version 1.0.1, the default was to create **non-portable** classes. In subsequent versions, the default is to create **portable** classes. The two most noticeable differences are that portable classes: * Support inheritance across different packages. Non-portable classes do not do this very well. * Always require the use of `self` and `private` to access members, as in `self$x` and `private$y`. Non-portable classes can access these members with just `x` and `y`, and do assignment to these members with the `<<-` operator. The implementation of the first point is such that it makes the second point necessary. ### Using `self` and `<<-` With reference classes, you can access the field without `self`, and assign to fields using `<<-`. For example: ```{r} RC <- setRefClass("RC", fields = list(x = 'ANY'), methods = list( getx = function() x, setx = function(value) x <<- value ) ) rc <- RC$new() rc$setx(10) rc$getx() ``` The same is true for non-portable R6 classes: ```{r} NP <- R6Class("NP", portable = FALSE, public = list( x = NA, getx = function() x, setx = function(value) x <<- value ) ) np <- NP$new() np$setx(10) np$getx() ``` But for portable R6 classes (this is the default), you must use `self` and/or `private`, and `<<-` assignment doesn't work -- unless you use `self`, of course: ```{r} P <- R6Class("P", portable = TRUE, # This is default public = list( x = NA, getx = function() self$x, setx = function(value) self$x <- value ) ) p <- P$new() p$setx(10) p$getx() ``` For more information, see the Portable vignette. ## Other topics ### Adding members to an existing class It is sometimes useful to add members to a class after the class has already been created. This can be done using the `$set()` method on the generator object. ```{r} Simple <- R6Class("Simple", public = list( x = 1, getx = function() self$x ) ) Simple$set("public", "getx2", function() self$x*2) # To replace an existing member, use overwrite=TRUE Simple$set("public", "x", 10, overwrite = TRUE) s <- Simple$new() s$x s$getx2() ``` The new members will be present only in instances that are created after `$set()` has been called. To prevent modification of a class, you can use `lock_class=TRUE` when creating the class. You can also lock and unlock a class as follows: ```{r} # Create a locked class Simple <- R6Class("Simple", public = list( x = 1, getx = function() self$x ), lock_class = TRUE ) # This would result in an error # Simple$set("public", "y", 2) # Unlock the class Simple$unlock() # Now it works Simple$set("public", "y", 2) # Lock the class again Simple$lock() ``` ### Cloning objects By default, R6 objects have method named `clone` for making a copy of the object. ```{r} Simple <- R6Class("Simple", public = list( x = 1, getx = function() self$x ) ) s <- Simple$new() # Create a clone s1 <- s$clone() # Modify it s1$x <- 2 s1$getx() # Original is unaffected by changes to the clone s$getx() ``` ```{r clone-size, echo=FALSE} # Calculate size of clone method in this block. Cloneable <- R6Class("Cloneable", cloneable = TRUE) NonCloneable <- R6Class("NonCloneable", cloneable = FALSE) c1 <- Cloneable$new() c2 <- Cloneable$new() # Bytes for each new cloneable object cloneable_delta <- object_size(c1, c2) - object_size(c2) nc1 <- NonCloneable$new() nc2 <- NonCloneable$new() # Bytes for each new noncloneable object noncloneable_delta <- object_size(nc1, nc2) - object_size(nc2) # Number of bytes used by each copy of clone method additional_clone_method_bytes <- cloneable_delta - noncloneable_delta additional_clone_method_bytes_str <- capture.output(print(additional_clone_method_bytes)) # Number of bytes used by first copy of a clone method first_clone_method_bytes <- object_size(c1) - object_size(nc1) # Need some trickery to get the nice output from pryr::print.bytes first_clone_method_bytes_str <- capture.output(print(first_clone_method_bytes)) ``` If you don't want a `clone` method to be added, you can use `cloneable=FALSE` when creating the class. If any loaded R6 object has a `clone` method, that function uses `r first_clone_method_bytes_str`, but for each additional object, the `clone` method costs a trivial amount of space (`r additional_clone_method_bytes` bytes). #### Deep cloning If there are any fields which are objects with reference sematics (environments, R6 objects, reference class objects), the copy will get a reference to the same object. This is sometimes desirable, but often it is not. For example, we'll create an object `c1` which contains another R6 object, `s`, and then clone it. Because the original's and the clone's `s` fields both refer to the same object, modifying it from one results in a change that is reflect in the other. ```{r} Simple <- R6Class("Simple", public = list(x = 1)) Cloneable <- R6Class("Cloneable", public = list( s = NULL, initialize = function() self$s <- Simple$new() ) ) c1 <- Cloneable$new() c2 <- c1$clone() # Change c1's `s` field c1$s$x <- 2 # c2's `s` is the same object, so it reflects the change c2$s$x ``` To make it so the clone receives a *copy* of `s`, we can use the `deep=TRUE` option: ```{r} c3 <- c1$clone(deep = TRUE) # Change c1's `s` field c1$s$x <- 3 # c2's `s` is different c3$s$x ``` The default behavior of `clone(deep=TRUE)` is to copy fields which are R6 objects, but not copy fields which are environments, reference class objects, or other data structures which contain other reference-type objects (for example, a list with an R6 object). If your R6 object contains these types of objects and you want to make a deep clone of them, you must provide your own function for deep cloning, in a private method named `deep_clone`. Below is an example of an R6 object with two fields, `a` and `b`, both of which which are environments, and both of which contain a value `x`. It also has a field `v` which is a regular (non-reference) value, and a private `deep_clone` method. The `deep_clone` method is be called once for each field. It is passed the name and value of the field, and the value it returns is be used in the clone. ```{r} CloneEnv <- R6Class("CloneEnv", public = list( a = NULL, b = NULL, v = 1, initialize = function() { self$a <- new.env(parent = emptyenv()) self$b <- new.env(parent = emptyenv()) self$a$x <- 1 self$b$x <- 1 } ), private = list( deep_clone = function(name, value) { # With x$clone(deep=TRUE) is called, the deep_clone gets invoked once for # each field, with the name and value. if (name == "a") { # `a` is an environment, so use this quick way of copying list2env(as.list.environment(value, all.names = TRUE), parent = emptyenv()) } else { # For all other fields, just return the value value } } ) ) c1 <- CloneEnv$new() c2 <- c1$clone(deep = TRUE) ``` When `c1$clone(deep=TRUE)` is called, the `deep_clone` method is called for each field in `c1`, and is passed the name of the field and value. In our version, the `a` environment gets copied, but `b` does not, nor does `v` (but that doesn't matter since `v` is not a reference object). We can test out the clone: ```{r} # Modifying c1$a doesn't affect c2$a, because they're separate objects c1$a$x <- 2 c2$a$x # Modifying c1$b does affect c2$b, because they're the same object c1$b$x <- 3 c2$b$x # Modifying c1$v doesn't affect c2$v, because they're not reference objects c1$v <- 4 c2$v ``` In the example `deep_clone` method above, we checked the name of each field to determine what to do with it, but we could also check the value, by using `inherits(value, "R6")`, or `is.environment()`, and so on. ### Printing R6 objects to the screen R6 objects have a default `print` method that lists all members of the object. If a class defines a `print` method, then it overrides the default one. ```{r} PrettyCountingQueue <- R6Class("PrettyCountingQueue", inherit = CountingQueue, public = list( print = function(...) { cat(" of ", self$get_total(), " elements\n", sep = "") invisible(self) } ) ) ``` ```{r} pq <- PrettyCountingQueue$new(1, 2, "foobar") pq ``` ### Finalizers Sometimes it's useful to run a function when the object is garbage collected. For example, you may want to make sure a file or database connection gets closed. To do this, you can define a `finalize()` method, which will be called with no arguments when the object is garbage collected. ```{r} A <- R6Class("A", public = list( finalize = function() { print("Finalizer has been called!") } )) # Instantiate an object: obj <- A$new() # Remove the single existing reference to it, and force garbage collection # (normally garbage collection will happen automatically from time # to time) rm(obj); gc() ``` Finalizers are implemented using the `reg.finalizer()` function, and they set `onexit=TRUE`, so that the finalizer will also be called when R exits. This is useful in some cases, like database connections. ## Summary R6 classes provide capabilities that are common in other object-oriented programming languages. They're similar to R's built-in reference classes, but are simpler, smaller, and faster, and they allow inheritance across packages.