Blocks
R blocks
You can write R language to transform data in blocks.
Add R block to pipeline
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block.
- Select
R
.
Example pipeline
- Data loader
load_data <- function() { # Specify your data loading logic here # Return value: loaded dataframe df <- read.csv(url('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')) df }
- Transformer
library("pacman") p_load(dplyr) transform <- function(df_1, ...) { # Specify your transformation logic here # Return value: transformed dataframe. df_1 <- filter(df_1, Pclass < 3) df_1 }
- Data exporter
export_data <- function(df_1, ...) { # Specify your data exporting logic here # Return value: exported dataframe write.csv(df_1, "titanic_filtered.csv", row.names = FALSE) }
Install R packages
Add the following at the start of your code in your R block:
pacman::p_load(package1, package2, package3)
Or
library("pacman")
p_load(dplyr)
Note
When you run the R block for the 1st time, the package will be installed. The 2nd time you run the R block, the package won’t need to be installed again.
What is pacman
?
pacman
is an R package management tool. You can use p_library()
to view all the available packages.
Here is the documentation for pacman
where you can find more useful methods: https://www.rdocumentation.org/packages/pacman/versions/0.5.1
Runtime variables
Runtime variables can be accessed via global_vars
vector, like global_vars['execution_date']
.
Example code:
load_data <- function() {
df <- read.csv(file='titanic_clean.csv')
df['date'] <- global_vars['execution_date']
df
}