install.packages("openxlsx2")
2 Basics
Welcome to the basic manual to openxlsx2
. In this manual you will learn how to use openxlsx2
to import data from xlsx-files to R as well as how to export data from R to xlsx, and how to import and modify these openxml workbooks in R. This package is based on the work of many contributors to openxlsx
. It was mostly rewritten using pugixml
and R6
making use of modern technology, providing a fresh and easy to use R package.
Over the years many people have worked on the tricky task to handle xls and xlsx files. Notably openxlsx
, but there are countless other R-packages as well as third party libraries or calculation software capable of handling such files. Please feel free to use and test your files with other software and or let us know about your experience. Open an issue on github or write us a mail.
2.1 First steps
First let’s assume that you have a working installation of openxlsx2
otherwise run the lines below to install the latest CRAN release:
Now we load the library:
2.2 Handling workbooks
The foundation of openxlsx2
is a workbook object. You can think of this object as a workbook loaded in a spreadsheet software. We import the entire thing. Every sheet, every chart, image, column, formula style, conditional formatting, pivot table and whatever else a spreadsheet file is allowed to carry. Therefore if you have a file that you want to work with, you can load it with:
wb <- wb_load("your_file.xlsx")
We usually name workbook objects wb
in our documentation, but this is no obligation, you can name your workbook object whatever you like to call them.
If you do not have a workbook yet, it is possible to create one. In the next line we will use three wrapper functions wb_workbook()
, wb_add_worksheet()
, and wb_add_data()
. The wrapper functions are piped together using the classic magrittr
pipe operator %>%
, but similarly you can use the R
’s native pipe operator |>
. 1 We assume that you have a dataset your_data
, either a vector, a matrix or a data frame and want to write this in a worksheet:
wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_data(x = your_data)
Okay, now you have a workbook object, but what have we actually done? Let’s work along the pipe syntax: (1) first we have created the workbook object wb_worbkook()
, (2) next we have assigned it a worksheet wb_add_worksheet()
, and (3) we have written data onto the worksheet.
Let’s try this with actual data. We use the mtcars
dataset. In the code we switch the fictional your_data
with mtcars
:
wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_data(x = mtcars)
Let’s see what the output looks like:
wb
#> A Workbook object.
#>
#> Worksheets:
#> Sheets: Sheet 1
#> Write order: 1
The output looks a little cryptic, it simply tells the name of the worksheet: wb_add_worksheet()
created a default worksheet name "Sheet 1"
. In the code above you can see that we do not use sheet
to tell wb_add_data()
where it should write the data. This is because internally we use a waiver current_sheet()
so that we do not have to write sheet = "Sheet 1"
whenever we work on the same worksheet. Basically the current sheet is updated whenever a new worksheet is added to the workbook.
wb <- wb_workbook() %>% wb_add_worksheet() %>% wb_add_worksheet() %>% wb_add_data(x = mtcars)
This will create two sheets "Sheet 1"
and "Sheet 2"
and the data will be written to the second sheet.
wb
#> A Workbook object.
#>
#> Worksheets:
#> Sheets: Sheet 1, Sheet 2
#> Write order: 1, 2
So how can we access the data on the sheet? Either with wb_to_df()
our internal handler to read from workbooks (this is the underlying function for wb_read()
and read_xlsx()
which are mere aliases for wb_to_df()
). So lets have a look at the top of the output:
Ah! The output is on the second sheet. We need either sheet = 2
or sheet = "Sheet 2"
. We go with the second variant, because the sheet index position and their name might differ.
wb %>% wb_to_df(sheet = "Sheet 2") %>% head()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 2 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 3 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> 7 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This looks like the head of the mtcars
dataset. So we have successfully read from the workbook. Now you want to export the workbook to a file:
Alternatively you can directly open it in a spreadsheet software (if you have one installed):
Once again, lets try this with the USPersonalExpenditure
dataset:
wb <- wb_workbook()
wb_add_worksheet(wb, sheet = "USexp")
wb_add_data(wb, "USexp", USPersonalExpenditure)
#> Error : Can't add data to a workbook with no worksheet.
#> Did you forget to add a worksheet with `wb_add_worksheet()`?
Dang! What did we do? We’ve added a worksheet, but wait, did we? No, you have to assign wrapper functions to an object for them to have an effect. Wrapper functions do not alter the workbook objects they are executed on. You can check that the workbook has no worksheets:
wb %>% wb_get_sheet_names()
#> named character(0)
Once we assign a sheet, this changes, and the data was correctly written:
wb <- wb_workbook()
wb <- wb_add_worksheet(wb, sheet = "USexp")
wb <- wb_add_data(wb, "USexp", USPersonalExpenditure)
wb_get_sheet_names(wb)
#> USexp
#> "USexp"
wb_to_df(wb)
#> 1940 1945 1950 1955 1960
#> 2 22.200 44.500 59.60 73.2 86.80
#> 3 10.500 15.500 29.00 36.5 46.20
#> 4 3.530 5.760 9.71 14.0 21.10
#> 5 1.040 1.980 2.45 3.4 5.40
#> 6 0.341 0.974 1.80 2.6 3.64
Now you’re probably thinking, I don’t want to assign the workbook object all the time and all the wb_
functions are a little tedious to type. There is an alternative for you and it is called chaining. Since the workbook is a R6
object internally, you can make use of chains. Basically every function that starts with wb_
should have a underlying function of the same name without the prefix. So our data writing example from above can be written as:
wb <- wb_workbook()$add_worksheet("USexp")$add_data(x = USPersonalExpenditure)
wb$to_df()
#> 1940 1945 1950 1955 1960
#> 2 22.200 44.500 59.60 73.2 86.80
#> 3 10.500 15.500 29.00 36.5 46.20
#> 4 3.530 5.760 9.71 14.0 21.10
#> 5 1.040 1.980 2.45 3.4 5.40
#> 6 0.341 0.974 1.80 2.6 3.64
Whether you use wrapper functions or chain functions is up to you and personal preference. There is just one thing to remember, the documentation is exclusively written for the wrapper function. So if you want to know the arguments for the wb$add_data()
part, you have to lookup the wrapper functions man page ?wb_add_data
.
2.3 Importing as workbook
In addition to importing directly from xlsx or xlsm files, openxlsx2
provides the wbWorkbook
class used for importing and modifying entire the openxml files in R
. This workbook
class is the heart of openxlsx2
and probably the reason why you are reading this manual in the first place.
Importing a file into a workbook looks like this:
# the file we are going to load
file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
# loading the file into the workbook
wb <- wb_load(file = file)
The additional options wb_load()
provides are for internal use: sheet
loads only a selected sheet from the workbook and data_only
reads only the data parts from a workbook and ignores any additional graphics or pivot tables. Both functions create workbook objects that can only be used to read data, and we do not recommend end users to use them. Especially not if they intend to re-export the workbook afterwards.
Once a workbook is imported, we provide several functions to interact with and modify it (the wb_to_df()
function mentioned above works the same way for an imported workbook). It is possible to add new sheets and remove sheets, as well as to add or remove data. R-plots can be inserted and also the style of the workbook can be changed, new fonts, background colors and number formats. There is a wealth of options explained in the man pages and the additional style vignette (more vignettes to follow).
3 Exporting data
3.1 Exporting data frames or vectors
If you want to export a data frame from R, you can use write_xlsx()
which will create an xlsx file. This file can be tweaked further. See ?write_xlsx
to see all the options. (further explanation and examples will follow).
write_xlsx(x = mtcars, file = "mtcars.xlsx")
3.2 Exporting a wbWorkbook
Imported workbooks can be saved as xlsx or xlsm files with the wrapper wb_save()
or with wb$save()
. Both functions take the filename and an optional overwrite
option. If the latter is set, an optional guard is provided to check if the file you want to write already exists. But be careful, this is optional. The default is to save the file and replace an existing file. Of course, on Windows, files that are locked (for example, if they were opened by another process) will not be replaced.
# replace the existing file
wb$save("mtcars.xlsx")
# do not overwrite the existing file
try(wb$save("mtcars.xlsx", overwrite = FALSE))
3.3 dims
/ wb_dims()
In openxlsx2
functions that interact with worksheet cells are using dims
as argument and require the users to provide these. dims
are cells or cell ranges in A1 notation. The single argument dims
hereby replaces col
/row
, cols
/rows
and xy
. Since A1 notation is rather simple in the first few columns it might get confusing after the 26. Therefore we provide a wrapper to construct it:
# various options
wb_dims(from_row = 4)
#> [1] "A4"
wb_dims(rows = 4, cols = 4)
#> [1] "D4"
wb_dims(rows = 4, cols = "D")
#> [1] "D4"
wb_dims(rows = 4:10, cols = 5:9)
#> [1] "E4:I10"
wb_dims(rows = 4:10, cols = "A:D") # same as below
#> [1] "A4:D10"
wb_dims(rows = seq_len(7), cols = seq_len(4), from_row = 4)
#> [1] "A4:D10"
# 10 rows and 15 columns from indice B2.
wb_dims(rows = 1:10, cols = 1:15, from_col = "B", from_row = 2)
#> [1] "B2:P11"
# data + col names
wb_dims(x = mtcars)
#> [1] "A1:K33"
# only data
wb_dims(x = mtcars, select = "data")
#> [1] "A2:K33"
# The dims of the values of a column in `x`
wb_dims(x = mtcars, cols = "cyl")
#> [1] "B2:B33"
# a column in `x` with the column name
wb_dims(x = mtcars, cols = "cyl", select = "x")
#> [1] "B1:B33"
# rows in `x`
wb_dims(x = mtcars)
#> [1] "A1:K33"
# in a wb chain
wb <- wb_workbook()$
add_worksheet()$
add_data(x = mtcars)$
add_fill(
dims = wb_dims(x = mtcars, rows = 1:5), # only 1st 5 rows of x data
color = wb_color("yellow")
)$
add_fill(
dims = wb_dims(x = mtcars, select = "col_names"), # only column names
color = wb_color("cyan2")
)
# or if the data's first coord needs to be located in B2.
wb_dims_custom <- function(...) {
wb_dims(x = mtcars, from_col = "B", from_row = 2, ...)
}
wb <- wb_workbook()$
add_worksheet()$
add_data(x = mtcars, dims = wb_dims_custom())$
add_fill(
dims = wb_dims_custom(rows = 1:5),
color = wb_color("yellow")
)$
add_fill(
dims = wb_dims_custom(select = "col_names"),
color = wb_color("cyan2")
)
-
Basically a pipe operator allows to write code from left to right. Without pipes the code would look like this:
wb <- wb_add_data(wb_add_worksheet(wb_workbook()), x = your_data)