1 Introduction

Unfortunately the entire business world is still built almost entirely on Microsoft Office tools and whenever data is involved, this means that it is largely built on the spreadsheet software Excel. R users that want to interact with this previously closed-source file format had to rely on various packages (the following is not necessarily a complete list of all packages). Packages that create workbook objects like xlsx (Dragulescu and Arendt 2023) and openxlsx (Schauberger and Walker 2023) and packages for special tasks namely readxl (Wickham and Bryan 2023), readxlsb (Allen 2023)¹, tidyxl (Garmonsway 2022), writexl (Ooms 2023) and WriteXLS (Schwartz 2022), some are Windows exclusive interacting with Excel via a DCOM server RDCOMClient and RExcel ², some are not, XLconnect. ³

In Excel 2007 a new open standard called OOXML(short for Office Open XML)⁴ which we will refer to as openxml was introduced. In December 2006 this standard was accepted by the ECMA and it subsequently replaced the previously used xls files wherever people are working with spreadsheet software (after all we are all aware that accounting does not really care whatever file format they are using as long as it opens up in their favorite spreadsheet software). The openxml standard introduced the so called Excel 2007 workbook format xlsx. These files are a collection of zipped XML-files. This makes it easy to import the files to R, because all you need is a tool to unzip the files and an XML-parser to import the files as data frames. Still, since there are various tasks available to interact with spreadsheet files, there are also various tools required. If all you want to do is read from files readxl is probably enough, if all you want to do is write xlsx files writexl is probably the fastest choice available. Yet there are a plethora of other tasks available and this book is about them.

The predecessor to openxlsx2 (Barbone and Garbuszus 2024) called openxlsx (originally founded by Andrew Walker) was inspired by the rJava based xlsx package, but dropped the rJava dependency, and the support for the old xls files and wrote a custom XML parser in Rcpp (Eddelbuettel and François 2011). Later Phillip Schauberger picked up the abandoned openxlsx package and continues to maintain it. Finally openxlsx2 was forked from openxlsx to include (1) the pugixml library (Kapoulkine 2006-2023) to address shortcomings of the openxlsx XML parser and (2) to switch from methods to the R6 package (Chang 2021) to introduce modern programming flows. Since then openxlsx2 has evolved a lot, includes many new features and is in a semi-stable API state since release 1.0.⁵ This manual is supposed to bundle and extend the existing vignettes and to document the changes.

1.1 Installation

You can install the stable version of openxlsx2 with:

install.packages('openxlsx2')

You can install the development version of openxlsx2 from GitHub with:

# install.packages("remotes")
remotes::install_github("JanMarvin/openxlsx2")

Or from r-universe with:

# Enable repository from janmarvin
options(repos = c(
  janmarvin = 'https://janmarvin.r-universe.dev',
  CRAN = 'https://cloud.r-project.org'))
# Download and install openxlsx2 in R
install.packages('openxlsx2')

1.2 Working with the package

We offer two distinct workflows for interacting with Excel files:

The first one is to simply work with R objects. It is possible to read (read_xlsx()) and write (write_xlsx()) data from and to files. We offer a number of options in the commands to support various features of the openxml format, including reading and writing named ranges and tables. Furthermore, there are several ways to read certain information of an openxml spreadsheet without having opened it in a spreadsheet software before, e.g. to get the contained sheet names or tables.
As a second variant openxlsx2 offers the work with so called wbWorkbook objects. Here an openxml file is read into a corresponding wbWorkbook object (wb_load()) or a new one is created (wb_workbook()). Afterwards the object can be further modified using various functions. For example, worksheets can be added or removed, the layout of cells or entire worksheets can be changed, and cells can be modified (overwritten or rewritten). Afterwards the wbWorkbook objects can be written as openxml files and processed by suitable spreadsheet software.

1.3 Example

This is a basic example which shows you how to solve a common problem:

library(openxlsx2)
# read xlsx or xlsm files
path <- system.file("extdata/openxlsx2_example.xlsx", package = "openxlsx2")
read_xlsx(path)

    Var1 Var2 <NA>  Var3  Var4       Var5         Var6    Var7     Var8
3   TRUE    1   NA     1     a 2023-05-29 3209324 This #DIV/0! 01:27:15
4   TRUE   NA   NA #NUM!     b 2023-05-23         <NA>       0 14:02:57
5   TRUE    2   NA  1.34     c 2023-02-01         <NA> #VALUE! 23:01:02
6  FALSE    2   NA  <NA> #NUM!       <NA>         <NA>       2 17:24:53
7  FALSE    3   NA  1.56     e       <NA>         <NA>    <NA>     <NA>
8  FALSE    1   NA   1.7     f 2023-03-02         <NA>     2.7 08:45:58
9     NA   NA   NA  <NA>  <NA>       <NA>         <NA>    <NA>     <NA>
10 FALSE    2   NA    23     h 2023-12-24         <NA>      25     <NA>
11 FALSE    3   NA  67.3     i 2023-12-25         <NA>       3     <NA>
12    NA    1   NA   123  <NA> 2023-07-31         <NA>     122     <NA>

# or import workbooks
wb <- wb_load(path)
wb

A Workbook object.
 
Worksheets:
 Sheets: Sheet1, Sheet2 
 Write order: 1, 2

# read a data frame
wb_to_df(wb)

    Var1 Var2 <NA>  Var3  Var4       Var5         Var6    Var7     Var8
3   TRUE    1   NA     1     a 2023-05-29 3209324 This #DIV/0! 01:27:15
4   TRUE   NA   NA #NUM!     b 2023-05-23         <NA>       0 14:02:57
5   TRUE    2   NA  1.34     c 2023-02-01         <NA> #VALUE! 23:01:02
6  FALSE    2   NA  <NA> #NUM!       <NA>         <NA>       2 17:24:53
7  FALSE    3   NA  1.56     e       <NA>         <NA>    <NA>     <NA>
8  FALSE    1   NA   1.7     f 2023-03-02         <NA>     2.7 08:45:58
9     NA   NA   NA  <NA>  <NA>       <NA>         <NA>    <NA>     <NA>
10 FALSE    2   NA    23     h 2023-12-24         <NA>      25     <NA>
11 FALSE    3   NA  67.3     i 2023-12-25         <NA>       3     <NA>
12    NA    1   NA   123  <NA> 2023-07-31         <NA>     122     <NA>

# and save
temp <- temp_xlsx()
if (interactive()) wb_save(wb, temp)

## or create one yourself
wb <- wb_workbook()
# add a worksheet
wb$add_worksheet("sheet")
# add some data
wb$add_data("sheet", cars)
# open it in your default spreadsheet software
if (interactive()) wb$open()

1.4 Authors and contributions

For a full list of all authors that have made this package possible and for whom we are grateful, please see:

system.file("AUTHORS", package = "openxlsx2")

If you feel like you should be included on this list, please let us know. If you have something to contribute, you are welcome. If something is not working as expected, open issues or if you have solved an issue, open a pull request. Please be respectful and be aware that we are volunteers doing this for fun in our unpaid free time. We will work on problems when we have time or need.

1.5 License

The openxlsx2 package is licensed under the MIT license and is based on openxlsx (by Alexander Walker and Philipp Schauberger; COPYRIGHT 2014-2022) and pugixml (by Arseny Kapoulkine; COPYRIGHT 2006-2023). Both released under the MIT license.

1.6 A note on speed and memory usage

The current state of openxlsx2 is that it is reasonably fast. That is, it works well with reasonably large input data when reading or writing. It may not work well with data that tests the limits of the openxml specification. Things may slow down on the R side of things, and performance and usability will depend on the speed and size of the local operating system’s CPU and memory.

Note that there are at least two cases where openxlsx2 constructs potentially large data frames (i) when loading, openxlsx2 usually needs to read the entire input file into pugixml and convert it into long data frame(s), and wb_to_df() converts one long data frame into two data frames that construct the output object and (ii) when adding data to the workbook, openxlsx2 reshapes the input data frame into a long data frame and stores it in the workbook, and writes the entire worksheet into a pugixml file that is written when it is complete. Applying cell styles, date conversions etc. will further slow down the process and finally the sheets will be zipped to provide the xlsx output.

Therefore, if you are faced with an unreasonably large dataset, either give yourself enough time, use another package to write the xlsx output (openxlsx2 was not written with the intention of working with maximum memory efficiency), and by all means use other ways to store data (binary file formats or a database). However, we are always happy to improve, so if you have found a way to improve what we are currently doing, please let us know and open an issue or a pull request.

1.7 Installation Requirements

We strive to limit our dependencies to the bare minimum required to run the package. Currently, these are R6, Rcpp, and stringi. All three are dependency-free themselves (apart from depending on R) and should be available on any recent R setup that includes devtools. (In fact, many popular packages with extensive dependency trees—such as the tidyverse, roxygen2, quarto, or shiny—already include at least two of these).

In addition, the zip package is required to create xlsx output files if a system tool like zip or bsdtar (or tar on Windows) is unavailable. To build openxlsx2 from source, a C++11 compatible compiler must be available. Other dependencies are optional and limited to specific features, or to our test suite, involving ggplot2, mschart, openssl, or rvg.

db <- tools::CRAN_package_db()
deps <- tools::package_dependencies(package="openxlsx2", recursive=TRUE, db = db)$openxlsx2
# ignore packages shipped by R
setdiff(deps, c("grDevices", "utils", "methods", "tools", "stats"))

[1] "R6"      "Rcpp"    "stringi"

1.8 Invitation to contribute

We have put a lot of work into openxlsx2 to make it useful for our needs, improving what we found useful about openxlsx and removing what we didn’t need. We do not claim to be omniscient about all the things you can do with spreadsheet software, nor do we claim to be omniscient about all the things you can do in openxlsx2. Nevertheless, we are quite fond of our little package and invite others to try it out and comment on what they like and of course what they think we are missing or if something doesn’t work. openxlsx2 is a complex piece of software that certainly does not work bug-free, even if we did our best. If you want to contribute to the development of openxlsx2, please be our guest on our Github. Join or open a discussion, post or fix issues or write us a mail.

Since the original creation of this section readxlsb has been archived on CRAN. A release that includes a bug fix for overflowing values can be found at https://github.com/JanMarvin/readxlsb (but this is not actively developed or maintained).↩︎
See https://github.com/omegahat/RDCOMClient.↩︎
And there are many other packages on CRAN for working with xls/xlsx spreadsheet files. Without a guarantee for completeness: SheetReader (Henze 2024), tablexlsx (Dotta and Blasco 2024), xlsx2dfs (Kim 2019), tablaxlsx (Rodríguez 2023), xlr (Hilderson 2025), xlcutter (Gruson 2023), knitxl (Dreano 2023), xlcharts (Luginbuhl 2024), joinXL (Glanville 2016).↩︎
See https://wikipedia.org/wiki/Office_Open_XML.↩︎
With ‘semi-stable’ we promise not to break the API unless we come across a bug that forces us to. All breaking changes are mentioned in the changelog.↩︎