# Create artificial xlsx file
wb <- wb_workbook()$add_worksheet()$add_data(x = t(c(1, 1)), col_names = FALSE)$
add_formula(dims = "C1", x = "A1 + B1")
# Users should never modify cc as shown here
wb$worksheets[[1]]$sheet_data$cc$v[3] <- 2
# we expect a value of 2
wb_to_df(wb, col_names = FALSE)
#> A B C
#> 1 1 1 2
8 Spreadsheet formulas
Below you find various examples how to create formulas with openxlsx2
. Though, before we start with the examples, let us begin with a word of warning. Please be aware, while it is possible to create all these formulas, they are not evaluated unless they are opened in spreadsheet software. Even worse, if there are cells containing the result of some formula, it can not be trusted unless the formula is evaluated in spreadsheet software.
This can be shown in a simple example: We have a spreadsheet with a formula A1 + B1
. This formula was evaluated with spreadsheet software as A1 + B1 = 2
. Therefore if we read the cell, we see the value 2. Lets recreate this output in openxlsx2
Now, lets assume we modify the data in cell A1
.
wb$add_data(x = 2)
# we expect 3
wb_to_df(wb, col_names = FALSE)
#> A B C
#> 1 2 1 2
What happened? Even though we see cells A1
and B1
show a value of 2
and 1
our formula in C1
was not updated. It still shows a value of 2
. This is because openxlsx2
does not evaluate formulas and workbooks on a more general scale. In the open xml style the cell looks something like this:
<c r="C1">
<f>A1 + B1</f>
<v>2</v>
</c>
And when we read from this cell, we always return the value of v
. In this case it is obvious, but still wrong and it is a good idea to check if underlying fields contain formulas.
wb_to_df(wb, col_names = FALSE, show_formula = TRUE)
#> A B C
#> 1 2 1 A1 + B1
If openxlsx2
writes formulas, as shown in the examples below, the fields will be entirely blank. These fields will only be evaluated and filled, once the output file is opened in spreadsheet software.
The only way to avoid surprises is to be aware of this all the time and similar, checking for similar things all the time.
8.1 Simple formulas
Generally speaking it is possible to use all valid formulas allowed in spreadsheet software. This can be functions, arithmetic operators or a mix of both. It’s possible to create functions for all by spreadsheet software supported functions, including custom vml ones (though this requires a workbook that was loaded with the required macros).
wb <- wb_workbook()$add_worksheet()$
add_data(x = head(cars))$
add_formula(x = "SUM(A2, B2)", dims = "D2")$
add_formula(x = "A2 + B2", dims = "D3")
8.2 Array formulas
Array formulas in openxml spreadsheets allow performing multiple calculations on a data vector or ‘array’ instead of a single cell. An array is similar to a vector in R. Unlike regular formulas that operate on a single value, array formulas can process multiple values simultaneously. An important distinction is that that you need array formulas, whenever the formula evaluates an array, even if the output creates only a single cell. So something like this SUM(ABS(A2:A11))
would require an array formula, because the SUM()
function is called on a function that returns an array ABS(A2:A11)
. If the previous formula in written as basic formula, spreadsheet software is likely to mess it up and tries to insert @
characters in the formula.
wb <- wb_workbook()$add_worksheet()$
add_data(x = head(cars))$
add_formula(x = "A2:A7 * B2:B7", dims = "C2:C7", array = TRUE)
8.3 Array formulas creating multiple fields
In the example below we want to use MMULT()
which creates a matrix multiplication. This requires us to write an array formula and to specify the region where the output will be written to.
Similar a the coefficients of a linear regression
8.4 Modern spreadsheet functions
Spreadsheet functions are constantly evolving and similarly extended. Several formulas introduced in the MS365 Excel versions require _xlfn.
as prefix for the function name. Such formulas will only be evaluated with compatible spreadsheet software. In case of doubt, see Excel functions (alphabetical) for a list of all functions and an indicator for the software version in which they were introduced.
8.6 Cell error handling
Spreadsheet users will be familiar with various errors thrown once formulas are used. These are not always useful in spreadsheet software and can be removed using wb_add_ignore_error()
. This function allows to fine tune the errors that are returned per cell.
8.7 cells metadata (cm) formulas
Similar to array formulas, these cell metadata (cm) formulas hide to the user that they are array formulas. Using these is implemented in openxlsx2
> 0.6.1:
wb <- wb_workbook()$add_worksheet()$
add_data(x = head(cars))$
add_formula(x = 'SUM(ABS(A2:A7))', dims = "D2", cm = TRUE)
#> Warning in write_data2(wb = wb, sheet = sheet, data = x, name = name, colNames
#> = colNames, : modifications with cm formulas are experimental. use at own risk
# wb$open()
8.8 dataTable
formulas1
A | B | C | |
---|---|---|---|
1 | sales_price | COGS | sales_quantity |
2 | 20 | 5 | 1 |
3 | 30 | 11 | 2 |
4 | 40 | 13 | 3 |
Given a basic table like the above, a similarly basic formula for total_sales
would be “= A2 * C2” with the row value changing at each row.
An implementation for this formula using wb_add_formula()
would look this (taken from current documentation) lets say we’ve read in the data and assigned it to the table company_sales
## creating example data
company_sales <- data.frame(
sales_price = c(20, 30, 40),
COGS = c(5, 11, 13),
sales_quantity = c(1, 2, 3)
)
## write in the formula
company_sales$total_sales <- paste(paste0("A", 1:3 + 1L), paste0("C", 1:3 + 1L), sep = " * ")
## add the formula class
class(company_sales$total_sales) <- c(class(company_sales$total_sales), "formula")
## write a workbook
wb <- wb_workbook()$
add_worksheet("Total Sales")$
add_data_table(x = company_sales)
Then we create the workbook, worksheet, and use wb_add_data_table()
.
One of the advantages of the open xml dataTable
syntax is that we don’t have to specify row numbers or columns as letters. The table also grows dynamically, adding new rows as new data is appended and extending formulas to the new rows. These dataTable
have named columns that we can use instead of letters. When writing the formulas within the dataTable
we would use the following syntax [@[column_name]]
to reference the current row. So the total_sales
formula written in open xml in dataTable
would look like this; =[@[sales_price]] * [@[sales_quantity]]
If we are writing the formula outside of the dataTable
we have to reference the table name. In this case lets say the table name is ‘daily_sales’ =daily_sales[@[sales_price]] * daily_sales[@[sales_quantity]]
However, if we were to pass this as the text for the formula to be written it would cause an error because the syntax that open xml requires for selecting the current row is different.
In open xml the dataTable
formula looks like this:
<calculatedColumnFormula>
daily_sales[[#This Row],[sales_price]]*daily_sales[[#ThisRow],[sales_quantity]]
</calculatedColumnFormula>
Now we can see that open xml replaces [@[sales_price]]
with daily_sales[[#This Row],[sales_price]]
We must then use this syntax when writing formulas for dataTable
## Because we want the `dataTable` formula to propagate down the entire column of the data
## we can assign the formula by itself to any column and allow that single string to be repeated for each row.
## creating example data
example_data <-
data.frame(
sales_price = c(20, 30, 40),
COGS = c(5, 11, 13),
sales_quantity = c(1, 2, 3)
)
## base R method
example_data$gross_profit <- "daily_sales[[#This Row],[sales_price]] - daily_sales[[#This Row],[COGS]]"
example_data$total_COGS <- "daily_sales[[#This Row],[COGS]] * daily_sales[[#This Row],[sales_quantity]]"
example_data$total_sales <- "daily_sales[[#This Row],[sales_price]] * daily_sales[[#This Row],[sales_quantity]]"
example_data$total_gross_profit <- "daily_sales[[#This Row],[total_sales]] - daily_sales[[#This Row],[total_COGS]]"
class(example_data$gross_profit) <- c(class(example_data$gross_profit), "formula")
class(example_data$total_COGS) <- c(class(example_data$total_COGS), "formula")
class(example_data$total_sales) <- c(class(example_data$total_sales), "formula")
class(example_data$total_gross_profit) <- c(class(example_data$total_gross_profit), "formula")
wb$
add_worksheet("Daily Sales")$
add_data_table(
x = example_data,
table_style = "TableStyleMedium2",
table_name = "daily_sales"
)
And if we open the workbook to view the table we created we can see that the formula has worked.
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | sales_price | COGS | sales_quantity | gross_profit | total_COGS | total_sales | total_gross_profit |
2 | 20 | 5 | 1 | 15 | 5 | 20 | 15 |
3 | 30 | 11 | 2 | 19 | 22 | 60 | 38 |
4 | 40 | 13 | 3 | 27 | 39 | 120 | 81 |
We can also see that it has replaced [#This Row]
with @
.
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
1 | sales_price | COGS | sales_quantity | gross_profit | total_COGS | total_sales | total_gross_profit |
2 | 20 | 5 | 1 | =[@sales_price] - [@COGS] | =[@COGS] * [@sales_quantity] | =[@sales_price] * [@sales_quantity] | =[@[total_sales]] - [@[total_COGS]] |
3 | 30 | 11 | 2 | =[@sales_price] - [@COGS] | =[@COGS] * [@sales_quantity] | =[@sales_price] * [@sales_quantity] | =[@[total_sales]] - [@[total_COGS]] |
4 | 40 | 13 | 3 | =[@sales_price] - [@COGS] | =[@COGS] * [@sales_quantity] | =[@sales_price] * [@sales_quantity] | =[@[total_sales]] - [@[total_COGS]] |
For completion, the formula as we wrote it appears as;
D | E | F | G |
---|---|---|---|
gross_profit | total_COGS | total_sales | total_gross_profit |
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] | =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]] |
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] | =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]] |
=gross_profit[[#This Row],[sales_price]] - gross_profit[[#This Row],[COGS]] | =gross_profit[[#This Row],[COGS]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[sales_price]] * gross_profit[[#This Row],[sales_quantity]] | =gross_profit[[#This Row],[total_sales]] - gross_profit[[#This Row],[total_COGS]] |
#### sum dataTable examples
wb$add_worksheet("sum_examples")
### Note: dataTable formula do not need to be used inside of dataTables. dataTable formula are for referencing the data within the dataTable.
sum_examples <- data.frame(
description = c("sum_sales_price", "sum_product_Price_Quantity"),
formula = c( "", "")
)
wb$add_data(x = sum_examples)
# add formulas
wb$add_formula(x = "sum(daily_sales[[#Data],[sales_price]])", dims = "B2")
wb$add_formula(x = "sum(daily_sales[[#Data],[sales_price]] * daily_sales[[#Data],[sales_quantity]])", dims = "B3", array = TRUE)
#### dataTable referencing
wb$add_worksheet("dt_references")
### Adding the headers by themselves.
wb$add_formula(
x = "daily_sales[[#Headers],[sales_price]:[total_gross_profit]]",
dims = "A1:G1",
array = TRUE
)
### Adding the raw data by reference and selecting them directly.
wb$add_formula(
x = "daily_sales[[#Data],[sales_price]:[total_gross_profit]]",
start_row = 2,
dims = "A2:G4",
array = TRUE
)
# wb$open()
this example was originally provided by @zykezero for
openxlsx
.↩︎