readspss
– importing SPSS files to R
Welcome to the readspss
package. The package was written
from scratch with additional code for special cases. Starting as a side
project for me to learn Rcpp the package did grow over the years. Today
readspss
is well tested using every sav, por and zsav file
I could lay my hands on. Import is considered feature complete, write
support is available, just not yet for every feature.
Installation is provided using r-universe
or
remotes
.
With r-universe:
options(repos = c(
janmarvin = 'https://janmarvin.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
install.packages('readspss')
With devtools
:
remotes::install_github("JanMarvin/readspss")
For the import of (z)sav and por files read.sav()
and
read.por()
are available.
library(readspss)
# example using sav
fl_sav <- system.file("extdata", "electric.sav", package = "readspss")
ds <- read.sav(fl_sav)
# example using zsav
fl_zsav <- system.file("extdata", "cars.zsav", package = "readspss")
dz <- read.sav(fl_zsav)
# example using por
fl_por <- system.file("extdata", "electric.por", package = "readspss")
dp <- read.por(fl_por)
Both functions return data.frame objects, containing numerics, dates, factors or characters.
For user specific demand the package supports many option available
in foreign
such as convert.factors
and
use.missings
. If one is familiar with said package, one
should have no problem adapting to readspss
. If for example
one does not want to have factors since they work differently in R than
in SPSS this can be achieved using the following code.
# example using sav
fl_sav <- system.file("extdata", "electric.sav", package = "readspss")
ds <- read.sav(fl_sav, convert.factors = FALSE)
# example using zsav
fl_zsav <- system.file("extdata", "cars.zsav", package = "readspss")
dz <- read.sav(fl_zsav, convert.factors = FALSE)
# example using por
fl_por <- system.file("extdata", "electric.por", package = "readspss")
dp <- read.por(fl_por, convert.factors = FALSE)
Since many features are self explanatory not all will be explained.
Of course readspss
can handle sav files in different
encodings; it handles file sets without data; all types of missings SPSS
developers over the years came up with; short, long and longer strings;
little and big endian files; sav, por and zsav compressed files; files
without valid header information; old and new SPSS files. As stated
above, every SPSS file I came across and during the development I came
across many.
Using code by Ben Pfaff readspss
can handle encrypted
SPSS files.
flu <- system.file("extdata", "hotel.sav", package="readspss")
fle <- system.file("extdata", "hotel-encrypted.sav", package="readspss")
df_u <- read.sav(flu)
df_e <- read.sav(fle, pass = "pspp")
R data.frame objects can be exported using write.sav()
and write.por()
.
library(readspss)
write.sav(cars, filepath = "cars.sav") # optional compress = TRUE
write.sav(cars, filepath = "cars.zsav") # optional compress = TRUE
#> Zsav compression is still experimental. Testing is welcome!
write.por(cars, filepath = "cars.por")
Export provides a few options to add a label, for compression of sav
and zsav files and conversion of dates. Currently it is not possible to
export strings longer than 255 chars. Obviously all exported files can
be imported using SPSS and readspss
(PSPP is expected to
work).
One may wonder, why does the world need another package to import
SPSS data to the R world. Similar tasks can be done by
foreign
, memisc
and haven
package. Well the first two packages use code from older releases of
PSPP and R-Core most likely has neither time nor a need to update their
codebase to a newer PSPP release. Still over the years the SPSS file
format has changed. Not drastically but new features such as long
strings were implemented. Features that foreign
cannot
handle. The newest of the three aforementioned packages,
haven
, is a wrapper around the ReadStat
C
library. The package development began around the time we started with
readstata13
so it is around quite some time now. Contrary to many other people in
the R world, I am not a huge fan of tibbles
which are an
integral part of haven
. One can agree that this is a minor
problem. My bigger problem with the package is, that I am not yet
convinced that ReadStat
and haven
are tested
enough. Even though I am sure that authors of both made sure that in
most cases their software works, there are still cases where it does
not. During the development process of readspss
I reported
a few bugs to the haven package. Among them were incorrectly trimmed
long strings and a severe bug where por-files imported awfully incorrect
values. All errors were found using publicly available data files,
writing unit tests and comparing data across different R-packages, PSPP
and various versions of SPSS. Until I see that such behavior is adopted
by other packages, I simply do not trust them and maybe you should not
either. If the import process of data fails, one does not have to worry
about anything else.
The development of readspss
began once development of
readstata13
slowed down. Having written most of the
c++
code to import dta-files, I learned a lot about binary
files and Rcpp development. Since SPSS was another statistical software
used at the university where I worked at that time, it felt natural to
have a look at sav-files. Shortly after I learned that the dta-file
documentation is priceless, not available for SPSS and development
ceased for quite some time. In February 2018 I changed jobs, resulting
in many train rides. A project was needed and development began again.
Using the PSPP documentation and countless hours of trial and error lead
to the current state of the package.
readspss
uses code of Ben Pfaff for the encryption part.
It uses code from TDA by Goetz
Rohwer and Ulrich Poetter for the conversion of numerics in the
por-parser. The PSPP
documentation was a huge help. Without the testing by Ulrich Poetter
this package would not be as complete as it is.