Function to read a SPSS sav file into a data.frame().

read.sav(
  file,
  convert.factors = TRUE,
  generate.factors = TRUE,
  encoding = TRUE,
  fromEncoding = NULL,
  use.missings = TRUE,
  debug = FALSE,
  override = FALSE,
  convert.dates = TRUE,
  add.rownames = FALSE,
  pass
)

Arguments

file

string a sav-file to import. can be a file on a computer or an url. in this case the file will be downloaded and read before it is used.

convert.factors

logical if true numeric or character variables will be converted into a factor in R.

generate.factors

logical function to convert variables with partial labels into factors. e.g. 1 - low and 5 - high are provided, labels 2, 3 and 4 will be created. especially useful in combination with use.missings=TRUE.

encoding

logical shall values be converted? If true, read.sav will try the charcode stored inside the sav-file. If this value is 2 or not available, fromEncoding can be used to change encoding.

fromEncoding

character. encoding of the imported file. This information is stored inside the sav-file, but is currently unused. Still this option can be used to define the initial encoding by hand.

use.missings

logical should missing values be converted. Defaults to TRUE.

debug

logical provides additional debug information. Most likely not useful to any user.

override

logical. The filename provided in file is checked for the ending sav. If the file ending is different, nothing is read. This option can be used to override this behavior.

convert.dates

logical. Should dates be converted on the fly?

add.rownames

logical. If TRUE, the first column will be used as rownames. Variable will be dropped afterwards.

pass

character. If encrypted sav should be imported, this is a maximum of ten character encryption key.

Value

readspss returns a data.frame with additional attributes

  • row.names rownames

  • names colnames

  • datalabel datalabel

  • datestamp datestamp

  • timestamp timestamp

  • filelabel filelabel

  • class data.frame

  • vtype SPSS type 0 is usually a numeric/integer

  • disppar matrix of display parameters if available

  • missings a list containing information about the missing variables. if use.missings=TRUE this Information will be used to generate missings.

  • haslabel list of variables that contain labels

  • longstring character vector of long strings if any in file

  • longmissing character vector of missings in longstrings if any

  • longlabel character vector of long labels

  • cflag 0 if uncompressed, 1 if compressed

  • endian 2 or 3 if little endian else 0

  • compression compression similar to cflag, somehow stored twice in the sav file

  • doc list containing documentation information if any

  • charcode encoding string most likely 2 is CP1252

  • encoding sometimes sav-file contain encoding as a extra string

  • ownEnc encoding of the R-session

  • doenc was the file supposed to be encoded?

  • autoenc was encoding applied to the file?

  • swapit were the bytes swapped?

  • totals character string of totals if any

  • dataview xml file how the data should be printed

  • extraproduct additional string provided

  • label list containing label value information

  • varmatrix a matrix with information how the data is stored

  • var.label variable labels

  • lmissings missings table if any in longstrings

Details

SPSS files are widely available, though for R long time only foreign and memisc provided functions to import sav-files. Lately haven joined. This package is an approach to offer another alternative, to document the sav-format and provide additional options to import the data. sav-files are stored most exclusively as numerics only in compression mode are some integers stored as integers. Still they are returned as numerics.

Note

Information to decrypt the sav-format was provided by tda www.stat.rub.de/tda.html and pspp www.gnu.org/software/pspp/

See also

read.spss, memisc and read_sav.

Examples

fl <- system.file("extdata", "electric.sav", package = "readspss")
dd <- read.sav(fl)