This function provides an overview of the variables in a dataframe, allowing efficient inspection of the factor levels, ranges for numeric variables, and numbers of missing values.

varView(
  data,
  columns = names(data),
  varViewCols = rosetta::opts$get(varViewCols),
  varViewRownames = TRUE,
  maxLevels = 10,
  truncLevelsAt = 50,
  showLabellerWarning = rosetta::opts$get(showLabellerWarning),
  output = rosetta::opts$get("tableOutput")
)

# S3 method for rosettaVarView
print(x, output = attr(x, "output"), ...)

Arguments

data

The dataframe containing the variables to view.

columns

The columns to include.

varViewCols

The columns of the variable view.

varViewRownames

Whether to set the variable names as row names of the variable view dataframe that is returned.

maxLevels

For factors, the maximum number of levels to show.

truncLevelsAt

For factors levels, the number of characters at which to truncate.

showLabellerWarning

Whether to show a warning if labeller labels are encountered.

output

A character vector containing one or more of "console", "viewer", and one or more filenames in existing directories. If output contains viewer and RStudio is used, the variable view is shown in the RStudio viewer.

x

The varView data frame to print.

...

Any additional arguments are passed along to the print.data.frame() function.

Value

A dataframe with the variable view.

Author

Gjalt-Jorn Peters & Melissa Gordon Wolf

Examples

### The default variable view
rosetta::varView(iris);
#> Variable view for 'iris':
#> 
#>              index                                     values      level valids
#> Sepal.Length     1  35 unique values ranging from 4.3 to 7.9. continuous    150
#> Sepal.Width      2    23 unique values ranging from 2 to 4.4. continuous    150
#> Petal.Length     3    43 unique values ranging from 1 to 6.9. continuous    150
#> Petal.Width      4  22 unique values ranging from 0.1 to 2.5. continuous    150
#> Species          5 setosa (1), versicolor (2) & virginica (3)    nominal    150
#>              NAs   class
#> Sepal.Length   0 numeric
#> Sepal.Width    0 numeric
#> Petal.Length   0 numeric
#> Petal.Width    0 numeric
#> Species        0  factor

### Only for a few variables in the dataset
rosetta::varView(iris, columns=c("Sepal.Length", "Species"));
#> Variable view for 'iris':
#> 
#>              index                                     values      level valids
#> Sepal.Length     1  35 unique values ranging from 4.3 to 7.9. continuous    150
#> Species          5 setosa (1), versicolor (2) & virginica (3)    nominal    150
#>              NAs   class
#> Sepal.Length   0 numeric
#> Species        0  factor

### Set some variable and value labels using the `labelled`
### standard, which is also used by `haven`
dat <- iris;
attr(dat$Sepal.Length, "label") <- "Sepal length";
attr(dat$Sepal.Length, "labels") <-
  c('one' = 1,
    'two' = 2,
    'three' = 3);

### varView automatically recognizes and shows these, adding
### a 'label' column
rosetta::varView(dat);
#> Variable view for 'dat':
#> 
#>              index        label                                     values
#> Sepal.Length     1 Sepal length               one (1), two (2) & three (3)
#> Sepal.Width      2                 23 unique values ranging from 2 to 4.4.
#> Petal.Length     3                 43 unique values ranging from 1 to 6.9.
#> Petal.Width      4               22 unique values ranging from 0.1 to 2.5.
#> Species          5              setosa (1), versicolor (2) & virginica (3)
#>                   level valids NAs   class
#> Sepal.Length ambiguous*    150   0 numeric
#> Sepal.Width  continuous    150   0 numeric
#> Petal.Length continuous    150   0 numeric
#> Petal.Width  continuous    150   0 numeric
#> Species         nominal    150   0  factor
#> 
#> 
#> * Note that value labels were set conform the `labeller` package convention, for example as a result of importing a dataset (from SPSS, STATA or SAS) using the `haven` package. These variables ('Sepal.Length') are considered continuous by R, but the assignment of value labels implies that the numeric values represent categories, and if that is the case, these variables should be stored as factors in R.

### You can also specify that you only want to see some columns
### in the variable view
rosetta::varView(dat,
                 varViewCols = c('label', 'values', 'level'));
#> Variable view for 'dat':
#> 
#>                     label                                     values      level
#> Sepal.Length Sepal length               one (1), two (2) & three (3) ambiguous*
#> Sepal.Width                  23 unique values ranging from 2 to 4.4. continuous
#> Petal.Length                 43 unique values ranging from 1 to 6.9. continuous
#> Petal.Width                22 unique values ranging from 0.1 to 2.5. continuous
#> Species                   setosa (1), versicolor (2) & virginica (3)    nominal
#> 
#> 
#> * Note that value labels were set conform the `labeller` package convention, for example as a result of importing a dataset (from SPSS, STATA or SAS) using the `haven` package. These variables ('Sepal.Length') are considered continuous by R, but the assignment of value labels implies that the numeric values represent categories, and if that is the case, these variables should be stored as factors in R.