This function provides an overview of the variables in a dataframe, allowing efficient inspection of the factor levels, ranges for numeric variables, and numbers of missing values.
varView(
data,
columns = names(data),
varViewCols = rosetta::opts$get(varViewCols),
varViewRownames = TRUE,
maxLevels = 10,
truncLevelsAt = 50,
showLabellerWarning = rosetta::opts$get(showLabellerWarning),
output = rosetta::opts$get("tableOutput")
)
# S3 method for rosettaVarView
print(x, output = attr(x, "output"), ...)
The dataframe containing the variables to view.
The columns to include.
The columns of the variable view.
Whether to set the variable names as row names of the variable view dataframe that is returned.
For factors, the maximum number of levels to show.
For factors levels, the number of characters at which to truncate.
Whether to show a warning if labeller labels are encountered.
A character vector containing one or more of
"console
", "viewer
", and one or more filenames in existing
directories. If output
contains viewer
and RStudio is used,
the variable view is shown in the RStudio viewer.
The varView data frame to print.
Any additional arguments are passed along to
the print.data.frame()
function.
A dataframe with the variable view.
### The default variable view
rosetta::varView(iris);
#> Variable view for 'iris':
#>
#> index values level valids
#> Sepal.Length 1 35 unique values ranging from 4.3 to 7.9. continuous 150
#> Sepal.Width 2 23 unique values ranging from 2 to 4.4. continuous 150
#> Petal.Length 3 43 unique values ranging from 1 to 6.9. continuous 150
#> Petal.Width 4 22 unique values ranging from 0.1 to 2.5. continuous 150
#> Species 5 setosa (1), versicolor (2) & virginica (3) nominal 150
#> NAs class
#> Sepal.Length 0 numeric
#> Sepal.Width 0 numeric
#> Petal.Length 0 numeric
#> Petal.Width 0 numeric
#> Species 0 factor
### Only for a few variables in the dataset
rosetta::varView(iris, columns=c("Sepal.Length", "Species"));
#> Variable view for 'iris':
#>
#> index values level valids
#> Sepal.Length 1 35 unique values ranging from 4.3 to 7.9. continuous 150
#> Species 5 setosa (1), versicolor (2) & virginica (3) nominal 150
#> NAs class
#> Sepal.Length 0 numeric
#> Species 0 factor
### Set some variable and value labels using the `labelled`
### standard, which is also used by `haven`
dat <- iris;
attr(dat$Sepal.Length, "label") <- "Sepal length";
attr(dat$Sepal.Length, "labels") <-
c('one' = 1,
'two' = 2,
'three' = 3);
### varView automatically recognizes and shows these, adding
### a 'label' column
rosetta::varView(dat);
#> Variable view for 'dat':
#>
#> index label values
#> Sepal.Length 1 Sepal length one (1), two (2) & three (3)
#> Sepal.Width 2 23 unique values ranging from 2 to 4.4.
#> Petal.Length 3 43 unique values ranging from 1 to 6.9.
#> Petal.Width 4 22 unique values ranging from 0.1 to 2.5.
#> Species 5 setosa (1), versicolor (2) & virginica (3)
#> level valids NAs class
#> Sepal.Length ambiguous* 150 0 numeric
#> Sepal.Width continuous 150 0 numeric
#> Petal.Length continuous 150 0 numeric
#> Petal.Width continuous 150 0 numeric
#> Species nominal 150 0 factor
#>
#>
#> * Note that value labels were set conform the `labeller` package convention, for example as a result of importing a dataset (from SPSS, STATA or SAS) using the `haven` package. These variables ('Sepal.Length') are considered continuous by R, but the assignment of value labels implies that the numeric values represent categories, and if that is the case, these variables should be stored as factors in R.
### You can also specify that you only want to see some columns
### in the variable view
rosetta::varView(dat,
varViewCols = c('label', 'values', 'level'));
#> Variable view for 'dat':
#>
#> label values level
#> Sepal.Length Sepal length one (1), two (2) & three (3) ambiguous*
#> Sepal.Width 23 unique values ranging from 2 to 4.4. continuous
#> Petal.Length 43 unique values ranging from 1 to 6.9. continuous
#> Petal.Width 22 unique values ranging from 0.1 to 2.5. continuous
#> Species setosa (1), versicolor (2) & virginica (3) nominal
#>
#>
#> * Note that value labels were set conform the `labeller` package convention, for example as a result of importing a dataset (from SPSS, STATA or SAS) using the `haven` package. These variables ('Sepal.Length') are considered continuous by R, but the assignment of value labels implies that the numeric values represent categories, and if that is the case, these variables should be stored as factors in R.