Title: | Rename and Encode Data Frames Using External Crosswalk Files |
---|---|
Description: | A pair of functions for renaming and encoding data frames using external crosswalk files. It is especially useful when constructing master data sets from multiple smaller data sets that do not name or encode variables consistently across files. Based on similar commands in 'Stata'. |
Authors: | Benjamin Skinner [aut, cre] |
Maintainer: | Benjamin Skinner <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2025-03-06 04:03:39 UTC |
Source: | https://github.com/btskinner/crosswalkr |
Encode data frame column using external crosswalk file.
encodefrom( .data, var, cw_file, raw, clean, label, delimiter = NULL, sheet = NULL, case_ignore = TRUE, ignore_tibble = FALSE ) encodefrom_( .data, var, cw_file, raw, clean, label, delimiter = NULL, sheet = NULL, case_ignore = TRUE, ignore_tibble = FALSE )
encodefrom( .data, var, cw_file, raw, clean, label, delimiter = NULL, sheet = NULL, case_ignore = TRUE, ignore_tibble = FALSE ) encodefrom_( .data, var, cw_file, raw, clean, label, delimiter = NULL, sheet = NULL, case_ignore = TRUE, ignore_tibble = FALSE )
.data |
Data frame or tbl_df |
var |
Column name of vector to be encoded |
cw_file |
Either data frame object or string with path to
external crosswalk file, including path, which has columns
representing |
raw |
Name of column in |
clean |
Name of column in |
label |
Name of column in |
delimiter |
String delimiter used to parse
|
sheet |
Specify sheet if |
case_ignore |
Ignore case when matching current ( |
ignore_tibble |
Ignore |
Vector that is either a factor or labelled, depending on data input and options
encodefrom_()
: Standard evaluation version of
encodefrom
(var
, raw
, clean
,
and label
must be strings when using this version)
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'), stfips = c(21,47,51), cenregnm = c('South','South','South')) df_tbl <- tibble::as_tibble(df) cw <- get(data(stcrosswalk)) df$state2 <- encodefrom(df, state, cw, stname, stfips, stabbr) df_tbl$state2 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr) df_tbl$state3 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr, ignore_tibble = TRUE) haven::as_factor(df_tbl) haven::zap_labels(df_tbl)
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'), stfips = c(21,47,51), cenregnm = c('South','South','South')) df_tbl <- tibble::as_tibble(df) cw <- get(data(stcrosswalk)) df$state2 <- encodefrom(df, state, cw, stname, stfips, stabbr) df_tbl$state2 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr) df_tbl$state3 <- encodefrom(df_tbl, state, cw, stname, stfips, stabbr, ignore_tibble = TRUE) haven::as_factor(df_tbl) haven::zap_labels(df_tbl)
Rename data frame columns using external crosswalk file.
renamefrom( .data, cw_file, raw, clean, label = NULL, delimiter = NULL, sheet = NULL, drop_extra = TRUE, case_ignore = TRUE, keep_label = FALSE, name_label = FALSE ) renamefrom_( .data, cw_file, raw, clean, label = NULL, delimiter = NULL, sheet = NULL, drop_extra = TRUE, case_ignore = TRUE, keep_label = FALSE, name_label = FALSE )
renamefrom( .data, cw_file, raw, clean, label = NULL, delimiter = NULL, sheet = NULL, drop_extra = TRUE, case_ignore = TRUE, keep_label = FALSE, name_label = FALSE ) renamefrom_( .data, cw_file, raw, clean, label = NULL, delimiter = NULL, sheet = NULL, drop_extra = TRUE, case_ignore = TRUE, keep_label = FALSE, name_label = FALSE )
.data |
Data frame or tbl_df |
cw_file |
Either data frame object or string with path to
external crosswalk file, which has columns representing
|
raw |
Name of column in |
clean |
Name of column in |
label |
Name of column in |
delimiter |
String delimiter used to parse
|
sheet |
Specify sheet if |
drop_extra |
Drop extra columns in current data frame if they
are not matched in |
case_ignore |
Ignore case when matching current ( |
keep_label |
Keep current label, if any, on data frame columns
that aren't matched in |
name_label |
Use old ( |
Data frame or tbl_df with new column names and labels.
renamefrom_()
: Standard evaluation version of
renamefrom
(raw
, clean
, and
label
must be strings when using this version)
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'), fips = c(21,47,51), region = c('South','South','South')) cw <- data.frame(old_name = c('state','fips'), new_name = c('stname','stfips'), label = c('Full state name', 'FIPS code')) df1 <- renamefrom(df, cw, old_name, new_name, label) df2 <- renamefrom(df, cw, old_name, new_name, name_label = TRUE) df3 <- renamefrom(df, cw, old_name, new_name, drop_extra = FALSE)
df <- data.frame(state = c('Kentucky','Tennessee','Virginia'), fips = c(21,47,51), region = c('South','South','South')) cw <- data.frame(old_name = c('state','fips'), new_name = c('stname','stfips'), label = c('Full state name', 'FIPS code')) df1 <- renamefrom(df, cw, old_name, new_name, label) df2 <- renamefrom(df, cw, old_name, new_name, name_label = TRUE) df3 <- renamefrom(df, cw, old_name, new_name, drop_extra = FALSE)
An example state crosswalk. Includes information for all states plus the District of Columbia.
stcrosswalk
stcrosswalk
A data frame with 51 rows and 7 variables:
Two-digit state FIPS codes
Two-letter state abbreviation
Full state name
Census region number
Census region name
Census division number
Census division name
An example state and territory crosswalk. Includes information for all states plus the District of Columbia plus territories.
sttercrosswalk
sttercrosswalk
A data frame with 69 rows and 10 variables:
Two-digit FIPS codes
Two-letter abbreviation
Full name
Census region number
Census region name
Census division number
Census division name
Indicator for status as state
Indicator for status as state or DC
1 := Under U.S. sovereignty; 2 := Minor Outlying Islands; 3 := Independent nation under Compact of Free Association with U.S.; 4 := Individual Minor Outlying Islands (within status 2)