Using ABS structures

Loading the package will lazily load a number of structures, a full list is available in the reference

Objects stored in the absmapsdata package can be accessed with the read_absmap function:

Converting state names and abbreviations

The clean_state() function makes it easy to wrangle vectors of State names and abbreviations - which might be in different forms and possibly misspelled.

Let’s start with a character vector that includes some misspelled state names, some correctly spelled state names, as well as some abbreviations both malformed and correctly formed.

x <- c("western Straya", "w. A ", "new soth wailes", "SA", "tazz", "Victoria",
       "northn territy")

To convert this character vector to a vector of abbreviations for State names, use clean_state():

#> [1] "WA"  "WA"  "NSW" "SA"  "Tas" "Vic" "NT"

If you want full names for the states rather than abbreviations:

clean_state(x, to = "state_name")
#> [1] "Western Australia"  "Western Australia"  "New South Wales"   
#> [4] "South Australia"    "Tasmania"           "Victoria"          
#> [7] "Northern Territory"

By default, clean_state() uses fuzzy or approximate string matching to match the elements in your character vector to state names/abbreviations. If you only want to permit exact matching, you can disable fuzzy matching. This means you will never get false matches, but you will also fail to match misspelled state names or malformed abbreviations; you’ll get an NA if no match can be found.

 clean_state(x, fuzzy_match = FALSE)
#> [1] NA    NA    NA    "SA"  NA    "Vic" NA

If your data is in a data frame, clean_state() works well within a dplyr::mutate() call:

 x_df <- data.frame(state = x, stringsAsFactors = FALSE)

 x_df %>% 
   mutate(state_abbr = clean_state(state))
#>             state state_abbr
#> 1  western Straya         WA
#> 2           w. A          WA
#> 3 new soth wailes        NSW
#> 4              SA         SA
#> 5            tazz        Tas
#> 6        Victoria        Vic
#> 7  northn territy         NT

The function clean_state can also return an ‘unofficial’ state/territory colour for use in charts.

clean_state("Queensland", to = "colour")
#> [1] "#800000"

The palette palette_state_name_2016 can be used in ggplot2 for the unofficial colours of states.

read_absmap("state2016") %>% 
    ggplot() + 
    geom_sf(aes(fill = state_name_2016), colour = NA) +
    scale_fill_manual(values = palette_state_name_2016) +

Australian public holidays

This package includes the auholidays dataset from the Australian Public Holidays Dates Machine Readable Dataset as well as a helper function is_holiday:

#> [1] TRUE
is_holiday('2019-05-27', jurisdictions = c('ACT', 'TAS'))
#> [1] TRUE

h_df <- data.frame(dates = c('2020-01-01', '2020-01-10'))

h_df %>%
  mutate(IsHoliday = is_holiday(dates))
#>        dates IsHoliday
#> 1 2020-01-01      TRUE
#> 2 2020-01-10     FALSE

Parsing income ranges

The parse_income_range function provides some tools for extracting numbers from income ranges commonly used in Australian data. For example:

parse_income_range("$1-$199 ($1-$10,399)", limit = "lower")
#> [1] 1
parse_income_range("$1-$199 ($1-$10,399)", limit = "upper")
#> [1] 199
parse_income_range("$1-$199 ($1-$10,399)", limit = "mid")
#> [1] 100

parse_income_range("e. $180,001 or more", limit = "upper")
#> [1] Inf
parse_income_range("e. $180,001 or more", limit = "upper", max_income = 300e3)
#> [1] 3e+05

parse_income_range("Nil income")
#> [1] 0
parse_income_range("Negative income")
#> [1] 0
parse_income_range("Negative income", negative_as_zero = FALSE)
#> [1] NA

tibble(income_range = c("Negative income",
                        "Nil income",
                        "$1,500-$1,749 ($78,000-$90,999)",
                        "$1,750-$1,999 ($91,000-$103,999)",
                        "$2,000-$2,999 ($104,000-$155,999)",
                        "$3,000 or more ($156,000 or more)")) %>% 
  mutate(lower = parse_income_range(income_range),
         mid   = parse_income_range(income_range, limit = "mid"),
         upper = parse_income_range(income_range, limit = "upper"))
#> # A tibble: 6 × 4
#>   income_range                      lower   mid upper
#>   <chr>                             <dbl> <dbl> <dbl>
#> 1 Negative income                       0     0     0
#> 2 Nil income                            0     0     0
#> 3 $1,500-$1,749 ($78,000-$90,999)    1500  1625  1749
#> 4 $1,750-$1,999 ($91,000-$103,999)   1750  1875  1999
#> 5 $2,000-$2,999 ($104,000-$155,999)  2000  2500  2999
#> 6 $3,000 or more ($156,000 or more)  3000   Inf   Inf