censobr: Download Data from Brazil’s Population Census

I’ve always been a heavy user of Brazilian census data. This is of one the key data sets I use in most of my reseach projects on cities, urban and regional development, urban transport and accessibility. Like in many other countries, the population census in Brazil is the most comprehensive data collection process, covering various different topics and using a consistent high-quality method for the entire country at fine spatial resolution.

Nonetheless, getting access to census data has never been easy or convenient. Although Brazil’s official statistics and geography institute (IBGE) makes the census data publicly available, the data is not shared in a ready-to-use format. Moreover, census data sets are often larger-than-memory, what puts critical barriers for most users with limited computational resources.

To help overcome these two problems, I’ve created censobr, an R package to make it easy for anyone to download data from Brazil’s population census. You may install censobr from CRAN or the dev version from Github.

The package currently includes 5 main functions to download Census microdata:

  1. read_population()
  2. read_households()
  3. read_mortality()
  4. read_families()
  5. read_emigration()

At the moment, censobr only includes microdata from the 2000 and 2010 censuses, but it is being expanded to cover more years and data sets.

As I mentiond, microdata of Brazilian census are often be too big to load in users’ RAM memory. To overcome this problem, censobr is built on top of Apache Arrow, which allows users to analyze larger-than-memory data sets like they would with a regular data.frame simply using common functions from dplyr. More info in this vignette.

Note: all data sets in censobr are enriched with geography columns following the name standards of the {geobr} package to help data manipulation and integration with spatial data from {geobr}.

ps. I also have the census in a special place in my heart because of my demography background.

comments powered by Disqus