Skip to main contentResearchEquals badge, showcasing an R and an equals sign.

Aggregate dataset of open data without identifying information

Chris Hartgerink, Richard Klein, Jelte Wicherts

This module contains a principal dataset collated from various open data, which we previously identified as not containing identifying information. This principal dataset is generated to be a pseudo-population to generate smaller sample datasets from without identifying information. These sample datasets will be used to generate precision estimates (α and 1-α) for algorithms to check for identifying information in open data in a next step. The principal dataset shared here contains 30,251 rows and a maximum of 23 columns.