Data anonymization irreversibly transforms data in a privacy-preserving way. The outcome is still clear data that can be of use for the CSPs and external users, but with a lower accuracy (and, thus, a lower disclosure risk) than the original data. Data anonymization is performed once at the storage stage; after that, any queries on the data (search, retrieval, calculations) are transparent to CLARUS and the CSP, even though they may result in approximate results.
Two types of anonymization mechanisms have been designed:
- Data coarsening: it systematically generalizes input records (independently, one at a time) according to a user-defined coarsening level. Since coarsened data are less detailed than the original ones, disclosure risk is minimized.
- Data microaggregation: it clusters a fixed number k of similar records together and replaces them with average values; thus, it transforms the whole dataset in a monolithic, global way (it cannot be applied independently to each record). Since the k microaggregated records within each cluster are indistinguishable, the re-identification probability is lowered to 1/k.