Evaluating Re-Identification Risk for Personal Health Information in Ontario Using Publicly Available Data Sources

Suleiman Jabbouri

Abstract

Canadian researchers contend that because there have been no published precedents of inappropriate disclosure of personal health information provided to them, the privacy risks of secondary usage of such information is very low. However, trusting the good intentions of individuals is not a practical approach to protecting personal health information.

One commonly used approach to protect data that may be disclosed for research purposes is to de-identify it. However, there is a lack of evidence to inform the de-identification decision. In particular, it is not known which variables increase the risk of re-identifying patients through record linkage with public databases. Studies have been performed in the US to examine re-identification risk. It is not known if US findings will carry over to Canada, as some have assumed.

The goal of this research is to replicate Dr. Sweeney¡Çs study at Carnege Million University in Ontario, with specific focus on the variables that were found to be good for record linkage in the US: date of birth, gender, and postal code. We found that it is possible to re-identify specific sub-populations of professionals since professional societies publish comprehensive membership lists. We focused on physicians in this analysis. The results indicate that approximately half of that sub-population can be re-identified using their gender and home postal code. The implications are that re-identification risks are relatively high for specific sub-populations. If a database containing de-identified personal health information is disclosed, and this information pertains to a known sub-population of professionals, then the risk of re-identification is relatively high unless variables such as gender, date of birth, and postal code are removed. Conversely, for the general adult population (i.e., non-professionals) and for youth, we were unable to re-identify individuals using publicly or semi-publicly available databases.