Safe Harbor: Protecting Patients, Improving Healthcare

Around the world and in every branch of medicine, electronic health records (EHRs) have revolutionized patient care. Just as EHRs have allowed healthcare providers to easily view, modify and share medical information about their patients, electronic records have also offered new opportunities for research and data analysis.

Electronic health records enable researchers to compare the data of large groups of patients across settings, modalities and time. EHR-based studies have been used to track diseases and vaccinations, develop optimal care strategies, and advocate for improved healthcare among at-risk demographics.

Although HIPAA states that all patient health information is legally protected, it does have very specific provisions that allow certain data to be used for research and analysis by healthcare providers and researchers. Under the provision, which falls within the HIPAA Privacy Rule, patient data may be removed from its protected status using one of two methods: a statistical analysis that shows the data has a very low risk of being used to identify an individual, or the more commonly used Safe Harbor method.

HIPAA’s Safe Harbor provision is simple but powerful. It lists eighteen “identifiers” that must be removed from data before it can be shared with an outside party. These eighteen identifiers include name, address, social security number, biometric identifiers and other individually unique information. Health records disclosed under Safe Harbor standards can include date of birth and geographic location (by city or town), as these are potentially valuable data sets. Once the eighteen identifiers have been removed from the data, it is considered “de-identified”, as it can not be traced to a specific patient without a separate identification code.

The unprecedented Safe Harbor standard, by removing privacy concerns from statistically valuable health records, gives both patients and researchers what they need. Public acceptance of electronic health records relies on healthcare providers’ ability to keep data completely confidential. At the same time, those conducting research into disease and healthcare standards must be able to access high-quality data that includes enough information to be statistically useful. By granting access to patient records only through the Safe Harbor method, providers maintain consistency in the records they share and have a clear and specific standard that patients can feel comfortable with.

Since its implementation in 2003, the Safe Harbor standard has fallen under some criticism. Patient privacy advocates worry that removal of the eighteen identifiers does not go far enough to ensure that individual patients can not be re-identified. They argue that in the age of online information-sharing, certain health, personal and geographic information could be combined to bring a patient’s identity into light.

Data risk, of course, is always contextual. In a doctor’s hands, patient data is always confidential and never used to uncover more than the patient is willing to disclose. In the hands of someone with a reason to re-identify and use the data inappropriately, even the smallest piece of information is a risk. The Privacy Rule in HIPAA assumes that even de-identified patient data that complies with Safe Harbor or is statistically shown to have a very low risk of re-identification will only be shared with healthcare researchers and other professionals with no motivation to identify individual patients. The very purpose of the de-identified data is to show trends and correlations among large groups of individuals, with no value when examined on an individual basis.

When de-identifying data according to Safe Harbor or other HIPAA standards, healthcare providers must not only be careful to remove every trace of personally identifiable information, they must also maintain privacy agreements with the party using the de-identified data. These privacy agreements are spelled out under HIPAA and have been the standard since before electronic health records made data analysis as common and widespread as it is today.

To ensure that Safe Harbor is consistently applied to patient records and the eighteen identifiers removed from every entry, use of de-identification software under careful oversight is highly recommended. Manual de-identification, especially on a large scale, tends to costly, prohibitively time-consuming and prone to error. Automated software thoroughly scours patient records, navigating their variable structures and removing the identifiers with a very high rate of accuracy. The software can even be set to seek out a specific de-identified data set that meets the needs of a particular project without sharing more information than absolutely necessary.

Thanks to the clear and effective standards set by Safe Harbor, and advances in de-identification software that makes meeting those standards possible, medical records now have the capacity to tell the story of not just one patient, but a group of patients with similar histories. Now and in the future, statistical analysis can tell us not only what factors lead to the development of particular problems, but what we can do to prevent them. By protecting the privacy of individual patients, healthcare providers and researchers can bring together data that will improve the health of all patients.