This past April, the George Mason University (GMU) Student Chapter experienced two significant successes.
The first occurred when the GMU Student Chapter competed at DataFest DC 2018, hosted by Summit Consulting April 20–22, with 11 other area universities. Two teams participated:Stat of the Art
George Mason’s presentation, “How Can Indeed Better Connect US Health Care Employers with Nurses?” won first place for Best Data Visualization.
The second success took place during Data Challenge DC the weekend of April 28. The GMU Student Chapter, led by Glen Hui, was instrumental in organizing the meeting. More than 30 students participated, six of which were from GMU. One of GMU’s graduate statistics students was on the team that won for best data visualization.
The European Union’s (EU) recently adopted General Data Protection Regulation (GDPR) marks a major transition in data privacy protections in the European Union. And it may affect approaches to data access and confidentiality protections more broadly, including in US research and other statistical activities.
After four years of preparation and debate, the GDPR was approved and adopted by the EU Parliament in April 2016 and went into effect May 25, 2018. Many detailed daily practices remain to be worked out, including extraterritorial enforcement, but one thing is certain: The GDPR means more bureaucracy for all involved.
The GDPR replaces the Data Protection Directive. (A regulation—as is the GDPR—is a binding legislative act. It must be applied in its entirety across the EU, while a directive is a legislative act that sets out a goal all EU countries must achieve. However, it is up to the individual countries to decide how.) Unlike the current EU privacy directive, an EU regulation does not require any enabling legislation by member nations. It is designed to harmonize data privacy laws across Europe, protect and empower all EU residents’ data privacy, and reshape the way organizations across the region approach data privacy. The regulation applies to EU members and nation states that are not EU members but are members of the EU economic area.
In this increasingly data-driven world where privacy cannot be completely guaranteed, the GDPR seeks to protect EU residents’ privacy and against breaches and misuses of “personal data.” Personal data is defined in a broad context as any information relating to an identified or identifiable natural person (data subject). An identifiable natural person is one who can be identified—directly or indirectly—in particular by reference to an identifier such as a name; identification number; location data; online identifier; or one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person.
Some personal data is categorized as special data, which is essentially sensitive personal data covering religious or philosophical beliefs, health, racial or ethnic origin, trade union membership, political beliefs, sex life or sexual orientation, genetic data, and biometric data (including photos when used for the purpose of uniquely identifying a natural person) of individuals. The collection and use of special data is subject to greater restrictions than other types of personal data.
Pseudonymization is the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information. This is the central feature of data protection by design. The GDPR looks favorably upon data controllers that keep “additional information” separate. To explain further, direct identifiers (name, Social Security number, or contact information) should be kept in a separate file from indirect identifiers, which can reveal identities if combined with additional data points. Personal data that has been pseudonymized (e.g., key-coded or as described above) falls short of being anonymized and therefore can fall within the scope of the GDPR, depending on how difficult it is to attribute the pseudonymized data to a particular individual.
The GDPR has important extraterritorial applications. It applies to personal information on EU residents even when they are outside the EU. It applies not only to personal data controllers and processors located in the EU, but also to those located outside the EU if their activities involve personal information on EU residents.
Coverage is triggered if the activities relate to offering goods or services to EU residents, irrespective of whether payment is required (e.g., over the internet), and monitoring behavior that takes place in the EU. When personal information on non-EU residents (e.g., for US residents) is transferred to an EU data controller or processor, that data becomes subject to the GDPR (Article 3).
Of course, breaking privacy is always a serious activity. Under GDPR, breaking privacy is now costly. Organizations—processors and controllers—in breach of GDPR can be fined up to 4% of the annual global turnover or 20 million euros (whichever is greater). This is the maximum fine that can be imposed for the most serious infringements (e.g., not having sufficient customer consent to process data or violating the core of Privacy by Design concepts).Main Topics
Main topics in the GDPR include the following:
- In the GDPR, conditions for consent have been strengthened. Requests for consent must be given in an intelligible and easily accessible form, with the purpose for data processing attached to that consent, using clear and plain language. It must be as easy to withdraw consent as it is to give it.
- Under the GDPR, breach notification will become mandatory in all member states where a data breach is likely to “result in a risk for the rights and freedoms of individuals.”
GDPR has increased data transparency and empowers data subjects. It gives data subjects the right to obtain from the data controller confirmation of whether personal data concerning them is being processed, and if so, where and for what purpose. The controller shall provide a copy of the personal data, free of charge.
The right to be forgotten entitles the data subject to have the data controller erase his/her personal data, cease further dissemination of the data, and potentially have third parties halt processing of the data. The conditions for erasure, as outlined in Article 17, include the data no longer being relevant to original purposes for processing or a data subjects’ withdrawing consent.
Privacy by design is also included in the GDPR. Privacy by design calls for the inclusion of data protection from the onset of the designing of systems, rather than an addition. More specifically, “The controller shall … implement appropriate technical and organisational measures … in an effective way … in order to meet the requirements of this Regulation and protect the rights of data subjects.” Article 23.GDPR and Research
Research occupies a privileged position in the GDPR. By harmonizing privacy legislation across the EU member states and carving out exemptions for scientific, historical, statistical, and health research, the GDPR seeks to reconcile the often-competing values of privacy and innovation.
The research regime set out in Article 89 expressly allows across the EU the following:
- Broad consents for scientific research where consent cannot be secured for all specific purposes at the outset of data collection
- Further use of personal data for scientific or statistical research as a secondary compatible purpose
- The right of the data subject to object to processing of personal data (unless necessary in public interest)
- Restriction of the right of a data subject to exercise their “right to erasure” if it is likely to significantly impair processing for scientific research purposes
- Relaxation of the storage limitation principle granting the ability to store personal data for longer periods
- Isolated transfers of personal data to third countries taking into account legitimate expectations of society for an increase in knowledge
Additionally, information obligations in scientific research do not apply if they would involve a disproportionate effort. Consideration of this takes into account the number of data subjects and age of the data and appropriate safeguards must be adopted. Furthermore, there is “no right to be forgotten” if it is likely to significantly impair processing for scientific research purposes. Use of the Article 89 research regime is subject to the following conditions:
- Appropriate safeguards to protect the right and freedoms of the data subject
- Adequate technical and security measures entrenching the principle of data minimization and using pseudonymized data as default
- Compliance with recognized ethical safeguards
The grounds that researchers can use to process personal data are the following:
- Consent of the data subject/research participant for the research purpose(s).
- Legitimate interests of the data controller (or a third party). In determining what these legitimate interests are, you need to ensure you balance the interests of the controller with any prejudice to the rights and freedoms or the interests of the data subject. In assessing whether the data controller has a legitimate interest, you need to take into account the reasonable expectations of the data subject. Public authorities cannot base processing on this ground.
- Performance of a public interest task or exercise of official authority.
Under both the GDPR and the earlier directive, the EU doesn’t allow the transfer of data on EU residents outside the EU unless the country is deemed to have adequate data privacy laws. Unfortunately, the EU has deemed that the United States does not currently have adequate data privacy laws, but organizations can navigate this by adhering to the EU-US Privacy Shield.
The EU-US Privacy Shield is a program in which participating US companies are considered to have adequate data protection and can therefore facilitate the transfer of EU data. The EU-US Privacy Shield’s predecessor, the Safe Harbour Framework, was overhauled because the EU did not consider this agreement strict enough on data protection for their citizens. The GDPR protects the data of all EU residents, regardless of whether they currently live in the EU.
Being certified under the EU-US Privacy Shield can give your company a jump-start on fulfilling the GDPR’s standards and provide legal clarity and direction on the EU’s data protection laws, but it will not guarantee total GDPR compliance. It is also important to note that the EU-US Privacy Shield will be revisited every year and could change, so it is important to have an assigned employee/person to stay current with all the updates.Helpful Resources
General Data Protection Regulation (GDPR) Guidance Note for the Research Sector: Appropriate Use of Different Legal Bases Under the GDPR.
ICO (2018) Guide to the General Data Protection Regulation (GDPR). Information Commissioner’s Office.
ICO (2017) Preparing for the General Data Protection Regulation (GDPR): 12 Steps to Take Now. Information Commissioner’s Office.
Insights Association (2017) GDPR: FAQs on the EU General Data Protection Regulation.
Maldoff, G. (2016) Top 10 Operational Impacts of the GDPR: Part 8 – Pseudonymization. The Privacy Advisor.
The Government Statistics Section (GSS) organized three invited sessions, including a panel on using multiple data sources for federal statistics; seven topic-contributed sessions, including an update on the US Commission on Evidence-Based Policymaking; three roundtables, including a lunch discussion on combating breaks in time series when using multiple data sources; five contributed sessions; and one poster session. In addition to sponsoring these events, GSS is co-sponsoring eight invited sessions, six topic-contributed sessions, and six speed sessions.
In addition, GSS is co-sponsoring a short course with the Section on Survey Research Methods (SRMS), titled “Applications of Hot Deck Imputation to Survey Data,” July 31 with Rebecca Andridge of The Ohio State University and Jenny Thompson of the U.S. Census Bureau as instructors.
Hot deck imputation is commonly used for handling missing data in which each missing value (recipient) is replaced with an observed value from a “similar” unit (donor). This half-day course is designed for survey practitioners who are interested in “seeing the methods in action.” Using examples from household and establishment surveys, the instructors will explore each step of hot deck imputation, beginning with different donor selection options through variance estimation methods. The course will cover classical hot deck methods alongside more cutting-edge approaches, including fractional hot deck imputation. The instructors will share their experiences with challenges that arise in the implementation of the hot deck—such as having fewer donors than recipients—and discuss various methods for overcoming these challenges.
More information about sessions, roundtables, and courses can be found online. As a reminder, roundtable and course space is limited, so sign up soon.
The events formerly known as the SPES/Q&P and Risk/SDNS mixers are morphing into one four-section joint mixer. This year, it will be the SPES/Q&P/Risk/Defense mixer at the 2018 Joint Statistical Meetings in Vancouver. We hope to see you Tuesday, July 31, in the Fairmont Waterfront Ballroom A from 6:30 p.m. to 8:30 p.m.
In the past, generous organizations and individuals have donated items such as books, software, CDs, DVDs, T-shirts, hats, ties, overalls (yes, overalls), pens, bags, water bottles, golf balls, blankets, coffee mugs, thumb drives, and the coveted Doughboy! Donated gifts have been both statistics and nonstatistics related.
We appreciate the generosity of our donors and hope you will consider adding to the excitement of the evening by donating door prizes this year. Of course, we will acknowledge all donors at the mixer.
The meetings are fast approaching, but there is still time to donate. Just complete the form at Survey Monkey to provide contact information and donation descriptions.
We would prefer you bring the items to the mixer or have them available at your booth for pickup. Also, mark the box of items “For SPES/Q&P/Risk/SDNS” in large letters so it is easily identified.SPES JSM Contributed Sessions in Vancouver
SPES has the following four contributed sessions in place for the upcoming JSM in Vancouver:
- Computer Experiments, Statistical Engineering, and Applications in Physical Sciences
- New Development in Reliability Models and Innovative Applications
- Machine Learning and Applications in Complex Engineering Systems
- Recent Developments in Designs of Experiments and Responses Surface Models
For more information, check out the JSM 2018 Online Program.
The Quality and Productivity (Q&P) section is sponsoring the following topic-contributed and contributed sessions at the Joint Statistical Meetings this year:
- New-Generation Experimental Design and Causal Inference in High-Tech Companies, organized by Tirthankar Dasgupta, Rutgers University
- Statistical Process Monitoring of High-Volume Data Streams, organized by Emmanuel Yashchin, IBM Research
- Field to Fork: Leading with Statistics in the Food Industry, organized by Shankang Qu, PepsiCo
- Modeling, Analysis, and Assessment, chaired by Douglas Ray, US Army RDECOM ARDEC
- Advances in Statistical Process Control, chaired by Ronald Fricker, Virginia Tech
Attendees are encouraged to use the online program to search for Q&P sessions. The Q&P Section also works closely with other ASA sections to co-sponsor sessions. In these situations, you will see Q&P listed as a co-sponsor in the online program, which contains more sessions than are listed above.
With a PhD in statistical astrophysics, David Corliss works in analytics architecture at Ford Motor Company while continuing astrophysics research on the side. He serves on the steering committee for the Conference on Statistical Practice and is president-elect of the Detroit Chapter. He is the founder of Peace-Work, a volunteer cooperative of statisticians and data scientists providing analytic support for charitable groups and applying statistical methods to issue-driven advocacy in poverty, education, and social justice.
With JSM just around the corner, it’s a good time to think about how to include Data for Good in your JSM activities. JSM is a huge event and can seem overwhelming. However, with a bit of planning, the Joint Statistical Meetings can be tamed and enjoyed.
It’s important to resist the temptation to overbook, dashing from one presentation to the next. JSM is about so much more than the papers! One strategy is to find the “big rocks”—a small number of activities most important to you—put them in your schedule, and then plan around them.
Every person can make sure Data for Good is one of those big rocks. Be sure to include time for meeting, networking, and just enjoying the event. As Student t often plays a role in my own D4G work, I always pay proper homage to William Gossett by raising a glass of a certain Irish stout.
When selecting papers, note how important it is to attend in person. For example, I don’t know why anyone would want to attend mine, which is about keeping your skill set up to date by doing Data for Good projects, because the content is just as good in print (but the rest of the invited session is great). Make a list of the papers you can read later and the big rocks to see in person.
Networking is a huge part of conferences! Plan time for this. If there is a person you want to meet, attend a paper they are presenting (if there is one) and don’t book the following time slot.Highlighted D4G Papers
An invited session, Data Science for Social Good, will be presented Thursday, August 2, from 10:30 a.m. to 12:20 p.m. DataKind founder Jake Porway will speak about designing for impact, followed by Darren Banks from RTI, who will touch on arrest-related deaths, and Erika Salomon from The University of Chicago, who will discuss interventions for people at risk of incarceration. The papers—and especially the discussion time at the end of the session—will be an important D4G highlight for JSM 2018.
Projects by Statistics without Borders (SWB) and their partners are featured in several presentations. An invited session August 1 from 8:30 a.m. to 10:20 a.m. will highlight recent SWB projects, including work related to the European migrant crisis and winter shelter for survivors of the 2015 Gorka earthquake.
Keep in mind that many of the most valuable presentations will be those on methodology that normally don’t say D4G on the label. Margaret Levenstein’s paper, “Transparency, Reproducibility, and Replicability in Work with Social and Economic Data” is one good example. Presentations about working with public data sources, such as those mentioned in the May Stats4Good column, and those focusing on collaboration and communication with nonstatisticians will be especially helpful.Not Attending JSM?
Not going to JSM, but interested in doing more with Data for Good? The presentations and other resources are not for attendees only. As JSM is a nexus of all things statistical, searching the speakers, talks, and posters is valuable for anyone, but perhaps most of all for those unable to attend. Most of the research for this month’s column came from the JSM online program, which is a tremendously valuable resource. Each person will want to look for subjects and speakers that interest them most. If you can’t be there in person, you can still mine the presentations, look for opportunities, and make connections for your next project.Bringing Data for Good Home
There are so many great opportunities at JSM, and everyone can make Data for Good one of them. Be sure to take some time to talk with presenters. Think about possibilities for your next D4G project and get connected with the people involved. When you are ready to leave, be sure to bring JSM—and Data for Good—home with you!