Presenting the importance of returning to security basics, the nature of differential privacy, and better tools to measure and improve your privacy and data governance
At this year’s Avast Data Summit, an internal event primarily intended for Avastians, a combination of Avast leaders and industry thought leaders gave seminars at the intersection of privacy, data, and security. Since its inception, Avast Data Summit has been always the event which makes Avast data-driven and connects privacy- and security-focused professionals with accomplished business thought leaders. Many of the topics presented at the event can help you classify, work with, and better secure your data. Following these suggestions can better protect your customers’ privacy and improve your own corporate security profile.
Companies exist in a changing data landscape. There is an evolving collection of data sources and products that are used to produce reports, management objectives, and guide a variety of corporate initiatives such as improving customer experience and product features. The evolution of data means having a group of data curators who determine how trust relationships are determined and what data gets deleted and what is retained. All Avast data is trusted, understood, and used in a meaningful, efficient, and secure manner. That’s our ultimate goal.
A detailed diagram of the Avast Data Landscape presented during the event
In this post, we’ll discuss three main themes from the session presenters: the importance of returning to security basics, understanding the nature of differential privacy, and how to use better tools to measure and improve your privacy and data governance.
Going back to security basics
The first step in your data evolution seems somewhat antithetical because we need to go back to the security basics. Avast CISO Jaya Baloo and Red Team Lead Stephen Kho spoke at a fireside chat during the seminar. Some of these basics are well known to many of you: investing in IT protection up front, rather than waiting to react to a breach in the future, for example. Or what Kho calls avoiding spending millions of dollars on shelfware or software that is purchased but quickly becomes unused and therefore put on a shelf. Baloo has frequently mentioned in previous talks that “security is a journey, not a destination,” meaning that you constantly have to re-evaluate your collection of tools and best practices. She mentioned that her role as a CISO for various organizations has often been more akin to the Greek goddess, Cassandra, who often accurately predicted the future, but few believed her. (She was talking about previous jobs, thankfully!)
Both speakers drew analogies to the Kubler-Ross stages of grief to breaches. What that means is security teams need to move towards acceptance that their infrastructure will eventually become a target and get hacked: “There is no point in getting stuck at the denial stage,” Kho said.
Baloo spoke about the security roller coaster: “We ride it up to a particular plateau where we get more resources to resolve certain issues, but then everyone’s focus shifts to other priorities, and we ride it back down until the next incident happens.” The event moderators liked that picture and Avast Global Head of Security Jeff Williams welcomed everyone to the “Avast Theme Park” later in the event.
One step towards better security is by creating a unified “purple” team out of the separate red (attackers or penetration testers) and blue (defenders or security operations center staffers). Having both teams work together can help identify infrastructure weak points and places that need better monitoring, for example.
Kho said, “Attacks will happen and happen again. You have to be thorough enough to fix as much as possible, otherwise you will get hacked again. This is especially important that you fix as many bugs with your code before you make it live.” This was illustrated by a session featuring Sean Vadig, who worked at Yahoo and now is part of Verizon Media. He reviews some of the behind-the-scenes security issues with two different breaches that happened in 2013 and 2014, when millions of customer records were exposed online. Both were likely caused by Russian state-sponsored hackers who took advantage of lax security practices. The team couldn’t connect a series of small intrusions to piece together the larger picture and realize that the hackers were still inside their network, to Kho’s point.
Back then, Yahoo had a terrible corporate culture where its security team didn’t want to work with other stakeholders and had a “trust no one” philosophy that made it difficult to recover from the attacks. “We also couldn’t quickly bring the right people in to help us understand what was stolen and its relative importance and context. We didn’t even know the names of our most critical files! The security team should be the enabler and protector of corporate revenue, and not just produce friction,” he said. “Security should speed up rather than slow down the process of getting code written and put into production.”
Vadig emphasized that going back to basics would have helped make both breaches less likely. For example, tightening access rights to prevent lateral movement across their networks, resolving vulnerabilities and patching quickly. “This should have been built into our corporate culture.” One of his recommendations was to build security into personnel promotions to show how it is valued by management.
Understanding differential privacy
Avast AI Data Scientist Sadia Afroz gave a talk on this topic, and it was interesting to explore how privacy can be seen in various shades of grey and isn’t just an all-or-nothing approach. She presented a series of scenarios taken from real world situations involving customers. For example, just deleting a user’s data doesn’t guarantee their privacy, because residues of replicated their data could remain on various other systems. “We have to do a better job of measuring our customer’s privacy, because it isn’t free and losing their trust could have a real cost to our business.”
Afroz cited studies that show with just a few pieces of information about someone, such as their birth date, their zip code and their gender, could make their identification nearly certain. “We need to be asking the right questions, such as how we will use their data, what analytical tools we have or will have, and does our analysis make our customers more or less identifiable as a result?” She posited a series of scenarios where a trusted data curator could play the role of a privacy intermediary or firewall between the data owners and the analysts. But what happens when data is published to untrusted places or if the curator’s trust is broken? These and other issues raised were thought-provoking. Afroz mentioned a series of blog posts by NIST on the topic. The posts go into more specifics about ways you can extract key business metrics, detect trends and analyze statistics in your data, yet they still preserve your customers’ privacy.
Tools to help improve your privacy and data governance
Sara Jordan is a senior researcher at the DC-based think tank Future of Privacy Forum. She gave a talk about various AI tools that can be used to improve your privacy and data governance. Part of this movement is towards making AI tools more ethical and transparent, which means developers have to be clear about the underlying technology and how they use various data pipelines to build their models. Part of this transparency is also to understand what biases the developers bring to the modeling effort, and to ensure that these biases don’t get enshrined into the code itself. Another part is to have legal and ethics screening and controls, such as review boards, to ensure that the tools operate as intended and don’t abuse data governance privacy.
The tools fall into three basic categories:
First is annotated data diagrams, which illustrate what is in a particular dataset and other features). One effort is from Microsoft called Datasheets for Datasets. This helps us try to understand data quality. “It is like a nutrition label on food products, such as use cases and what are the ingredients. If these are attached to our data sources, that helps data users understand the flows of data and matches them with an analyst’s expectations, and does the right people have the correct access level for particular data elements.”
A second group of tools involve using model registries or inventories. This helps analysts understand the various relationships among your separate systems and track data dependencies and their consequences. These registries can help you keep track of upgrades and are also useful in calculating the return on investment of your data models. She cited the exemplary registry used by the City of Amsterdam that tracks all their algorithms deployed for such activities as automated parking control or illegal apartment rentals.
The third group of tools involve privacy-enhancing technologies. “We need to ask ourselves what is the utility of the data once we apply privacy to it? Making things private can be a technical challenge, but if we apply differential privacy methods, we can preserve privacy and have minimal loss of its usefulness. Is this a trade-off that we can accept?” Exploring this trade-off and the budgetary implications is a useful approach.