Exploring the intersection of privacy, AI, and security at CyberSec&AI Connected

David Strom 10 Nov 2021

CyberSec&AI Connected strives to deepen the ties between academia and industry in the fields of AI, cybersecurity, and machine learning

Last week, the third annual CyberSec&AI Connected was held virtually. There were many sessions that combined academic and industry researchers along with leaders from Avast to explore the intersection of security and privacy and how AI and machine learning (ML) fits into both arenas. 

In the conference's opening remarks, Avast CTO Michal Pechoucek and Avast Chief Data Officer Miroslav Umlauf set the tone for the first day’s sessions by saying, “The way we work online is changing exponentially.”

Dawn Song, a computer science professor at the University of California at Berkeley, started things off by citing that 2.5 quintillion bytes of data are generated every day, making it a challenge to protect all that information flowing around online. “We can’t copy concepts and methods that we have in the analog world to solve our digital problems,” she said. She outlined a four-part framework for responsible data use that includes:

  • Secure computing platforms, such as the Keystone open source secure processor hardware, 
  • Federated learning, whereby one’s data stays under their control,
  • Differential privacy, using tools such as the Duet programming language and public data sets such as the Enron email collection, and
  • Distributed ledgers that can have immutable logs to help guarantee security.

She was bullish on each of these technologies, saying that in ten years we will all be using them to keep our data confidential by default, what she refers to as a “data commons.” 

Understanding differential privacy

Differential privacy was touched on by several of the conference presenters during the day. Nicholas Carlini, a researcher at Google Brain, studied how private data can be revealed by using the public GPT-2 language model that collected thousands of text messages from the public internet. He showed what is called a membership inference attack, where a ML model can be built to identify a series of data elements that were used to train the model itself. He feels a lot of work is still needed to deploy differential privacy in practice, because the computational challenges are too difficult. “After all, you can’t retract private data once it gets revealed or extracted from models.” 

This was a common theme throughout the first day’s sessions. Alessandro Acquisti is a  professor and privacy researcher at Carnegie Mellon University. He examined several myths about privacy with his own empirical social science research projects. He doesn’t think that privacy is any indicator of something that we are trying to hide, but more of a fundamental human right. He showed research where he used a random picture taken on campus of a student and then proceeded to find digits of that person’s Social Security number. “It is getting almost impossible for individuals to effectively manage their online privacy. That horse has already left the barn.” He thinks it is hard to figure out who will bear the costs for consumers’ privacy, but thinks that if “we are serious about privacy management, then federal regulation is the only way to go.”

The future of personal data privacy

Miroslav Umlauf, the Chief Data Officer at Avast, moderated a panel on the role that AI will play in the future of personal data privacy. There was an interesting discussion among the panelists. David Freeman, an anti-abuse researcher at Facebook, talked about how quantifying risk is more art than science, and having truly anonymous data has little utility. “We are in the early days of understanding privacy as a science, and the more scientific approach and designs we can do, the better we’ll be to keep those privacy promises,” he said. Micah Sheller, a ML researcher at Intl, spoke about his lessons from being involved in various healthcare-related research projects. “I signed consent for a particular purpose, such as for treatment by my doctor. I want to know that data about me has a benefit to me and exactly what it is being used for. That kind of granularity works and is the holy grail that we want to provide users, but it conflicts with what we know about protecting intellectual property.” He spoke about the different perceptions of privacy when people volunteer their private data for cancer research, for example. “I won’t be as cooperative if I am asked to provide my data to help increase my insurance premiums, for example.”

Reza Shokri, an assistant professor at the National University of Singapore, also spoke on this panel. “A big challenge is how much can you guarantee users’ privacy that would hold under unknown future circumstances, when we don’t know what this future data architecture or computational capabilities will entail?” he asked. He suggested using data privacy meters and other tools to help quantify the risks involved, especially when it comes to deploying various differential privacy measures. Another panelist was Dr. Frida Polli, the CEO of Pymetrics. This firm uses behavioral AI to help with workforce hiring decisions. They believe that every user owns and controls all their data-- for example, they can delete their data permanently or at any point in the job application process. She mentioned that there is still a gap between what is ethical and what is legal: “Algorithms that do a horribly predictive job for potential job seekers are considered legal, so we still have a lot of work to do.”

Cybersecurity implications from AI

A second panel discussion looked at the particular implications that AI and ML have for improving cybersecurity defenses, which was moderated by Lorenzo Cavallaro, a computer science professor at University College in London. The panel included Avast’s CISO Jaya Baloo, who stated that they haven’t seen any AI-generated attacks until now, and they are unlikely to see them in the near future either. “AI could be used to do some fuzzing for a supply chain attack but I haven’t seen it yet. It is more likely that we would use AI to help customize a defensive framework, but attackers can find more useful zero day exploits elsewhere.”

Bret Hartman, formerly at Cisco and now teaches at California Polytechnic University, says that numerous cybersecurity startups that he has examined use some form of AI and ML, especially when dealing with massive data collections. “But we are a long way from having AI predict the next ransomware campaign characteristics, for example.” Brad Miller, a software engineer at Google, agrees. “AI is just one component in a larger context. And attackers are looking at easier weaknesses, such as leveraging cryptographic bugs in SSL for example.” Finally, ​​Rajarshi Gupta, the general manager of ML services at Amazon Web Services, spoke about what will make it harder for AI to be deployed for defensive cybersecurity circumstances. “The environment is quickly evolving, software and attacks are very different now from 20 years ago, which means we can’t rely on older data to build our models. This means that ML and AI can’t recognize new malicious behavior very easily. When some new attack happens, AI can’t catch it.”

CyberSec&AI Connected was organized by The Avast Research Lab in cooperation with the Czech Technical University in Prague and the Private AI Collaborative Research Institute.

--> -->