The Importance of Data Masking

A guest blog by Steve Pomroy, Imperva

The Uber Breach and the Case for Data Masking

You’ve probably heard about the Uber data breach, involving the personal data of 57 million Uber customers and drivers and a six-figure bribe. Without a doubt, it’s nasty business. But for all that, this breach stings a little more for me and not just because I’m probably one of those Uber riders whose data was stolen. It’s worse because these data thieves could have been left holding the equivalent of digital fool’s gold instead of Uber’s crown jewels, so to speak. Let me explain.

Sensitive Data and Development Environments

If you read the article closely, you’ll notice that the hackers appeared to work their way in via the software engineering side of the house. They ultimately accessed an archive of sensitive rider and driver data after obtaining login credentials used by the software engineers (which the engineers had publicly posted in GitHub). In other words, the attackers found a copy of the production data that was being used by Uber’s software developers to improve and enhance its systems and applications. There is a fundamental problem with this approach to development. First and foremost, sensitive data should never be used for software development purposes.  

The approach of simply copying a production database and dumping it into a lightly secured development environment might have been acceptable 20 years ago, but it’s completely unacceptable now. Why? Because it not only needlessly amplifies the attack surface with each copy, but industry best practice masking technology is available that greatly reduces the risk while retaining the functionality of the data.

That data should have been masked (deidentified / pseudonymized) before being opened up to the software engineers. Not only would it provide developers with the realistic data they need to build high quality software, it would also protect the data subjects (Uber riders like you and me). By the way, copying data for dev purposes like this flies in the face of the data minimization rule found in the EU’s GDPR. If this had happened after May 2018, it would certainly run afoul of the new rules.

A Different Outcome

Every time I read one of these headlines, I cringe. It’s every executive’s worst nightmare to stand in front of the media and/or lawmakers and talk about the millions of user accounts stolen, the fraud protection they’re putting in place, and the “renewed” approach to security they’re (finally) taking—not to mention the fines regulators may impose, the brand damage, share price hit, etc. And, of course, there are the customers like you and me who have our personal data stolen all over again.

And then I turn the scenario completely inside out:

The CEO calls a press conference to announce that hackers broke into their systems and attempted to steal the data of 57 million customers from one of their development servers. Luckily, the stolen “data” had been masked. Not only did the attackers steal data with zero street value, the company worked with investigators to track down the hackers attempting to bribe the company for their silence. The message in this scenario couldn’t be clearer: here’s the proof that our company takes the security of your data very seriously.

While data masking wouldn’t have prevented the Uber breach, it certainly would have mitigated the impact. Masked data reduces risk exposure and in this case would have kept Uber’s customer and driver data safe (and useless) in hackers’ hands.

Follow Us