Databreaches: Hardening Data Protection on Your Web Servers
This is our deep dive into aspects of OWASP’s #3 top ten web application risks: sensitive data exposure. We’ve looked at cloud database access misconfigurations; at preventing exposure through better user password hygiene; and now at protecting data on the server side.
Hackers and sensitive data
All businesses store vast amounts of data, much of it collected by web applications and stored in internet-accessible data stores. Very broadly, this data can be categorized as user credentials (UID and password), personal (customer) information, and sensitive company information.
One of the biggest risks for this stored data is complacency, especially within SMBs. There is an ongoing perception that a business can be too small or not store enough data to be targeted by an attacker. This is not true. The official 2018 statistics from the UK government’s Department for Digital, Culture, Media & Sport (DCMS) estimates that over 40% of small businesses experienced a data breach or attack over a period of 12 months.
Over 40% of small businesses experienced a data breach or attack over a period of the last 12 months
Globally, the Verizon 2018 Data Breach Investigations Report (DBIR) found that 58% of all breach targets were small businesses. In the U.S., the problem is considered so severe that a new law (the NIST Small Business Cybersecurity Act) was enacted on 14 August 2018 to make the National Institute of Standards and Technology (NIST) provide easy-to-use cybersecurity resources specifically geared for small businesses.
Hackers want all a company’s stored information. Credentials can be used against other accounts where the user might by re-using the same password; personal data can be collected, collated with other bits of personal data from elsewhere, and used for identity theft; and sensitive company data can be sold (especially if it is intellectual property) or used for extortion.
Loss of this data will lead to brand damage, lost business and remediation costs. If it involves the personal information of EU citizens, it could also lead to huge fines from the GDPR regulators.
The problem, however, is that no security expert will say you can guarantee to prevent all breaches – the consensus is that it is not a question of whether you will be breached, but when you will be breached. Sooner or later, you will lose some or all this sensitive data.
Since you cannot ultimately protect your network, the onus falls on protecting the data itself. Broadly speaking, this requires the application of either hashing (for credential databases), anonymization (for personal information databases), or encryption for all other sensitive company data.
Adequate use of these technologies will keep the data safe (mostly) if it is stolen, and will ensure the most lenient treatment from GDPR regulators following a breach.
Three technologies for protecting sensitive data on the server
Hashing is the application of a mathematical algorithm to a variable length string of characters. The process has three primary characteristics. Firstly, the output from the algorithm is a standard fixed length regardless of the length of the input. Secondly, different inputs create different outputs. And thirdly, you cannot reverse the process to determine the input based on the output.
These characteristics make hashing technology ideal for handling user credentials. At registration, the user provides a password. This is hashed, and only the hash (output) is stored on the server. At subsequent logins, the user again provides the password. This is again hashed and compared to the hash output stored with that user name.
Hashing is a method for ensuring that the user credentials remain private and are never stored in plain text on the server
If it matches, the login succeeds. If it doesn’t match, the login fails. It is a method for ensuring that the user credentials remain private and are never stored in plain text on the server. If the system is breached and the credential database stolen, there is no way the hash can be reversed to discover the user’s actual password.
Password cracking involves guessing a password, running the same hash algorithm, and seeing if the output is the same output as that in the password database. This is why user passwords should be complex and unguessable – forcing the attacker to try every conceivable input to find the corresponding output.
There are numerous hash algorithms. Which algorithm should be chosen depends on the security needs of the data and the framework of the specific applications. For Java applications, OWASP recommends the Argon2, PBKDF2, Scrypt or Bcrypt libraries Also worth considering is the SHA (Secure Hash Algorithm) family. More specific information on hashing can be found in OWASP’s cheat sheet on password protection and NIST’s Secure Hash Standard guidelines.
Further characteristics to consider are the use of a ‘slow’ algorithm, and to employ iterative hashing. The former introduces a minimal delay overhead at password registration and subsequent logins for individual users, but an excessive overhead and delay for a hacker attempting to crack a large number of stolen hashed passwords.
The latter – re-hashing the hashed output one or more times – has a similar but more extreme effect; and is consequently not so frequently used. As always in such cases you should be guided by risk analysis on which method to choose.
Finally, hashed passwords should always be ‘salted’. This is the addition of extra characters to the password before hashing. It changes what might otherwise be a simple, guessable password into something more complex. Under such circumstances, the method of salting must be kept secret. If a hacker knows how your salt is applied, he can do the same in his cracking attempts.
Encryption differs from hashing in one major way – with the correct decryption key, encrypted data can be returned to its original cleartext form. This makes it an ideal protection for documents containing sensitive company data still in use. The document can be encrypted for storage and decrypted for use.
Encryption is an ideal protection for documents containing sensitive company data
The weak point for encryption is management of the decryption keys. If these can be accessed by an attacker, then the encryption offers no protection. Systems and methods for managing and protecting decryption keys are essential. OWASP has provides its own guidance on implementing encryption in web applications.
The advantage of encrypting sensitive data is that strong encryption well-implemented cannot be accessed by criminals without the correct decryption key. As a result, data protection regulators often consider that stolen encrypted data is not necessarily lost data.
The big disadvantage for encryption is that encrypted data cannot be processed. It is not useful for structured databases in constant use – for example, you cannot search for specific strings within the database because they no longer exist in that format.
This currently deters the use of encryption for such databases. It may change in the future with the evolution of an emerging type of encryption known as homomorphic encryption. This allows encrypted databases to be searched without requiring prior decryption. Such technologies exist, but are not yet mainstream.
Marketing is one area that suffers from this. Many companies maintain a database of customer or user information that can be used for marketing purposes. This is sensitive personal information often subject to data protection laws such as GDPR and HIPAA. But because the databases are in constant use, they cannot easily be encrypted. In such circumstances, the less secure option of data anonymization can be used.
At a basic level, anonymization ensures that stored personal data cannot be directly related to individual concerned. The rest of the data remains in cleartext and can be processed. The theory is that if this data is stolen, it cannot be used by criminals because the individual concerned is unknown.
Anonymization ensures that if data is stolen, it cannot be used by criminals because the individual concerned is unknown
The advantage of anonymization is that it is a process accepted by data protection regulators. GDPR, for example, defines anonymous information, as “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” In such instances, GDPR simply does not apply.
Achieving an adequate level of anonymization, however, is not easy. The UK data protection regulator (the ICO) has provided a detailed code of practice: Anonymization: managing data protection risk.
Despite acceptance by data protection regulators, it is perhaps useful to note that security experts believe it is impossible to so completely anonymize personal data the de-anonymization cannot be achieved.Finally, it is important to remember that while hashing and encryption are security features, anonymization does nothing for cybersecurity. Anonymization is primarily about fulfilling compliance requirements.
The three technologies of hashing, encryption and anonymization will go a long way to protecting you against OWASP’s #3 web application risk – that of sensitive data exposure.
But while the technologies are mature and well-understood, they are not always simple to implement effectively. Smaller companies with limited in-house expertise should always seek specialist professional advice.
There does, however, remain one problem – few companies know where all their sensitive data is stored. If you don’t know where it is, you cannot protect it. One starting point could be High-Tech Bridge’s Immuniweb Discovery service; once you have a concise, comprehensive list of all web applications and dependencies, you can start tackling the issues of compliance and sensitive data protection.