Making a hash of it: The lowdown on Inland Revenue and your data

08:30 am on 16 September 2024

Phil Pennington, Reporter

phil.pennington@rnz.co.nz

Photo: AFP

Analysis - The encrypted details of hundreds of thousands of taxpayers are being given by Inland Revenue to Facebook, Instagram, LinkedIn and Google for targeted advertising - but how good is the encryption?

Inland Revenue and its controlling minister Simon Watts say it is a lock.

"This process is irreversible," Watts told RNZ on Monday.

Many others demur. They say it is easily reversible - and one software developer built us something to show how.

Oversees, the objectors include the US Federal Trade Commission and the European Data Protection Supervisor. Closer to home, various software consultants contacted RNZ after the story ran to express their concerns.

"I was absolutely stunned that this is happening and I think there is a very clear privacy breach going on here," one emailed.

"The IRD seem to be saying: We have a secure method (hashed data) to communicate with data-hungry multinational tech organisations who make money by building products based on the data they can collect about you. Clearly that's a bad argument," another said.

A third, ex-hacker Adam Boileau, was blunt about sharing details with organisations that already had billions of data points to start with.

"Using hashing or other data aggregation in this context is, sadly, just a technological sleight of hand trick to bamboozle," he said.

One of Inland Revenue's defences for choosing to use what was arguably the best-bang-for-buck approach to targeted advertising that tax money could buy - after all, papers show it only spent about $400,000 with Facebook this way in the last six years - was a technical one: "Hashing is a type of cryptographic security method that turns identifiers into randomised code and cannot be reversed so identities are protected," it said.

"For example, john.doe@ird.govt.nz may come out hashed as wLKziR/6RoXDv1MDaXLH1UNUC9nIVr97jrTnL4TcxsM=. Meta, for example, uses this hashed information and compares it to its own hashed information to build custom audiences."

'Cannot be reversed' - really?

Maybe 15 years ago (or whenever exactly it was that Inland Revenue started down this route - online marketer Jack Yan said it was an early adopter), "cannot be reversed" carried weight.

But times, and tech, change.

After the RNZ story revealed the practice, one software developer set out, of his own volition, to do some tyre kicking.

"To prove how ineffective hashing is for anonymising a finite set of values, I created a simple programme that converts any hashed (encrypted) NZ landline [phone] number back to the original (unencrypted) number," he told RNZ.

How long does it take?

"0.15 seconds."

He called his programme the 'SHA256 Generator': SHA stands for standard hash algorithm, and SHA256 is the algorithm Facebook uses (which was introduced along with three other hashing algorithms, globally, over 20 years ago).

Here is the consultant's DIY recipe for reversing irreversible hashing: First, generate a list of all possible phone numbers for each area code. "For example, for the South Island, 03 000 0000 to 03 999 9999."

Next, generate a SHA256 hash each. The 'Generator' will do that for you, super quick.

Store it in a database. Then when a hash lands, and you think it might be a phone number, ask the database.

"This is a well known technique for attacking hashed values," the consultant said.

As you might imagine, someone streamlined and packaged up this approach, calling it 'the rainbow table'.

"A rainbow table attack is a password-cracking method that uses a special table (a "rainbow table") to crack the password hashes in a database," a tech website said.

Cyber attackers can use databases to decode hashed values. (File image) Photo: 123RF

Boileau, the technical editor at Risky.biz, does weekly podcasts on security news. He compared hashing to a meat grinder for lumps of data.

"You can't tell by looking at a hash-sausage which bits of the pig went in."

A cyber attacker who stole a file of passwords had to attempt to decode the hashes, "to put the sausage into the grinder, turn the handle backwards, and get a pig out", he said.

"Instead of this folly, what we - I spent 20 years as a professional hacker - do, is just hash every word in the dictionary and see if we get a match."

Surely that takes ages?

No. "Using the power of modern 3d gaming graphics equipment, we can do this at speeds of hundreds of billions of words per second. The maths for both are basically the same."

If you know something about the nature of the data that has been hashed, then the reversing gets even easier. For instance, if it is likely info on gender, or dates of birth, or phone numbers, or credit cards, then simply computing a hash for every possible phone number or credit card "is trivial, mere seconds or minutes of compute".

"Ultimately, there is no easy way to share data with someone or an organisation you don't trust, especially if that organisation already has billions of data points to start with," Boileau said.

"If they want to correlate or investigate to de-anonymise data, they can do so."

They can. but do Facebook and Google and LinkedIn want to? What is in it for them, if they already have your name, date of birth, address, phone, and email contact?

"Look at the kinds of adverts that are being posted, which are targeted at specific people by the IRD," Daniel Wilson, a lecturer in the school of computer science at Auckland University, said.

Inland Revenue said it targeted ads at people with an income tax debt due, or a GST debt due, or a student loan debt due, or needing a Working for Families update.

"What happens if the aim of IRD is successful and someone clicks on one of the IRD adverts dished up by Facebook?" Wilson said.

"Facebook, for instance, keeps track of your ad activity." (You can check that out by going to 'menu', then 'Recent Ad Activity'.)

"So if I click on the IRD 'sort out your income tax debt' advert, that is logged ... giving information to Meta that, for instance, I am likely to have an income tax debt is pretty sensitive stuff.

"This is in a different league from Meta knowing that I am a fan of, say, popular science books."

Inland Revenue offered other defences, including that this was both within the law and an effective way to get tax revenue back.

It also stressed that it trusted the tech companies to do the right thing, including deleting the taxpayer's info quickly after use.

Wilson said Inland might think deletion limited its responsibility.

"But in the broader system context, if IRD is successful in their aim of getting a client to click on a specific ad that indicates a particular tax liability, this information is logged and - in the current environment - is free to be used by social media companies for activities like training AI systems," he said.

"Social media organisations would not have been able to collect this specific kind of information without IRD's targeted advertising campaigns."

Where is the regulator in this?

The Office of the Privacy Commissioner has told RNZ it did not have a general position on hashing, but could look at developing one, if need be. The US Federal Trade Commission and European regulators saw the need years ago.

One emailer said the commissioner needed to find out more about Inland Revenue, and conjured the slippery slope. "Up until a few years ago there was a red line that health data should not go offshore. That has been gradually whittled down."

Another said that Inland Revenue might be sailing close to the wind. They discussed how Google Adwords Customer Match feature allowed a customer - like Inland Revenue - to upload a list of details to Google to target individuals directly with advertising.

One of the terms of service conditions was that the advertiser had to have a privacy policy which allowed them to share customer data with advertisers and third parties.

The emailer said they did not believe that Inland Revenue had acquired "the knowing, uncoerced consent for this usage of my private information" as required under the Privacy Act.

Inland Revenue defended hashing - then, after the RNZ story ran, said it would take another look "to ensure it is still safe to use".

But when was it last safe?

Making a hash of it: The lowdown on Inland Revenue and your data

'Cannot be reversed' - really?

Related Articles

Next Article

Related Articles

IRD data sharing: Safety of anonymising detail to be examined

Inland Revenue giving thousands of taxpayers' details to social media platforms for ad campaigns

War between landlord and tenant over sign on fence escalates to power, water being cut