Questions tagged [anonymization]

15 questions
45
votes
6 answers

How can I transform names in a confidential data set to make it anonymous, but preserve some of the characteristics of the names?

Motivation I work with datasets that contain personally identifiable information (PII) and sometimes need to share part of a dataset with third parties, in a way that doesn't expose PII and subject my employer to liability. Our usual approach here…
Air
  • 832
  • 9
  • 20
15
votes
5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…
mal
  • 253
  • 1
  • 6
8
votes
2 answers

Data anonymization in Python

I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before…
Muhammad Ali
  • 2,509
  • 5
  • 21
  • 22
7
votes
1 answer

How do we make data Obfuscate or "De-identificate" to make it anonymous and share it publicly?

Right now, I am working on preparing a small dataset for release to the public by getting rid of sensitive information. While working on it, I wondered... what are the best practices of dealing private or sensitive polynomial attributes in a…
mlane
  • 173
  • 4
6
votes
2 answers

Name Anonymization Software

Although I have seen a few good questions asked about data anonymization, I was wondering if there were answers to this more specific variant. I am seeking a tool (or to design one) that will anonymize human names from a specific country:…
Stumbler
  • 163
  • 5
4
votes
1 answer

How to release datasets with fingerprinting

I intend on monetising some large datasets. These datasets are anonymised and released to (paying) clients via a web api. Are there any standard algorithms such that if the datasets are intentionally leaked publicly, the data can be altered such…
DataAnon
  • 43
  • 2
2
votes
1 answer

How do you choose an appropriate $k$ to achieve $k$-anonymity for data?

How do you choose an appropriate $k$ to achieve $k$-anonymity for a data? What methods exist that are agnostic to the business context for the problem?
kevins_1
  • 737
  • 8
  • 11
2
votes
1 answer

Does data anonymization conflict with GDPR rules?

There are GDPR articles that relate to a person's ownership of their data e.g., Art. 17 GDPR Right to erasure (‘right to be forgotten’) and Art. 20 GDPR Right to data portability. In case one would anonymize the data without a way to "restore" the…
thinwybk
  • 203
  • 1
  • 2
  • 8
1
vote
5 answers

How to protect data from internal data scientists?

In our company we want to protect data privacy internally. Meaning, we want to find a way to anonymize the data so the data science team members cannot expose it and yet still can use it for modelling. I googled and read about Pseudonymization. But…
Ahmedn1
  • 131
  • 2
1
vote
2 answers

How to evaluate k-anonymity for a dataset which is only a sample/subset

I work with a trajectory dataset which holds records from people using a certain ticketing app (for public transportation). The trajectory describes the route (i.e. an array of stations) of buses, trains etc But, there is only a small set of…
1
vote
0 answers

Evaluation of the preprocessing to make a dataset anonymous

I have a very huge dataset from the NLP area and I want to make it anonymous. Is there any way to check if my pre-processing is correct? Generaly, is there any way to evaluate how good is the pre-processing for the anonyminity? I want to mention…
0
votes
1 answer

How to identify a field as holding personal identifiable information from the name of the field itself using ML model in python?

Is it possible to automatically detect fields holding personal information (name, phone, address, SSN, passport, gov ID...) from its names, using python in order to upload datasets into the cloud after encrypting or anonymizing the PII fields? I am…
alim1990
  • 173
  • 1
  • 8
0
votes
1 answer

How to write custom de-identification algorithm in Python?

I have tried a simple algorithm to anonymize my data using the de-identification technique. But the code doesn't work for me. I want to anonymize the data by slightly changing the values. The data sample is available here import pandas as pd import…
Muhammad Ali
  • 2,509
  • 5
  • 21
  • 22
0
votes
2 answers

Anonymizing data

In https://www.kaggle.com/c/santander-product-recommendation/data it mentions that Please note: This sample does not include any real Santander Spain customers, and thus it is not representative of Spain's customer base. What are the ways where…
william007
  • 775
  • 1
  • 10
  • 21
0
votes
1 answer

Data Anonymization for all domains?

I am using a dataset from Marketing and sales department. The dataset contains customer name (company name), company address, pincode, no of orders placed, revenue generated from that customer etc. My question is whether I should hide/mask/anonymize…
The Great
  • 2,725
  • 3
  • 23
  • 49