Inhalt des Dokuments
Es gibt keine deutsche Übersetzung dieser Webseite.
Master Thesis: Injecting differential Privacy to the word Movers distance for user regulated Data Privacy
Injecting differential Privacy to the word Movers distance for user
regulated Data Privacy
It’s very common to find similarities between users based on
their profile especially in social networking services and eCommerce
applications to recommend similar products to similar
users. User’s similarity has a wide range of applications such as social networking services connect people with new friends based on user profile data and common interests. Personal
text messages are a good candidate to explore to find similarities between the users. Text messages contain high privacy risks because the contents of the text messages are normally very sensitive and personal. In this thesis, we propose different techniques to obfuscate author’s information. Word Mover Distance is used to find the similarity between the users and we inject noise into Word Mover Distance. If two users want to find the similarity between them, they need to share their word embedding vectors. In this way, no one will know the exact words used by another user as the text is already converted into vectors and the embedding is secret. We introduced six different technique to inject noise into original data. First, we create a noise collection of unrelated random text. Then we choose N random words from noise collection and inject them into original data. The approaches are to add these random words to original text without changing original text, deleting N random words from original text, replacing N random words from text by N random words from noise collection. Here N is a user regulated variable ranging from 0-15. Similarly, we also created random vectors whose values are between [-N, N] as noise vectors and add these vectors into word vectors and weight vectors to alter the vectors. Here N is a continuous variable ranging from [-0.25,-0.2,...,0.2,0.25]. We used 20newsgroups data to apply our six different techniques and compared the results by calculating the accuracy before and after applying noise injection techniques. Deleting, replacing words and adding vector to weights showed drops in accuracy but preserve more privacy while adding words and vector into word vectors has less impact on accuracy but also less private as compared to other techniques.
Supervisor: Tobias Eichinger
Type: Master Thesis
Duration: 6 months
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409