Inhalt des Dokuments
Master Thesis: Efficient and Privacy-aware Similarity Estimation in D2D Scenarios
Efficient and Privacy-aware Similarity Estimation in D2D Scenarios
Centralized Online Social Network (OSN) platforms suffer from major privacy concerns of users who have to entrust a lot of private data to one single service provider. One advantage of such a centralized architecture is that it is easy for the service provider to conduct profile matching and suggest similar users as new friends. In peer-to-peer based Online Social Networks, privacy concerns are tackled.
Recent developments in sensor technologies and web services make the smartphone the optimal social networking device: it typically has only one user, is highly personalized, and contains location and other context data about the user. Additionally to explicitly expressed friendship connections with other users, similar users – in terms of some profile feature, e.g., taste in music – can be found and connected to in a different layer of the social graph. To address privacy concerns, utilizing P2P technology for a distributed OSN, the social graph – indicating the connections between users – can be stored in a distributed manner, storing information about A’s contacts on A’s device. In order to extend the graph, new users that are similar should be added to A’s connections.
In this work, the idea for a peer-to-peer OSN that utilizes users’ smartphones as nodes that automatically collect location and context data to form a profile about its user, is followed. In general, there are the problems of efficiency and privacy-awareness. In order to connect to new users, generated profiles should be compared with other users. If the comparison indicates that two users are similar with respect to some profile feature, an edge in the social graph should be created. In order to protect the users’ privacy, profile (features) should not be compared in clear text. Furthermore, if ad-hoc connections between smartphones are used, bandwidth constraints have to be considered.
In the described scenario, this thesis tackles questions of similarity estimation in Device-to-Device (D2D) scenarios. Probabilistic data structures like Bloom Filters, Count-Min Sketch, or Stream-Summary require very low data for storage. On the other hand, as these data structures are probabilistic, they are somewhat inaccurate regarding what they represent. This inaccuracy can be used to tackle the problem of privacy. Utilizing such data structures, in this thesis, it should be investigated, to what extend the similarity between users can be estimated and what similarity measures are appropriate. Regarding privacy or anonymity, concepts like k-anonymity or Differential Privacy, or an iterative similarity estimation approach, could be viable options.
For this thesis, we provide a sample data set, based on the Million Song Dataset (http://labrosa.ee.columbia.edu/millionsong/ ). This data contains information about what songs users listened to how many times, including a user ID, song title, artist name, album name, and genre.
The goal of this thesis is to develop a framework that facilitates similarity estimation on online social network profile data. This includes the following tasks:
- Provide an overview on the state of the art of appropriate data structures for the given scenario.
- Provide an overview on the state of the art of similarity measures.
- Design a framework that can select and execute the relevant similarity estimation.
- Prototypical implementation of the framework.
- Evaluation of the system’s performance with regard to resource efficiency and privacy-awareness, based on the given data set.
Supervisors: Felix Beierle, Kai
Type: Master Thesis
Duration: 6 months
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409
e-mail query