TU Berlin

Service-centric NetworkingRieder, W. (2020). On The Usefulness of HTTP Responses to Identify Differences Between Non- And Web Trackers. Bachelor Thesis, Technische Universität Berlin

Page Content

to Navigation

Bachelor Thesis: On The Usefulness of HTTP Responses to Identify Differences Between Non- And Web Trackers

Title:

On The Usefulness of HTTP Responses to Identify Differences Between Non- And Web Trackers

Description:

Web tracking still poses a serious threat to the privacy and security of users to this day. A variety of different approaches has been taken to tackle this problem. This includes the use of predefined blacklists as well as the use of machine learning for automated web tracker detection. Both approaches have in common that the detection and blocking of web trackers is partially based on extensive research of HTTP traffic, especially, HTTP requests. However, the observation of mostly HTTP requests might not provide a complete picture on how potentially third-party trackers operate or even if they can be rightfully classified as such. Therefore, this thesis aims to better understand HTTP responses and their usefulness in the

context of web tracking. Firstly, a browser extension recorded HTTP response headers and additional meta-data. Secondly, a JavaScript program processed and analyzed the data sets with an overall population of N = 21 838, complemented by a Command-line interface. Finally, a feature set containing ten features based on HTTP headers was derived and prototype classification systems were built. This whole process was backed up by statistical analysis and several metrics in R.

The analysis showed several significant differences and similarities between HTTP header

names and values of non- and web trackers. Statistically significant associations between multiple HTTP headers and trackers can be concluded, substantiated by effect size. Furthermore, the developed feature extraction process identified four new features that had not been considered in previous research regarding classifiers based on HTTP headers. The other six, either confirmed previously identified features or opened up a new perspective on the composition of these features. Prototype classification systems for the evaluation of the effectiveness of the feature set, and as an approximate estimate on how well such a system would perform, resulted in fairly good accuracy values of 0.7470 and 0.7536. This thesis postulates that HTTP responses are indeed useful for the differentiation between

non- and web trackers and as additional features for the automated web tracker detection. However, the results should be viewed critically, as the population size is small compared to past research, first- and third-party trackers were not considered separately, and another ground-truth, might show different findings. Further assessment and confirmation is needed to test the reliability and validity of the presented results.

Supervisor: Philip Raschke

Type:  Bachelor Thesis

Duration: 4 months

Navigation

Quick Access

Schnellnavigation zur Seite über Nummerneingabe