Inhalt des Dokuments
Es gibt keine deutsche Übersetzung dieser Webseite.
Master Thesis: Cardinality Estimation for Spatio-Temporal Data Streams
Cardinality Estimation for Spatio-Temporal Data Streams
With the advancement in mobile communication systems, huge amounts of data is being continuously generated. Data analysis of such huge datasets is a common task nowadays. This data also contains spatio-temporal information of the users which worked as a driving force in growth of location analytics. One of the very basic problem in location analytics is to find the number of unique users in an area in recent time. This information is important specially for telecommunication and traffic management companies in order to appropriately deploy the required resources. When requirement is to find the number of unique users in an area, it is not very efficient to store all the user data received as individual user data. On top of that, companies have to worry about the user privacy because of the governmental laws and user’s concern about a possible breach. Processing this huge amount of data not only requires the memory re- sources but also provides a computational challenge. In these situations, traditional approach of storing the users’ data and running complex computations is not an optimal solution, when requirement is to only get a count of unique users.
Probabilistic data structures are data structures which gives an answer with some probability. One type of probabilistic data structures estimates the cardinality. In this master thesis, a new approach of using cardinality estimation data structures in context of spatio-temporal data streams is introduced to address the problem of finding the number of unique users in an area, in a recent time window. Instead of giving us the exact count, cardinality estimation algorithms gives us an estimate about the cardinality but dramatically reduces the computational and memory resources. Inaccuracies attached to these algorithms are controlled. In this master thesis, a detailed discussion about cardinality estimation algorithms and spatio-temporal data streams is presented. We also designed and implemented a scalable framework to support cardinality estimation for sliding window based spatio-temporal data streams.
Evaluation of our framework with real users’ data and artificially generated data tells us that our system is very viable compared to traditional systems, in context of cost attached to such systems. For scenarios, where a small compromise on accuracy does not affect the business decisions, this framework is very feasible.
Supervisor: Peter Ruppel
Type: Master Thesis
Duration: 6 months
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409