Master Thesis: Predicting the next App based on Smartphone Data
Predicting the next App based on Smartphone Data
The supply of smartphones has been rapidly growing, and smartphones have been popularization in our everyday lives. According to many statistics, people tend to use smartphones more and more, numerous applications are produced and utilized. In the meantime, people are interested in ways to enhance some situations around application use. Mainly, people focus on practical methods of predicting the next application that can be benefits for three areas: (a) opening the user the next application on the screen within a simple interaction; (b) pre-loading or keeping the application in memory; (c) optimizing network environments for using the applications. The fundamental of all of them is finding an appropriate way to predict the application that he/she is likely to use next. Hence, this thesis addresses the problem by building a prediction model. Because the early stage of researches related to application usage prediction was executed based on contextual information. However, the new stage has concentrated on making a prediction model which can predict which applications will be used next. Nevertheless, most of them were suffering from prediction accuracy because of the amount of data or diversity of data. Moreover, they did not give enough information about the prediction of their models for new users who did not participate in training phases. It is significant to make a prediction model with similar conditions with the real world to apply the model in actual circumstances.
Therefore, this thesis tries to conduct data analysis of the TYDR dataset and to make a prediction model based on LSTM networks with sufficient amount and varieties in the dataset: about 22 millions of application usage of 975 users and 19,485 applications which are gathered during nine months. Through the data analysis, we discover the application usage history is the crucial feature of the dataset. Also, the users have some patterns depended on temporal elements. We implement prediction models applied LSTM networks which have not been tried by other studies and are particularly beneficial to this problem because it treats time-ordered application usage data. The experiments try several time units: 5 minutes, 1 hour, 12 hours, 1 day, and 1 week to consist of a sequence of inputs for LSTM networks. The best result is that about 76.3% of Recall@8 score by the bi-directional LSTM model with 1 hour time unit for a sequence of inputs. Also, if we evaluate the model in comparable conditions with other studies (fewer applications, testing with existing users in the training dataset), we can accomplish about 90% of Recall@8. Furthermore, one of our models, the encoder-decoder LSTM model, can predict a sequence of ten time-steps next applications with about from 60% to 70% of Recall@8 that has never investigated by other researchers.
Supervisor: Katerina Katsarou 
Type: Master Thesis
Duration: 6 months
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409
e-mail query