Master Thesis: A Hybrid Recommender System for Documents Based on Enterprise Knowledge Graphs
Hybrid Recommender System for Documents Based on Enterprise Knowledge
In this thesis, a concept for a hybrid recommender system (RS) for documents based on content and graph signals is proposed. The proposed concept considers both personalized and impersonalized recommendations in an enterprise context utilizing an enterprise knowledge graph. The proposed recommender system is a weighted similarity ensemble which takes one or more pretrained models as input and outputs top l recommended documents. The concept follows parallelized voting schema hybridization with optional reranking and dithering. Furthermore, it is parameterizable so that the input models, voting and reranking strategies, dithering, and top l documents can be changed.
The proposed concept for impersonalized recommendations is prototyped with four implementations of voting and reranking strategies as well as dithering. The prototype implements proposed modularity and parametrization so that at run time an RS operator can change the in- put models, their weights, voting and reranking strategies, change the dithering strength, and enable/disable reranking and dithering at all. Furthermore, the RS operator can implement and add further strategies. The prototype is tested on an SAP Community Network(SCN) knowledge graph. The SCN is a platform where users can write blogs, ask questions, make comments, follow blogs, users, and topics. The SCN knowledge graph contains data from more than 20 different systems which are modeled with three vertices; User, Content, and Term. The knowledge graph includes 13,550,980 vertices and 29,719,762 edges. For testing purposes, following two state-of-the-art content-based models and three graph-based models were trained. TF-IDF, which finds similar documents based on weights of shared words between two documents. Doc2Vec, a shallow two-layer neural network that captures semantic similarity between documents. Graph models include graph-embedding model node2vec and a heterogeneous in- formation network model PathSim which, via meta-paths, utilize the full power of knowledge graphs. Additionally, an already trained content-based model FastText was plugged into the RS to test the ability of parametrizing the input models.
To test the hybrid RS for impersonalized recommendations of SCN documents, a quantitative ensemble evaluation was performed on 20 different RS ensemble configurations. For this purpose, a browser extension was developed which injects five recommendations for the currently open blog on the SCN platform. The extension randomly chooses five ensemble configurations and injects one recommendation per ensemble. With the help of the extension, users were able to select the presented recommendations which they would explore further. The evaluation of the user selections shows that 863 blogs were read and 1603 recommendations were selected in total. This represents a click-through rate of 37.15%. Furthermore, the results show that the content-based models dominate. First, because of the lack of transactional (who-viewed- what) data, PathSim and node2vec achieve low coverage. Second, due to their justification property, i.e. for users it is clear that the recommendations are based on similar topic/content. Besides the quantitative ensemble evaluation, a qualitative user feedback was collected. The given feedback stresses the importance of the UX and justifications which might have a more significant impact on user impression of RS quality and RS acceptance than the quality of the recommendations itself. Furthermore, the users implicitly requested personalized recommendations since the SCN covers so many specialized topics that it is difficult for a user to relate to many of them.
Supervisor: Dr. Peter Ruppel
Type: Master Thesis
Duration: 6 months
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409