direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Bachelor Thesis: Cross-domain sentiment analytics using a Deep learning approach


Cross-domain sentiment analytics using a Deep learning approach


As a result of increased internet use, huge volume of text data are generated everyday in the form of product reviews, blog entries, etc. Natural language consists of sentiments and opin- ions of the author, and machine learning is used to automatically extract those sentiments from text data. Sentiment analysis on product reviews is helpful for both customers and companies because with it customers can decide whether or not to purchase an item, on the other hand companies can also understand customer’s behaviour by analysing their purchasing pattern and feedback. This helps them implement marketing strategies that corresponds customer sat- isfaction. Thus, many researchers have successfully addressed the sentiment analysis of the reviews using lexiconbased and machine learning approaches. However, there is limited re- search done in the crossdomain sentiment analysis (CDSA) field. The main problem with CDSA is the vocabulary mismatch, i.e., same word having different meanings across different domains. Moreover, CDSA datasets are most of the time not labelled. It is very expensive, time consuming and hence, not feasible for human experts to manually label those datasets. In such cases, it is helpful to use an existing classifier that is trained on labelled data but can also be transferred to unlabelled CDSA datasets. However, there is no certain way of telling how well the classifier generalizes the target domain.

In this work we first introduce the topic, mention the motivation for the topic and explain the challenges. Then, we describe all the background information needed to effortlessly under- stand
the work. Afterwards, we analyse the related work done in the field of CDSA. Concepts and designs used for the implementation are then mentioned. Finally, we compare the result obtained
from our implementation not only with the result from other domain but also with the similar work done by other researchers.

For this work we have built a framework for cross-domain sentiment analysis, i.e., a frame- work that trains the classifier on one domain and tests it on the other. Along with deep learning algorithm (long short term memory (LSTM)), we have also used the supervised machine learning classifier (logistic regression, support vector machine (SVM)) and ensembled them to create a classifier which predicts the polarity of product reviews as either positive or negative. We first preprocess the data and vectorize it to make it machine readable. We then use the data as input to the classifier, predict the polarity of testing data and compare them with the actual labels from testing data. The results are then evaluated and visualized. With our framework we are able to achieve 85% crossdomain accuracy. Even though, the state of art paper from Bliter et al. [1] achieved up to 87% accuracy for CDSA datset, we are able to achieve better results for some other cross-domain dataset combinations.

Keywords: Sentiment analysis, Opinion mining, Sentiment mining, Opinion extraction, Review mining, Emotion analysis mining, Cross-domain, Domain agnostic

Supervisor: Katerina Katsarou [1]

Type:  Bachelor Thesis

Duration: 4 months


TU Berlin - Service-centric Networking - TEL 19
Ernst-Reuter-Platz 7
10587 Berlin, Germany
Phone: +49 30 8353 58811
Fax: +49 30 8353 58409
e-mail query [3]

------ Links: ------

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Copyright TU Berlin 2008