Stoel! Return Prediction with Sentiment Analysis of Twitter Data

2022

We attempt to make improvements to stock return prediction accuracy through

sentiment analysis of Twitter data. Our hypothesis is that Twitter users mainly

consists of retail investors, implying that the aggregation of sentiment will influence

stocks with lower levels of institutional ownership.

Our analysis involves thee sentiment approaches. The first approach gives

labels to tweets based on magnitude and direction of changes in the stocks price.

The second is a manual labelling approach, where the authors went through tweets

manually and determined whether the tweets had a positive, negative or neutral

sentiment. The last is using a dictionary created from financial tweets. For the

first two approaches, we utilised three text classification methods Naïve Bayes,

Logistic Regression and SVM.

The Sentiment features were used in tandem with common financial features,

momentum, liquidity and volatility, to compare predictive power through three

supervised regression models, Random Forest, Gradient Boosting and a neural

network model - LSTM.

We find that including sentiment in the models decrease accuracy slightly

across all models, and that including the level of stock institutional ownership has

limited effect on improving predictions, in our selected sample. We argue that

larger data size may be beneficial create an accurate market sentiment proxy, and

that sentiment analysis should be more useful when focusing on special cases, like

peak volumes and the number of followers.

Masteroppgave(MSc) in Master of Science in Business Analytics - Handelshøyskolen BI, 2022

Handelshøyskolen BI