dc.description.abstract | We attempt to make improvements to stock return prediction accuracy through
sentiment analysis of Twitter data. Our hypothesis is that Twitter users mainly
consists of retail investors, implying that the aggregation of sentiment will influence
stocks with lower levels of institutional ownership.
Our analysis involves thee sentiment approaches. The first approach gives
labels to tweets based on magnitude and direction of changes in the stocks price.
The second is a manual labelling approach, where the authors went through tweets
manually and determined whether the tweets had a positive, negative or neutral
sentiment. The last is using a dictionary created from financial tweets. For the
first two approaches, we utilised three text classification methods Naïve Bayes,
Logistic Regression and SVM.
The Sentiment features were used in tandem with common financial features,
momentum, liquidity and volatility, to compare predictive power through three
supervised regression models, Random Forest, Gradient Boosting and a neural
network model - LSTM.
We find that including sentiment in the models decrease accuracy slightly
across all models, and that including the level of stock institutional ownership has
limited effect on improving predictions, in our selected sample. We argue that
larger data size may be beneficial create an accurate market sentiment proxy, and
that sentiment analysis should be more useful when focusing on special cases, like
peak volumes and the number of followers. | en_US |