
The Bag of Words Model
In this week’s segment of our
Stock Prediction Series, where we learn to use Deep Learning for stock market prediction, we will be creating dictionaries which contain our positive and negative sentiment words, in order to perform Sentiment Analysis on news articles!
Purpose: Define a Positive and Negative Dictionary for Sentiment Analysis
Functions Defined: None
Packages Used: Your Brain!
In order to do a simple sentiment analysis, we need to create a list of Positive Words and Negative Words. A simple
Google Search for such a thing returns some pretty cool results. The Positive Word list can be found
HERE. The Negative Word list can be found
HERE. Name them “PositiveWords.csv and NegativeWords.csv”, respectively. I downloaded each list and uploaded it into Jupyter. In addition, I used a Financial Master Dictionary made by Professor in Finance Bill McDonald. This CSV can be found
HERE. I put these into a folder called “MasterDictionaryFiles” in the Working Directory.
If you haven’t already noticed, the Financial Master Dictionary CSV has a column for “Positive” and “Negative”. If either is greater than 0, we know that word has that corresponding connotation. We are going to use that when creating our dictionaries. To get started, go to your Jupyter Working Directory/MasterDictionaryFiles and do New > Text File, Renaming it to “PositiveWords.txt”. Do the same for “NegativeWords.txt”.
We can then write a function to add all Positive words to our PositiveWords.txt and all Negative words to our NegativeWords.txt. For the Financial Master Dictionary, we have to check if its Positive/Negative value is greater than 0, but for the other CSVs it should be a straightforward add to our own dictionaries. We must make sure to write in each word in the format _WORD_. For example, the word “great” would be written as “_GREAT_”. This is to avoid our program dissecting parts of words when doing sentimental analysis. Consider the following situation:
The news article says: “Tesla Motors Inc. (TSLA) is extremely unprofitable”. Zoom in on the word “Unprofitable”. If we were to use the simple python “if __ in __”, the program would find the words “Unprofitable” AND “profitable”. To avoid this, we put any symbol like underscores (_) around the word to make sure the program isn’t taking parts of words. Thus the program, when checking for weighted words, would only find “Unprofitable”, and not “profitable”.
Using this function, we should now have a Positive and Negative Dictionary. We are now ready for the fun part – Sentiment Analysis! Tune in next week to see some Python magic!
Leave any questions, comments, or concerns in the Comments Section below!
Want a Notification for our Next Post?
Recent Comments