Analyzing Trends (in the Stock Market)
Purpose: Define a function to figure out it’s sentiment – is it talking about the subject in a positive or negative manner?
Functions Defined: getSentiment()
Packages Used: Your Brain!
The most important part of any Artificial Intelligence or Deep Learning algorithm is the dataset used to train it. If we want a lot of data points, that means we need a lot of “X” variables and “Y” variables. In this case, our X variables are the sentiment of an article ), and our Y variables are if the stock went up or down (represented by a 0 or 1).
An example of a 2d plotted dataset:
Once we have a dataset, we can train our model with the dataset to find a relationship:
If we train our dataset with 1000 articles that we scrape, how do we find if the article was correlated with an upward or downward trend in the data? Well, if we can find the date the article was written, we can reference the historical stock price
graph for that stock and see how it was effected. The most simple way to do this is calculate the slope of the prices around that day. In this case, we take 11 points: one day before the article’s publishing date, up to 10 days past the article’s publishing date. In other words, T-1 to T+10. We can later change this range of data collection if needed.
So, how do we (1) get those data points and (2) get the slope from them? That’s what we’ll be focusing on this part of the series.
There are two things we must do: first, get some data points, and two, get the slope using them. To get data points, we first need to download a CSV file containing the stock’s historical data. We use AlphaVantage
, which provides free historical data through a simple HTTP request. You can create a AlphaVantage API Key
to use their service. We can then request for a CSV using pd.read_csv(url). Using AlphaVantage’s API
, we want our URL to be the following:
We can pass our function getSlope() the articledate variable which is of type ‘datetime’. We then cover that to a Year-Month-Day format, which is the same as what our Historical Stock Prices CSV has. We then define a simple function SearchCSVForDate() which is the following:
def SearchCSVForDate (articledate):
return StockHistory[StockHistory['timestamp'] == articledate].index.tolist()
This will take in a date and return the row number of that date in the CSV. Now that we have a simple method to collect data points, we can now calculate slope.
Let’s break down Least Square Method
. First, what should X and Y be? Imagine X as the X axis representing time in days (0,1,2,…). The Y axis is the Adjusted Close price of the stock (212,213,217,…). The top part is the sum of a bunch of (x-X)(y-Y)s. The x and y are representative of a specific point. The bottom part is the sum of a bunch of (x-X)^2s. The x is representative of the x value for a specific point. If this doesn’t yet make sense, it will soon.
The big X with a bar is a mathematical symbol meaning the average of all X-es. Same with the big Y. So we can first take the average of all the X values and Y values with a for-in statement. We can then focus on the actual formula.
We calculate the top & bottom parts separately. For the top: The sum function (represented by the strange E) is simply a for loop with a += in disguise. Each time it runs through the for-loop body code, it’s referencing the next point. So it starts with the T-1 point, then T, then T+1, etc.
The bottom part is also a for loop with a +=. This time it’s taking only the x value of a specific point, subtracting Xavg, and squaring that. Adding that up for all the points, we should now have a bottom value.
We finish the formula by dividing the finally derived value from the Top Part by the value from the Bottom Part, to get a number that is our slope!
That was quite a lot of mathematics, but you should now have a working function to automatically calculate the slope of a set of points – Congratulations! We can now have a general view of how any given article has effected the stock price of a stock!
Any questions can be left in the comments! 🙂
Want to Follow This Series?