If you’re even slightly interested in trading or investing you probably heard about WallStreetBets. I’s a huge community of retail traders on Reddit with 9.5M followers.
This community can influence the market quite significantly these days. Here is a chart of Gamestop, the most famous stock in the community. As you can see community triggered 6000% raise for the stock just in 1Y which is huge:
You can like the community or hate, but you have to agree that it’s important to keep your eye on it. Personally, I don’t have enough time to read all the topics. But if you know a bit of programming you can try to automate the process of parsing it and get some insight from it automatically.
In this article, I will show you how easy it is to parse Reddit in Python. Just in few lines of code, you can parse submissions and comments for them.
First of all, let’s import the libraries we need for this example. PRAW stans for Python Reddit API Wrapper, using this library I will parse data from Reddit.
import pandas as pd import praw
Next, we have to authenticate to Reddit API, here is the code you can use for that:
reddit = praw.Reddit( user_agent = "myApp", client_id = 'your_client_id', client_secret = 'your_client_secret' )
As a user_agent parameter, you can use whatever you want. To get client_id and client_server you have to create your application in Reddit. It’s quite straightforward, go to https://www.reddit.com/prefs/apps and click the button “are you a developer? Create an app..”:
You’ll see the following form:
Fill application name and redirect URI to your website for example and click “create app”. Your application will be created in just a moment. Here is how it looks. You can easily find your id (highlighted green) and secret (highlighted red) for it:
First, you have to create an object for a subreddit. To create an object for “wallstreetbets” you can just run the following method with subreddit name as an only parameter:
subred = reddit.subreddit("wallstreetbets")
Now “subred” variable is an object you can use to get some information about a subreddit, for example:
- subred.created – date and time when subreddit was created.
- subred.url – URL of subreddit
- subred.description – Subreddit description
Also, this object has a bunch of methods that will allow you to get submissions from this subreddit. For example:
- subred.top() – top submissions for the subreddit
- subred.raising() – top rising submissions
- subred.new() – new submissions
- subred.search(“GME”) – search subreddit for this keyword.
These methods will output you a generator. Then in a loop, you can go through it and get the data you need for every submission. For example here is a code that will allow you to get the top 100 submissions from wallstreetbets and create a nice pandas data frame from them:
subm_list = [[s.id, s.url, s.title, s.num_comments, s.score, s.author, s.created, s.selftext] for s in subred.top(limit = 100)] subm_df = pd.DataFrame(subm_list, columns = ['id', 'url', 'title', 'comements', 'score', 'author', 'time', 'content'])
Dataframe looks pretty good:
Using PRAW library you can easily parse comments for the submission as well. Let’s do that for 1 particular submission. First, let’s create an object for single submission:
sub = reddit.submission(id = "lnfh3v")
After that using this object we can parse commest from this submission. sub.comments.replace_more in this code will limit comments to only the top level.
sub.comments.replace_more(limit = 0) comm_list = [[c.id, c.author, c.body, c.score, c.created] for c in sub.comments] comm_df = pd.DataFrame(comm_list, columns = ["id", "author", 'body', 'score', 'time'])
This will output us a nice data frame with comments for the submission we selected.
That’s it for this article, as you can see it’s quite a simple library that allows you to parse Reddit pretty easily. For more information about it check official documentation: https://praw.readthedocs.io/en/latest/