Python

Downloading the entire history of OHLC bars from Kraken with Python

Downloading the entire history of OHLC bars from Kraken with Python

Kraken is a very popular exchange. Currently, it holds 3rd place in terms of 24h volume traded. Quite a lot of people try to create systematic trading models based on Kraken’s data. Some time ago, I already published an article where I showed how to get data from Kraken. The problem is that for OHLC, you get only the last 720 bars with the standard endpoint. Fortunately, there is a hack that will allow you to download OHLC for any timeframe for the entire history. In this article, I’ll show you how you can do that.

Kraken, unfortunately, has this limitation on the number of OHLCs you can get, but luckily there is no limit on getting trades/tick data. So I’ll show you how you can get all the tick data for the entire history and calculate OHLC for it yourself. It might sound complicated, but as you’ll see, it’s pretty simple.

First, let’s import the libraries we need in our script.

import pandas as pd
import datetime as dt
import requests
import time

Next, let’s define the endpoint we’ll use to download trades, symbols, and starting date for our process. We also defined a trades_list variable that will store the results of our downloading process.

trades_list = []

url_get_trades = "https://api.kraken.com/0/public/Trades"

last_date = dt.datetime(2022, 1, 1).timestamp()
symbol = "XBTUSD"

After that, we’re ready to start downloading it. It’s a loop that will load data from the last date we observed (the initial date for the first iteration) and will download 1000 following trades. It will adjust the format a bit and will spend results to our list. We have there a pause built-in to avoid hitting the limits of Kraken for a free account.

while True:
    try:
        req_params = {
            "pair": symbol,
            "since": last_date
        }
        
        res = requests.get(url_get_trades, params=req_params)
        
        symbol_internal = list(res.json()["result"].keys())[0]
        
        ticks = pd.DataFrame(res.json()["result"][symbol_internal])
        ticks.index = pd.to_datetime(ticks[2], unit="s")
        
        if len(ticks) == 1:
            break
        
        last_date = ticks.index[-1].timestamp()

        trades_list.append(ticks)

        if len(trades_list) % 25 == 0:
            time.sleep(5)

    except KeyError:
        continue

Depending on the start date, it can run for quite some time. I tested it on XBTUSD, and for a bit more than a year of data, it took about 1 hour. Next, let’s join the results in one data frame and explore them.

trades_df = pd.concat(trades_list)
trades_df.drop_duplicates(inplace=True)

trades_df.rename(columns={0: "price", 1: "volume", 2: "time", 3: "side", 4: "orderType", 5: "misc", 6: "number"}, inplace=True)
trades_df.index.name = "index"

trades_df.shape
(10552744, 7)

As you can see, around one year of data is more than 10M rows. But Python is fine to deal with these quantities of data. Now let’s check how it looks inside:

We have a bunch of columns, but for us, only datetime (index), price, and volume are essential. Let’s subset them, rename and fix the type for them:

trades_df = trades_df.rename(columns = {"price": "close"})[['close', 'volume']]
trades_df = trades_df.astype({'close': float, 'volume': float})

Next, let’s create dummy variables for other columns we’ll need in our OHLC bars data frame:

trades_df = trades_df.assign(open = trades_df.close, high = trades_df.close, low = trades_df.close)

The last thing that remains to get our bars data frame is to run resample() and agg() methods for our trades data frame.

bars_1h = trades_df.resample("1H").agg({"open": "first", "high": "max", "low": "min", "close": "last", "volume": "sum"})

If we check the resulting data frame, we’ll see a nice OHLC data frame for the 1-hour timeframe.

You can test the result with TradingView, for example, and you’ll get the same results.

Conclusion

As you can see, it was a pretty simple script, and it was relatively easy to get OHLC for the entire period from Kraken. Also, working with tick data, it’s not such a big issue for Python. It handled a 10M data frame easily, and this should work basically on any PC just fine. The nice thing about tick data is that you can do much more exciting stuff with them. For example, you can compute real Renko bars based on Kraken’s data and backtest strategies on them. I’ll create another article about it at some point as well.


Follow me on TradingView and YouTube.

This image has an empty alt attribute; its file name is wide.png

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Pine Script Programming Courses
Pine Script Programming Courses
Learn to build your own TradingView Indicators and Strategies
Sidebar Signup Form
If you want to be the first in this business, subscribe to the latest news