Neighbourhood Pool Watch: 10.5 Forecasting the network hashrate revisited

25th September, 2013

Recommended reading

0. Introduction

As you're probably aware, from December last year until August this year I was posting a weekly network hashrate and difficulty forecast, which was based on the lagged relationship between the BTC exchange rate and the network hashrate. The forecasts produced were reasonably accurate until April - May, when the ASIC onslaught really began in earnest.

I think there were five reasons the forecast method failed:

The BTC-hashrate relationship began to change since ever increasing numbers of ASIC owners were able to obtain a much higher rate of return compared to GPU owners for the same price and electricity cost,
For many, ASICs became "must have devices" and obtaining a return on investment became a secondary concern.
With the massive and unstable increase in hashrate, the ROI became quite hard to estimate, and many may have bought the devices with unrealistic expectations.
Since most ASIC prices - and hence the estimated return on investment - were BTC denominated , the exchange rate began to have less effect on the network hashrate.
The forecasts were aimed at testing the initial relationship I'd derived, so I didn't re-examine the relationship. Further, I wasn't so much concerned with the accuracy of the forecasts, but what the accuracy of the forecasts said about the changing BTC-network hashrate relationship.

When I began to re-examine forecasting the network hashrate, I decided on the following guidelines:

The forecasts should aim at accuracy and not attempt to describe any underlying relationship.
Each week the forecasting method should be re-examined for accuracy, and changed if the effectiveness of the forecast method is worse than another method.
There would be no regression to external variables - the forecasts should only refer to previous values of the network hashrate.

So since August I've been developing forecast strategies for forecasting the weekly average network hashrate, the daily average network hashrate, the network difficulty, and as an added bonus the BTC exchange rate. This post concerns forecasting the weekly average network hashrate; the others will be posted when I have time. I'll start posting the weekly forecast again starting next week.

1. A brief overview of the last four weeks' work.
If the actual forecasts are all that interest you, this section can be safely skipped - scroll down a bit. For those that are interested, I'm keeping this short. I don't intend it as a "How to" guide - Rob Hyndman's free online textbook does a much better job than I could, and can be followed quite easily if you have college level math or statistics, or with a bit of hard work if you have high school level math (and can still remember it).

So this section assumes that you know what I'm talking about. If not, spend a few days reading the first seven chapters of the textbook, and you'll understand. A second point is that I've only included weekly average hashrates post August 2010 in the dataset; immediately previous to this there was a massive hashrate spike just after MTGOX came online.

I had determined to start by using an ARIMA model to forecast the average weekly network hashrate, so the first analysis I made was of the "stationarity" of the network hashrate. Last year I thought it was stationary after a log transformation and a first difference; now I'm not so sure. The chart below shows the network hashrate, the log(network hashrate), the first difference of log(network hashrate) and the second difference of log(network hashrate):

In order to forecast, you really want to determine the transformations and differencing required to make a series stationary (not trending up or down, or with seasonal changes). Luckily there are no seasonal changes, and it's quite clear that the second difference of the log(network hashrate) is stationary around zero.

Next, I derived the autocorrelation function (ACF) and partial autocorrelation function (PACF), and plotted the resulting correlograms:

Hyndman's textbook describes using the ACF and PACF in determining ARIMA models:

The data may follow an ARIMA( $p,d,0$ ) model if the ACF and PACF plots of the differenced data show the following patterns:

the ACF is exponentially decaying or sinusoidal;

there is a significant spike at lag $p$ in PACF, but none beyond lag $p$ .

The data may follow an ARIMA( $0,d,q$ ) model if the ACF and PACF plots of the differenced data show the following patterns:

the PACF is exponentially decaying or sinusoidal;

there is a significant spike at lag $q$ in ACF, but none beyond lag $q$ .

So it looked like an ARIMA(0, 2, 1) model might be a good starting point. However while I could ignore the spike at lag 22, I was a bit concerned about the significant spike at lag 10. What did it mean? Should I be using an ARIMA(10, 2, 0) model instead? I thought looking at changes in the ACF and PACF over time might be informative:

The grey lines show the 95% confidence interval. In the chronological PACF you can see that lag 10 spike is reducing in significance over time, and the lags one and two are increasing. Since the correlation with lag ten is on a decline I decided to ignore it and start with an ARIMA(0, 2, 1) model.

From here it was just a lot of time spent assessing various related ARIMA models for forecast accuracy over historical data, comparing various ARIMA models (using the RMSE and the Diebold-Mariano test), and eventually comparing the best of these to the best exponential smoothing model, a naive forecast assuming that this week's average network hashrate would be the same as last week's, and a slightly modified naive forecast which assumed the percentage increase in this week's network hashrate would be the same as the percentage increase in last week's network hashrate.

The result for the current week was the following model (using Box Cox lambda = 0 is equivalent to using a log transformation)

The results of the chosen models (a slightly different one for every historical week) were surprisingly accurate - almost as accurate as the previous lagged BTC exchange rate regressed model while the relationship was stable, and much more accurate than the previous model in recent times. The plots below are to give readers a general overview of the models' performances (but were not used in assessing the models).

In recent times, even the eight week forecast had percentage errors between -50% and +90%, and over the past 12 months data the models had 80% confidence intervals of -50% to + 35%.

It's clear that the forecasts tend to under-predict (about 65% of forecasts have negative errors), however I've used bootstrapped confidence intervals derived from the data, so forecast confidence intervals will take this into account.

2. The forecasts
If you skipped the previous section, then you might now understand what the forecasts represent. Think of them as an indication of possible hashrates to come. The forecasts are the most probable, but the actual hashrates will be within the 50% confidence interval half the time, and most of the time they will be within the 95% confidence interval. Please do take the confidence intervals into account when reading the forecasts.

The confidence intervals are based on historical data, so they should be reasonably accurate. The next section contains the accuracy of the last eight weeks of one to eight week forecasts.

3. The errors
As previously mentioned, the forecasts tend to underestimate the network hashrate as you can see in the table below. It must be kept in mind that the network hashrate can only be directly estimated, and that there is variance in that estimation which means that the errors may actually be slightly different to these. At the current high rate of block-solving, the hashrate estimate error will be approximately +/- 5.5% (with 95% confidence), so any forecast within +/- 5.5% of the estimated average weekly network hashrate is as accurate as possible.

4. What next?
I'll be posting a new weekly average network forecast every Monday. Eventually, I also plan to post a network difficulty forecast after each retarget, and a 14-day daily average network forecast at somepoint each week - maybe with the weekly average forecast if I can find a way to put them both in the same post without overloading readers. I will be also producing a post on forecasting the BTC exchange rate; whether I make that a weekly post will depend on how useful readers find it.

Please don't forget to comment (or "plus one" if you can't be bothered commenting) if this post was useful to you and you'd like to see more - feedback allows me to determine how I can most usefully spend my time.

organofcorti.blogspot.com is a reader supported blog:

12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Thank you to Rob Hyndman for producing an excellent free online textbook on forecasting.

<tenpoint five>

Neighbourhood Pool Watch

Pages

Wednesday, 25 September 2013

10.5 Forecasting the network hashrate revisited

No comments:

Post a Comment