Neighbourhood Pool Watch: How to make a more accurate 'proportion of network' pie chart part 1

Thursday 5th March 2015

Previous post: Why 'proportion of network' pie charts are misleading

0. Introduction

In the last post on this subject, a reader commented:

"i do not think these charts show block solve rate over time (which changes constantly) and i think thats the one missing variable here"

That is true, but at the time it was something I didn't really want to address because it's a bit complicated. This post addresses this idea and shows how the variance in the daily estimate of network proportion can be reduced to less than a tenth of what it is if determined by measuring number of blocks solved in a time period.

1. Daily average block rate time series

Below are recent block solve rates for various block makers, including the network as a whole. The cross bar is the observed block rate, the vertical bars the extent of the 95% confidence interval for the population mean, the (unobserved) block rate intensity function.

If you squint your eyes a bit, you can see that you can draw an imaginary curve that always intersects the vertical lines, even if it doesn't intersect the actual observed rate points. This imaginary line might be a good way to estimate the intensity function, but how do you prove that it's a good estimate?

This has been done before in this paper by Frenkel, Gertsbakh and Khvatskin, (based on methods from a book by Çinlar, E. (1975) which I can't find online). So the idea is to find a likely non-homogeneous Poisson process (NHPP) intensity function, and then use the method in section 5 to assess the fit of the estimated intensity function.

A couple of years ago I spent long months working on this problem, and in the process found:

a reliable method for estimating the intensity function, and
there aren't enough short duration inter-block time intervals.
block timestamps errors are not significant enough to cause problems for the method if a daily observed block rate, or a block rate of lesser frequency, is used.

At some point when I have both time and inclination, (and as long as Dave Hudson doesn't do it first) I'll publish both details, but for this post, the important thing is that certain types of smoothing function using specific parameters are a good fit for the intensity function. Smoothers that I've found useful in estimators are a kernel smoother, a simple roughness-penalising smoothing spline, and a generalized additive model with a smoothing term. The former uses a bandwidth estimating algorithm, latter two use knot and alpha parameters that I've found best fit the observed block rate data.

I prefer the kernel smoother (it has fewer free parameters), so lets add one to the above plot:

You might notice that there are steps in the intensity function estimate for some of these plots. This is because when a retarget occurs and difficulty changes, the block rate changes immediately. Modelling this effect using a kernel smoother is done by creating s smoothed estimate of the intensity function of (block rate x mean daily difficulty), and then dividing this smoothed estimate by the mean daily difficulty.

Plotting since mid-last year makes this a little more obvious:

I've removed the vertical 95% confidence intervals to make the plots a bit more legible, so each '+' just represents a block solve rate for a particular day. The blue line is the intensity function estimate, the red lines are the 95% confidence interval for the sample (observed) block solve rates.

The 95% confidence intervals are bumpy for two reasons: the step-function change in block rate after a retarget, and the fact that the observed block rates for a Poisson process must be integer values (as opposed to the intensity function, which does not).

2. How accurate is the estimated intensity function?

This is the entire point of this faffing around with smoothers and finding 'unknowable' intensity functions - it is more accurate than using a daily average block rate?

The next plot converts the data from block rates to percentage of the network - all data has been divided by the estimated network intensity function, and I'm stating again (without providing the proof) that the smoothed intensity function estimate is a good model of the actual intensity function.

The blue shaded area on the plots below is the 95% confidence interval for percentage of network block solved. The red dashed line is the estimated intensity function, and the red shaded area is the 95% confidence interval for the estimated intensity function, created using bootstrapped residuals.

As you can see, there is a huge difference between the two - the estimated intensity function is a more consistent, reliable and accurate measure of percentage of the network blocks solved by an arbitrary block maker. The variance in the observed block rates is due to the stochastic nature of the block finding process, but the variance in the estimated intensity function is much more affected by significant and continued changes in hashrate than by the block rate variance.

3. Summary

That's all I have time for right now. I have more details to post and I'll do that in the next few days some time. For the moment:

If you accept the premise that certain smoothers can model the block rate intensity function, it has much less variance than using the raw observed block rates.
Variance in the observed block rates is due to the stochastic nature of the block finding process, but the variance in the estimated intensity function is much more affected by significant and continued changes in hashrate.
In the last graph, the blue shaded area indicates the error currently in daily block rate pie charts, and the red shaded area the error in pie charts that use the intensity function estimation method.

Organofcorti lives!

organofcorti.blogspot.com is a reader supported blog:

1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG

Created using R and various packages, especially data.table and ggplot2.

Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.

Please refer to the most recent blog post for current rates or rule changes.

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.

Exceptions:

Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!

I write in British English.

Neighbourhood Pool Watch

Pages

Thursday, 5 March 2015

How to make a more accurate 'proportion of network' pie chart part 1

No comments:

Post a Comment