Pages

Thursday, 5 March 2015

How to make a more accurate 'proportion of network' pie chart part 2

Friday 6th March 2015



Previous posts:
Why 'proportion of network' pie charts are misleading
How to make a more accurate 'proportion of network' pie chart part 1

0. Introduction
Last post, I finished with the plot below. 
"The blue shaded area on the plots below is the 95% confidence interval for percentage of  network block solved. The red dashed line is the estimated intensity function, and the red shaded area is the 95% confidence interval for the estimated intensity function, created using bootstrapped residuals."




I also mentioned that there was a very large difference between the 95% confidence interval for block rate data  and data from the smoothed block rate estimate. In this short post I'll quantify the reduction in variance that comes from using a smoothing spline.


1. Confidence interval comparison #1
If you take the upper and lower 95% confidence interval bounds for both the smoothed intensity estimate and the block rate (Poisson distributed) data, and divide by the smoothed intensity estimate (which we have tested and found is an adequate model for the unknowable intensity function), then we obtain the plot below.





 2. Confidence interval comparison #2
 The last plot is a clear illustration that the Poisson distributed variance is far larger than that of the smoothed intensity estimate. But exactly how much of an improvement? In the next plot we divide the intensity estimate's 95% confidence interval by the Poisson distributed block rate 95% confidence interval.



In most cases, the estimate variance decreases to ten to fifteen percent of the Poisson distributed variance, and in all cases the estimate confidence intervals are much narrower - that is to say, the data is more reliable. Instead of having at confidence interval of +/- 50%, it would be more like +/- 5%.


3. Summary
So, that's how you calculate more reliable block rate derived statistics. Whether it's a pie chart or an estimate of the network hashrate, I am not comfortable using data derived from the number of daily number of blocks made - the variance is just too large. However, I am comfortable using daily statistics if they were calculated using the method described in this post and the last. 
  • Using a smoothed intensity estimate allows more confident assessments of statistics derived from daily block rates.
  • Raw daily estimates - eg network hashrate or proportion of network attributable to a block maker - should not be used without understanding that the associated variance will mean a 95% confidence interval (think 'error range') of at least +/- %15 (the network hashrate), but more likely +/- %50 (average sized pool).




organofcorti.blogspot.com is a reader supported blog:

1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG



Created using R and various packages, especially data.table and ggplot2.

Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.

Please refer to the most recent blog post for current rates or rule changes.

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
  • Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
  • Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.











No comments:

Post a Comment

Comments are switched off until the current spam storm ends.