Saturday 21st February 2015
0. Introduction
There's often a great deal of discussion over who has the largest share of the network based on the number of blocks solved in a certain arbitrary period of time. However there is a significant amount of variance in this sort of data that is often ignored.
1. Sample mean vs population mean
Determining the statistics of a population after being given sample data is a fairly common use of statistical analysis, yet it is almost never applied to block solve rates. Without understanding the difference between the sampled block rate and the actual block rate, conclusions about block solve rates can be incorrect.
For bitcoin mining block solve rates, the sample mean is the number of blocks solved by a block maker, and the population mean can be thought of as the sample mean if the time period was repeated an infinite number of times at the same hashrate.
2. Confidence intervals for the population mean
The population mean is what we want to know, but data only provides the sample mean. Luckily, for Poisson distributed random variables (such as block solve rates approximately are) we can estimate the range in which the population mean might occur. Details for the method of estimation can be found here.
The confidence interval can be thought of as an indication of the confidence one has in the the estimate: a 75% confidence interval will include the population mean (unknown but actual block solve rate) 75% of the time, but the 99% confidence interval will contain the population mean 99% of the time.
In the plot below, the x axis represents the number of blocks solved in any arbitrary time period and the y axis the upper and lower bounds for the listed confidence intervals.
It's a little hard to read the confidence intervals when the solve rate gets large, so the plot below provides the same data, except the population mean confidence intervals are presented as a percentage of the sample mean.
3. Summary
- The variance in block solve rates means that confidence intervals must be used to provide information about the population mean (or actual but unknown hashrate / block solve rate).
- Pie charts of block solve rates cannot provide this information, and give as much weight to a block maker that solves one block as to a block maker that solves one hundred blocks, or the same block maker in one hundred times the time.
- So don't put too much, um, confidence in 'proportion of network' pie charts.
There is a follow up post here, which explains how to improve the accuracy of pie chart proportion of network estimates.
Organofcorti lives!
organofcorti.blogspot.com is a reader supported blog:
1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG
Created using R and various packages, especially data.table and ggplot2.
Find
a typo or spelling error? Email me with the details at
organofcorti@organofcorti.org and if you're the first to email me I'll
pay you 0.01 btc per ten errors.
Please refer to the most recent blog post for current rates or rule changes.
I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
- Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
- Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.



Your proofreading is fine i dont know about the charts thanks for the info here. The pie charts are only useful for determining who has what amount of the network and where. but i do not think these charts show block solve rate over time (which changes constantly) and i think thats the one missing variable here and might change the whole look of that chart. im not a math guy except for straightfoward problems and i dont have the data that made these either, but all in all tells you what the pie charts don't. good work (if i made myself sound like an idiot please explain where so i can correct myself as im still learning about bitcoin and have only been in this buisness 2 months)
ReplyDelete> but i do not think these charts show block solve rate over time (which changes constantly) and i think thats the one missing variable here and might change the whole look of that chart
ReplyDeleteThat's true and I address this in the weekly stats, eg:
http://2.bp.blogspot.com/-xYuO25QpLvc/VOHME_iwUuI/AAAAAAAABXw/-UaKExVFdWY/s1600/2b_pcnetworkPlot2015-02-16.png
so you seem to be experienced with this stuff where i am not. but those circles on the graph are block solve points? or something else? also ive been looking at other info that says bitcoin wont run out of bitcoins to mine until 2140?
ReplyDelete1. Each point represents the daily average hashrate.
Delete2. I think you mean that the block reward will only consist of transaction fees some time during the 22nd century?
same person by the way just logged into my google account
ReplyDelete