10th April, 2013
0. Introduction
A while back I was reading an interesting post by forum member Korbman (original here, January 2013 update here). I especially became interested in a question that was only a small part of Korbman's paper:
- What are the statistics of bitcoin miners' hashrates? By statistics I mean the number of miners, the mean and standard deviation hashrate per miner, the distribution of hashrates per miner, and anything else that seems interesting. By miners, I mean the actual end user with as little as an NVIDIA GPU, or as much as an FPGA farm.
- What can be inferred from these data? Can any generalisations be made about miners' hashrates?
- How will pre ASIC hashrate distributions differ from post ASIC hashrate distributions?
I planned to obtain, for a particular day, the 24 hour average hashrates for each user account at as many pools as I could, analyse the per pool data, and then do the same for the combined data.
In doing this I am making the assumption that, as long as I had managed to obtain a significant portion of the network hashrate, the combined data would be similar to that of the network as a whole. It turns out this is much harder than Korbman makes it seem - I've been working on this post sporadically since then.
In the following I use the terms "miner", "hashrate contributor" and "user accounts" to mean the same thing - the amount of hashrate owned by any one entity (either person or company) at any given pool.
In doing this I am making the assumption that, as long as I had managed to obtain a significant portion of the network hashrate, the combined data would be similar to that of the network as a whole. It turns out this is much harder than Korbman makes it seem - I've been working on this post sporadically since then.
In the following I use the terms "miner", "hashrate contributor" and "user accounts" to mean the same thing - the amount of hashrate owned by any one entity (either person or company) at any given pool.
1. Data inaccuracy
Some pools have a "Hall of Fame", a listing of all miners and their average hashrates over a period of time. Other pools do not, and I needed to contact the pool ops directly and ask for the data. The first part of the plan - to have all data from the same day - was impossible. Instead they were taken during the week 20th to 27th January 2013, when the average network hashrate was 21644 Ghps.
The requirement that all hashrates would be averaged over 24 hours was also abandoned. Only p2Pool does this, other pools use 15 minutes to three hours over which to average miner hashrates.
Alone, this would have significant impact on the accuracy of this study, but there are other sources of inaccuracy:
- Variable difficulty causes increased variance for the hashrates of even the top hashrate user accounts.
- Standard deviation increased, hashrate per miner probability distribution changed.
- Many miners split their hashes among several pools in order to reduce variance, or to pool hop.
- Hashrates for user accounts at any given pool are reduced, overall number of miners increased, average and median hashrate per user decreased, hashrate per miner probability distribution changed.
- Short hashrate averaging window
- Increased variance / standard deviation
- Probability distribution affected: a shorter averaging window means fewer shares will be submitted. Many miners may submit shares that are a low integer multiple of one, making the probability distribution more like a discrete distribution rather than a continuous one - which is how it should appear at this scale.
2. Pool hashrate distributions.
Keeping the provisos regarding possible and probable sources of error in mind, below is a table showing statistics of the various pools for which I could obtain data.
The subset of user accounts here account for 14000/21644 = 64.7% of the network hashes. The relationship of the mean to the median and the 5th and 95th percentiles to the minima and maxima respectively, make me suspect that the user hashrates for each pool will be distributed according to a Pareto distribution.
Note also that the mean hashrate for Bitclockers, Eligius, 50BTC.com and Slush's pool are much lower than for BTCGuild, Itzod, p2Pool and Polmine.
Next is a violin plot for the pools above. If you haven't come across a violin plot before, think of it as a vertical density plot, width being proportional to number of users at a given hashrate. The advantage of this type of plot is that it also allows easy comparison of limits (compared to a density plot) and provides a more intuitive understanding of the results than traditional boxplots.
The subset of user accounts here account for 14000/21644 = 64.7% of the network hashes. The relationship of the mean to the median and the 5th and 95th percentiles to the minima and maxima respectively, make me suspect that the user hashrates for each pool will be distributed according to a Pareto distribution.
Note also that the mean hashrate for Bitclockers, Eligius, 50BTC.com and Slush's pool are much lower than for BTCGuild, Itzod, p2Pool and Polmine.
Next is a violin plot for the pools above. If you haven't come across a violin plot before, think of it as a vertical density plot, width being proportional to number of users at a given hashrate. The advantage of this type of plot is that it also allows easy comparison of limits (compared to a density plot) and provides a more intuitive understanding of the results than traditional boxplots.
BTCGuild, Itzod, p2Pool and Polmine are similarly distributed - similar hashrate ranges and mode ("typical" hashrate). Eligius and Slush's pool are similarly distributed to each other, with smaller minima and mode. Bitclockers and 50BTC are both strangely bimodal - which is quite unexpected. I'm not sure if this is a feature of these pools or data error.
50BTC.com, Bitclockers and Eligius also have unusual empirical CDFs. Bitclockers has large steps in the CDF - lots of miners with exactly the same hashrates. This is probably due to one (or both) of two reasons:
- Hashrates are being calculated from the number of D1 shares being submitted over a short period of time
- A large number of small GPU or CPU miners that only submit one share in a larger timeframe.
Slush's pool's CDF is also a bit of an outlier. At the time this data was obtained (thanks Slush!) the pool was experiencing significant pool hopping. Although the effects of pool hopping are mitigated somewhat by Slush's score method, pool hopping is still viable (see here for details) but only for a very short period at the start of a round. This in turn means that there will be many more "low hashrate miners" - pool hopping accounts that provide a large amount of hashrate for only a small portion of a round.
Since I have no way to disentangle the pool hoppers from the fulltime miners, and since this post is directed to the hashrate distribution of fulltime miners, this is a problem. Lots of other pools will have miners come and go, but none as consistent and systemic as at Slush's pool - and if I'd had data for DeepBit, I'm sure we would have seen similar results.
I have been able to fit the remaining pools to Pareto II distributions quite nicely. Knowing this I decided to see how well all pools followed the Pareto Principle which states that in applicable cases ~ 80% of an effect come from 20% of the cause. In the case of hashrate distributions, we want to find the percentage p (where 0 ≤ p ≤ 1/2) such that 100*p% of all pool accounts own 100(1 − p) % of the pool hashrate.
3. Estimating the network hashrate distribution of fulltime GPU/FPGA contributors.
Instead, I've combined the pool hashrates for those pools with data that do not have:
- Abnormal appearing data or contain errors/approximations and does not have a large proportion of outliers.
- A large proportion of small hashrates which would imply a large number of CPU or intermittent GPU miners.
From the analysis in part two I decided to remove the following pools:
- Bitclockers
- Eligius
- FiftyBTC
- Slush's pool
This leaves only the following pools:
- BTCGuild
- Itzod
- p2Pool
- Polmine
The combined hashrate of these last four pools was only 4597 Ghps compared to the actual network weekly average hashrate which was 21644 Ghps - only 21.2% of the network hashrate in total. Extrapolating from this subset to the entire network is going to be yet another source of error - but I really have no way to estimate how large an error it could be. Consider the following as a guide rather than an accurate estimate.
So combining the data and assuming that this subset is a representative sample of the network, below are the hashrate distribution statistics of fulltime GPU/FPGA miners for the pre ASIC network:
These statistics are similar to those of the constituent pools, as expected, and still look to be Pareto distributed.
Since I'm not comparing the density of the hashrate distribution to anything else, a normal density plot rather than violin plots will suffice. Again, the density is similar to the constituent densities (see violin plots for Itzod, p2Pool, BTCGuild and Polmine), and still looks Pareto distributed.
I next fitted the hashrates to a Pareto II curve. The best fit was to a Pareto II curve where mu = 0, alpha = ~ 1.22 and sigma = ~ 0.714. The eCDF shows how close a fit it is, except for the lower tail. This is to be expected, since the lower hashrates will show the greatest amount of variation due to short averaging times and variable difficulty shares, and also suggests part time miners.
If we can continue to extrapolate from this dataset to the network as a whole, the Pareto Principle holds for the network as a whole - 21% of the miners own 79% of the network.
5. Conclusions:
- Hashrates are distributed amongst contributors as a Pareto II random variable.
- Hashrate ownership percentages follow the Pareto Principle: ~ 20 percent of pool accounts contribute ~ 80% of the hashrate.
- Minimum hashrates apparently vary by groups - four pools have hashrate averages per contributor approaching 2Ghps, and some have averages less than a quarter of that.
6. Next:
The reason for creating a sample of fulltime miners also had another purpose - to allow an estimation of the post-ASIC network hashrate, when the next plateau in hashrate will occur. I will do just that and also explain how you can estimate this for yourself in the next post 12.2 Predicting post-ASIC network hashrates.
Donations help give me the time to analyse bitcoin mining related issues and post the results. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r
Thank you to bitcointalk.org forum member Korbman for inspiring this post.









No comments:
Post a Comment
Comments are switched off until the current spam storm ends.