Pages

Monday 25 May 2015

May 24th 2015 Network Statistics



Changelog:

Errorlog:
  • Fixed smoothing for inequality estimates.
  • Fixed USD price estimates.

Notifications:
  • Nil.

0. New centralisation measures included
Brett Winton (Director of Research at ark-invest.com) contacted me this week to suggest I assess the HHI, or Herfindahl index. After reading a bit about it and it's inverse Gamma diversity (used by ecologists), I think I prefer this measure and its inverse to the others because it has a much more intuitive meaning. 

To quote Brett: 
Note that HHI theoretically captures the equivalent share that would be enjoyed by equal-sized firms in the marketplace, so 1/HHI equals the equivalent number of competitive firms. (ie at .11 the market share structure is the equivalent of having 9 equal-sized miners.) 
This seems a more meaningful way to understand centralisation and inequality in mining, where a small number of species (block makers) compete over a limited resource (blocks).

1. Network hashrate has sudden drop
It doesn't look like variance so some judicious monitoring is required. Of course, I have no idea what could cause the hashrate to drop by ~5% in a few days, so (weirdly enough) I'm hoping the change is network wide "bad luck".

2. Still only 400 kB average block size
Is 400 kB some kind of network wide hard limit? We're not really getting past that size as an average. I have some ideas why this might be so, but I still have some work to do.

3. Typical pooled miner size approaches 35 Thps.
The typical pooled miner size percentage of the network has been slowly reducing and is now approaching 0.0001% of the network, which means the typical miner now owns 35 Thps. You can't see this, but I'm looking sadly at an old GPU and some gridseeds.




The network hashrate
The plots below show the network hashrate since block height 1, for the last year and for the last six months. The mean estimate is calculated using the daily average hashrate.



The second and third charts also include confidence intervals for the hashrate, the mean hashrate estimate, and a 28 day forecast estimate.
  • The dashed line is the mean hashrate estimate.
  • The grey shaded area is the 95% confidence interval for the mean hashrate estimate.
  • The dotted line is the 95% confidence interval for daily hashrate averages, given the mean hashrate estimate, so 95% of the large grey dots (average daily hashrate) should be within the dotted line.
  • The blue shaded areas are the confidence intervals for the forecast.
  • Forecast confidence intervals are bootstrapped.
You notice that the mean forecast is not given - just the confidence intervals. The reason for this is that in the past people have focussed on the mean forecast, but I think the range of values the network hashrate could take is much more important.








Miner profitability and forecast
  • The first plot below shows the weekly miner income and cumulative miner income for the past 52 weeks. 
  • The second plot shows the weekly miner income for the past 26 weeks with an eight week forecast.
  • The third plots shows the cumulative miner income eight week forecast.
  • Forecast confidence intervals are bootstrapped.

Again, the mean forecast is not given for the same reasons I gave previously. Eight weeks forecast is possible as these are weekly summary statistics; for daily summary statistics (such as above) only four weeks forecast is possible with any accuracy.





Transaction fees
Transaction fees are often overlooked by miners but will become very important for them - as the block reward decreases, transaction fees must necessarily go some way toward ameliorating the loss in block reward.

However, as can be seen in the top facet of the second plot below the transaction fees per block are not increasing - or even maintaining - a percentage of the block reward.

The lower facet plots the percentage of the maximum possible block size used.







Estimated mean and median miner hashrate
This estimate is actually the average and median percentage of the network contributed by each miner. 

Standard error has been calculated using bootstrapping resampled data, and is shown by the shaded area.





Estimated number of miners
The known number of miners is calculated using the miner hashrate distribution that some pools provide. It is shown by the dashed line, colour indicating the percentage of the network that those miners make.

The estimated number of miners uses a model to estimate the number of miners at pools that do not provide such data. I will be attempting to optimise the model regularly, so this week's plot may not be the same as last week's.

Standard error for the estimate has been calculated using bootstrapping resampled model data, and is shown by the shaded area.




Inequality measures
General inequality between block makers (facet 1)
Previously, I have described inequality measures. The two general inequality measures, the Gini coefficient and the Theil index, measure inequality between blocks block makers. They are minimised when all block makers solve a similar number of blocks over a period of time  and maximised if only one of many block makers solves all the blocks for a given period of time (since we know that bitcoin mining is a stochastic process in which variance can be significant, a reasonable time period should be chosen). 

The Herfindahl index theoretically captures the equivalent share that would be enjoyed by equal-sized firms in the marketplace.

Inequality between groups: smaller block makers and larger block makers (facet 2)
I'm using two ways to illustrate inequality between the half of the network with the highest concentration of hashrate, and the half of the network with the lowest concentration of hashrate.
Mining centralisation index = 1 - mean(Sblocks) / mean(Lblocks)
Sblocks = number of blocks solved by small block makers
Lblocks = number of blocks large by large block makers

This index is measuring the inequality between two groups: the half of the network with the highest concentration of hashrate, and the half of the network with the lowest concentration of hashrate. It can be interpreted as:

Large to small density ratio = 1 / (1 - centralisation index)

For example an index of 80% means that the average larger pool has 1 / (1 - 0.8) = 5 times greater proportion of the network than the average smaller pool.

Mining centralisation index 2 = Sh * (log(Sh) - log(Sn)) + Lh * (log(Lh) - log(Ln))

Sh = Sblocks/(Sblocks + Lblocks)
Sn = No. small pools/(No. small pools + No. large pools)
Lh = Lblocks/(Sblocks + Lblocks)
Ln = No. large pools/(No. small pools + No. large pools)

This also has a range from maximum equality at 0 to maximum inequality at 1, but does not have an intuitive meaning (except that lower is better). 

Below the two general and two grouped inequality measures have been plotted. The Gini coefficient and the Theil index are quite similar, and the Mining centralisation indices 1 and 2 also are quite similar. 



General inequality between block makers: Gamma diversity
The Gamma diversity with q = 2is equal to the inverse of the Herfindahl Index, and in this case  equals the equivalent number of competitive firms. 



Inequality between groups: Public mining pools and non pool block makers.
Another concern many people have is that public mining pools have a decreasing share of the network. Public mining pools are reliant on miners in order to make blocks and distribute rewards, and a pool with fewer miners has greater income variance.

This means that if a pool was doing something to the block chain that miners don't like (anything from incorporating graffiti into the block chain - some of my favourite graffiti here -  to Selfish Mining), miner could choose to leave the pool. Non pool block makers might have fewer restrictions on their actions, which could be a problem for the network.

There are a number of different ways to analyse this, but I went with something quite simple:

Public mining pools % network  = P  /  N
P = no. of blocks attributable to public mining pools in some period of time
N = no. of blocks solved by network in same period of time.

This is quite simple to understand. If you worry about mining pools disappearing, then the fact the line is slowly heading toward 50% won't help you sleep at night.




organofcorti.blogspot.com is a reader supported blog:

1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG



Created using R and various packages, especially dplyrdata.tableggplot2 and forecast.

Recommended reading:



Thank you to blockchain.info and blocktrail.com for use of their transaction and address data, and coincadence.com for their p2pool miner data.

Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.

Please refer to the most recent blog post for current rates or rule changes.

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
  • Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
  • Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.


2 comments:

  1. Under #2 you mention 400 MB blocks. I'm assuming you meant KB? ;-)

    ReplyDelete

Comments are switched off until the current spam storm ends.