Pages

Friday, 12 December 2014

Measures of network hashrate centralisation


0. Introduction
I think we're all aware there is significant concern that if block makers (pools and large solo mining farms) are able to create significant proportions of the network's blocks in an arbitrary time period, then they could cause problems for the network.

For example:
  • 50% of the network or more:  a 51% attack becomes a real possibility. This can be performed by one entity, or a colluding group.


1. Measure the largest hashrate
Measuring the proportion of blocks created by the largest identifiable block creator seems to be a good place to start - we want some indicator that a block creator is approaching 50%. The plot below shows the proportion of network blocks created by the largest block creator at the time. The colours indicate which block creator was the largest at that point.

It should be noted that in this and all following measures, the "Unknown" proportion of the network is considered as a single entity (although this almost certainly is not the case), rather than simply moving the data.

 The drawback of this method is that it only captures trends in a small section of the network - the largest block creator at a specific time. What about collusion? If the top two block creators colluded, they could perform a 51% more easily.


2. Measure the top two largest hashrates
Below is the proportion of the network controlled by the top two block creators at a given point in time (black). The red shaded and dotted line is the largest block creator, the blue shaded and dotted line is the second largest block creator.





Although it seems obvious that one or two entities creating 50% of network blocks for even a short period of time is a significant problem. However, the proportion of the network controlled by the largest or the top two largest block creators tell us nothing about the possibility of selfish mining. At this point it becomes necessary to leave such simple measures of centralisation for more general measures of inequality.

3. Gini coefficient
The Gini coefficient is well known and well described in many other places, so I'll keep this short. It's a comparison between a known distribution of ownership and a perfectly equitable distribution of ownership. The larger the coefficient, the more unequal or centralised is the control of the network. As an example, the world Gini coefficient for income in 2005 was 0.68 and most first world countries have a a distribution of incomes with a Gini coefficient less that 0.5. A list of national income Gini coefficients can be found here.


4. Theil index
 The Theil index is another related measure of entropy and I prefer it because it's much simpler to calculate, however I didn't find much comparison data. The version used here has been rescaled so that the index lies between 0 and 1.



Compared to national income Gini coefficients, the bitcoin network hashrate distribution seems very unequal. However the detrimental effects of income inequalities are quite different from the detrimental effects of unequal network control. Is a Gini coefficient of 0.65 or a Theil index of 0.35 bad, good or indifferent? To be honest, it doesn't really matter - all we need to pay attention to are changes in the measure, and  in these last two cases downward trends are good and upward trends bad.


5. The hashrate centralisation index
Although a simple and intuitive interpretation is unnecessary,  it can be useful for when the only comparison data available is historic. Since 50% control of the network is a significant problem for bitcoin I thought a useful bitcoin index should include the "50% of the network" figure as a point of comparison, and I came up with the index in the plot below.

It is calculated as follows for any arbitrary time period:
  • Sort the block creators from largest to smallest.
  • Calculate the cumulative sum of blocks created by each entity.
  • The group with the cumulative sum of blocks less than 50% of the total network blocks is denoted group A and the rest as group B. 
  • Paraphrased, group A is the group of the largest block creators that sums to at most 50% of network blocks for the time period, and group B the group of the smallest block creators that sums to 50% or just over.
  • The index is the one minus the ratio of the average blocks per entity of group B to the average blocks per entity of group A:
Centralisation index: 
1 - mean(group A blocks per entity) / mean(group B blocks per entity)

How should the index be intuitively interpreted? At 0%, the index indicates that there in no difference between the larger and smaller groups - that is the hashrate is distributed evenly. At 100%, there would only be only block solving entity - complete control and centralisation of the networks' block creating capability. Recently the index has scored ~ 80%, which means that the group of larger block makers have five times the average hashrate of the group of smaller block makers, which means that collusion and control of the network is much easier for some.


6. Summary
  • Available methods of centralisation measurement include:
    • The proportion of the network controlled by the largest or a group of the larger block creating entities;
    • The Gini coefficient and the Theil index;
    • The bitcoin centralisation index.
  • Of these, the last three are likely most useful to measure general inequality in the network hashrate distribution.
  • The Theil index and the bitcoin centralisation index are the easiest to measure; the Gini coefficient is more widely known.
  • None of the measures suggest a long term increase in centralisation over time; the Gini coefficient and Theil index both indicate a decrease in centralisation since the start of the year.



Organofcorti lives!

organofcorti.blogspot.com is a reader supported blog:

1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG



Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.

Please refer to the most recent blog post for current rates or rule changes.

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
  • Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
  • Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.


2 comments:

  1. I found very useful to use the Co-orphan rate (CoOrp) as a measure of collusion. I defined it as [ Orp(A,B) + Orp(B,A) ] / [ N(A) + N(B) ], where Orp(A,B) is the number of orphan blocks of B that lost a competition to a block from A, and N(A) is the number of blocks mined by A, where both functions are computed over a time window. This index measures the possibility that A and B are colluding. In other words, it measures the possibility that A y B are in fact the same miner. As I defined it, it only give accurate values if N(A) and N(B) are high. If CoOrp(A,B) is zero, then both miners never compete. If CoOrp(A,B)=0.02, then both miners always compete (2% is aproximately the average orphan rate). I suppose you can compute CoOrp(X,Y) for every pair of miners having more than 20% for each 60 days and have a tolerable low variance in the estimator. Obviously miners can cheat not to be detected by this method, but that would cost them money, and also they must be aware they are being monitored.

    ReplyDelete
    Replies
    1. Hi Sergio,

      That is an interesting idea, but I see a problem in implementation. I think you'd need a non standard client, a monitor that can be in constant connection to the majority of the network at any time. I see much fewer than expected orphans, and I think that's because I live about 10 seconds behind the network median latency and connect to only 8 peers.

      Any suggestions?

      Delete

Comments are switched off until the current spam storm ends.