1st June 2015
0. Introduction
Last week I saw a discussion on bitcointalk.org about the "biggest shares" found at solo.ckpool.org, and where they might be recorded. The quick answer to this is "in the block chain". This is because the difficulty of a share is just another way of describing a blockhash, and all the "biggest shares" - that is, the tiniest blockhashes - will have created blocks. Below is a chart of blockhash (in terms of difficulty) for every block ever produced, in bins of 14 days:
Blockhash difficulty as multiple of network difficulty is a little more informative:
Why do some periods seem to have more or fewer blocks that have smaller blockhashes (greater multiple of difficulty)? This is an effect of using time instead of block height as a histogram bin. If we bin to (for example) 2016 blocks, the result is quite different:
It's necessary to use a logarithmic y axis and a logarithmic colour scale or it becomes very hard to visualise the distribution of blockhashes as multiples of difficulty. This suggests a highly kurtotic (pointy with large variance) distribution. Why is this so? Aren't blockhashes just a uniformly distributed random number? Yes, but the same variables transformed to a "difficulty" measure are not.
1. Blockhashes and difficulty
What follows is probably not going to be unknown to most readers and this isn't really the place for an in-depth discussion for those who don't understand the relationship between blockhashes and difficulty, so I'll keep it simple.
When a miner attempts to create a new block, they take various data (for example the time, a hash of transactions to be included, a hash of the previous block header, some random data) and then hash it twice.
The resulting hash of the block header is a hexadecimal number, and if the value of the hash is less than the maximum allowable hash divided by the network difficulty (the target), the result is a valid block and may be included in the blockchain if accepted by a majority of nodes.
If we divide the hash value by the target (i.e. maximum allowable blockhash / network mining difficulty), then the blockhash of any block in the blockchain is transformed to a uniform random number between 0 and 1, and the multiple of the current mining difficulty is the inverse.
For example, block height 358751:
Blockhash = 000000000000000010b79efadf8d61b82c18fa72f7821d83fcb78ef32d6cce38
= 4.099063e+56
Difficulty = 48807487245
Maximum = 26959535291011309493156476344723991336010898738574164086137773096960
Target = (Maximum / Difficulty) = 5.523647e+56 As a uniform (0,1) random variable: Blockhash / Target = 0.7420935
As a multiple of difficulty: Target / Blockhash = 1.347539
So the blockhash as a multiple of difficulty is an inverse uniform random number. If the lower tail CDF of a uniform (0,1) random variate is x, then the upper tail CDF of its inverse is 1/x, and the lower tail CDF of the inverse is 1/(1-x) and the probability density is the first derivative of the CDF, 1/x2.
So, how does the block chain block hash data stack up against theory? So well that you can barely tell the difference between the ECDF and the theoretical CDF, or the empirical and theoretical probability densities, and the QQ plot is 1:1. Empirical and theoretical data are (for once!) indistinguishable.
2. So which block makers have had the tiniest, most difficult blocks?
From the foregoing, it's clear that when difficulty is higher, blockhashes will be smaller and when difficulty is lower, blockhashes will be larger. So the largest difficulty (smallest) blockhash is always going to be a recent one, as long as the network hashrate continues to increase.
There is also little point in finding which pool has the largest average multiple of difficulty per blockhash, since the inverse of a uniformly distributed random variate has undefined mean and variance. In simple terms, this is not a distribution to which the "law of large numbers" applies. The inverse uniform distribution is actually a type of Pareto distributed random variable; if you want to learn more I'd start there.
So, more interesting is "Which blockhashes have been the largest multiple of the then current network difficulty?", to which question I provide the answer in the plot below. The points represent the multiple of the network mining difficulty at the time the blockhash was found, the size of the point represents the total number of blocks found by that block maker.
The bars are the 95% confidence interval of the largest expected multiple of difficulty for a given number of solved blocks.
In case you're wondering how I calculated that, the kth order statistic of uniformly distributed random variate is a beta distributed random variable, and so the inverse of the distribution of the k=1 beta random variate is the distribution of the k=n order statistic for n inverse uniform distributed random variates.
You'll see two block makers have data outside the 95% confidence intervals. Given that there are seventy seven block makers that is acceptable.
All time tiniest blockhash compared to difficulty: "Unknown", followed by DeepBit, Slush, Discus Fish, and Eligius.
All time tiniest blockhash compared to difficulty and total number of blocks solved: simplecoin.us, followed by "Dot pool" (whoever that turns out to be), Halleychina.com, allinvain and IceDrill.
3. Summary
Having a blockhash much larger than difficulty can't earn you any extra bitcoins, so it's a pointless type of "luck". But it's fun and interesting to see that some block makers have found block hashes at a difficulty of almost three hundred thousand times larger than the mining difficulty at the time, and that this is independent of the network mining difficulty.
So, guys at solo.ckpool.com: As I write this, your pool has solved 62 blocks. That means the 95% confidence interval of the largest multiple of difficulty blockhash you could expect is between 1/(1 - q0) and 1/(1-q1), where q0 and q1 are the 2.5% and 97.5% beta quantile for beta( 62, 1), or between 17 and 2450 x difficulty.
organofcorti.blogspot.com is a reader supported blog:
1QC2KE4GZ4SZ8AnpwVT483D2E97SLHTGCG
Created using R and various packages, especially dplyr, data.table, ggplot2 and forecast.
Recommended reading:
- For help on ggplot2.
- For help on forecasting.
Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you 0.01 btc per ten errors.
Please refer to the most recent blog post for current rates or rule changes.
I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.
Exceptions:
- Errors in text repeated across multiple posts: I will only pay for the most recent errors rather every single occurrence.
- Errors in chart texts: Since I can't fix the chart texts (since I don't keep the data that generated them) I can't pay for them. Still, they would be nice to know about!
I write in British English.
No comments:
Post a Comment
Comments are switched off until the current spam storm ends.