0. Introduction
Pool: BitcoinPool
Pool ops: forum members Geebus and FairUser
Publicly identifiable pool operator: No.
Payout method: Standard proportional payout with a "pool hopper tax"
Hoppable? Yes.
Fee: Donations only
Current pool fulltime miner hashrate: 100 Ghps
Current pool hopper maximum hashrate: 700 Ghps
Availability of public statistics:
- Easily available data does not include shares per round, but is easily accessible on website and all rounds since the start of the pool are available.
- Shares per round is available on request.
- Miner status API available.
- Current pool hashrate.
Are public statistics likely to be reflecting actual data? Unknown, possibly no,
Public perception of pool and pool op: Unknown.
Should I mine here right now? No.
0. Introduction
Why did their share of the network drop significantly so quickly? There are many possible reasons. The list below is by no means exhaustive, and I cannot be sure they are actually contributing factors. If you'd like to read more about the pool's history, I recommend visiting the pool's forum, and both the locked thread and unlocked thread on bitcointalk.org.
First attck (syn flood), and more details on the attack.
Xenon481, quick discussion of issues.
Some responses from FairUser.
Accusation of stats fudging.
Xenon481, users not being paid.
FairUser: problem caused by bug not notifying of invalid/orphaned blocks.
Geebus on IP banning.
2. BitcoinPool's sample statistics vs population statistics
This post's analysis section is a departure from previous posts. I am assuming that readers don't need some of the comparison to generated data charts I used previously, so we'll go straight to the meat and potatoes.
All data for this post is available as a tab delimited file here: http://www.bitbin.it/xtJin0wW
Mean, median, variance, skewness and kurtosis are statistical parameters that we have previously used to analyse the distribution to which a pool's round length random variable belongs. The theoretical geometric distribution statistics and experimental parameters are shown in the table below for the available 907 blocks (911 currently solved blocks, minus 3 outliers and one orphaned block for which no data was provided).
The upper and lower bound for the mean, median, and standard deviation are the 95% confidence intervals for the sample statistics, rather than the population upper and lower bounds. For the skewness and kurtosis the lower and upper bounds are the bias-corrected and accelerated 95% confidence intervals as per D. Wright & J. Herrington.
The 95% confidence intervals for all statistics exclude the theoretical population mean, median, and standard deviation parameters of a geometric distribution where mean = shares per round / D.
The Negative Binomial CDF (approximated using the gamma distribution) indicates that only 0.006% of runs of 914 block would have a mean shares per round / D larger than BitcoinPool. This is a very unlikely eventuality.
3. Quantiles and empirical cumulative distribution function.
As discussed in previous posts, the cumulative distribution function (CDF) of Bitcoin pooled mining round lengths describes the probability that a round will be solved after a given number of shares have been contributed. For example, a large probability means that most rounds will be solved when a large number of shares are submitted. Since shorter rounds are more common, we expect the CDF to rise rapidly, and then level out and approach 1 at larger round lengths.
The comparison plot below on the left is a QQ plot (quantile-quantile) to compare the quantiles from BitcoinPool's distribution of shares per round / D and what that distribution should be theoretically.
Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from BitcoinPool shares per round / D data, and the theoretical quantiles are calculated using the number of rounds in the dataset, so that there are the same number of theoretical as empirical data with the same cumulative probabilities. The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x.
The comparison plot below on the right shows the difference between the theoretical CDF for geometrically distributed variables and the empirical CDF (eCDF) for BitcoinPool round lengths. The ecdf is is defined as:
ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size

The QQ plot shows that the (shares per round) / D data varies quite significantly from those expected from a geometric distribution. The QQ line has a larger gradeint than the expected y=x, and is in fact approximately to y = 1.156x - meaning that for the middle 50% of round lengths the mean is 1.156 time greater than expected for an exponential / geometric distribution.
The eCDF and theoretical CDF are quite different. The shading shows the upper and lower 95% confidence interval bounds for the empirical CDF, and for a large portion of their length these plainly exclude the theoretical (population) CDF.
The eCDF provides only a visual guide. The Anderson - Darling K test provides a more accurate assessment of whether BitcoinPool's shares per round belong to the expected probability distribution.
5. Anderson - Darling K test
This is a non parametric test of whether a sample is likely to have come from a particular probability distribution. Like the Kolmogorov - Smirnov test it compares the eCDF to the expected probability distribution CDF, but from my reading the ADK test is more accurate than the KS test for large sample sizes ( ~ 100 ). I tested the BitcoinPool shares per round / Difficulty data against ten thousand randomly generated exponential variables.
The hypotheses:
If
H0 = BitcoinPool's round lengths belong to a geometric distribution,
HA = BitcoinPool's round lengths do not belong to a geometric distribution,
Then:
p value = 0.00029
H0, the hypothesis that the BitcoinPool shares per round are geometrically distributed, can be rejected - the sample statistics upper and lower bounds are all outside the expected population parameters, and the ADK test result indicates that HA cannot be rejected at a p < 0.0003.
It is extremely unlikely that BitcoinPool's shares per round are geometrically distributed, and it is very likely that the shares per round are on average abnormally large. This should be of great concern to both the BitcoinPool pools ops and miners alike.
4. Has BitcoinPool always been so unlucky?
Below are the 100 block rolling mean and the cumulative mean shares per round / D, together with the population 95% confidence upper and lower bounds. From the rolling mean we note that from about the 480th block solved for the pool (July 2011) the 100 block rolling mean shares per round start to leave the expected 1.0 mean, returning only once, and clearly exceeding the 95% confidence interval upper bound.
The cumulative mean chart shows that the cumulative mean shares per round / D has only been below the expected 1.0 in the first 13 rounds solved, and never again afterward. The cumulative mean shows the same increase beginning around solved block 480, exceeding the 95% confidence interval upper bound permanently just before solved block 600. The cumulative mean from block 700 to now shows a continual and gradual increase.
6. Discussion and conclusions
I first noticed some anomalies in BitcoinPool's published data, and on 10th September 2012 contacted them to discuss my findings. Unfortunately I could not convince them that the data was anomalous in any way, and then Geebus simply stopped responding. I've not heard from them since and so I have decided to publish what I have.
I hope it's clear that I am not inferring that the Geebus or FairUser have acted dishonestly, however I can say with 95% confidence that BitcoinPool's shares per round are much longer on average than expected, and we cannot reject that possibility that the shares per round do not belong to the expected statistical distribution. From this I can also say that Geebus and FairUser should be much more concerned about this than they appear.
In the next post I will analyse in more detail where the anomalies lie, and hopefully provide sufficient information for Geebus and FairUser to look for possible problems in their pool. In a later post I will analyse the "pool hopping protection" that the pool features.
Donations help give me the time to investigate pools and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r



No comments:
Post a Comment
Comments are switched off until the current spam storm ends.