Neighbourhood Pool Watch: 8.1 BitcoinPool: Another Bitclockers?

0. Introduction

Pool: BitcoinPool
Pool ops: forum members Geebus and FairUser
Publicly identifiable pool operator: No.
Payout method: Standard proportional payout with a "pool hopper tax"
Hoppable? Yes.
Fee: Donations only
Current pool fulltime miner hashrate: 100 Ghps
Current pool hopper maximum hashrate: 700 Ghps

Availability of public statistics:

Easily available data does not include shares per round, but is easily accessible on website and all rounds since the start of the pool are available.
Shares per round is available on request.
Miner status API available.
Current pool hashrate.

Accuracy of public statistics: Good - only three outlier rounds from nine and eleven hundred rounds needed to be removed from data.
Are public statistics likely to be reflecting actual data? Unknown, possibly no,
Public perception of pool and pool op: Unknown.

Should I mine here right now? No.

0. Introduction

BitcoinPool solved their first block on 2011-02-24, and became popular quite quickly, reaching an average of 4% of the network hashrate by May 2011. Like Bitlc.net, they then lost that gain just as quickly and by August 2011 were at 1% of the network hashrate and declining. Their current base hashrate is about 0.2 to 0.5% of the network hashrate.

Why did their share of the network drop significantly so quickly? There are many possible reasons. The list below is by no means exhaustive, and I cannot be sure they are actually contributing factors. If you'd like to read more about the pool's history, I recommend visiting the pool's forum, and both the locked thread and unlocked thread on bitcointalk.org.

Confusions over the pool's definition of efficiency (interestingly, only just now becoming a necessity for most pools).

Xenon481's response.

First attck (syn flood), and more details on the attack.

Xenon481, quick discussion of issues.

Some responses from FairUser.

Accusation of stats fudging.

Xenon481, users not being paid.

FairUser: problem caused by bug not notifying of invalid/orphaned blocks.

Geebus on IP banning.

2. BitcoinPool's sample statistics vs population statistics

This post's analysis section is a departure from previous posts. I am assuming that readers don't need some of the comparison to generated data charts I used previously, so we'll go straight to the meat and potatoes.

All data for this post is available as a tab delimited file here: http://www.bitbin.it/xtJin0wW

Mean, median, variance, skewness and kurtosis are statistical parameters that we have previously used to analyse the distribution to which a pool's round length random variable belongs. The theoretical geometric distribution statistics and experimental parameters are shown in the table below for the available 907 blocks (911 currently solved blocks, minus 3 outliers and one orphaned block for which no data was provided).

The upper and lower bound for the mean, median, and standard deviation are the 95% confidence intervals for the sample statistics, rather than the population upper and lower bounds. For the skewness and kurtosis the lower and upper bounds are the bias-corrected and accelerated 95% confidence intervals as per D. Wright & J. Herrington.

The 95% confidence intervals for all statistics exclude the theoretical population mean, median, and standard deviation parameters of a geometric distribution where mean = shares per round / D.

The mean and median appear to be in an approximately correct ratio for a mean of 1.13, but the standard deviation is too low for a mean of 1.13, and the skewness and kurtosis are, unusually, excluded from the population values.

The Negative Binomial CDF (approximated using the gamma distribution) indicates that only 0.006% of runs of 914 block would have a mean shares per round / D larger than BitcoinPool. This is a very unlikely eventuality.

3. Quantiles and empirical cumulative distribution function.

As discussed in previous posts, the cumulative distribution function (CDF) of Bitcoin pooled mining round lengths describes the probability that a round will be solved after a given number of shares have been contributed. For example, a large probability means that most rounds will be solved when a large number of shares are submitted. Since shorter rounds are more common, we expect the CDF to rise rapidly, and then level out and approach 1 at larger round lengths.

The comparison plot below on the left is a QQ plot (quantile-quantile) to compare the quantiles from BitcoinPool's distribution of shares per round / D and what that distribution should be theoretically.

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from BitcoinPool shares per round / D data, and the theoretical quantiles are calculated using the number of rounds in the dataset, so that there are the same number of theoretical as empirical data with the same cumulative probabilities. The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x.

The comparison plot below on the right shows the difference between the theoretical CDF for geometrically distributed variables and the empirical CDF (eCDF) for BitcoinPool round lengths. The ecdf is is defined as:

ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size

The QQ plot shows that the (shares per round) / D data varies quite significantly from those expected from a geometric distribution. The QQ line has a larger gradeint than the expected y=x, and is in fact approximately to y = 1.156x - meaning that for the middle 50% of round lengths the mean is 1.156 time greater than expected for an exponential / geometric distribution.

The eCDF and theoretical CDF are quite different. The shading shows the upper and lower 95% confidence interval bounds for the empirical CDF, and for a large portion of their length these plainly exclude the theoretical (population) CDF.

The eCDF provides only a visual guide. The Anderson - Darling K test provides a more accurate assessment of whether BitcoinPool's shares per round belong to the expected probability distribution.

5. Anderson - Darling K test

This is a non parametric test of whether a sample is likely to have come from a particular probability distribution. Like the Kolmogorov - Smirnov test it compares the eCDF to the expected probability distribution CDF, but from my reading the ADK test is more accurate than the KS test for large sample sizes ( ~ 100 ). I tested the BitcoinPool shares per round / Difficulty data against ten thousand randomly generated exponential variables.

The hypotheses:

If
H0 = BitcoinPool's round lengths belong to a geometric distribution,
HA = BitcoinPool's round lengths do not belong to a geometric distribution,
Then:

p value = 0.00029

H0, the hypothesis that the BitcoinPool shares per round are geometrically distributed, can be rejected - the sample statistics upper and lower bounds are all outside the expected population parameters, and the ADK test result indicates that HA cannot be rejected at a p < 0.0003.

It is extremely unlikely that BitcoinPool's shares per round are geometrically distributed, and it is very likely that the shares per round are on average abnormally large. This should be of great concern to both the BitcoinPool pools ops and miners alike.

4. Has BitcoinPool always been so unlucky?

Below are the 100 block rolling mean and the cumulative mean shares per round / D, together with the population 95% confidence upper and lower bounds. From the rolling mean we note that from about the 480th block solved for the pool (July 2011) the 100 block rolling mean shares per round start to leave the expected 1.0 mean, returning only once, and clearly exceeding the 95% confidence interval upper bound.

The cumulative mean chart shows that the cumulative mean shares per round / D has only been below the expected 1.0 in the first 13 rounds solved, and never again afterward. The cumulative mean shows the same increase beginning around solved block 480, exceeding the 95% confidence interval upper bound permanently just before solved block 600. The cumulative mean from block 700 to now shows a continual and gradual increase.

6. Discussion and conclusions

I first noticed some anomalies in BitcoinPool's published data, and on 10th September 2012 contacted them to discuss my findings. Unfortunately I could not convince them that the data was anomalous in any way, and then Geebus simply stopped responding. I've not heard from them since and so I have decided to publish what I have.

I hope it's clear that I am not inferring that the Geebus or FairUser have acted dishonestly, however I can say with 95% confidence that BitcoinPool's shares per round are much longer on average than expected, and we cannot reject that possibility that the shares per round do not belong to the expected statistical distribution. From this I can also say that Geebus and FairUser should be much more concerned about this than they appear.

In the next post I will analyse in more detail where the anomalies lie, and hopefully provide sufficient information for Geebus and FairUser to look for possible problems in their pool. In a later post I will analyse the "pool hopping protection" that the pool features.

Donations help give me the time to investigate pools and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Neighbourhood Pool Watch

Pages

Tuesday, 30 October 2012

8.1 BitcoinPool: Another Bitclockers?

No comments:

Post a Comment