Neighbourhood Pool Watch: 4.1 Slush's pool

0. Introduction

Pool: mining.bitcoin.cz aka Slush's pool
Pool op: Slush
Publicly identifiable pool operator: Yes, Marek Palatinus is the pool operator.
Payout method: Exponentially scored proportional payout
Hoppable? Yes - at the moment.
Fee: 2%
Current pool hashrate: 1100 Ghps
Availability of public statistics:

All rounds since the start of the pool are available on request.
Miner submitted shares not available, miner score is.
Current pool hashrate.
Handy theoretical vs empirical CDF chart, based on all round since the start of the pool.
"Daily Pool Payout" chart - limited usefulness, but interesting.

Accuracy of public statistics: No obvious errors, and only one significant hash rate vs block duration outlier.
Are public statistics likely to be reflecting actual data? Yes
Public perception of pool and pool op: Very good.

Should I mine here right now? Yes, although there are some drawbacks caused by the current payout method.

Should I mine here after the payout method change to DGM? Yes.

Note: On Friday 13th April local time, Slush gave a timeframe of a few days for a change over to a DGM payout method, which is provably fair and quite flexible. If his change over estimate is accurate, the new payout method should be on line sometime next week. Please keep the changeover in mind during my discussion of the exponential payout system.

Slush's pool is the longest lasting bitcoin mining pool and at the moment the third largest in terms of total pool hashrate. It opened to the public in October 2010, fully two months before Deepbit.net started, and for the last twelve months has had a pool hashrate between one and two thousand gigahashes per second. Miners posting on the Slush's pool bitcointalk forum praise Slush for the pool's reliability and Slush's very quick help with their mining problems. I am also a fan of Slush's pool, and I'll try to not let that affect my objectivity here.

There is only one drawback to mining at Slush's pool, and that is the exponential scoring method used. This new payout method was brought online in February 2011 by Slush in response to proof that in order to maximize mining profits, a pooled miner should leave a round at 0.43 x D (D = bitcoin mining difficulty).

The basic idea of the score system is to reduce the value of shares earlier in the round by making shares later in the round exponentially more valuable. For every submitted share the pool assigns it a 'score' as follows:

score = score + exp(round_time/c)

Rewards are then calculated proportionally:

reward = user score / total score * 50

Score is reset each round, and renormalised each hour intraround. Slush made the method, the necessity and drawbacks quite clear to his miners and has been open and honest about it (in contrast to Bitclockers.com).

The major drawbacks of the method are:

Increased payout variance compared to the standard proportional method;
Although hopping resistant, it is not proof against strategic miners;
Since the function is reliant on time rather than number of shares submitted, changes in total pool hashrate and D affect the usefulness of the score in reducing profitability for strategic miners.
Variance due to the score method and the method's ability to reduce a strategic miner's payout are in opposition: reducing variance by increasing c increases the profitability of strategic mining; reducing the profitability of strategic mining by decreasing c means increasing variance.
Increases in the pool hashrate have a similar effect to increasing c and visa vera;
Decreases in D have a similar effect to increasing c and visa versa.

For more information on Slush's scoring method and strategic mining, see How to hop 1, How to hop 2, and How to hop 8. For more information on strategic mining in proportional pools, see How to hop 5, How to hop 6, How to hop 7, and How to hop 9.

Please note that in the analysis that follows, there are two chronological breaks: one between February 2nd and February 29th 2012, and one between March 19th and March 20th 2012. These will in no way affect the analysis but should be kept in mind when interpreting the chronological hashrate graph which covers late February until mid April 2012.

1. Pool Hashrate

You'll notice there's much more short term variance in pool hashrate than for Arsbitcoin.com. This is likely due to strategic miners. Arsbitcoin.com did not suffer this type of strategic mining. The effects of strategic mining on fulltime miners at Slush's pool will be covered in the next post.

1. Do Slush's pool round lengths appear geometrically distributed?
Note: Data used in this post is available here as a .csv file. Each row contains: a Unix time timestamp when a block was solved; the total round shares in the block; the duration of the round in seconds; total round shares divided by D; and D the Bitcoin difficulty (D) at the time.

As mentioned in previous posts, pooled mining round lengths should be geometrically distributed. Below are a comparison between Slush's pool and simulated data for:

Chronological ordering of total round shares as a fraction of D.
Histograms of total round shares as a fraction of D.
Boxplots of total round shares as a fraction of D, grouped by difficulty period.

These charts are much as we'd expect if Slush's pool's round shares were geometrically distributed. There are similar numbers of long and short rounds, and the histograms and boxplots are quite similar. Slush's pool passes the first round of analysis.

2. Theoretical comparisons - statistical parameters.

Mean, median, variance, skewness and kurtosis are statistical parameters that we have previously used to analyse the distribution to which a pool's round length random variable belongs. The theoretical geometric distribution statistics are:

mean = 1/p = D
median = -1/log(D,base=2) ~ 0.693*D
std.dev. = sqrt((1-p)/p^2) ~ D
skewness = (2-p)/sqrt(1-p) ~ 2
kurtosis = 5-p+1/(p-1) ~ 6

The statistical parameters of Slush's pool round lengths are:

mean = 0.9666838
median = 0.6692116
st.dev = 1.016986
skewness = 1.926507
kurtosis = 4.571475

Round lengths are already divided by D so the parameters of Slush's pool's round length distribution are quite close to the values expected for a geometric distribution, except for a much smaller kurtosis (lighter tail). I've observed significant variation in pool round length kurtoses.

Although the exact standard errors for skewness and kurtosis are a function of the 6th and 8th central moments respectively of the originating distribution, they can be approximated as follows:

Standard error for skewness = sqrt(6/N)
Standard error for kurtosis = sqrt(24/N)

where N is the number of variables in the data.
(Tabachnick/Fidell 1996)

Using these standard error estimates for the skewness and kurtosis at Slush's pool:

Standard error for skewness = sqrt(6/756) = 0.089
Standard error for kurtosis = sqrt(24/756) = 0.178

A sample's statistics should be within two standard errors of the population parameters, which is clearly not the case for Slush's pool's kurtosis. However in simulation I find that these standard error estimates for kurtosis are often significantly incorrect.

D. Wright & J. Herrington's paper on standard errors for and confidence intervals of kurtosis and skewness show that traditional estimates can be significantly incorrect, depending on the underlying distribution, and these estimates can reject the null hypothesis when it shouldn't.

Instead, they have developed a method which appears more accurate for most distributions. Using the R package developed by D. Wright to assess confidence intervals for kurtosis and skewness, the following 95% confidence intervals for skewness and kurtosis of the rounds lengths as a fraction of D are obtained:

Skewness Kurtosis
lb BCa 1.66089667 3.0909960
ub BCa 2.26320923 7.0498422

where lb BCa and ub BCa are the lower and upper bounds of the bias-corrected and accelerated 95% confidence intervals.

This estimate provides a much greater variation in kurtosis and skewness as compared to the traditional estimate, and also shows that the theoretical estimates for a geometric distribution are within the confidence intervals for Slush's pool's kurtosis and skewness. . Slush's pool passes the second round of analysis.

3. Theoretical comparisons - quantiles and empirical cumulative distribution function.

The cumulative distribution function (CDF) of a Bitcoin pooled mining round lengths describes the probability that a round will be solved after a given number of shares have been contributed. For example, a large probability means that most rounds will be solved when a large number of shares are submitted. Since shorter rounds are more common, we expect the CDF to rise rapidly, and then level out and approach 1 at larger round lengths.

The comparison plot below on the left is a QQ plot (quantile-quantile) to compare the quantiles from Slush's pool's round length distribution and what that distribution should be theoretically.

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from Slush's pool's round length data, and the theoretical quantiles are calculated using the number of rounds in the dataset, so that there are the same number of theoretical as empirical data, with the same cumulative probabilities (from 1/756 and up). The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x.

The comparison plot below on the right shows the difference between the theoretical CDF for geometrically distributed variables and the empirical CDF (eCDF) for Slush's pool round lengths. The ecdf is is defined as:

ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size

Both plots show that the round length data from Slush's pool do not vary significantly from those expected from a geometric distribution.

We can assess the comparison between the eCDF and the theoretical CDF more rigorously using the Kolmogorov-Smirnov test. In this case:

If:
H0 = Slush's pool's round lengths belong to a geometric distribution,
HA = Slush's pool's round lengths do not belong to a geometric distribution,
Then:

p value = 0.3256

This means we cannot reject H0 as a hypothesis, and provides evidence that Slush's pool's round lengths are distributed geometrically. Slush's pool passes the third round of analysis.

4. Is Slush's pool's mean round length probable?

Slush's pool's mean round length is 0.9666838 x D. Is this exceptionally lucky, or close to the expected mean?

To do this we use the Central Limit Theorem (CLT) which states that a sufficiently large number of independent random variables will be approximately normally distributed. This means we can estimate the likelihood Slush's pool's round length mean by using the standard deviation of the the round lengths and the number of rounds:

population mean = sample mean
population std. dev. (sd) = sample sd / sqrt(number of variables in sample, n)

For Slush's pool:

population mean = D
sample mean = 0.9666838
sample sd = 1.016986
n = 756
population sd = 1.016986/sqrt(756) = 0.03698742

Using the normal CDF function in R, we get:

> pnorm(0.9666838,1, 0.03698742)
[1] 0.1838622

0.183 is much greater than the p < 0.05 which would have allowed us to infer that the pool was the recipient of very unlikely luck, and so the mean round length is within the acceptable range, as the plot below illustrates.

Slush's pool passes the final round of statistical analysis.

5. Why shouldn't I mine at Slush's pool?

This section was originally going to show the increase in variance caused by Slush's exponential algorithm, and the next post calculate the loss to full time miners caused by strategic miners. Since Slush's announcement which I mentioned at the start of the post, there is not much point continuing with this section or next post as planned.

Below is a chart from that post which shows how significantly Slush's pool was being affected by strategic miners (compare with the equivalent chart for Deeptbit.net chart 2). The extent to which strategic miners add hashrate to the pool is clear even though the point at which strategic miners would leave the pool varied by pool hashrate and difficulty. My estimate was that strategic miners added least 60 to 75% was to the pool hashrate each round, and possibly more. This would have a noticeable affect on fulltime miner earnings.

After a changeover to the DGM payout method, adding to the pool hashrate at any point in the round will no longer affect fulltime miner earnings.

Further with a DGM payout method in place, Slush's pool is likely to change from a reliable and trusted pool with a large hashrate, a responsive operator, relatively content miners, and a flawed payout method to a reliable and trusted pool with a large hashrate, a responsive operator, happy miners, and a fair payout method. Good luck with the changeover!

6. Conclusions

Slush's pool passes all four sets of statistical analysis, indicating it is very likely that the published pool statistics are accurate, and that miners at the pool are being paid as fairly as the current payout method allows.
Slush has a good reputation for honesty, openness, and responsiveness to his miners enquiries and preferences.
Slush's pool is generally trusted by the community.
The current payout method, although better than proportional payouts, is flawed and increases variance for miners and doesn't eliminate the possibility of strategic miners reducing the payout of full time miners.
Once the payout method is changed to Meni Rosenfeld's DGM, full time miners will not lose earnings to strategic miners. Variance in earnings can be controlled by this method, and can be minimised.

Donations help give me the time to investigate pools and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Neighbourhood Pool Watch

Pages

Friday, 13 April 2012

4.1 Slush's pool

No comments:

Post a Comment