Neighbourhood Pool Watch: 5.2 p2Pool: Achieving expectation.

Pool: p2Pool
Pool op: No pool operator, although some pools provide access to p2Pool.
Developer: forrestv
Payout method: PPLNS variant
Hoppable? No.
Fee: Donation only.
Current pool hashrate: 370 Ghps
Availability of public statistics:

Local and global statistics are provided as part of the p2Pool client.
All rounds since the start of the pool are available in JSON at
http://p2pool.info/blocks - very easy to use once converted from JSON. Thanks to bitcointalk forum member twmz for providing the service.
p2pool.info has many interesting charts that are worthwhile investigating if you're interested in how the pool compares to other pooled mining methods.

Accuracy of public statistics: There are some inaccuracies which when averaged over all rounds are probably not very important. More details here.
Are public statistics likely to be reflecting actual data? Yes
Public perception of pool: Good.

Should I mine here right now? Yes. Do read about p2Pool and it's pros and cons first, but poor pool luck and orphaned blocks are no longer a reason not to mine there.

0. Introduction
If you haven't read the previous post about p2Pool, or if it has been a while since you did, you should probably have a quick read over it again since it has a lot of introductory information about p2Pool and why you should mine there, and also addresses concerns that were raised at that time about the pool luck and orphaned block production. It might also be useful to leave the browser tab open to that post for reference. Today's post is a response to a request from the ever helpful bitcointalk forum member rav3n_pl.

The conclusion to the last post was that either the data collection method was faulty, there was some unknown way luck was affected in the pool without the loss of shares (unlikely and no known way of occurring) or the pool was simply having very bad luck. As far as orphan production went, all I could say was that it had been increasing.

In this post I'll present the last pool statistics for p2Pool, showing that all statistics are as expected (pardon the pun in the title to this post), and that neither orphan production nor pool luck correlates with pool hashrate.

The data this post is taken from is posted here.

1. Pool hashrate
Although the pool hashrate has recovered somewhat since the June post, as a proportion of the network hashrate p2Pool has continually declined, as the average hashrate per round charts show below. I hope this post goes some way to encouraging more miners to at least try p2Pool and see if it suits them. As I show later, an increase in hashrate has in no obvious way correlated with an increase in orphaned blocks or poor pool luck.

The September 2012 spike in pool hashrate was attributable to Pyramining. Unfortunately it did not last, but it is quite helpful in testing the "increases in hashrate causes bad luck" and "increases in hashrate cause orphaned blocks" hypotheses.

2. p2Pool's sample statistics vs population parameters

I thought it best to get this out if the way quickly, so below is the moment comparison table. Instead of calculating the sample confidence interval, I've used the population 95% confidence interval, since the population distribution is well known and well defined. The mean population 95% confidence interval, the (estimated) negative binomial CDF, the median population 95% confidence interval, and the standard deviation confidence interval are all calculated using the two parameter gamma distribution:

mean and CDF: pgamma(q, shape = rounds, rate = rounds)
median: qgamma(p, rounds*log(2)^2, rounds*log(2))
std. deviation: qgamma(p, rounds/2, rounds/2)

Note that the last two are only approximations, but are sufficiently accurate for rounds > 50. The population 95% confidence intervals for skewness and kurtosis were each calculated using separate simulations of one million sets of 969 rounds.

p2Pool's statistics are only well within the population 95% confidence interval, but are very close to the expected values - more so than perhaps any pool so far.

When did this happen? The cumulative mean chart below can help us determine when the mean pool round length exceeds expectations and recovers to within expectations. Analysing the cumulative statistic quantities of a system is a method often used in industry to determine when in a timeline the statistic of interest exceeds a particular value, and has a formalised method in CUSUM. We will be looking for the points at which the cumulative mean exceeds our predefined 95% confidence interval limit, at either the lower or upper bound.

p2Pool's mean shares per round / D returned below the upper 95% confidence interval from about solved block 630 until now. At about solved block 800, the mean shares per round / D show a significant decrease, which may be starting to level out now as the pool approaches a cumulative mean of 1.0 for the first time since the first few rounds of the pool's history.

3. Quantiles and empirical cumulative distribution function
I've included this for completeness more than any other reason, to show the difference between the previous QQ and CDF plots and the current results.

If you compare these plots to the originals, you'll notice that the QQ plot and line are both much closer to the expected 1:1 than the previous, and the theoretical and empirical CDFs are very similar - the theoretical CDF well within the confidence interval for the empirical CDF. These new plots are all very much as expected for an exponentially distributed random variable. Next we'll test the hypothesis that the p2Pool shares per round / D are exponentially distributed.

4. Anderson Darling K testThis is a non parametric test of whether a sample is likely to have come from a particular probability distribution. Like the Kolmogorov - Smirnov test it compares the eCDF to the expected probability distribution CDF, but from my reading the ADK test is more accurate than the KS test for large sample sizes ( ~ 100 ). I tested the p2Pool shares per round / Difficulty data against ten thousand randomly generated exponential variables.

The hypotheses:

If
H0 = p2Pool's round lengths belong to a geometric distribution,
HA = p2Pool's round lengths do not belong to a geometric distribution,
Then:

p value = 0.53266

At a p value well above the 0.05 significance level, H0 (the hypothesis that the p2Pool shares per round are exponentially distributed), cannot be rejected.

5. Pool luck, orphaned blocks, and increasing pool hashrate
Ignoring for a moment the fact that no known mechanism could be responsible for affecting the actual number of difficulty 1 equivalent shares it takes to solve a block, let's look at a simple comparison between the cumulative mean shares per round / D, the cumulative mean orphans per round, and the pool hashrate.

Increases in the cumulative mean and the cumulative orphans per block don't seem to correlate at all with hashrate, certainly not in any obvious way. One orphaned block appeared while the pool hashrate was enjoying it's 120 Ghps spike, but this then continued into a series of orphans. There was also a slight increase in cumulative shares per round / D during this time. Still the only relationship that appears to be a trend is that cumulative shares per round / D and cumulative orphans per block - more on that later.

There are a number of ways we can go about determining a possible correlation between the pool hashrate and shares per round on D or orphans - a correlation coefficient and linear regression for the first and logistic regression for the second. In both cases I did not find a significant correlation. The following charts show you why.

First consider shares per round / D as a function of hashrate:

If there was any significant correlation shares per round / D would increase with increasing hashrate. There is not evident correlation between the two - either on a log (as above) or a linear scale of shares per round / D.

If there was a correlation between orphaned blocks and pool hashrate, the group of orphans per hashrate would be on the left side of the graph. Instead, they gather just below the main group of total solved blocks per hashrate. This means they only correlate with the number of total solved blocks (as we'd expect) and not pool hashrate.

Linear and logistic regression both support this conclusion - there is no relationship between p2Pool's hashrate and either pool luck or orphaned block production.

6. Pool luck and orphaned blocks correlation
If you look at the chart of "Comparison of pool luck, orphaned blocks and hashrate", there appears to be a correlation in the cumulative orphans per round and mean shares per round / D from about block 500 onward. The chart below show boxplots of shares per round / D, coloured for the number of orphaned blocks they contain. The darker boxplots do seem to have a slightly higher median than the lighter boxplots. Is it significant? If so what can it mean? Are the orphaned blocks increasing shares per round / D significantly?

This might be possible if the shares from an orphaned block were added to the shares of the next block, but p2Pool doesn't do this and I have used the data this way - shares per round / D is based on all solved blocks, orphaned or not.

So, is there are direct correlation between the two? I don't think so. Apart from the fact that I can't think of any mechanism that could cause both orphans and shares per round / D to increase, I think it's more likely that these increases are both correlated to time periods when something happened to p2Pool - for example a client update.

This brings us to our final chart for this post - cumulative mean shares per round / D and cumulative orphans per block, as a function of the date.

So far, I have been able to determine the following:

By early July, most of the p2Pool miners had changed from versions 1 and 2 to version 3.
By the end of September most of the p2Pool miners had changed from versions 3 and 4 to version 5.

Both of these are marked by thick vertical white lines on the plot. Unfortunately for the hypothesis that different versions could have caused more orphans and poorer pool luck, no change in client version occurred in mid August, when the cumulative pool luck ceased increasing and orphaned block production began to rise, and we still have no mechanism to explain the phenomena.

7. Discussion and conclusions
It is quite clear from this analysis that p2Pool's run of bad luck - in terms of difficulty 1 shares per round / D and orphaned block production - is over, and I hope I've dispelled the myth that an increased pool hashrate increase orphans and kills the pool's luck. I'll leave interested readers to provide an explanation of the odd correlation between orphaned block production, shares per round / D and certain time periods - please do comment below if you have any suggestions yourself.

All of the sample statistics fit well in the expected range, and we can't reject the hypothesis that p2Pool's shares per round / D are exponentially distributed.
There is no correlation between pool luck and pool hashrate, or orphaned block production and pool hashrate, as some have suggested.
Since solved block 500, the cumulative mean shares per round / D and the cumulative orphans produced per block seem to have both increased and decreased at the same time.
My guess is that the change in cumulative mean shares per round / D is a coincidence, and the change in orphaned block production is related somehow to changes in the p2Pool client.

I hope this post goes some way to encouraging more miners to test p2Pool for themselves.

Donations help give me the time to analyse bitcoin mining related issues and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:

12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Neighbourhood Pool Watch

Pages

Wednesday, 21 November 2012

5.2 p2Pool: Achieving expectation.

No comments:

Post a Comment