Pages

Wednesday, 20 June 2012

5.1 p2Pool - bad luck or flawed?






Please note:
This is old data - please do not use this post to inform yourself of the current p2Pool situation. The pool's luck is much better since this post, and p2Pool miners are earning as expected. A more recent update is 5.2 P2Pool: achieving expectation











Pool:  p2Pool
Pool op:  No pool operator, although some pools provide access to p2Pool.
Payout method: PPLNS variant
Hoppable? No.
Fee: Donation only.
Current pool hashrate: 250 Ghps
Availability of public statistics: 
  • All rounds since the start of the pool are available in JSON at 
    http://p2pool.info/blocks.
  • Local and global statistics are provided as part of the p2Pool client.
  • p2pool.info has many interesting charts that are worthwhile investigating if you're interested in how the pool compares to other pooled mining methods.
Accuracy of public statistics: There are some inaccuracies which when averaged over all rounds are probably not very important. More details here.
Are public statistics likely to be reflecting actual data? Probably
Public perception of pool: Generally good.

Should I mine here right now? Yes, although you might not earn quite so much as on other pools, for reasons discussed below.

0. Introduction

p2Pool is a decentralised method of pooling shares with other miners to solve a block. If the hashrate rises to more than 51% of the network hashrate, it cannot be used to perform a "51% attack" (which could be used to fork the chain block and create a "double spend". Because of this, p2Pool has many adherents and supporters, some donating significant sums to the pool in order to increase average miner earnings.

Another advantage of p2Pool is that, being decentralised, it is DDoS proof, which means that on college holidays the DDoS attacks of the larger pools which happen at this time cannot be aimed at p2Pool or lead to pool downtime.

In order to use p2Pool, a miner is directed to a p2Pool node which must be run locally. A bitcoin node must all be running. The method by which p2Pool propagates shares to solve blocks is quite clever, and is similar to the way bitcoin solved blocks are propagated. While attempting to solve a block, each miner submits a share which is added to the share chain and is used to determine which miner's shares have been accepted in solving the block. Once a block is solved, the local bitcoin node announces the block to the rest of the network and the orphan block race begins as the block propagates.

In the bitcoin network, new blocks are solved every 10 minutes or so. In order to reduce payout variance lower difficulty shares are accepted. However if difficulty 1 shares were accepted as occurs in most non-decentralised pools, a long poll would occur much too frequently and the rate of stale shares would be much too high. So the share difficulty is much more dynamic than in the bitcoin network and difficulty is set so that one share is accepted every 10 seconds.

Since an increase in difficulty leads to higher variance, this may limit the maximum acceptable hashrate on any p2Pool group (of which there is currently only one) although the difficulty at which shares are created can be changed locally so that miners with a large hashrate will affect the overall difficulty less (at a cost of more variance to them). It also means increased variance over that expected at centralised pools using difficulty 1 shares, and this is most felt by miners with lower hashrates. 

Payouts are based on a hybrid PPLNS (Pay Per Last N Shares) scheme, and so cannot be mined strategically for greater expected share value - the expectation of a submitted share is always B/D (where B is the bitcoin reward and D is the current network difficulty. The last n shares accepted are either the last 8640 shares (which at 10 seconds per share is 24 hours of pool mining) or the total amount of shares expected to solve three blocks, whichever is smaller. 

Although a "withholding attack" cannot be used to gain a block reward without sharing, it could be used to withhold the block completely and prevent it from being announced. In order to provide a disincentive to the attacker, a subsidy of 0.5% is sent to the node that solved the block. For more details: https://en.bitcoin.it/wiki/P2Pool

There have been concerns by miners on the bitcointalk.org forum thread for p2Pool that there has been significantly poor luck at the pool for an extended period, and that the number of orphaned blocks created by the pool is higher than it should be. In this post we will investigate the published pool statistics generally, and then specifically with regard to these two concerns.

1. Pool Hashrate

Average pool hashrate was quite low until January this year, when p2Pool start to gain new users. Its hashrate rapidly increased to a maximum of 350 Ghps in April, since when it has rapidly decreased to a current level of 250 Ghps.



Average pool hashrate as a function of round length should show no correlation if the payout method cannot be mined strategically. This is clearly the case as demonstrated below. 




Compare this to the change in hashrate with round length experienced at DeepBit in March this year (note that DeepBit seems to be 'pool hopped' much less now than was the case then, with some 'anti hopper' measures in place to reduce the hashrate increase due to strategic miners). More detail on strategic mining at DeepBit here.


2. Do p2Pool's round lengths appear geometrically distributed?
Note: Data used in this post is available here as a .csv file. Each row contains: the block height, a Unix time timestamp indicating when a block was solved; the total round shares in the block; the duration of the round in seconds; 
total round shares divided by D; and D the Bitcoin difficulty (D) at the time.

Since difficulty is dynamic for p2Pool, the comparable number of difficulty 1 shares that have been submitted to the share chain are estimated based on the pool hashrate. The actual number of shares submitted to the share chain are not used in the calculations below.


As mentioned in previous posts, pooled mining round lengths should be geometrically distributed. Below are a comparison between equivalent difficulty 1 shares submitted by p2Pool miners per round and simulated data for:
  • Chronological ordering of total round shares as a fraction of D.
  • Boxplots of total round shares as a fraction of D, grouped by difficulty period.
  • A rolling mean of total round shares as a fraction of D.

Neither chart show an obvious variation from the simulated rounds. There is a significant amount of variation in first five box plots in both cases, due to the few numbers of blocks solved per difficulty period. 




The one hundred block rolling mean of round length / Difficulty data shows a significant variation from what we would expect. The rolling mean is rarely below the blue broken line which indicates the expected round length. This means that the round length is generally much greater than expected. The red broken line indicates the actual mean value of round lengths as a fraction of difficulty.

 3. Theoretical comparisons - quantiles and empirical cumulative distribution function.

As discussed in previous posts, the cumulative distribution function (CDF) of Bitcoin pooled mining round lengths describes the probability that a round will be solved after a given number of shares have been contributed. For example, a large probability means that most rounds will be solved when a large number of shares are submitted. Since shorter rounds are more common, we expect the CDF to rise rapidly, and then level out and approach 1 at larger round lengths.

The comparison plot below on the left is a QQ plot  (quantile-quantile) to compare the quantiles from p2Pool's round length distribution and what that distribution should be theoretically, using the exponential distribution approximation.

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from p2Pool round length data, and the theoretical quantiles are calculated using the number of rounds in the dataset, so that there are the same number of theoretical as empirical data, with the same cumulative probabilities. The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x. 

The comparison plot below on the right shows the difference between the theoretical CDF for geometrically distributed variables and the empirical CDF (eCDF) for p2Pool round lengths. The ecdf is is defined as:
ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size


Both plots show that the round length data from p2Pool does vary significantly from those expected from a geometric distribution. The QQ plot and QQ line show that although the data does seem to be geometrically distributed, the mean is greater than the expected y=x, and is closer to y=1.104x

The eCDF curves "Theoretical 1" and "Theoretical 2" are the CDF curves for mean=D and mean = 1.104D respectively. The eCDF clearly shows a very good match for "Theoretical 2" implying that the true mean for the round length distributions is better described by the mean round length of 1.104D.

4. Are p2Pool's round lengths likely to be geometrically distributed?

We can assess the comparison between the eCDF and the theoretical CDF more rigorously using the Kolmogorov-Smirnov testIn this case:

If:
H0 = p2Pool's round lengths belong to a geometric distribution,
HA = 
p2Pool's round lengths do not belong to a geometric distribution,
Then:

p value = 0.3303
This means we cannot reject H0 as a hypothesis, and provides evidence that p2Pool's round lengths are distributed geometrically.

5. Theoretical comparisons.
Mean, median, variance, skewness and kurtosis are statistical parameters that we have previously used to analyse the distribution to which a pool's round length random variable belongs. The theoretical geometric distribution statistics and experimental parameters are shown in the table below.

"Theoretical 1" and "Theoretical 2" are the theoretical parameters when the round length means are D and 1.104D respectively. The upper and lower bound for the mean, median, and standard deviation are the 95% confidence intervals for the sample statistics. For the skewness and kurtosis the lower and upper bounds are the bias-corrected and accelerated 95% confidence intervals as per D. Wright & J. Herrington.



The confidence intervals for mean, median and standard deviation exclude the theoretical population mean, median, and standard deviation parameters of a geometric distribution with mean = D with a confidence of 95%, but match those of a geometric distribution with mean = 1.104D quite well. Kurtosis and skewness are quite close to the expected values, and support the conclusion that the rounds are indeed geometrically distributed.

6. Orphaned blocks
Orphaned blocks are valid blocks that have not been selected by a majority of nodes as part of the main block chain. This is usually because the same block has been solved at a similar time by another miner, propagated more quickly and so was accepted by a majority of nodes first. Blocks can also be orphaned as the result of a "51% attack", but that is not a likely reason for a pool producing more orphaned blocks than it should. 

Orphan production over the life of the pool is about 3.18%. This may be high, as from anecdotal evidence it seems that the usual rate of orphaned block production is about 1.5%. A definitive answer to the expected  mean orphaned block creation frequency and to what probability distribution the orphaned block creation frequency belongs has, to my knowledge, not yet been addressed. However, to determine if orphaned blocks have been increasing at p2Pool, we need only determine if there has been an increase in the rate of orphan production.

This has been done below. The first chart shows the occurrence of orphaned blocks (red) and the cumulative sum of orphaned blocks (black), per block solved for the life of the pool. It does appear that the rate of orphan production has increased during the last half of the block production,  although it must be kept in mind that with such a small sample this apparent increase may be due at least in part to the variance of the underlying probability distribution, rather than to a long term increase in orphan production.

Since an increase in hashrate may be the cause of orphan production, the lower two charts compare the two hundred block rolling mean of orphan production and hashrate. The mean orphan production rate increases linearly during the 200 block rolling mean, where the hashrate declines after April. It is unlikely that the pool hashrate is having an effect on the rate of orphan production at p2Pool.




7. Conclusions

  • Although p2Pool's round lengths appear normally distributed, they have a mean value of 1.104D, and are greater than the expected value of D with 95% confidence. This is could be very "bad luck" but we cannot discount the possibility of the greater round lengths being inherent in the pool.  However, as Meni Rosenfeld points out there is no known mechanism by which this could occur.
  • A possibility is that the mechanism by which the equivalent number of D 1 shares are estimated could be faulty. This could mean the actual round lengths could be closer to the expected round lengths, and miner payouts would not have been much less (or any less) than expected.
  • Orphan production seems to have been increasing,  although with a small sample size this conclusion may be erroneous.
  • Orphan production rate does not appear to correlate with changes in P2pool hashrate.
  • If the usual orphan rate at other pools is 1.5%, and the mean round length remains approximately 1.1D, then miners will earn 11% less here than their expected share values, with 95% confidence.
  • If the orphan rate is normal, then miners are earning approximately 9% less than expected.
  • Variance is significantly larger than for pooled mining using difficulty 1 shares, but much less than for solo mining.
  • Donations to p2Pool over the past 5 months 131.13 btc, increasing miner earnings by an average of 0.52% 
  • Overall, miner share values have been either 8.5% or 10.5% less than expected.
  • Given the decentralised nature of the pool, some proponents feel that this is an acceptable price. 
  • DDoS attacks can also lead to downtime and loss of income for miners. This will not occur at p2Pool.
  • The concept of p2Pool is laudable, and I hope it continues to be developed and the round length problem solved. 



Afterword
I've had requested some form of email notification for new posts. Although I can't do this through blogger, I always announce these posts at https://bitcointalk.org/index.php?topic=66026.0. If you  click on "Notify" you'll receive email notifications to the address you have registered with bitcointalk.org (you must join bitcointalk.org to receive notification emails).

Donations help give me the time to investigate pools and write these posts. If you enjoy or find them helpful, please consider a small bitcoin donation:
12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

No comments:

Post a Comment

Comments are switched off until the current spam storm ends.