Pages

Friday, 10 May 2013

13.1 BitMinter and "luck"

9th May, 2013

Pool: BitMinter, Bitcointalk forum thread
Pool op: forum member DrHaribo (real name Geir Harald Hansen)
Publicly identifiable pool operator: Yes.


Payout method: PPLNS, shift based and currently 10 shifts of 0.4 x Difficulty D1 shares.
Hoppable? No.

Variable difficulty: Yes, set to 20 shares per minute.
Provision for local work: Yes, getblocktemplate (GBT) and Stratum.
ASIC ready? Yes.

Fee: 1 % since 20th April 2013. Extras ("perks") available for donators.
Merged mining: Yes, NMC.
Transaction fees: Paid to miners.

Current pool hashrate: 13.1 Thps

Availability of public statistics: 

Quite thorough, lots of statistics, charts, and a public API.

Tables:
Charts:

Accuracy of public statistics: Accurate - no outlier data detected.
Are public statistics likely to be reflecting actual data? Yes.
Public perception of pool and pool op: Good.

Should I mine here right now? Yes.


0. Introduction
I have several reasons for wanting to write a post on BitMinter:
  1. Recently, there have been claims that BitMinter's bad luck in recent months was unusual and perhaps the result of a bug (starts here, goes on for pages). I performed the usual analyses and found that the average pool shares per round / D was within a 95% confidence interval - for the entire pool history and also for the last two months. However, simply posting something to this effect is asking for disbelief, so I decided I wanted to write a post detailing all results.
  2. Just after this, Geir Harald Hansen (DrHaribo) approached and commissioned me to perform an audit on the pool's block history data. This was quite handy, since I'd already thought about doing a post and had much of the groundwork already completed.
  3. One of BitMinter's claims is that it pays more than any other large pool - it pays the transaction fees associated with solving a block to miners, provides merged mining of namecoin and has only a 1% fee, although donations are encouraged.

1. Pool hashrate

The pool hashrate is an important factor in the variance experienced by a pool's users. The greater the hashrate, the more blocks are solved. Any given miner will have a smaller proportion of the reward, but rewards will arrive more frequently. Smaller more regular rewards means reduced variance. This means many miners prefer pools that have a larger proportion of the network.

The pool hashrates in the chart below are the average hashrates per round. The pool's hashrate has been between 4% and 9% of the network hashrate from August last year until just recently when ASICMiner added 4Thps to the pool, increasing the proportion of the network hashrate to ~ 12%. The addition of new ASIC mining tools to the pool accounts for the first recent surge in hashrate, and ASICMiner accounts for the second more recent large increase.




2. BitMinter's average shares per round / difficulty vs population statistics

All bitcoin data for this post are available as a .csv file: BitMinter BTC round history. All the results in this post can be confirmed using that data. If you're not sure how to do that yourself, post a comment and I'll try help.

Since I hope at least some readers will be the new miners at BitMinter who were concerned about the pool's "luck", I'll explain the below tables in a bit more detail than usual. If you already know what you're looking at, you can skip the next few paragraphs.

Without going into much detail, when you mine bitcoin (or namecoin), your GPU / FPGA / ASIC is doing something rather simple - it's just creating a hash of some data. Assuming you're not solo-mining, if the hashes numerical value is sufficiently small your mining pool accepts the hash as a proof-of-work, or share. This share may not solve the block on which the pool is working, but it proves you are making the attempt to solve a block. 

Share reward methods are various. Some pay the expected amount per share, for every share submitted (Pay Per Share and its variants). For others, such as Pay Per Last N Shares (which BitMinter uses), and the Double Geometric Method, the amount awarded to miners is affected by the number of shares it takes to solve a block. The more shares a block requires, the less each share is rewarded.

Profitability is determined to a great extent by the number of shares per round, and this pool statistic is of great interest to a pool's users. Since each share has the same probability of solving a block, the process of block solving results in shares per round that are geometrically distributed random variables. At very small probabilities such as we now have, dividing the number of shares per round by the network difficulty at which the block was mined results in a random variable that is a good approximation of an exponentially distributed random variable.

Why approximate using shares per round / difficulty? The geometric distribution is a special case of the negative binomial distribution. In the case of pooled bitcoin mining, the number of shares required to solve a number of blocks is a negative binomial random variable, so this could be used to determine the probability of a number of shares solving a number of blocks. However since difficulty changes every 2016 blocks only subsets of data can be analysed in this way.

In the same way that the exponential distribution is a good approximation of shares per round / difficulty for one solved block, the Erlang distribution is a good approximation for the average shares per round / difficulty over an arbitrary number of rounds. Since difficulty changes are no longer a factor, the probability of the average shares per round / difficulty for any time period can be calculated.

The cumulative distribution function (CDF) describes the probability that a random variable will have a value greater (upper tail probability) or less than (lower tail probability) a particular limit. The Erlang distribution CDF in the tables below is the lower tail CDF. If you subtract this value from 1 you have calculated the probability that the average shares per round / Difficulty would be equal to or greater than the data.

The tables below report:
  • The cumulative distribution function of the average shares per round / difficulty for the entire pool history (table 1) and for the last two months (table 2), and the 95% confidence interval upper and lower bounds.
  • Results of the Anderson Darling test, with the null hypothesis being that shares per round / difficulty for each block is exponentially distributed, and the p value bounds that indicate the data is in fact exponentially distributed.
  • The mean, median, standard deviation, skewness and kurtosis of  shares per round / difficulty for each block, along with the expected values and the 95% confidence interval upper and lower bounds (confidence intervals for skewness and kurtosis have been estimated using simulated data).



The results for the complete pool history are well inside the expected range - nothing unusual here, and the results are only noteworthy in the way they match the expected values so well. The CDF indicates that for any run of 3283 solved blocks, 38% of them would be longer (less "lucky").

However, some miners are claiming that pool "luck" has been unusually poor only recently, so I also performed the same calculation for BitMinter shares per round / difficulty over the last sixty days:




The CDF indicates that for any run of 610 solved blocks, only 8% of them would be longer (less "lucky") . If this figure was 0.8%, there might be cause for concern, but if one out of every twelve runs of 610 blocks are likely to have the same average or greater, then this isn't very unusual at all.

The median and mean are much closer to their upper bound values, indicating the pool has indeed had poor luck over the last sixty days. However, the results are still well within the 95% confidence interval bounds, and the mean, median, standard deviation, skewness and kurtosis and the  Anderson Darling test result all suggest the shares per round / difficulty per round are exponentially distributed.

In summary: Some recent bad luck but nothing significant and data is distributed as expected.

The chart below shows how the cumulative mean of shares per round / difficulty has changed over time, compared to the Erlang distribution 95% confidence interval. The cumulative mean is very clearly well inside the confidence interval at all times since the start of the pool.



The rolling mean is also quite useful to help visualise shares per round / D variations in time. The 10 block rolling mean confidence interval is are Erlang distributed quantiles with p = 0.025 and 0.975, and the shape parameter (k) = 10 and the rate parameter (lambda) = 1/10. The 100 and 1000 block rolling mean confidence intervals use the same probabilities, with k = 100 lambda = 1/100 and k = 1000 and lambda = 1/1000, respectively. 

Since a 95% confidence interval is being used, we can expect the confidence intervals to exclude roughly 5% of the data. 

10 round rolling mean: 4.8% of the data are excluded, 2.7% greater than the the upper bound and 2.1% lower than the lower bound.
100 round rolling mean: 5.5% of the data are excluded, 4.8% greater than the the upper bound and 0.7% lower than the lower bound.
1000 round rolling mean: 10.7% of the data are excluded, 0% greater than the the upper bound and 10.7% lower than the lower bound.

There's no clear tendency to either good or bad "luck".







3. Quantiles and empirical cumulative distribution function.

These charts would help detect any abnormalities in BitMinter's shares per round / difficulty for each solved block, if there were any to be found (such as was the case for Bitclockers and BitcoinPool). They are very useful if the initial investigation suggests data is abnormal, less so when results suggest data is distributed as expected. I've included it for completeness, and as an opportunity for newer readers to learn some new things, and older readers to brush up on their knowledge. 

The comparison plot below on the left is a QQ plot  (quantile-quantile) to compare the quantiles from BitMinter's distribution of shares per round / D and what that distribution should be theoretically.

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from the BitMinter's shares per round / D data, and the theoretical quantiles (for exponentially distributed variables, using shares per round / D) are calculated using the number of rounds in the dataset, so that there are the same number of theoretical as empirical data with the same cumulative probabilities. The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x. 

The comparison plot below on the right shows the difference between the theoretical CDF for exponentially distributed variable (probability that a round will be solved after a given number of shares/difficulty) and the empirical CDF (eCDF) for BitMinter shares per round / Difficulty. 

The ecdf is defined as:

ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size







In the first QQ plot, the relationship between the theoretical and empirical quantiles is approximately 1:1, as expected. The second QQ plot empirical quantiles are generally larger than the theoretical quantiles after about 1.7 x Difficulty.

The first CDF vs eCDF plot is a little difficult to read since the theoretical CDF overlays the eCDF, and the 95% confidence interval for the eCDF is quite narrow. In the second plot, the theoretical CDF is slightly different to the eCDF, but still within the 95% confidence interval for the data.

In all, these plots don't change the result from section 2 - that over the history of the pool and over the last sixty days, the number of shares it takes to solve a block divided by the difficulty at which the block was solved, is distributed exponentially.


4. Another quantile comparison: Boxplots of shares per round / difficulty.

Boxplots group data and show relationships between their quartiles (25%, median and 75%). Outliers are the black dots. The  theoretical median value of shares per round / Difficulty is log(2) ( ~ 0.693 ) , shown on the boxplot chart below as the red line. The central line of each boxplot is the median for that group, and is expected to be log(2) - so the central line of each boxplot should be near the red line. The dashed red lines are the 25% and 75% quantiles, so the tops and bottoms of each box would be expected to line up with these.

Each boxplot groups the blocks solved between retargets, and boxes re coloured to indicate the number of blocks solved in that period. The last box is from the last retarget up until today, and so does not cover the full amount of time between retargets.

The quantiles of the grouped data approach the expected values more closely as the amount of data in each group increases, as is expected if the BitMinter shares per round / Difficulty are distributed exponentially.

I like this method of depicting shares per round / Difficulty, since it's quite obvious that the more blue the boxes (lower blocks solved between retargets) variance is large, and the more red the boxes the more closely the medians match the expected log(2). The simple lesson is that the greater the share of the network you have, the less variance you will experience.




5. Orphaned blocks

If a pool user activates the "prepay" perk (requires the donation percentage to be set to 1.5%), payment is received immediately a block is solved without waiting for confirmations - you get paid sooner and orphaned blocks are paid. Is this perk worth 1.5%? Importantly for miners who have not activated the perk, are the orphaned blocks at BitMinter especially high? 

The first question is easier to answer than the second. The chart below shows the cumulative percentage of orphaned blocks over time - mostly between just under 1% to just under 1.5%. If you value the immediate payment on solving a block, hate losing shares after orphaned blocks, can't stand variance or just want to donate to BitMinter, then 1.5% is probably a fair price to pay. 

It's hard to tell whether or not the number of orphans at BitMinter is especially low or high since I don't have results for many other pools. It's certainly low compared to some pools.



7. Transaction fees.
Each bitcoin (and namecoin, see below) block solved by the pool has an associated reward, which is 25 btc at the moment. As well, the transaction fees paid by people who have had transactions included in the block are paid to the pool.

Some pools do not pass this on to miners, some do. The chart for this section shows the cumulative transaction fees and cumulative earnings for a 1 Ghps miner from the start of the pool. The large uptick in the transaction fee earnings over the last twelve months is probably due mainly to SatoshiDice and partly to increased general use of bitcoin. 

While transaction fees did not make up a large percentage of earnings when the block reward was 50 btc, since SatoshiDice  causing a significant increase in the number of transactions per block, increased use of bitcoin generally, and the reward halving transaction fees make up a growing part of the rewards a miner can expect. As the chart below shows, at BitMinter transaction fees have been accounting for a growing percentage of miner earnings over the standard block reward. Since the reward halving, transaction fees add up to 4% to the earnings from a block.





8. Merged mining
BitMinter also allows its users to mine namecoin using the same hashes that miners send for solving bitcoin blocks. Although the exchange rate of (currently) ~ 0.01 BTC / NMC doesn't seem too attractive,  the extra does add up. It's simple to calculate how much extra you can expect to earn from merged mining namecoin (excluding namecoin transaction fees).

Expected BTC from merged mining NMC, per BTC block solved:
DB / DN x x R 
Expected % extra from merged mining NMC, per BTC block solved:

DB / DN x x R / B x 100

Where:
DB = bitcoin mining difficulty
DN = namecoin mining difficulty
B = bitcoin mining reward per block
N = namecoin mining reward per block
R = the BTC / NMC exchange rate

Currently, DN2275750.77, DB = 10076292.88, B = 25, N = 50 and R = 0.00838  so for each block a merged mining pool solves, 1.86 btc (or 7.4 %) extra will be earned by the pool and distributed to miners.

I couldn't obtain the recent BTC / NMC exchange rate history, so the charts below are only up until April 1st 2013. Data is directly from the BitMinter BTC and NMC block histories.






9. So, have miners earned significantly less at BitMinter than elsewhere?
The last chart I present illustrates
  • The cumulative amount paid out by the pool compared to the amount that would have been paid had BitMinter been a fee free / 3% / 5% Pay Per Share pool, not including namecoin income.
  • Since concerns have been raised about the pool's performance over the past two months, the second plot shows the cumulative earnings for a 1 Ghps miner over the last sixty days, compared to the expected income if BitMinter's reward method had been fee free / 3% / 5% Pay Per Share.






The last part of the first plot shows that the pool has paid more than fee free PPS to miners over the history of the pool - solely due to transaction fees. The "Actual payment" line starts to move toward the "0% fee PPS" line toward the end, as the 1% fee starts to reduce the extra earned from transaction fees.

The second plot shows that over the last two months miners have earned a bit more or a bit less than they would have for fee free PPS - even with the bad "luck" the pool has had over the last sixty days. If namecoin earnings are included, this means BitMinter users have earned more than PPS over the past two months.



9. Conclusions.
  • BitMinter's "luck" both over the history of the pool and over the last two months is not unusual. 
  • One out of every twelve reruns of the last sixty days' worth of blocks would be likely to have the same luck or worse.
  • While this does not prove that the pool is running normally, there is no evidence that it is not running normally.
  • Shares per round / Difficulty are distributed as expected for pooled bitcoin mining.
  • The claim that miners at BitMinter have consistently earned 10% less than expected over the past couple of months is false.
  • When transaction fee earnings and merged mining are taken into account, earnings are greater than for a fee free PPS pool that doesn't merged mine or include transaction fees. Miners can be confident that even when luck has been poor, they're still earning more than at a PPS pool that doesn't pay transaction fees and doesn't merged mine.
I think there are several reasons that concerns arose about BitMinter's luck. 
  1. There was a large influx of new pool users who have not yet learned all there is to know about pooled bitcoin mining. Many of the more experienced pool users who have had to answer these questions repeatedly have started to avoid answering them, since from previous experience these explanations become very time consuming. Perhaps I'll write a post on "luck" and pooled bitcoin mining  - then new miners could simply be directed to it when the "luck" questions arise.
  2. Many miners seem unaware of the extra income from merged mining and transaction fees.
  3. BitMinter's charts are also part of the problem. In order for miners to intuitively understand whether or not statistics are unusual, confidence intervals must be provided. Without that guide, miners have no way to judge how bad or good "luck" has been over a given period of time. To my knowledge there are no public pools that that provide this data, so BitMinter is not the only pool with this problem. Another smaller issue is that on charts that do not provide time axis dates, miners find it hard to judge over how long a period of time luck has been either good or bad.



Thank you to forum member Phelix, owner of blockchained.com for the NMC price history data, and also to DrHaribo for commissioning me to do something I enjoy.

organofcorti.blogspot.com is a reader supported blog: 12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r





6 comments:

  1. Awesome article.
    Thanks taking the time and effort look into the all the data and making the graphs so that even non maths boffins like myself can understand.

    ReplyDelete
    Replies
    1. Thanks for the feedback - I try to use basic highschool / college level maths as much as I can, but even so there's a bit of a learning curve for most people.

      If anyone has trouble following the post, please post a comment about what it is you don't follow. It will make it easier for me to write a primer post on mining bitcoin and cryptocurrencies based on bitcoin.

      Delete
  2. It's a good in-depth analysis, and thank you for taking the time to do it.

    Whilst a concrete set of numbers and graphs showing "luck" etc over time are always useful, from a miners perspective there's a simpler summary - having tried a few pools and even have p2pool solo mining setup - I've received more bitcoins in 90 days from being part of bitminter for the same processing power than from any other (excluding the fee, and the %age donations for the perks).

    Taking into account the deductions, it's within 0.2% of any other pool payout, and over-time I expect that to be outweighed by the NMC earnings (accumualted but not used at present).

    The only "gripe" is that the web based worker stats aren't always correct - whether there is a bug in the sampling or by hash levels are too low I dont know, but it almost always reports each worker as doing 0.00 Mh/s

    ReplyDelete
    Replies
    1. Thanks, good to get feedback.

      Are the worker stats still wrong?

      Delete
  3. Thanks for this great post, I just started minning a few days ago in Bitminter and found about the bad luck, is good to see is just bad luck. I hope the good luck starts now :)

    ReplyDelete
    Replies
    1. Glad you found the post and that it helped.

      Pool "luck" is the first thing a new miner has questions about. Why do periods of bad luck seem longer than good? (because "bad luck" means more shares which take longer - "good luck" lasts much less time). What submission difficulty should I use? (http://organofcorti.blogspot.com/2012/10/71-variable-pool-difficulty.html). What the hell is CPPSRB? (Capped PPS with Recent Backpay ). Why is my hashrate at the pool so different from my hashrate measured locally? (because it's determined by the number of shares you submit, which is affected by the submission difficulty). ANd that's just for a start ... ;)

      Delete

Comments are switched off until the current spam storm ends.