Neighbourhood Pool Watch: 15.1 Gigavps and Liquidbits Semi-Private Mining Pool

21st August, 2013

Pool: Gigavps and Liquidbits Semi-Private Mining Pool
Pool ops: The pool is a joint venture between forum member gigavps (real name James Gibson) and Liquidbits.
Publicly identifiable pool operator: Yes.
Coinbase signature: No signature.

Payout method: PPLNS, shift based. There are 10 shifts and each shift is of 0.5 x Mining Difficulty, so N=5. This is a low variance N.
Exploitable reward method? No.

Variable difficulty: Yes, set to 12 shares per minute, with a minimum of D4.

Provision for local work: Stratum.

ASIC ready? Yes.

Fee: 0 %

Merged mining: No

Transaction fees: Paid to miners.

Current pool hashrate: 35.3 Thps

Availability of pool statistics: Available only to pool members. Note that a worker API is being developed and should be available soon.

Accuracy of public statistics: Accurate - no outlier data detected.
Are public statistics likely to be reflecting actual data? Yes.
Public perception of pool and pool op: Good.

How to join: Apply to gigavps by private message on bitcointalk.org

Should I mine here right now? Yes.

DISCLAIMER: This is a commissioned pool audit, similar to the one I provided for Bitminter.com. If you have concerns about my impartiality, you can check the dataset and calculate the results for yourself.

0. Introduction

Earlier this year I was asked by gigavps to provide some input into some of the statistical charts for his pool, and also provide an audit for him to ensure the pool was running as it should. The pool had some bad luck but within an expected range, and the charts turned out quite well, and show at a glance how the pool's "luck" has been.

It's six months later now, and the pool has since become advertised on the bitcointalk forum, has become one of the five highest hashrate mining pools, and all of that hashrate seems to be ASIC - the minimum hashrate on the pool is ~ 8 Ghps. Recently gigavps once again requested an audit - although a pool audit can only point to errors in the data, it can provide prospective miners with the assurance that a pool operator is honest.

I've previously thought it might be useful to allow pool operators the opportunity to explain why they started the pool and how they think it differs from other pools, and gigavps was kind enough to provide the following:

"The semi-private mining pool was born out of the fact that downtime is the killer of mining revenue. The participants of the pool do not share the URL to help protect it from unwanted burdens like DDOS attacks. The pool also runs on redundant infrastructure to add another level of protection from downtime. All of this combines to create, what I feel is one of the most reliable pools around. For me, this reliability is worth more than just about any other aspect of a mining pool."

The concept of a private mining pool is interesting and has much to recommend it. Although a private URL suffers from the same problems as any method that relies on "security through obscurity", the bar for potential attackers is set much higher and there are easier targets for their vandalism.

Another advantage is that the pool operator can veto any new member and if high hashrate experienced miners are the core members of the pool then more time can be spent by the pool op solving problems rather than coaching new miners and explaining to them the vagaries of pool "luck". This might sound like elitism and in a sense it is, however there are many other options for new miners and I'm sure the members of gigavps' pool appreciate the fact he has more time to devote to the pool. Fewer miners with more experience means less time reiterating previous explanations.

1. Pool hashrate
As can be seen from the chart below, after advertising on the bitcointalk forum in June the pool's hashrate has an average increased linearly, although there have been several jumps as large hashrate miners have joined the pool. This means that since mid July when the network hashrate started its current exponential rate of increase the pool has managed to hold ~ 8 % of the network. This success can be attributable at least in part to using an "invite only" system - any newly joined miners are likely to increase the pool's hashrate significantly.

2. Income per block

The table below shows the actual pool rewards from the Semi-Private Mining Pool. The expected, lower and upper bounds are for a comparison "population pool": a fee-free pool that has had the same diff1 (share submission difficulty 1) equivalent shares / mining Difficulty. Since many pools still don't offer a transaction fee, the population pool also does not.

The Semi-Private Mining Pool has had slightly worse "luck" than average - for the same shares submitted at the same difficulty, 940 blocks would be expected rather than the 921 blocks the pool solved. However 98.0% of the expected number of blocks is still well within the 95% confidence range of 93.7% to 106.4% of the expected number of blocks. This means also that the block reward payments are also in the acceptable range.

The number of orphaned blocks are similar to those for other pools at ~ 1.4% (1.5% seems to be usual). Interestingly, the transaction fees almost make up for the lower than average payment, bringing the result to 226 btc of the expected amount. This is something miners should notice - over time transaction fees mount. For the Semi-Private Mining Pool, this has averaged to an extra ~ 1%. If the pools' luck had been slightly better, that could have almost made up for the orphaned blocks as well.

Since there is no fee at this pool, miners here probably ended up earning more than they could have elsewhere, except for pools that also merged mine as well as pay transactions fees.

3. Shares per round statistics

All data for this post are available as a .csv file: The Semi-Private Mining Pool data.

I explained in the BitMinter post why shares per round are divided by the difficulty prior to analysis. I'll try to make time to write a post on this topic alone, but for now I'll repeat the introduction I provided then. If you understand the analysis already, skip the italicised portion below.

"...... Without going into much detail, when you mine bitcoin (or namecoin), your GPU / FPGA / ASIC is doing something rather simple - it's just creating a hash of some data. Assuming you're not solo-mining, if the hashes numerical value is sufficiently small your mining pool accepts the hash as a proof-of-work, or share. This share may not solve the block on which the pool is working, but it proves you are making the attempt to solve a block.
Share reward methods are various. Some pay the expected amount per share, for every share submitted (Pay Per Share and its variants). For others, such as Pay Per Last N Shares and the Double Geometric Method, the amount awarded to miners is affected by the number of shares it takes to solve a block. The more shares a block requires, the less each share is rewarded.
Profitability is determined to a great extent by the number of shares per round, and this pool statistic is of great interest to a pool's users. Since each share has the same probability of solving a block, the process of block solving results in shares per round that are geometrically distributed random variables. At very small probabilities such as we now have, dividing the number of shares per round by the network difficulty at which the block was mined results in a random variable that is a good approximation of an exponentially distributed random variable.
Why approximate using shares per round / difficulty? The geometric distribution is a special case of the negative binomial distribution. In the case of pooled bitcoin mining, the number of shares required to solve a number of blocks is a negative binomial random variable, so this could be used to determine the probability of a number of shares solving a number of blocks. However since difficulty changes every 2016 blocks only subsets of data can be analysed in this way.
In the same way that the exponential distribution is a good approximation of shares per round / difficulty for one solved block, the Erlang distribution is a good approximation for the average shares per round / difficulty over an arbitrary number of rounds. Since difficulty changes are no longer a factor, the probability of the average shares per round / difficulty for any time period can be calculated.
The cumulative distribution function (CDF) describes the probability that a random variable will have a value greater (upper tail probability) or less than (lower tail probability) a particular limit. The Erlang distribution CDF in the tables below is the lower tail CDF. If you subtract this value from 1 you have calculated the probability that the average shares per round / Difficulty would be equal to or greater than the data."

The table below shows:

The cumulative distribution function of the average shares per round / difficulty for the entire pool history.
Results of the Anderson Darling test, with the null hypothesis being that shares per round / difficulty for each block is exponentially distributed, and the p value level that indicates the data is in fact exponentially distributed.
The maximum length round, the expected maximum length, and the 95% confidence intervals.
The mean, median, standard deviation, skewness and kurtosis of shares per round / difficulty for each block, along with the expected values and the 95% confidence interval upper and lower bounds (confidence intervals for skewness and kurtosis have been estimated using simulated data).

The charts below it are a visualisation of the same data, the blue dot indicating actual data, and the red lines indicating the expected result and the 95% confidence interval.

The CDF indicates that for any run of 921 solved blocks, 25% of them would be longer (less "lucky"), and the Anderson-Darling test suggests the shares/mining Difficulty data is exponentially distributed. The longest round so far ( 6.49 x D ) is less than expected for 921 solved blocks, and the average round length a little more than expected (hence the slightly lower payout than expected). All quite unremarkable.

Summary: All of these results are inside the 95% confidence interval for exponentially distributed random variables.

It is possible that although the combined data over the history of the pool is distributed exponentially, the possibility some parts of the data not being distributed exponentially cannot be discounted, specifically:

i. Chronological periods during which the data has not been distributed as expected; and

ii. Specific round lengths which deviate from the normal distribution.

The former can affect a miners earnings for the periods during which significantly more or significantly fewer than expected shares were required to solve a round. The latter is only necessary to be analysed if the former is shown to be abnormal, but I include the analysis for completeness.

4. Chronological analysis

There are a few ways that changes in the data can be detected over the history of the pool. What follows is an analysis of changes in the average reward per rescaled share (a share rescaled by multiplying it by the share submission difficulty and dividing by the mining Difficulty, as mentioned previously), the median / percentile rescaled share per round for groups of blocks, the cumulative mean rescaled shares per round, and the average number of orphaned blocks.

4.1 Chronological changes in average reward per rescaled share for groups of 20 blocks.

The number of shares per round / difficulty (rescaled shares per round) determines to a greater or lesser extent the amount each share was worth, and the reward method itself being the other significant determinant.

For PPLNS, the amount paid out per block is the block reward plus the transaction fees. The number of blocks expected from ( 20 x mining difficulty ) rescaled shares is a Poisson distributed random variable with a mean of 20 and a variance of 20. If this is applied to the amount paid to miners each block, the mean and variance become 500 BTC.

For those interested, the confidence interval for the chart below has been determined such that no more than one bar would be expected to exceed either the lower or upper confidence interval bound. This is done by first calculating alpha = 1 / (number of groups of blocks). The confidence interval is then from alpha/2 to 1 - alpha/2, and the confidence interval is 1 - alpha. In this case the number of groups is 47, so the confidence interval is 97.9% with a lower bound of 1.06% and an upper bound of 98.9%. These lower and upper bound for a Poisson distributed random variable with mean 20 (number of blocks per group) is 10 to 31 blocks, so the reward confidence interval is 250 to 775 BTC per group.

The transaction fees have not been removed, however this is small compared to the amount of variance encountered for rescaled shares per round in 20 blocks.

Only one group exceeds the confidence interval, as expected, and it exceeds the bound by only one block reward. The pool seems to have had some poor luck early on, partly compensated by a very lucky period immediately afterward. Since then, the changes in luck have been significantly less extreme. There don't appear to have been any significant changes in reward payment per rescaled share for the 20 block groups.

4.2 Chronological changes in the median and percentile rescaled share per round between retargets.

The boxplots below group rescaled shares per round between retargets and show relationships between their quartiles (25%, median and 75%). The aim is to give a visual analysis of the median, percentiles and range of the data, and usually each boxplot would contain the same number of datapoints. However miners remember luck in periods of time rather than in periods of blocks, so I think it's best to represent each time period chronologically. An arbitrary time interval could have chosen, but my personal preference is to group by retarget. The last box is from the last retarget, and so does not cover the full amount of time between retargets, and hence has fewer blocks in it.

Boxplots group data, and show relationships between their quartiles (25%, median and 75%). Outliers are the black dots. The theoretical median value of shares per round / Difficulty is log(2) ( ~ 0.693 ) , shown on the boxplot chart below as the red line. The central line of each boxplot is the median for that group, and is expected to be log(2) - so the central line of each boxplot should be near the red line. The inner dashed red lines are the 25% and 75% quantiles, so the tops and bottoms of each box would be expected to line up with these.

The outer dashed red lines are the 2.5% and 97.5% quantiles, and the more blocks in a box, the more are expected to exceed these bounds.

The quantiles of the grouped data approach the expected values more closely as the amount of data in each group increases, as expected for exponentially distributed data. There are series of groups which consistently differ from the expected values.

4.3 Cumulative mean shares per round / mining Difficulty.

Another simple and effective way to judge the changes over the history of the pool is to chart the changes in cumulative mean shares per round / mining Difficulty. This variable is the one that would affect miners the most since it directly affects their rewards per share, and the confidence intervals are Erlang distributed. The cumulative mean is very clearly well inside the confidence interval at all times since the start of the pool.

4.4 Cumulative mean orphaned blocks.

Typically, all else being equal 1.5% orphaned blocks is a usual amount, often varying from 1 to 2 % for established pools. Early on for a pool the cumulative average percentage of orphaned blocks can be quite variable, but as can be seen below after a few hundred rounds this will settle to a consistent amount - in the case of the Semi Private Mining Pool, that's about 1 to 1.5%.

While I suspect the number of orphaned blocks appearing during an arbitrary number of rounds can be modelled reasonably well by the Poisson distribution (the mean and variance are both 0.014), I haven't yet proved it to my satisfaction and so i have not included confidence intervals for the data. However, if that model is assumed to be true then the 95% confidence interval for the number of orphaned blocks per 100 rounds with an average rate of 1.5% is between zero and four orphaned blocks per 100 rounds. So while the early high percentage of orphaned blocks is unusual, it's not unlikely.

4.5 The maximum length round

As the number of solved blocks increases, the probability of a very large number of diff1 equivalent shares solving a block increases. It is possible to compare the pool's longest round with the expectation, median and 95% confidence interval of the longest round. At some point I'll post an explanation of this, however the distribution and the expectation of the maximum of an exponential distribution are both well known and a bit of googling should provide you with answers in the meantime.

A single very long round is not usually a point of concern for most miners. It is also not a very useful diagnostic tool compared to other methods. However it is useful to illustrate maximums that have occurred since a really long round could kill a pool if enough miners can leave because of it. Knowing what an expected maximum could be might help miners keep their cool.

In the chart below I've plotted the rescaled shares per round compared to the expectation (solid red line) median (dashed red line) and 95% confidence interval (shaded red area) of the longest round. Although the longest round occurs quite earlier, it is within the 95% confidence interval and, since it is still the longest round, is now well below the mean and the median for the longest round.

5. Analysis of specific round lengths and probabilities

These charts would help detect any abnormalities in The Semi-Private Mining Pool's rescaled shares per round for each solved block if there were any to be found ( such as was the case for Bitclockers and BitcoinPool ). They are very useful if the initial investigation suggests data is abnormal, less so when results suggest data is distributed as expected. I've included it here for completeness.

The first plot below is a QQ plot (quantile-quantile) which compares the quantiles from The Semi-Private Mining Pool's distribution of rescaled shares per round and what that distribution should be theoretically.

Quantiles are points taken at regular intervals from the cumulative distribution function (CDF). The empirical quantiles are from The Semi-Private Mining Pool's rescaled shares per round and the theoretical quantiles for the exponentially distribution are calculated using the number of rounds in the dataset. In this way there are the same number of theoretical as empirical data with the same cumulative probabilities. The empirical quantiles are then plotted as a function of theoretical quantiles, and the relationship here should be y = x. The red line shows the average relationship between the middle 50% of the empirical and theoretical quantiles.

The relationship between the theoretical and empirical quantiles is approximately 1:1, as expected, and there is no significant deviation from the relationship.

The next plot below on the right shows the difference between the theoretical CDF for an exponentially distributed variable ( the probability that a round will be solved after a given number of shares/difficulty) and the empirical CDF (or eCDF) for The Semi-Private Mining Pool's rescaled shares per round.

The ecdf is defined as:

ecdf = n/max(n) where 'n' is the nth datapoint in a set of data ordered by size

The eCDF is lower than the CDF at approximately 0.5 to 1.5 rescaled shares per round, and is a little higher at about 3 rescaled shares per round. This means that rounds lasting 0.5 to 1.5 x Difficulty diff1 shares are slightly under-represented and some longer rounds over-represented. It's hard to tell from the CDF exactly how unusual this is, but the theoretical CDF is still within the 95% confidence interval for the eCDF so its unlikely to be very unusual. The final plot below will help determine just how unusual the data is.

To gain a clearer appreciation of which number of rescaled shares per round are most affected, a histogram of the data can be used. Instead of assigning the rescaled shares per round to bins determined by size, we can instead assign different lengths of shares per round / D to equi-probable bins. This means that each bin of shares are just as likely to occur as each other - the probability of shares per round / D from 0.11 to 0.22 is the same as from 1.61 to 2.30.

Since each bin is just as likely to occur as each other, confidence intervals can be estimated as a Poisson probability. If there are a total of 10 bins, the expected number of blocks in each bin is the total number of solved blocks (921) divided by the number of bins (10) = 92.1. As previously, the confidence interval is based on the number of data points. Here there are ten bins, so alpha = 0.05, and the upper and lower limits are 5% and 95%, for a 90% confidence interval. The Poisson probability quantiles the define the upper and lower limit of the 90% confidence interval are 77 and 108 blocks per bin respectively.

Blocks of size 1.20 to 1.61 rescaled shares are slightly greater than the upper bound, and 0.36 to 0.51 slightly lower than the lower bound. In this case, two bins slightly outside the expected interval is not a matter for concern.

6. Summary

Although the Semi-Private Mining Pool has slightly worse luck than usual over the pool's history, all data are within acceptable ranges and miners can be confident that the pool operator is running the pool honestly.
The pool has a similar percentage of orphaned blocks to the expected percentage of orphaned blocks.

Neighbourhood Pool Watch

Pages

Thursday, 22 August 2013

15.1 Gigavps and Liquidbits Semi-Private Mining Pool

No comments:

Post a Comment