Neighbourhood Pool Watch: 16.1 The network: Orphaned blocks part 1

Other posts in this series

0. Introduction
Most miners become fascinated with orphaned blocks at some point - What are they? This can be simply answered, and also answered simply. But then the questions escalate (or at least mine did): Why do orphaned blocks occur? Can they be minimised or predicted? Which pool has the fewest? What factors most influence the number of orphans produced by the network? Miners then find very few answers and mostly forget about orphaned blocks (until they lose income to an orphan race), putting them in the same category as forces of nature.

Meni Rosenfeld wrote a very clear and succinct answer to the question "What factors most influence the number of orphans produced by the network?". He writes:

"If the average time to find a block is T, and the typical time for a found block to propagate in the network is t, then the proportion of orphans among all blocks will be roughly 1/(1+T/t). As long as T>t there's not much risk for the network ..."

This was actually in response to the question: "How will a massive increase in hash power affect orphan rates?", but it does provide a clear answer to the factors affecting orphan production: the time to find a block, and the time for a block to propagate. However it doesn't apply to the rate of production of all orphaned blocks - just valid ones.

Definition:

Valid orphaned blocks: those that could have been accepted by the majority of clients.

Invalid orphaned blocks: those that could not have been accepted by the majority of clients

This is an important distinction, since it's only valid orphaned blocks in which I'm interested - I'm hoping to find a nice simple relationship between the number of valid orphaned blocks and factors that might affect them. Invalid orphaned blocks are usually created in response to some change in the client rules once a majority is reached. They can also be created if a block's timestamp is greater than the network adjusted time plus 7200 seconds, or less than the median of the last 11 timestamps. Given these factors, the rate of invalid orphaned blocks is in principal unpredictable (although understandable in retrospect).

So, to continue with Meni's explanation, the time to find a block is reduced below 600 seconds any time the network hashrate increases for more than one retarget (2016 blocks). In this case, mining difficulty does not completely compensate for the increase and the result is an average block solve time of less than 600 seconds.

The time for a block to propagate depends to a great extent on local factors such as connectedness to the network. However if a block is very large, it may be that it will take longer to propagate, so that we would expect valid a orphaned block to be on average larger than the main chain block to which it lost the race.

So I decided to find a way to relate the number of valid orphaned blocks to the block solve rate, and the block size. My main hypothesis is that the valid orphan occurrence rate (with respect to the network block height) is a non-homogenous poisson process (NHPP). This is based on observation and the fact that over short intervals the orphan occurrence rate is indeed consistent with a Poisson process. Plus, this is a falsifiable hypothesis and even if the orphan occurrence rate is not an NHPP, in the process I may still be able to derive a significant relationship between the orphan occurrence rate and the factors that may affect it.

Since the recording of valid orphaned blocks is contingent on the connectivity of the client, the following results are using the longest standing and best connected record of orphaned blocks that I know of, which is at blockchain.info.

1. Some visual comparisons of the orphan occurrence rate and the block solve rate.

A good first step is to always look at the relationships in the data first, and define outliers and data that needs to be removed - in this case, invalid orphaned blocks.

The first chart compares the kernel densities of orphaned blocks to the number of seconds per block solved; the second is a histogram comparison of the percentage of the median number of orphans per retarget to the percentage of the expected number of days per retarget.

Neither of these look very promising - there doesn't seem to be much of a correlation there at all. The two spikes in orphan production look suspicious though, so on to finding and removing the invalid orphaned blocks.

2. Reorgs, BIPs and other strangeness

Plotting the number of blocks between orphan races with respect to the number of orphans (since blockchain.info started recording them at height 142705) provides a nice way to find occurrences of forks and other invalid orphans:

The red dots indicate one orphan per block height, green indicates two orphans per block height and blue three orphans per block height. Some of these may be forks, and we'll look at that later.

Three unusual features of the chart are apparent, and each I've shaded in grey, and I'll take them in reverse order of occurrence (which happens to be the order of interestingness).

The recent chain fork caused the group of orphans in the last grey shaded area just before the 900th orphan event. These blocks are well identified and were easily removed.

The next group, shortly before that, are identified by blockchain.info as orphan races at the following heights:

222562 222563 222564 222565 222566 222567 222568 222569 222570 222571 and 222572

After some sleuthing I found that these blocks were solved by Deepbit - but not as a fork, and weirdly and some of them days after the original. Check it out:

Deepbit orphans recorded by blockchain.info:

Block height Link to Deepbit stats page Date/Time on page

and orphan hash

222562 https://deepbit.net/stats/1361577600 22.02 20:42:46

222563 https://deepbit.net/stats/1361664000 23.02 12:29:02

222564 https://deepbit.net/stats/1361664000 23.02 17:05:04

222565 https://deepbit.net/stats/1361750400 24.02 03:11:42

222566 https://deepbit.net/stats/1361750400 24.02 13:39:40

222567 https://deepbit.net/stats/1361836800 25.02 17:14:57

222568 https://deepbit.net/stats/1361836800 25.02 19:37:25

222569 https://deepbit.net/stats/1361836800 25.02 22:30:17

222571 https://deepbit.net/stats/1361923200 26.02 07:21:11

222572 https://deepbit.net/stats/1361923200 26.02 07:50:53

There were a couple more on the last page not listed by blockchain.info:

Link to Deepbit stats page Date/Time on page

and orphan hash

https://deepbit.net/stats/1361923200 26.02 13:04:32

https://deepbit.net/stats/1361923200 26.02 21:04:09

I didn't look further than that. So that's 12 orphaned blocks in four days, sequential block heights that are not part of a fork, and orphan 'races' taking place days after the network already accepted the blocks against which the orphans 'raced'. Also, some of the races were against their own blocks:

Deepbit main chain blocks:

Block height Link to Deepbit stats page Date/Time on page

and orphan hash

222563 https://deepbit.net/stats/1361577600 22.02 17:11:23

222564 https://deepbit.net/stats/1361577600 22.02 17:23:29

So something strange happened there, and I have no idea what. However it's clear these orphaned blocks were anomalous, the timestamps reveal they could never have been accepted by the network, and so need to be removed from the dataset.

Next, we come to the BIP-16/17 orphans, the first shaded group just prior to index 300. This created many orphans while voting on which version of "Pay to script hash" (P2SH) block solvers preferred. Once a majority of clients showed a preference for one over the other, transactions that were not accepted by the network caused orphaned blocks, as outlined in this bitcoin.stackexchange.com answer.

Since I'm not a developer and I don't know how to determine which of the blocks were orphaned because the clients had not updated to BIP-16 transactions and which were due to orphan races, I simply removed all orphaned blocks in the shaded area.

But it doesn't end there. Transaction 4005d6be caused 94 orphaned blocks between April 1st 2012 (the first day of BIP 16 majority) and October 19th 2012. I realised that there could be other transactions that also caused orphan blocks in this time for the same reason. If a transaction occurred in one orphaned block and again in another, the probability is high that the transaction caused the orphan. When I searched all the orphaned blocks for repeated transactions, I found that most orphans occurred between the same block heights as transaction 4005d6be, except for a few that were due to sequential orphan races and forks, such as the duplicated transactions in 258610 and 258608, and the fork starting at 250532 and finishing at 250533.

So I removed all these orphans except for the forked orphans. However forked orphans also need to be adjusted - only the original valid orphan is useful, the remaining members of the fork could never be accepted by the main chain (unless the fork became the main chain) and so are invalid and were removed.

There are still some invalids that can be found, using the timestamp rule. However this is a bit tricky since only the late timestamps are easily findable - the late timestamps have to be compared to the network adjusted time which I don't know.

Instead, I compared the time difference between the orphaned blocks and the associated main chain block.

The pre - block height 160 000 orphans and post - block height 220 000 are mostly within +/- 300 seconds of the main chain block to which they lost the orphan race. I can imagine that there would be some periods when the time between blocks is hours and there is a possibility of an orphan race occurring in that time, however on average I consider it unlikely (although I could be wrong). I'm fairly certain time differences over 1000 seconds are probably due to invalid blocks. In the end I decided on an arbitrary cut-off of +/- 600 seconds time difference as defining an invalid orphaned block ( indicated on the chart above by the shaded area ) and hoped that this arbitrary cut-off didn't affect the analysis too much.

2. The clean results
The assumed invalid orphans total 212. Removing them from the data:

3. Summary of part 1

I was really hoping for a clear inverse relationship between the two, looking at the last chart that doesn't seem to be the case. Where to from here?

Continue with the plan - find a function describing the expected number of orphans per block as a function of the block solve rate and block size.
If that leads nowhere, I'll try again using only orphans prior to the BIP-16 orphans and post the version 0.7 version 0.8 forking, since there appear to be very few invalid orphaned blocks in this group.

So I'll see what I can come up with and post the results as soon as I finish.

Thanks to blockchain.info for supplying the data on which this post was based.

organofcorti.blogspot.com is a reader supported blog:

12QxPHEuxDrs7mCyGSx1iVSozTwtquDB3r

Find a typo or spelling error? Email me with the details at organofcorti@organofcorti.org and if you're the first to email me I'll pay you per error:

I'm terrible at proofreading, so some of these posts may be worth quite a bit to the keen reader.

2 comments:

Anonymous24 October 2013 at 00:21
Good detective work! I wish I could find orphan blocks between 2009 and 2010. S.D.L.
organofcorti27 October 2013 at 19:01
Thanks! I noticed your post on bitcointalk asking if anyone has them. Why those years in particular? I think that given the few miners in 2009 and the early part of 2010, plus the fork and also the crazy hashrate increase just after MTGOX opened that you'd have a lot of unusual orphans?

For me, I'd like all the orphaned blocks of 2011.There were no forks that year that I can remember, and there was a continuous increase in hashrate for half the year, followed by a slow decrease. It would be perfect for an analysis of valid orphans vs hashrate.

Comments are switched off until the current spam storm ends.

Neighbourhood Pool Watch

Pages

Tuesday, 22 October 2013

16.1 The network: Orphaned blocks part 1

2 comments: