Pages

Thursday, 11 July 2013

11.4 MTGOX volume post Dwolla: A single statistical test

11th July 2013


Note: I updated this analysis on the weekend, but only found time to finish the post just now. This means some of the time periods I refer to, such as "fifty-two days since May 17th", will be referring to the date of the analysis, not the date of posting.

0. Introduction
In the last post I presented some comparisons between the MTGOX US$ BTC volume and that for other exchanges, which suggested that when compared to other exchanges MTGOX's volume for US$ BTC trades had declined. Redditor Liorithiel commented "Not a single statistical test was done". This was a fair comment, and in the post I'd avoided figuring out a way to do this. I derived an answer not too long after, but I got sidetracked with some of the other fascinating things I found out about MTGOX's trade volume along the way and I'll discuss them in later posts. 

1. Hypothesis
Let's start with -
Null hypothesis: MTGOX US$ / BTC volume has not reduced significantly since mid-May
Alternative hypothesis: MTGOX US$ / BTC volume has reduced significantly since mid-May

2. Volume
So how should volume be defined? Volume per day or per week or per n trades? Fiat volume or BTC? How can the trend in any of these measures be accounted? For example, in the first 5000 trades 1726703 BTC was traded and in the last 5000 trades 18924.74 BTC was traded. In terms of US$, in the first 5000 trades US$177546.5 was traded for BTC and in that last 5000 trades US$1689397 was traded. 

What is the volume being compared to in order to decide whether it has reduced or not? To be as comprehensive as possible, it should be compared to all exchange volumes for the entire history of BTC exchange, or at least to other US$ BTC exchanges - something I did in a non analytical way last time. I got distracted before I managed to do that and instead only compared volume to the history of MTGOX US$ BTC volumes.

To provide an idea of the vast differences in scale and variance due to varying definitions of volume:





The variance in BTC or US$ per 50000 trades is much lower than their counterparts, and the weekly US$ volume has the greatest variance and includes changes through more than six orders of magnitude.  Another advantage of volume for n trades is that the closure of the exchange has no effect on the data. So, which is the most useful volume measure? 

A significant problem these charts illustrate is that none of the datapoints are independent - all of them (bar the first) is dependant on previous datapoints in some way, and show trends that suggest the influence of outside variables. However any of the usual kinds of statistical analysis requires independent identically distributed (iid) random variables. One way to increase the independence is to use the percentage change in volume rather than the actual volume:





Note: The week before and the week after the closure of the exchange have been removed from the weekly percentage of volume change charts.

The difference in variance is quite striking, but all four volume measures seem equally random. In the end I decided to use time based BTC volume per unit time - it has a lower variance than time-based USD volume, and more useful generally than trade based volume, since the number of trades per unit time varies so much.

Since it's a "before and after" change in volume that I want, I actually need variables in terms of the amount of time since mid-May compared to the same amount of time before that. Since it has been about fifty-two days since May 17th, I used fifty-two  days as a time period.

In summary: I've defined the variables I'm analysing as the fifty-two  day percentage change in BTC volume.

The hypotheses become -
Null hypothesis: MTGOX US$ / BTC  fifty-two  day BTC percentage change in volume has not reduced significantly since mid-May
Alternative hypothesis: MTGOX US$ / BTC fifty-two  day BTC percentage change in volume has reduced significantly since mid-May

3. Modelling the difference of the log change in fifty-two day volume
Percentage change in volume is not easy to work with, so instead I have used the log transformation, the difference of the log change in volume. This should result in a symmetric distribution with mean close to zero. 

A second change is that rather than use only the diff-log change every fifty-two days from day one, I created a dataset of variables that started at various intervals: one starting at day one, one at day thirteen, one at day twenty-six, and one at day thirty-nine.

Using this dataset I looked for distributions to which the diff-log variables could be modelled. The two best fits were to the normal distribution and the logistic distribution - although the logistic model was a much better fit than the normal model. I investigated further and found that while very good logistic models existed for the diff-log daily and weekly volumes and volume per 2000 trades to volume per 100000 trades, very few normally distributed models existed, and all had a poorer fit than a logistic model. I think perhaps that the normal fit for the  diff-log fifty-two day volume is the central limit theorem in action.




I did investigate other models, of course, for example the expGamma,  generalised normal, and lambda distributions, but these were either not a good fit, had too many degrees of freedom or did not explain anything about the data (the lambda distribution I use mainly to determine if data might be logistic or normally distributed). 

Modelling a dataset means confidence intervals can be obtained, rather than using empirical quantiles which - especially near the extremities - will be significantly affected by outliers. Model details are given below.




4. Hypothesis test
For the hypothesis test, a p-value is required to assess the significance of a result. It is the probability of a percentage change in volume being as extreme as the actual percentage change in volume, and I usually p <  0.05 to indicate a significantly unlikely result.

For this analysis I'm using a slightly different definition. Since I want to find out whether or not the change in volume over the last fifty-two days is lower than expected, I've defined "lower than expected" as "unlikely to have happened yet". Excluding the time in 2011 during which MTGOX wasn't operating, the exchange has been trading for 1081 days. In that time there have been 20.788 fifty-two day periods; something that is unlikely to have yet happened will have a probability of less than 1/20.788, so I'm using a p-value of  0.0481. 

Final hypothesis - 
Null hypothesis: The probability of the MTGOX US$ / BTC  fifty-two  day BTC percentage change in volume reducing significantly since mid-May is greater than 0.0481.
Alternative hypothesis:  The probability of the MTGOX US$ / BTC  fifty-two  day BTC percentage change in volume reducing significantly since mid-May is less than or equal to 0.0481.

The volume in the fifty-two days prior to midnight 17th May: 6799130 BTC
The volume in the fifty-two days post midnight 17th May: 1862479 BTC
Difference in log volume: -1.29488

Probability of this change in volume given the logistic distribution parameters defined above:

P = 1 / (1 + exp(- (x - mu) / s))
where 
x = -1.29488
mu = 0.06398846
s = 0.37301752

P = 0.02550801 < 0.0481

Since P < p,  the null hypothesis can be rejected at a 95.19% confidence level, and the MTGOX USD/BTC trade volume denominated in BTC has dropped significantly.

5. A second hypothesis test
It's possible that the abnormally large trade volume during April may have skewed these results. I decided to rerun the test excluding the outliers. By iteratively applying Grubbs' test to randomised samples of the daily volumes, I selected the volumes for the 12th, 13th, 16th and 17th April as outliers and multiplied the average of the remaining trade days by fifty-two for a rescaled volume of 5662176 BTC.

The rescaled volume in the fifty-two days prior to midnight 17th May: 5662176 BTC
The volume in the fifty-two days post midnight 17th May: 1862479 BTC
Difference in log volume: -1.1119

Probability of this change in volume given the logistic distribution parameters defined above:

P = 1 / (1 + exp(- (x - mu) / s))
where 
x = -1.1119
mu = 0.06398846
s = 0.37301752

P = 0.04099808 < 0.0481

Once again P < p,  the null hypothesis can be rejected at a 95.19% confidence level, and the MTGOX USD/BTC trade volume denominated in BTC has dropped significantly.


6. Summary

  • The BTC denominated volume of MTGOX US$/BTC trades has reduced significantly.
  • If the highest four trading days in April are removed from the dataset and the remaining daily volumes rescaled, the BTC denominated volume of MTGOX US$/BTC trades has still reduced significantly.
  • The removal of the outliers had a significant effect on the results, and if I had chosen to remove more high volume days, the results would have been quite different. However, the outliers were during the fall and rebound of the price in April, and I think they were the only appropriate daily volumes to remove from the dataset.





No comments:

Post a Comment

Comments are switched off until the current spam storm ends.