I was attempting to explain to someone why the method I'm using to test whether or not the next day would close up or down was working, and realized that I didn't have a good explanation. After all, yesterday had 56% of cases being down, and today was down. This has been working nearly 80% of the time (even more if you disregard small differences like the Dow being down 4 pts on a day that was expected to be up). You would think intuitively that if the typical odds were in the 50-60% range that showed the bias which I am testing, that this method would only work 50-60% of the time, yet it is not. Here is my attempt at an explanation:
One explanation I can think of as to why it's working so well is that these shouldn't be thought of as "odds". Really, it's more like the preponderance of the evidence since it's difficult to determine which historical analogy should be used. What the odds table that I created does is eliminate the garbage -- by showing that 56% of cases were down for today, and that it predicts over 80% of the time, is effectively a way of "throwing out" the historical analogies that aren't any good and showing that the majority of analogies show a negative bias. Therefore, this is more a "decision by majority" rather than odds.
I don't have any threshold for the closeness of the fit. I just take the top 30 for the 2-5 period clusters. So in those 30 I assume there are some really good ones and some that aren't so good. I think my way of sampling 30 for 4 clusters and taking the majority opinion has the effect of weeding out the bad fits.
There has to be an explanation as to why it works so well. It absolutely could not work well if there wasn't a statistical reason.