Monday, July 20, 2009

K-S Test Results

Monday:
Kolmogorov-Smirnov Test
Tests for how similar two data sets are, by measuring the largest distance between the two functions. The IDL function, kstwo, works by inputting two data sets and outputting the K-S statistic "D" and the corresponding "prob". If prob is small, the tests are likely not from the same origin. I ran this test on my data and a couple sets of random data, and taking the mean and standard deviation of numerous trials for comparison. I got the kinds of results I was expecting between the random sets, but got two differing results on the GOODS stars when I included my whole star catalog versus limiting it to the 27th magnitude.

Results

GOODS-N to 27th mag vs. Random 1 (Normalized*, 1 set vs 9 sets)
d-mean: 0.19797699
d-stdev: 0.04482917
prob-mean: 0.69592268
prob-stdev: 0.2092225

*The first time I ran it, I hadn't yet normalized them, so the sets had varying total populations, and resulted in even higher values of d and lower probabilities.

GOODS-N to 28th mag vs. Random 1 (Normalized, 1 set vs 9 sets)
d-mean: 0.089285724
d-stdev: 0.055698492
prob-mean: 0.99998375
prob-stdev: 0.18113585

GOODS-S to 27th mag vs. Random 1 (Normalized, 1 set vs 9 sets)
d-mean: 0.16683391
d-stdev: 0.027475102
prob-mean: 0.84035881
probsigma: 0.12022707

GOODS-S to 28th mag vs. Random 1 (Normalized, 1 set vs 9 sets)
d-mean: 0.12184878
d-stdev: 0.042824080
prob-mean: 0.99533697
prob-stdev: 0.19727988

Random 2 vs. Random 3 (1 set vs 9 sets)
d-mean: 0.14321605
d-stdev: 0.055629589
prob-mean: 0.92319757
prob-stdev: 0.20686308

Random 2 vs. Random 3 (9 sets vs 9 sets)
d-mean: 0.12757371
d-stdev: 0.032610029
prob-mean: 0.98843256
prob-stdev: 0.070053501

Random 4 vs. Random 5 (100 sets vs 100 sets)
d-mean: 0.1610636
d-stdev: 0.046950535
prob-mean: 0.8827874
prob-stdev: 0.18292440

Conclusions

I'm more comfortable going with the statistics done on the GOODS data to the 27th magnitude, since in my work before eliminating the dimmest data points gave me a more stellar sample. This means their likenesses to randomness are lowered. The South field I think is still well within range to call "close to random" at about 84%, given the averages and standard deviations where the sets are known to be random. The North field I can't say quite as confidently, at almost 70%, but it lies at the edge of what I'd call random.

No comments:

Post a Comment