Tuesday, January 16, 2007

How I Spent My MLK Day

I used most of yesterday in an ill-begotten attempt to relearn some of the grad school statistics I had studied in a half-assed manner 16 years ago. Once again I was mucking about in the NOAA historical data on hurricanes, trying to figure out how valid or invalid the disaster models were that I wrote about before. I came up with some stuff, but I don't know enough about the stats to be all that confident about my results. In looking over stuff that might be similar I came upon this post at Climate Audit, that was also looking at the same data. I posted what I found over there hoping to get some response that might back me up or point out what basic tenet of statistical analysis I had screwed up. So far I've been ignored...which is fine...I'm used to THAT. In any event, this is what I posted:

I fret about commenting here (on the site) because I am NOT terribly conversant in a lot of the statistical methods (I went to grad school in political philosophy for crying out loud), but I have been looking at this same data for a while now. I’ve reached some conclusions, based upon my admittedly limited knowledge, but I’d like to know what folks think I’ve done right or wrong, and where the weaknesses of my understanding lie.

OK, so we are looking at tropical storm/hurricane formation in the Atlantic basin in an historical perspective. Based upon the caveats included in the NOAA historical data set, I decided to limit my investigation to the 1946-2005 time period. (It also makes for an even 60 year period.)

So I’ve loaded up my data onto a spreadsheet and found the following basic statistics:

# of Tropical Storms 634 (avg. 10.5667/year)
# of Hurricanes 369 (avg. 6.15/year)

Next, I get the standard deviation for the T. Storm number ang get back a value of 3.9504 (and for just hurricanes a values of 2.5765)

Even I know these are very large numbers, so I look around for reasons (thank you google) and I hit upon the possibility I could be looking at a lognormal distribution. Is this reasonable? I think it is, but I’m open to being convinced otherwise.

In any event I get log10 values for the data, and get
mean: 0.9971
standev: 0.1533

extrapolating the numbers back out (i.e. 10 ^ 1.1504) I get an upper value of 14.1372 storms, with a mean (i.e. 10^0.9971) of 9.9334 storms. This makes the lower boundary 5.7296. OK so I look at how much of the original data falls between 6 and 14 inclusive. I find 51 out of 60 (85%) observations fall in that range. So once again that looks pretty good, or I’m not seeing anything that argues agaisnt this being a lognormal distribution. I think.

Anyway, I was doing all of this because of the news story about RMS downplaying most of the historical data and stressing the last 12 years. The argument seems to be that the last 12 years were so unusual, in term of tropical storm activity, that it must be indicative of new rules to the game.

So I looked at the 12 years 1994-2005. Using the log10 numbers I get a mean of 1.1354, for that time period which seems to be within ONE standard deviation of the 1946-2006 values. (i.e.,0.9971+0.1533=1.1504)

That is about as far as my limited knowledge can take me. Where have i screwed up, what dont I know?


On the chance that someone with a statistical background reads this here, I pose the same questions. According to my work the claim that the last 12 years of hurriane activity is significantly different from what came before is false. In fact I found that the activity fell within a stardard deviation from the mean. I also ran the numbers using as a baseline the data from 1946-1993 and comparing the 1994-2005 years to that (i.e. a log10 mean of 0.9625, with a standard deviation of 0.1326 for the 1946-1993 data, vs. a log10 mean of 1.1354 for the 1994-2005 period.), and found the 12 year period was still only barely one standard deviation away from the older numbers.

It seems clear to me that the claims of the folks at RMS (" "Increases in hurricane frequency should be expected along the entire U.S. coast, but will be highest in the Gulf, Florida, and the Southeast, while lower in the Mid-Atlantic and the Northeast.") is in no way borne out by the data.

But I'll admit it would seem even clearer if I could know that I haven't made some sort of basic blunder.

No comments: