At Carveth’s Marina on Stoney Lake in Ontario there’s a sign up where they keep track of the day when the ice is completely out of the lake. (Being in central Ontario the lake freezes over completely in the winter.) The data is also available on their web site. I got curious about it and wondered if I could see any patterns.
I set up an Org file where I would use R. First, the raw data as a table:
Then I have an R source block that sets up the R session I’m going to use (I name all R sessions I use from Org as R:something):
The next block loads the raw data, forces the dates to be known as dates instead of just text, adds a new column for just the year, adds a num_days column that is the number of days since the start of the year (I don’t want to work with dates like “19 April,” which are clumsy, and leap years throw things off), adds a column for the decade the year is in, and then drops everything from before 1960.
Flipping over the the R:ice session, I can check that the ice_out data frame looks how I want:
Next, a chart showing, for each year, how many days it takes for the ice to go out. I add a best-fit line with the lm model (here’s a nice full explanation).
Visually there’s a definite downward trend there: the ice is going out earlier. I assume this is caused by climate change. Statistically, is there anything really going on here?
In the R session we can find out more about that linear regression by setting it up and then asking it to explain itself.
It’s saying this is a statistically valid model (the Pr values are small), but the R-squared measures (the coefficients of determination) are very low: about 10% of the num_days value is explained by the year.
The model is saying the line shown represents y = -0.18851*x + 481.72452. Over a range of 10 years that means the line changes by 1.8851 downwards, and -1.8851 is pretty close to -2.0, so I think of that as saying “every decade the ice goes out, more or less, almost two days earlier.” (The 481 is the intercept on the y axis, and it’s large because we’re working with contemporary years like 2015; if you subtract 1960 from the years the intercept gets much smaller but the slope of the line stays the same.)
The data does not fit close to the line, as we can see. From one year to the next the ice could go out 25 days earlier or 25 days later. As to the variance, here’s the standard deviation of the number of days to ice out over each decade:
Seems like this decade the variation in when the ice goes out is greater. That fits with the idea of climate change bringing out greater variability in weather, but of course this is just a guess here.
Back on Org, here’s a histogram of num_days:
That got me wondering how that changed over the decades.
And then I realized that finally I had a chance to try out the ggridges library I’d heard of, because it can do just what I did, and much more, and make it look much nicer.
It certainly looks like things are creeping leftward: there is more > 120 at the top than the bottom, and look how there is more < 80 recently.
Now, bear in mind I’m not a climate scientist and I’m not a statistician, and all I had was a range of dates and I made a linear model and a histogram. There are many factors determining when the ice goes out: one must be the daily temperatures, and historical data on that is available from Environment Canada, but I’m not going to get into that. I have no information about when the lake froze in the first place (anecdotally, it’s later), or how thick the ice is (anecdotally, thinner; I don’t think people drive pickup trucks full of lumber over the ice any more).
Nevertheless, it seems reasonable to say that on average, more or less, since the 1960s the ice is going out almost two days earlier every decade.