Analyzing and visualizing weather data with R

Table of Contents

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

Having moved from the West of Ireland to the Pacific North West I was interested to observe that, while both people regularly complain about the rain, the rainfall patterns appeared quite different. That initial observation led me to wonder how similar, or different, the patterns are between the two regions? This blog describes the process of accessing historical weather data, processing the dataframes with R, and visualizing the findings with R’s ggplot2 package. In doing so I will hopefully reach a conclusion as to which ‘West Coast’ has more cause to complain about rainfall. The weather stations we’ll reference are located at Shannon and Vancouver international airports.


Sourcing and Accessing Dat

Initial Google searches will likely lead to the R package ‘weatherData’. It’s worth noting that despite its prominence in google search results, this package no longer works due to its API source, Weather Underground, moving to a paid model. A follow-up search may reference Weather Underground’s own R package, though it’s worth noting that their free-tier requires users to maintain a personal weather station in order to receive an API key.

I found the most appropriate solution was to pull data directly from the relevant national meteorological services. As many meteorological services have begun publishing their own R packages in recent years, Canadian data is accessible from the ‘weathercan’ package. The Irish Meteorological Service, Met Éireann, does not maintain an R package, but provides csv data for download from met.ie.


Determining Metrics

While Irish records track all forms of precipitation under a single count, Canadian records differentiate between ‘total rain’ and ‘total precipitation’. The latter includes all forms of precipitation while the former includes ‘all liquid precipitation’ but excludes snowfall. As snow is far more common in Vancouver than Shannon and the subject of this study is rainfall, it’s tempting to compare Shannon’s total precipitation to Vancouver’s total liquid precipitation. Ultimately though, it makes the most sense to stay consistent and compare identical metrics.


Importing Data

For the csv file from met.ie, it’s necessary to read from line 25 onwards, as the lines before this contain the dataset’s glossary. For the Canadian data, it’s necessary to concatenate data from two datasets as Vancouver International Airport’s weather station changed in June 2013.


Standardizing the Data Frames

In order to efficiently work with these data frames, it’s necessary to format their date columns, restrict their date ranges, and standardize their column names.


Calculating the Highest Count of Consecutive Dry Days for each Year

My approach was to group each period of consecutive dry days with a unique id for each, before using these ids to get the highest count for each year. The data showed that Vancouver consistently had the highest number of consecutive dry days. The data showed that Vancouver’s driest consecutive periods were typically in the months of July-September while Shannon was far less predictable. These dates could be included as tooltip values if creating interactive plots.


Get Monthly Insights

In terms of monthly insights, I was interested in total precipitation volume, volume by ranges, and counting dry days (days with 0mm precipitation). I also checked for NA values, of which there were 28 for Vancouver and 0 for Shannon. The resulting plots show greater precipitation volumes in Vancouver for the months of October to April, and higher counts of dry days in Vancouver for all but the month of March.


Calculate for mm Ranges

For plotting the precipitation range columns, it was necessary to convert the dataframe to long format in order to plot the data by individual facets. This plot required more customization than the others and included a custom legend.


Conclusion

The findings show that Vancouver gets higher volumes of precipitation in shorter, isolated periods, while maintaining clear seasons with far less rain in Summer. In contrast, Shannon typically has far fewer consecutive dry days and far more days with 0 to 5 mm of rainfall.

Ultimately, with something as arbitrary as complaining about the weather, people are always going to find an angle to lament their own experiences. In that spirit I’m going to conclude that Ireland has more cause to complain on the grounds that its weather requires its people to carry umbrellas on a higher number of days per year.

This project’s source code is viewable on GitHub.

Keep reading

Dig deeper into data development by browsing our blogs…
ProCogia would love to help you tackle the problems highlighted above. Let’s have a conversation! Fill in the form below or click here to schedule a meeting.