S
pring 2022
Dew Point and Humidity in Boston

Since I have been living in Boston for a couple of years now, I wanted to compare the weather I experienced to those of previous years. I found a data set on Kaggle that has the kind of variables i want to explore, the observations are from the year 2008 till 2018, this seemed good enough. The creator of the data set mentions: "Dataset contains ... for every day from 1/1/2013 - 4/8/2018 inclusive". and they used https://www.wunderground.com to create the data set. Some fo the key variables include: Year, Month, Day, High Avg and Low Temps in F, High Avg and Low Dew points in F, High, Avg and Low Humidity in percent, Precipitation in inches, Snow in inches and various other variables.


Source: https://www.kaggle.com/jqpeng/boston-weather-data-jan-2013-apr-2018

Using ggplot I drew some exploratory visualizations for the data:
( List of built-in theme functions on
https://ggplot2.tidyverse.org/reference/ggtheme.html )

mean = 65.57509
median = 65
sd = 14.79349

The visualization above shows the distribution of average humidity throughout days from the year 2008 to the year 2018. The visual shape of the distribution suggests a central region of high-frequency values; the main hump at about 65% is also the only mode making the distribution unimodal. The mean average humidity across the days in this distribution is 65.57509; the median is relatively very close at 65; and the standard deviation is 14.79349. The mean and median are relatively very close to each other, and they correspond to the same bar that is the mode of the distribution. The mean is a little larger than the median, which suggests positive skew; it is a bit challenging to make sense of this skeweness visually, the difference is so small and the distribution decreases at almost at a similar rate on both sides of the median.


Tools: RStudio