There are various ways that one can produce graphs of data. Being an R enthusiast, I can produce some really nice graphs using the basic graphics capabilities of R, as I’ve done with the bathymetry of Deep Creek Lake. But R can do much more. There are several JavaScipt libraries for producing graphs of data, such as HighCharts and Dygraphs, and R has an interface to them.
In this note I will show graphs of a number of datasets:
For now I will constrain myself to showing graphs with R basic graphics engine, and with Dygraphs, a Javascript based graphing engine.
In this note I will explore what these options can do. With R base graphics more information can be displayed in a graph, while with Dygraphs an interesting capability can be produced with interactions that allows one to hone in on as small a time interval as one wishes.
As will be seen, one can produce an estimate of the flowrate from the hydroelectric plant
With R base graphics one can produce graphs by just plotting x and y values and annotating them with whatever text one wants on it and with additional graphical elements.
Let’s look at the rainfall data from a local weather station, KMDOAKLA9, whose data can be found on WeatherUnderground. The station is located somewhere on Rock Lodge Rd, near the Deep Creek bridge: >Latitude / Longitude: N 39 ° 31 ‘25’‘, W 79 ° 20’ 21 ’’, Elevation: 2543 ft
I downloaded the data available since Jan 1, 2018 through March 30, 2018. I had to process the downloaded file to extract the daily rainfall in the following way and produce a rainfall plot:
file1 <- "../data/KMDOAKLA9_2018_Jan1_March31.txt"
x <- readLines(file1)
N <- length(x)
skip <- c("Wea", "Jan", "Feb", "Mar", "201")
Nskip <- length(skip)
i <- 0
yr <- 2018
rain_date <- vector() # To store the daily rainfall amount in inches
rain_amount <- vector() # To store the date of the rainfall
s <- vector() # A local vector for string splitting operations
t <- vector() # A local vector for string splitting operations
i <- 0 # Counter for lines processes
k <- 0 # Counter for storing the result
while(i < N){
i <- i + 1
s <- unlist(strsplit(x[i], split="\t")) # Break a line into its components
test <- FALSE
for(j in 1:Nskip){
subs <- substring(s[1], first=1, last=3)
if(subs == skip[j]) { test = TRUE; break }
}
if(subs == skip[1] || subs == skip[5]) { next }
if(subs == skip[2]) { mon <- skip[2] }
if(subs == skip[3]) { mon <- skip[3] }
if(subs == skip[4]) { mon <- skip[4] }
if(test == FALSE){
k <- k + 1
rain_date[k] <- paste(mon, s[1], yr, sep="-") # Create the date
s <- unlist(s) # Break the last column apart to separate the value from the unit
ss <- s[length(s)]
t <- unlist(strsplit(ss, split=" "))
rain_amount[k] <- t[1]
}
}
rain_amount <- as.numeric(rain_amount)
head(rain_amount)
## [1] 0.01 0.00 0.01 0.00 0.00 0.00
rain_date <- as.Date(rain_date, format="%b-%d-%Y")
head(rain_date)
## [1] "2018-01-01" "2018-01-02" "2018-01-03" "2018-01-04" "2018-01-05"
## [6] "2018-01-06"
plot(rain_date, rain_amount, type="l")
grid(col="black")
Dygraphs is a JavaScript library that can be used to produce a variety of graphs from data. R has a set of functions, via package ‘dygraphs’, that can interface with this library and produce web pages with embedded JavaScript powered graphs.
The following chart is a Dygraph plot of the same data, the rainfall at Deep Creek Lake since the beginning of this (2018) year, as plotted above with R base graphics engine, but with enhanced features that allows one to zoom in and out of the data.
I’m following the example displayed here using the “Dygraphs” plot package. with the rainfall data.
library(xts)
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
rain_ts <- xts(rain_amount, order.by=rain_date)
library(dygraphs)
dygraph(rain_ts, main = "Rainfall Since Jan 1 at Weather Station: KMDOAKLA9") %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.2) %>%
dyOptions(drawPoints = TRUE, pointSize = 2) %>%
dyRangeSelector(dateWindow = c("2018-01-01", "2018-03-30"))
I’m now going to graph the lake level data the same way. First I read the data, then create an ‘xts’ object, plot it first with the xts package and then plot it into Dygraphs.
# Read the lake level and generator status data
file2 <- "../data/reduced_todays_output_20180331.txt"
x <- read.table(file=file2, header=TRUE, sep="\t")
# Extract into individual vectors
onoff <- x$ONOFF
level <- x$Level
datetime <- x$Datetime
# Convert ON/OFF to 1/0 values
onoff <- as.character(onoff)
onoff[onoff=="ON"] <- 1
onoff[onoff=="OFF"] <- 0
onoff <- as.numeric(onoff) # Convert from character to numeric
head(onoff, 50) # Show the first 50 numbers
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
## [36] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
level <- as.numeric(level)
head(level, 50)
## [1] 2457.65 2457.63 2457.63 2457.64 2457.63 2457.63 2457.63 2457.62
## [9] 2457.61 2457.60 2457.58 2457.60 2457.60 2457.61 2457.61 2457.61
## [17] 2457.60 2457.60 2457.60 2457.60 2457.59 2457.59 2457.59 2457.60
## [25] 2457.58 2457.60 2457.58 2457.58 2457.58 2457.58 2457.59 2457.57
## [33] 2457.57 2457.59 2457.57 2457.60 2457.58 2457.57 2457.59 2457.59
## [41] 2457.58 2457.59 2457.58 2457.58 2457.57 2457.57 2457.57 2457.57
## [49] 2457.60 2457.60
datetime <- as.POSIXct(datetime) # Convert to time objects
head(datetime)
## [1] "2018-01-16 16:56:56 EST" "2018-01-16 17:07:08 EST"
## [3] "2018-01-16 17:17:23 EST" "2018-01-16 17:27:38 EST"
## [5] "2018-01-16 17:37:53 EST" "2018-01-16 17:48:08 EST"
level_ts <- xts(level, order.by=datetime) # Create time series objects
plot(level_ts) # Plot the lake levels with R graphics
The above graph is done with R basic graph functions and using an ‘xts’ time series object as defined in the ‘xts’ package.
The following graph is the same data but plotted with Dygraph. It assumes that the data has already been put in the desired format, as shown above.
# Plot lake levels with Dygraph
dygraph(level_ts, main = "Lake levels since Jan 16, 2018") %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4) %>%
dyRangeSelector(dateWindow = c("2018-01-16", "2018-03-30"))
# Plot generator on/off with Dygraph
onoff_ts <- xts(onoff, order.by=datetime) # Create time series objects
# Plot generator on/off with Dygraph
dygraph(onoff_ts, main = "Generator ON/OFF since Jan 16, 2018") %>%
dyOptions(stepPlot = TRUE) %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4) %>%
dyRangeSelector(dateWindow = c("2018-01-16", "2018-03-30"))
This is looking pretty good. It has the same slider option below the graph.
It’s quite interesting to notice how long the generator ran during this period, almost contoneously from February 11 to March 3, almost 20 days.
During January and February there were a lot of periods lasting roughly about 1/2 hr to 45 minutes. Not sure what’s going on!
To get an idea of how the generators are operation, we can look at the USGS Youghiogheny River water gage located near Hoyes Run, just below the tail race of Deep Creek Hydro. I’ve extracted the gage data from Jan 1 through March 30 of this year, 2018. The data is converted in the following two charts, the first one showing the flowrate in CFS, and the second the gage height in ft.
#========1=========2=========3=========4=========5=========6=========7=========8=========9=========0
usgs_hoyes_file <- "../data/USGS_Hoyes_2018March30.txt"
#========1=========2=========3=========4=========5=========6=========7=========8=========9=========0
# Get the river data - Discharge Rate and Stage Height
river.data <- read.table(file=usgs_hoyes_file, sep="\t",
comment.char="#", header=TRUE)
river.data <- river.data[-1,] # Get rid of the first record read; not of interest
# 02 00060 Discharge, cubic feet per second
# 03 00065 Gage height, feet
discharge.time <- strptime(river.data$datetime, format="%Y-%m-%d %H:%M")
discharge.time <- as.POSIXct(discharge.time, tz="EST")
discharge <- river.data$X70091_00060
discharge <- as.numeric(as.character(discharge))
gage.height <- river.data$X70090_00065
gage.height <- as.numeric(as.character(gage.height))
First I read the data and define it as local variables and condition them for use later. Next I’ll create the plot for both the discharge and height.
Plotting data with the axis as a time value is often a little tricky to make it look right, particularly with aligning a grid with the appropriate dates. Otherwise it’s standard base graphics commands.
#========1=========2=========3=========4=========5=========6=========7=========8=========9=========0
date1 <- discharge.time[1]
last <- length(discharge.time)
date2 <- discharge.time[last]
#========1=========2=========3=========4=========5=========6=========7=========8=========9=========0
# The following creates a graph with the river discharge data.
plot(discharge.time, discharge, type="l", cex=0.2,
xlab="Day/Time/Duration", ylab="Discharge (flow rate), cfs")
mtext(paste("River Discharge (cfs) AT HOYES RUN MD\n",
"from: ", date1, " to: ", date2,
"\nUSGS 03075500 YOUGHIOGHENY RIVER AT HOYES RUN, MD",
sep=""), side=3, line=1)
# Draw the grid
abline(v=axis.POSIXct(1, discharge.time), col="grey80")
(yaxp <- par("yaxp"))
## [1] 0 12000 6
abline(h=seq(yaxp[1], yaxp[2], (yaxp[2]-yaxp[1])/yaxp[3]),
lty=1, lwd=2, col = "grey80")
points(discharge.time, discharge, col="blue", cex=0.2)
#========1=========2=========3=========4=========5=========6=========7=========8=========9=========0
# The following creates a graph with the river gage height data.
plot(discharge.time, gage.height, type="l", cex=0.2,
xlab="Day/Time/Duration", ylab="Gage Height, ft")
mtext(paste("River Gage Height (ft) AT HOYES RUN, MD\n",
"from: ", date1, " to: ", date2,
"\nUSGS 03075500 YOUGHIOGHENY RIVER AT HOYES RUN, MD",
sep=""), side=3, line=1)
# Draw the grid
abline(v=axis.POSIXct(1, discharge.time), col="grey80")
yaxp <- par("yaxp")
abline(h=seq(yaxp[1], yaxp[2], (yaxp[2]-yaxp[1])/yaxp[3]),
lty=1, lwd=2, col = "grey80")
points(discharge.time, gage.height, col="red", cex=0.2)
So here we have the two graphs using the basic capabilities of R. Next I’ll plot the data with Dygraph, including the zoom slider at the bottom of the graph.
discharge_ts <- xts(discharge, order.by=discharge.time) # Create time series objects
plot(discharge_ts) # Plot the discharge with R graphics
The first thing to do is to cast the data as an xts time series object. Simple. Then plot the results with R. The graph looks nicer.
Next we’ll plot it with Dygraph.
dygraph(discharge_ts, main = "Discharge at Hoyes Run since Jan 1, 2018") %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4) %>%
dyRangeSelector(dateWindow = c("2018-01-1", "2018-03-30"))
df <- data.frame(discharge.time, discharge)
df <- df[complete.cases(df), ]
discharge.time <- df$discharge.time
discharge <- df$discharge
discharge_ts <- xts(discharge, order.by=discharge.time) # Create time series objects
dygraph(discharge_ts, main = "Discharge at Hoyes Run since Jan 1, 2018") %>%
dyOptions(fillGraph = TRUE, fillAlpha = 0.4) %>%
dyRangeSelector(dateWindow = c("2018-01-1", "2018-03-30"))
The next observation is an interesting result when one slides the righthand slider all the way to the left so that you can see the data for Jan 1-5. The releases are seen very clearly. There is a lot of fluctuation in the maximum values because of turbulence in the river. This is inherent in the release, because the USGS gage is basically just downstream of the tailrace.
I used a screen ruler to measure the distance from the lowest value of the peak and the highest value of the peak down to the apparant baseblow and got the following numbers for the first nine releases since Jan 1:
low_value <- c(6.4, 6.6, 6.0, 5.8, 6.7, 6.6, 7.8, 6.4)
high_value <- c(10.3, 10.0, 9.2, 10.3, 10.8, 10.0, 10.7, 9.5)
scale <- 900/12.6
h1 <- low_value * scale
h2 <- high_value * scale
histogram <- function(x){
h <- hist(x, breaks=10, col="red",
main="Min and Max Release Values",
xlab="Release Value, cfs")
xfit <- seq(min(x),max(x),length=40)
yfit <- dnorm(xfit, mean=mean(x), sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)
}
I also used as a scale factor the distance on the screen between 1100 and 200 cfs, which was 12.6 cm. I next created a function that would show values in a histogram, and also fitted the values to a normal distribution curve, mostly to aid in the visualization. I then plot them.
histogram(h1)
histogram(h2)
From these graphs, if I extract the peaks from the density curves and average them, I get about 590 cfs.
Another way to look at them is to plot the two histograms side-by-side. The following code was adapted from the reference cited in the code block:
# REF: https://stackoverflow.com/questions/3541713/how-to-plot-two-histograms-together-in-r
# Calculate the histograms - don't plot yet
h1gram <- hist(h1, plot = FALSE)
h2gram <- hist(h2, plot = FALSE)
# Calculate the range of the graph
xlim <- range(h2gram$breaks,h1gram$breaks)
ylim <- range(0,h2gram$density, h1gram$density)
# Plot the first graph
plot(h1gram, xlim = xlim, ylim = ylim,
col = rgb(1,0,0,0.4),xlab = 'Release Value, cfs',
freq = FALSE, ## relative, not absolute frequency
main = 'Distribution of Low and High Values of River Flow')
# Plot the second graph on top of this
opar <- par(new = FALSE)
plot(h2gram,xlim = xlim, ylim = ylim,
xaxt = 'n', yaxt = 'n', ## don't add axes
col = rgb(0,0,1,0.4), add = TRUE,
freq = FALSE) ## relative, not absolute frequency
# Add a legend in the corner
legend('topright',c('Low Value','High Value'),
fill = rgb(1:0,0,0:1,0.4), bty = 'n',
border = NA)
par(opar) # Reset the to the default plotting parameters
Visually from this graph I get a midpoint value of 600 cfs.
The reason I’m doing this is because there is no defined value for the “operating” release rate. The literature reports values from 570 to 620 cfs. So for analysis purposes, using 600 cfs is prabably the best value. The maximum error is probably around 1%.