Real-Time Forecasting of Deep Creek Hydro Operations, Part 1

Preface

This report describes how an operational system might be implemented and how Deep Creek Hydro might be operating it while attempting to satisfy all ‘stake holders.’

Since all of my work is done on a Mac using the latest version of the OS X operating system, Version 10.13.3. For other operating systems there are similar capabilities which are left to the ‘student’ to implement. This should not be a difficult task.

All of the coding is done in “R”, a computer language originally developed for statistical analyses, but which has become a favorite language for use in many academic and also private enterprise applications, and not necessarily in statistics. Here is the main link for R. The language is quite suitable to processing large amounts of data. In addition, it has outstanding capabilities to produce publication quality graphs. R is free and supported by a large number of people all over the world.

This report is written in RMarkdown and RStudio. RStudio is an integrated development environment for “R”, while RMarkdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. Go here for more details on using R Markdown. RMarkdown is oriented towards the R language and contains additional functionality to work with R itself. The combination of RStudio and RMarkdown allows one to execute scripts as part of the documentation when using “knitr,” a package that is part of R.

1. Introduction

With todays software and computer systems, pretty much everything can be done in a completely automated fashion. However, some people may want to insert an operator action in the chain of events, but here, in this report, everything will be done automatically.

Computer systems these days, whether they are Mac OS X driven, Windows driven or Linux driven all have the capability to automatically launch scripts at specified times. Hence it is possible to launch certain actions at specific times and when integrated should provide operators with a list of options to choose from, if so desired.

The Internet provides access to various pieces of information that one needs to make things happen, whether it be lake levels, river flows, or weather, the three principal categories of information needed. Data is collected from these data sources, massaged for the models that need them, and then used to make certain kinds of assessments and forecasts. Once these are completed various websites may need to be updated, also automatically, with schedules of water releases, the eventual goal of the effort.

2. Approach

I’ve categorized the required effort for an automated water release program from Deep Creek Hydro as follows:

1. Scraping - Extract raw information from various websites
2. Preprocessing - Cast the raw data collected above into a form that can be used by the various programs/scripts
3. Analysis - Perform analyses with the collected data about how well was done in the past and to collect the necessary information to make forecasts
4. Forecasting - Use the results of the various analyses to make forecasts of releases to be made
5. Publishing - Publish the results to operators, websites and other need-to-know organizations


Let’s examine each of the above steps.

3. Scraping

I’m using the term ‘scraping’ rather loosely. Scraping refers to two types of processes:

  1. To extract information/data from web pages without going through an official API (Application Programming Interface). For example, when you go to the Deep Creek Hydro web page that displays the lake level and generator status, the only way that you can get this data onto your computer in some kind of file is by copying and pasting the line with the data. There is no ‘download’ button. This data, however, can be scraped by a small script automatically, however often you want. Pretty much any web page can be scraped in this way. After one has scraped the page, it needs to be processed to extract the desired information.
  2. Extract/download using an API (Applications Programming Interface) such as for weather forecasts.

We’ll show the details for these two, and perhaps, others.

4. Lake Levels and Generator Status

The essence of the approach to determine the current lake eleve is to determine the amount of water that’s available in the lake for release through the hydroelectric generators and bypass flow. This is typically the amount of water available between the current lake level and the lower ruleband elevation. For now, we’re interested in making a forecast for water usage once a day. The lake level and generator status are shown on the Deep Creek Hydro website and pdated basically every 10 minutes. A simple script is shown here that extracts from the website the current water level of the lake and the status of the generator which then must be analyzed for how that translates into potential future water releases. The R script that does this is as follows:

URL <- "https://renewableops.brookfield.com/en/Presence/North-America/Recreation-and-Safety/Deep-Creek-Hydro/Lake-Levels"
thepage <- readLines(URL)
## Warning in readLines(URL): incomplete final line found on 'https://
## renewableops.brookfield.com/en/Presence/North-America/Recreation-and-
## Safety/Deep-Creek-Hydro/Lake-Levels'
write(thepage, file="./results/hydrotestpage.txt")
pl <- length(thepage)
cat(paste(Sys.time(), pl, sep="\t"), file="./results/page_file.txt", append=TRUE, sep = "\n")

The URL of the page is specified and all of the lines on that page are read. The URL is accessed in minimal time. This file is saved for processing later on without additional calls to the website. The length of the file interms of lines is determined and written to a file. This is used as a check on whether the website page is changed and when. We’ll come to this later on and display an alert message when this happens.

The warning displayed here basically signifies that there is no EOF (End-Of_File) marker, but it does not concern us.

Now I have to process that page and extract the relevant data. The page is stored, as above, on my computer and I do not have to process the URL again until I want an update. The following lines of code start the analysis of the downloaded data:

file2 <- "./results/page_file.txt"
pl <- readLines(file2)

npl <- length(pl)
if(npl > 1) {
    x1 <- unlist(strsplit(pl[npl], split="\t"))
    x2 <- unlist(strsplit(pl[npl-1], split="\t"))
    if((x1[2] != x2[2])) {
        stop("The size of Deep Creek Hydro's web page has changed!")
    }
}

I first start checking to see if the page has changed. I check the length of the number of lines read for two succussive accesses of the Deep Creek Hydro website. I read the file that keeps stored the data and number lines of the website. To do the comparison I must have at least one entry. Thus when “npl>1” I take each of the two lines and break them apart at the “” (tab) separator. This results in a two two-element vectors, with the second entry being the number of lines of the Deep Creek Hydro webpage. Here I’m assuming that when they are different it is possible that the website has changed which must be checked before proceeding. If they are the same continue with the analysis; if not stop the analysis process.

The next thing to do is to analyze the webpage and extract the relevant information.

By examining the way the html page is rendered one can determine and count at which lines within the

element span the data are located. It is found in the following portion of the website and in this format:

# NOTE: The result of this process is the following four lines:
# [1] "                        <td>OFF</td>"
# [2] "                        <td>9/29/2016 03:18</td>"
# [3] "                        <td>2457.75</td>"
# [4] "                        <td>9/26/2016 13:10 </td>"
# The meaning of this data is:
# 1. Generator Status
# 2. Last Reading
# 3. Current lake level (Feet above sea level)
# 4. Last Status Change

This needs to be found only once unless Brookfield changes its page at which point the algorithm may have to be changed also. For now (March 2018), this is what works. Read the rest of the comments in the code snippet above and one should get the drift as to what is important.

Here is the segment of the overall script that extracts the data:

# Find the line number that contains the html token "<tbody> and extract the data"
first.line.number <- grep('<tbody>', thepage)
line1 <- first.line.number + 2
line2 <- first.line.number + 5
x <- vector()
x <- thepage[line1:line2]

library(qdapRegex)
x <- rm_between(x, "<td>", "</td>", extract=TRUE)
unlist(x)

[1] “OFF” “3/13/2018 14:45” “2458.53” “3/13/2018 09:10”

The data segment is isolated into four elements of a temporary vector, x. Using the libray “qdapRegex” I can use its function “rm_between” to remove the html tags and store the results in a vector “x”, namely the results shown above. Note that I “unlist” the vector so that its entries are in the form of character data rather than a generic unspecified data type, as in “lists.” That’s basically all there is to it!

Now that I got these results, I want to save them. This is done simple by the following script segment:

thistime <- paste(format(Sys.time(), "%a %b %d %X %Y"), sep="\t")
current.data <- paste(thistime, x[1], x[3], sep="\t")
current.data <- noquote(current.data)
current.data

[1] Tue Mar 13 14:57:07 20182458.53

file.out <- "./results/reduced_data.txt"
write(current.data, file=file.out, append=TRUE)

I determine the current date and time and put that in an appropriate format. I then construct a character string with the “paste” function with the current time, the lake level, and the generator status, all separate by a tab character. That string is then appended to an existing file “reduced_data.txt”. The first set of entries of the resulting file would look something like this:

Sun Mar 11 16:55:28 2018    OFF 2458.5
Sun Mar 11 17:00:27 2018    OFF 2458.5
Sun Mar 11 17:01:02 2018    OFF 2458.5
Sun Mar 11 17:02:30 2018    OFF 2458.5
Sun Mar 11 17:02:58 2018    OFF 2458.5
Sun Mar 11 17:03:24 2018    OFF 2458.5
Sun Mar 11 17:04:16 2018    OFF 2458.5

Another housekeeping element to be performed is a file for use in several of my websites. This data is used to construct a panel that shows lake levels, generator status, and lower and upper rule band values. More on that later. The script fragment that does this is as follows:

x[4]       <- format(Sys.time(), "%m/%d/%Y %H:%M")
monthindex <- as.numeric(format(Sys.time(), "%m"))   # Extract the month index

extracted <- paste(x[2], x[1], x[4], x[3], monthindex, sep="  ")
extracted <- noquote(extracted)

fileConn <- "./results/levelboxitems.txt"
write(extracted, file=fileConn)

5. Automating the Retrieval

All of the above is put in a single script, called ‘deepcreekhydro.R’ that is executed automatically by the system at times set by the operator. Its execution is controlled by a bash shell script called ‘extract_hydro_data’. This process is different for different operating systems and here the only one show is for MacOS 10.13.3.

Hence on my Mac this can be done in one of two ways ‘launchd jobs’ or ‘cron jobs’. The preferred way to add a timed job is to use ‘launchd.’

I have a launchd job that runs every 10 minutes. The p-list script for it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:</string>
    </dict>
    <key>KeepAlive</key>
    <dict/>
    <key>Label</key>
    <string>com.pitiur.deepcreekhydrodata</string>
    <key>ProgramArguments</key>
    <array>
        <string>/DAS/scripts/extract_hydro_data</string>
    </array>
    <key>StartInterval</key>
    <integer>600</integer>
</dict>
</plist>

For this script to work it must be located in a very special place on my computer. What this says is execute the ‘bash shell’ script “extract_hydro_data” every 600 seconds. I’m executing a second script, briefly mentioned earlier, that provide the data for constructing a small segment of a webpage on several of my websites. That script is called “lakelevelbox.R”. Now, the bash script that oversees all of this looks as follows:

#!/bin/bash
cd /DAS/code/
R < deepcreekhydro.R --no-save
R < lakelevelbox.R --no-save


# For the deepcreekanswers.com website
USER='YOUR USER NAME'
PASS='YOUR PASSWORD'
/usr/bin/ftp -inv 69.195.124.109 << ftpEOF
   quote user $USER
   quote PASS $PASS
   prompt
   asci
   cd /public_html/deepcreekanswers/i_files
   lcd /DAS/results/
   put i_leveldata.php
   cd /public_html/senstech/i_files
   put i_leveldata.php
   quit
ftpEOF

# For the deepcreekscience.com website
USER='YOUR USER NAME'
PASS='YOUR PASSWORD'
/usr/bin/ftp -inv ftp.deepcreekwatershed.org << ftpEOF
   quote user $USER
   quote PASS $PASS
   prompt
   asci
   cd ./deepcreekscience/i_files
   lcd /DAS/results/
   put i_leveldata.php
   quit
ftpEOF

It is left to the “student”" to figure out what it means. It basically goes to your local computer and executes the R scripts, it then logs into your website(s) and transfers information generated by these scripts to several places that your website(s) has access to. Here the info is added to my and websites.

The basic item that we’ve accomlished above is to set up a system whereby the lake levels and generator status as reported on the Deep Creek Hydro website are automatically retrieves every 10 minutes with the data saved on my computer for future use. That future use is to be determined in more detail as the process evolves in this report.

5. The Available Water

Assume we want to make a forecast for when to conduct a release at 6 am every morning. The above computations are done very quickly, in just a few seconds. One can then modify the bash script that as soon as 6:00 am, or just after, is reached, an R script is run that uses the last data entry for lake level in the “reduced_data.txt” and computes the available water for releases. Alternatively, I can use the information in the “levelboxitems.txt” file, since this is the latest information retrieved. Let’s use it.

I need to compute the total water volume between the current lake level and the lower rule band.

The lake volume as a function of the lake level (> 2445 ft) is given by the following expression, obtained from:

    V = 1.0511e+07 - 8696.8 * L + 1.7989 * L^2
    V = Lake Volume, acre-feet
    L = Lake Level, ft ASL

[The upper and lower rulebands are given there also](http://www.deepcreekscience.com/wamtesting.php.

The total available water is obtained by integrating the volume equation between the current water level and the current value of the lower rule band elevation.

So what do the rule bands look like? Figure 1 shows them.

The lower rule band has been made into a function that, given a date, finds what the lower rule band value is, a lake level in ft Above Sea Level.

The function is defines as follows:

lower_rule_band <- function(dt)  {
# Computes the lower rule band value at a specified date, dt
#
# dt is the date in the format shown: "3/22/2016"
# Usage:    x <- rule_band(dt)
#           returns lower.rb

    require(lubridate)
    # The origin of time for the time functions
    t.origin <- "1970-01-01 00:00:00"
    
    # The number of days per month
    days.per.month <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
    tmp <-  as.Date(dt,'%m/%d/%Y') 
    date.year <- as.numeric(format(tmp,'%Y'))
    if((date.year %% 4) == 0) days.per.month[2] <- 29
    

    # Lowest and highest desirable at the end of the month [brookfield permit-2011jun.pdf]
    # There are 13 month values listed.  The first is the same as the last
    lower.ruleband <- c(2455.0, 2455.0, 2456.0, 2458.0, 2459.6, 2460.0, 2460.0, 2459.0, 2458.0, 2457.0, 2456.0, 2455.0, 2455.0)

    d1 <- as.POSIXlt(dt, format="%m/%d/%Y")

    month1 <- month(d1)

    days1 <- day(d1)
        
    lrb <- lower.ruleband[month1] + days1/days.per.month[month1] * (lower.ruleband[month1+1] - lower.ruleband[month1])

    return(lrb)
}

The function is called “lower_rule_band” and saved in a file called “lower_rule_band.R”. The single parameter passed into it is a date in the form “5/26/2018”. I use a package called “lubridate” which makes extracting day, month and year values easy. I account for leap years. The lower ruleband values, as specified in the latest version of the permit, are listed in a vector. Linear interpolation is used to account fro the dates between the begiinning and end of a month. Here is a simple test to check its values:

source("./functions/lower_rule_band.R")

dates <- c("1/1/2015", "2/15/2015", "3/15/2015", "4/1/2015", "4/15/2015", "5/15/2015", "6/15/2015", "7/15/2015", "7/31/2015", "8/1/2015", "9/15/2015", "10/1/2015", "11/15/2015", "12/15/2015")

for(i in 1:length(dates)){
    lr <- lower_rule_band(dates[i])
    # print(paste(dates[i], lr, sep="    "))
    print(paste(dates[i], (format(round(lr, 2), nsmall = 2)), sep="    "))
}

[1] “1/1/2015 2455.00” [1] “2/15/2015 2455.54” [1] “3/15/2015 2456.97” [1] “4/1/2015 2458.05” [1] “4/15/2015 2458.80” [1] “5/15/2015 2459.79” [1] “6/15/2015 2460.00” [1] “7/15/2015 2459.52” [1] “7/31/2015 2459.00” [1] “8/1/2015 2458.97” [1] “9/15/2015 2457.50” [1] “10/1/2015 2456.97” [1] “11/15/2015 2455.50” [1] “12/15/2015 2455.00”

The results of this process are a series of data lower ruleband values. These can be checked for correctness with Figure 1 given above.

So the status now is that I have the current lake level, the lower rule band value and the difference computed by the following chunk of R script:

file1 <- "./results/levelboxitems.txt"
level.data <- readLines(file1)
x <- unlist(strsplit(level.data, split="  "))
lake.level <- as.numeric(x[4])
s <- unlist(strsplit(x[3], split=" "))
dt <- unlist(s[1])
lrb.level <- as.numeric(lower_rule_band(dt))
delta <- as.numeric(format(round(lake.level - lrb.level, 2), nsmall = 2))
print(paste("Lake Level = ", lake.level, "  Lower Rule Band = ", lrb.level, "  delta = ", delta, sep=""))

[1] “Lake Level = 2458.53 Lower Rule Band = 2456.83870967742 delta = 1.69”

Note that I had to use the “as.numeric” function to convert values to numerical type in order to be able to compute the difference. As before, the number is printed with only two significant figures after the decimal point, something that makes engineering sense, given that lake levels are reported with two significant figures after the decimal point on the Deep Creek Hydro website.

We now have to compute the lake volume from the equation between above at the two levels and subtract the values to give the available volume for releases.

# The lake volume function as a function of water elevation
lake_volume <- function(L){
    # V = Lake Volume, acre-feet
    # L = Lake Level, ft ASL
    V <- 1.0511e+07 - 8696.8 * L + 1.7989 * L^2
    V <- V * 1.0e+06/43560   # Convert from cubic foot to acre-feet
    return(V)
}

L1 <- lake.level
L2 <- as.numeric(format(round(lrb.level, 2), nsmall = 2))

v1 <- lake_volume(L1)
v2 <- lake_volume(L2)

volume <- v1 - v2
print(format(round(volume, 1), nsmall = 1))

[1] “5643.4”

Note that I’ve made generous use of the “round” function. I rarely use this unless I want to print publication quality data. Usually it is something like “print(volume)”.

To test to see if the lake volume function works appropriately, I use it to compute the volume for L1 and the volume for L1-0.001. I would expect the result to be small. Here it goes:

print(lake_volume(L1) - lake_volume(L1-0.0001))

[1] 0.3409069

And the result is as expected, a relatively small volume…

Part 2 of this report continues the analysis of developing t Water Allocation Model (WAM).


Author: PLV
First Published: March 3, 2018
FILE: 2018-03-03-Real-Time Automated Analysis, Part 1
2018 PLV Some Rights Reserved
Contact: pete@senstech.com

Back to the Top