Predicting Energy Consumption (Part 1)

An Introduction to Time Series Analysis and Forecasting Using Python

Time Series Analysis & Forecasting

Time series data refers to a set of observations collected at different points in time, often at a regular interval. Analysis of time series data is crucial in a wide array of industries, including finance, epidemiology, meteorology, social sciences and many others.

In this article, we will look at the following topics within the domain of time series analysis and forecasting:

  • Exploratory Analysis
  • Visualizations
  • Seasonal Decomposition
  • Stationarity
  • ARIMA Models

These topics provide a very basic introduction to time series analysis and forecasting. More advanced forecasting methods will be discussed in Part 2 of this article. You can find a notebook with the code from this article in this GitHub repo.

Data Description

We will be analyzing hourly energy consumption data provided by PJM Interconnection. PJM Interconnection is a regional transmission organization (RTO) that coordinates distribution of electricity across a region including all or parts of 14 states in the northeastern United States. The data we will be looking at is composed of hourly energy consumption data from Duquesne Light Co. from January 1, 2005 to August 3, 2018. Duquesne Light Co. serves the Pittsburgh, PA, metropolitan area as well as large segments of Allegheny and Beaver Counties. A copy of the dataset we will be using can be found here. Individual service areas for large member companies of PJM, including Duquesne Light (DUQ), are shown in the figure below.

Image for post
Image for post

Energy consumption data are provided in megawatts (MW). The values provided represent a measurement of instantaneous power within the Duquesne Light grid. Since power represents the rate at which work is done, the data includes a collection of instantaneous rates, similar to a set of velocity data.

The total amount of energy consumed over a time period is represented as megawatt-hours (MWh). If power is constant, this value can be determined by multiplying the instantaneous power by the duration over which it is applied. In our data, instantaneous power varies hourly, so to determine the total amount of energy consumed we would need to integrate over time. We will be looking at consumption rates only for our data, as opposed to the total amount of energy consumed.

Exploratory Analysis

First, we will need to load the modules we will be using as well as our dataset:

# Load Modulesimport numpy as np 
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# Set Plotting Styles'ggplot')
# Load Data
duq_df = pd.read_csv('data/DUQ_hourly.csv', index_col=[0], parse_dates=[0])
# Sort Data

When our data was originally stored in the CSV file, timestamps were converted to strings. We need to parse them back into datetime objects for our analysis and set them as the dataset index. This can be done automatically using the parse_dates and index_col parameters for the read_csv function included with Pandas. Once we have established a datetime index, we sort the values to make sure they are presented in chronological order. Skipping this step will reveal that the timestamps may have been sorted by their timestamp strings previously, as some timestamps are out of order.

A quick evaluation of the data indicates that it has 119,008 rows and one column (DUQ_MW). We will now go through the data to identify duplicate timestamps, impute missing values and prepare some basic visualizations.

Duplicate Values

Some datasets may include duplicate readings that share a common timestamp. The reason for this duplication should be investigated for each unique dataset, as reasons for data duplication can vary widely depending on the data collection methodology.

There are several approaches towards the handling of duplicate values. We have the option of discarding both or one of the duplicate values, or imputing the timestamp value using each of the available measurements. For our example, we will compute the mean energy consumption for each of our duplicate value pairs and use that value moving forward.

Once we have removed duplicate values, we manually set the frequency of the DatetimeIndex to hourly (‘H’). Normally, the date parser would be able to determine the frequency of the DatetimeIndex automatically. The presence of duplicate DatetimeIndex values prevents this from happening though, so the frequency must be set manually following removal of duplicate values. Setting the frequency now will help us avoid problems with plotting and calculations down the line.

# Identify Duplicate Indices
duplicate_index = duq_df[duq_df.index.duplicated()]
print(duq_df.loc[duplicate_index.index.values, :])
# Replace Duplicates with Mean Value
duq_df = duq_df.groupby('Datetime').agg(np.mean)
# Set DatetimeIndex Frequency
duq_df = duq_df.asfreq('H')

Missing Values

A quick search through the dataset indicates that there are 24 missing values across our date range. We will use mean interpolation to impute these missing values. We can do this using the interpolate() method on our dataframe object:

# Determine # of Missing Values
print('# of Missing DUQ_MW Values: {}'.format(len(duq_df[duq_df['DUQ_MW'].isna()])))
# Impute Missing Values
duq_df['DUQ_MW'] = duq_df['DUQ_MW'].interpolate(limit_area='inside', limit=None)

Visualizing Energy Consumption Data

First, let’s look at a simple time series plot of our data:

plt.plot(duq_df.index, duq_df['DUQ_MW'])
plt.title('Duquesne Light Energy Consumption')
plt.ylabel('Energy Consumption (MW)')
Image for post
Image for post

There is a clear annual seasonal pattern visible in the data, but the details of each individual year are obscured by the density of the plot. Let’s look at a single week:

plt.plot(duq_df.index[:WEEK_END_INDEX], duq_df['DUQ_MW'][:week_end_index])
plt.title('Duquesne Light Energy Consumption (One Week)')
plt.ylabel('Energy Consumption (MW)')
Image for post
Image for post

From this plot we can also see a clear daily seasonality, with a repeated pattern of hourly consumption rates occurring each day. We also observe an upward trend within this week of data, but this trend is likely a part of the annual seasonality we saw previously.

Let’s break our DatetimeIndex into separate features so we can look at some of these patterns a little more closely:

def create_features(df):
df['Date'] = df.index
df['Hour'] = df['Date'].dt.hour
df['DayOfWeek'] = df['Date'].dt.dayofweek
df['Quarter'] = df['Date'].dt.quarter
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year
df['DayOfYear'] = df['Date'].dt.dayofyear
df['DayOfMonth'] = df['Date']
df['WeekOfYear'] = df['Date'].dt.weekofyear
df['DayOfYearFloat'] = df['DayOfYear'] + df['Hour'] / 24
df.drop('Date', axis=1, inplace=True)
return df
duq_df = create_features(duq_df)fig, axes = plt.subplots(2, 2, figsize=(16,16))# Day of Week
dow_labels = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
g = sns.boxplot(x=duq_df.DayOfWeek, y=duq_df.DUQ_MW, ax=axes[0][0])
# Month of Year
g = sns.boxplot(x=duq_df.Month, y=duq_df.DUQ_MW, ax=axes[0][1])
# Hour of Day
g = sns.boxplot(x=duq_df.Hour, y=duq_df.DUQ_MW, ax=axes[1][0])
# Year
g = sns.boxplot(x=duq_df.Year, y=duq_df.DUQ_MW, ax=axes[1][1])
fig.text(0.08, 0.5, 'Energy Consumption (MW)', va='center', rotation='vertical')
Image for post
Image for post

Here, we look at energy consumption in terms of the hour of day, day of the week, month of year, and year of our dataset range. We can see that energy consumption is slightly higher Monday-Friday than it is Saturday-Sunday, which is to be expected since many businesses do not operate on weekends in the U.S. Energy consumption typically peaks around July/August, and there are several more outliers within the months preceding and following the peak period than are observed during the cooler months (November — April). Hourly energy consumption is at its lowest around 4 AM. After this time it gradually increases until it plateaus around 2 PM. It then peaks again around 7 PM before declining again until it reaches its early morning low point. Mean energy consumption rates and interquartile ranges are fairly consistent from year to year, although a slight downward trend is observable.

Next, we’ll look at a seasonal plot to get a better idea how energy consumption patterns vary from year to year. We’ll use monthly mean energy consumption instead of hourly data to make it easier to distinguish between years when all of the data are plotted together.

year_group = duq_df.groupby(['Year', 'Month']).mean().reset_index()
years = duq_df['Year'].unique()
NUM_COLORS = len(years)
cm = plt.get_cmap('gist_rainbow')
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_prop_cycle(color=[cm(1.*i/NUM_COLORS) for i in range(NUM_COLORS)])
for i, y in enumerate(years):
df = year_group[year_group['Year'] == y]
#rolling_mean = df.DUQ_MW.rolling(window=7*24).mean()
plt.plot(df['Month'], df['DUQ_MW'])
plt.title('Mean Monthly Energy Consumption by Year')
plt.ylabel('Mean Energy Consumption (MW)')
Image for post
Image for post

Now, let’s plot each year’s worth of data on a separate subplot so individual years can be evaluated more closely. We will use a 7-day moving average to smooth the data, making general trends easier to identify for comparison.

num_rows = 7
num_cols = 2
year_index = 0
fig, axes = plt.subplots(num_rows, num_cols, figsize=(18,18))
years = duq_df['Year'].unique()
for i in range(num_rows):
for j in range(num_cols):
df = duq_df[duq_df['Year'] == years[year_index]]
rolling_mean = df['DUQ_MW'].rolling(window=7*24).mean()
axes[i][j].plot(df['DayOfYearFloat'], rolling_mean.values)
axes[i][j].set_ylim(1100, 2500)
year_index += 1

fig.text(0.5, 0.08, 'Elapsed Days', ha='center')
fig.text(0.08, 0.5, 'Energy Consumption (MW)', va='center', rotation='vertical')
Image for post
Image for post

These plots help us identify the unique fluctuations of energy consumption rate across each year. For example, in 2008 the peak energy consumption rate occurs much earlier than in other years. Some years, such as 2012, demonstrate a clear peak summer season, while other years, such as 2015, also show local maxima in the winter season as well.

Moving Average

In the plots we just created, we used a smoothing method called a moving average. A moving average is an average value calculated over a fixed window of observations. The average value is calculated over the fixed window length, and the window is shifted one unit into the future. A new average value is calculated, and this process is repeated in a stepwise fashion across the entire time series.

Moving averages are extremely useful for reducing the amount of noise that is present in a time series. They provide a smoothing effect that makes it easier to identify general trends and patterns within the data.

The size of the fixed-length window that is used to calculate the moving average is determined by the analyst. Larger window sizes result in greater smoothing but reduce granularity, obscuring patterns that appear at lower levels within the time series. The plot below includes a few moving averages calculated using our energy consumption data to demonstrate the smoothing effect produced by increasing the moving average window size.

YEAR_PERIOD = 24*365
month_roll = duq_df.rolling(MONTH_PERIOD).mean()
midyear_roll = duq_df.rolling(MIDYEAR_PERIOD).mean()
year_roll = duq_df.rolling(YEAR_PERIOD).mean()
fig, ax = plt.subplots(figsize=(24, 10))
plt.plot(month_roll.index, month_roll['DUQ_MW'], color='red', label='30-Day MA')
plt.plot(midyear_roll.index, midyear_roll['DUQ_MW'], color='blue', label='180-Day MA')
plt.plot(year_roll.index, year_roll['DUQ_MW'], color='black', label='365-Day MA')
plt.title('Duquesne Light Energy Consumption Moving Averages')
plt.ylabel('Energy Consumption (MW) Moving Average')
Image for post
Image for post

Seasonality and Classical Seasonal Decomposition

Classical seasonal decomposition asserts that time series data can be broken apart into four separate components:

  • Base Level / Average Value — The average, stationary value of the observations
  • Trend — Increase / decrease of observations over time
  • Seasonality — A pattern in the data that repeats over a fixed period
  • Residuals — Error

For our analysis, the Base Level and Trend will be combined to form a single Trend component.

It is important to note the difference between seasonal and cyclical patterns, as both may be observed in time series data. Seasonal patterns occur with a regular, fixed period. Seasonality may be used to describe regular patterns that repeat daily, weekly, annually, etc. Cyclic patterns, on the other hand, are observed when data rises and falls with an irregular period.

When trying to determine if an observed pattern is seasonal or cyclic, first determine whether or not the pattern recurs at regular intervals. Patterns that occur at regular intervals can usually be identified by period lengths that correspond to our calendar/time-keeping system. For example, patterns that repeat daily, weekly, or annually would be considered seasonal. If the observed period of the pattern is irregular and does not seem to be tied to our calendar/time-keeping system, it is most likely a cyclical pattern. For example, stock price data often rises and falls based on business cycles, but these cycles are irregular and very difficult to predict since they are based on numerous external factors. The distance between peaks produced by these cycles is not constant, indicating a cyclical pattern.

Let’s break our energy consumption data into three components (trend, seasonality, and residuals). To do this, we will use the seasonal_decompose function that is provided as part of the statsmodels module.

Additive vs. Multiplicative Decomposition

When using the seasonal_decompose function, we will need to specify whether the decomposition should assume an additive or multiplicative model. An additive model assumes that the sum of the individual time series components (Trend + Seasonality + Residuals) is equal to the observed data. A multiplicative model assumes that the product of the individual time series components (Trend * Seasonality * Residuals) is equal to the observed data.

Typically, additive decomposition is most appropriate if there is little to no variation in the seasonality over time. Multiplicative decomposition is most appropriate when the magnitude of seasonality varies over time. Before we perform seasonal decomposition, we will look at the magnitude of some of our seasonalities by measuring the peak-to-peak amplitude of each seasonal waveform. The peak-to-peak amplitude is equal to the difference between the maximum value (peak) and minimum value (trough) within each seasonal cycle.

max_daily_vals = duq_df.groupby(['Year', 'DayOfYear']).max()['DUQ_MW'].values
min_daily_vals = duq_df.groupby(['Year', 'DayOfYear']).min()['DUQ_MW'].values
daily_amp = max_daily_vals - min_daily_vals
plt.xlabel('Day #')
plt.ylabel('Amplitude (MW)')
plt.title('Estimated Daily Amplitude')
Image for post
Image for post

We can see that the daily amplitude also follows a seasonal pattern, increasing or decreasing depending on the day of the year. In an additive model, we would expect the amplitude to remain relatively constant. Since the amplitude changes over time, we will assume that a multiplicative model best applies to the data.

from statsmodels.tsa.seasonal import seasonal_decompose
mult_decomp = seasonal_decompose(duq_df['DUQ_MW'], model='multiplicative', extrapolate_trend='freq', period=ANNUAL_PERIOD)
Image for post
Image for post


The stationarity of a time series refers to whether or not its statistical properties (mean, variance, etc.) remain constant as time progresses. In a stationary time series, statistical properties of the data do not change over time, whereas in a non-stationary time series statistical properties vary. The observed values in a stationary time series are fully independent of time. As a result, time series that exhibit trend or seasonality are non-stationary.

It is important to consider the stationarity of a time series when the intention is to develop a forecasting model. Most forecasting approaches assume the time series is either stationary or can be rendered stationary via mathematical transformation.

Augmented Dickey-Fuller (ADF) Test

There are several statistical methods that can be used to evaluate stationarity, but one of the most common methods is the Augmented Dickey-Fuller test, or ADF test. The ADF test is a type of test known as a unit root test. A unit root is a stochastic trend component within a time series. It adds a random, unpredictable pattern to the time series that is unrelated to trend or seasonality. By detecting whether or not the time series contains a unit root, we can determine whether or not the series is stationary.

The ADF test is an augmented form of the Dickey-Fuller test. The Dickey-Fuller test assumes that the time series data can be approximated by a first order autoregressive model with a white noise error. For most real-world datasets, this is an oversimplification and does not provide an adequate model. The augmented version of the Dickey-Fuller test allows for datasets that are better fit using higher order autoregressive models.

An ADF test assumes the following hypotheses:

  • Null Hypothesis — If not rejected, suggests that the time series contains a unit root and is non-stationary.
  • Alternative Hypothesis — If the Null Hypothesis is rejected, suggests that the time series does not have a unit root and is stationary.

To perform an ADF test, we will use the adfuller function from the statsmodels module. This function takes a time series as input and outputs a tuple containing the following values:

  • ADF Test Statistic
  • p-value
  • Number of Lags Used
  • Number of Observations Used
  • Critical Values (1%, 5%, and 10%)
  • Maximum Information Criterion (if # of lags is specified)

The first two items (ADF Test Statistic and p-value) are the ones we are most interested in. The p-value will help us determine whether nor not we should reject the null hypothesis. For our analysis, we will reject the null hypothesis if the p-value obtained by the ADF test is < 0.05.

The ADF test is designed to identify non-stationarity caused by the presence of a unit root within the process of a time series, but it is not designed to detect non-stationarity in other forms such as seasonality. We can see an example of this with our energy consumption data, which includes clear seasonality:

from statsmodels.tsa.stattools import adfulleradf_result = adfuller(duq_df['DUQ_MW'])print(f'ADF Statistic: {adf_result[0]}')
print(f'p-value: {adf_result[1]}')
Image for post
Image for post

The p-value obtained by the ADF test is <0.05, indicating that we should reject the Null Hypothesis and accept our Alternative Hypothesis that the time series does not contain a unit root. However, our seasonal decomposition showed that there are very clear seasonality components within our data.

ARIMA Models

ARIMA is an acronym that stands for Autoregressive Integrated Moving Average. ARIMA models offer a basic approach to time series forecasting. An ARIMA model consists of three components:

  • Autoregressive Model Term (p)
  • Moving Average Model Term (q)
  • Order of Differencing (d)

The term “integrated” that is included in the ARIMA acronym refers to the use of differencing of raw data to make the time series stationary. Let’s explore the components of an ARIMA model a bit further.

Autoregressive Models

An autoregressive model is a model that makes use of a regression developed using past (lagged) values observed from the time series. In other words, past values from the time series are used to create a regression equation that can be used to predict future values of the time series. If we define the observed value of a time series at time t as y(t), an autoregressive model developed using the immediate preceding value from the time series would be expressed by the following linear regression equation:

Image for post
Image for post

In this equation, y(t-1) represents the observed value of the time series one unit time step prior to the value we are trying to predict, A represents the linear regression coefficient, B represents the linear regression constant term (in this case the mean, or base, value of our time series), and ε represents the regression error.

Autoregressive models are defined by their order, which is the number of preceding time series values that are used to predict future values. Autoregressive models may be referred to using the abbreviated form AR(n), where n represents the model order. The autoregressive model described above is a first order model since it only incorporates one preceding value into the regression, and it may be written as AR(1).

Moving Average

A moving average model is a form of regression model that uses a linear combination of white noise series terms to forecast future values. A white noise series is a sequence of uncorrelated variables that have a constant variance and mean. A white noise series does not exhibit autocorrelation.

To understand what this means, let’s look at the linear regression equation for a first order moving average model:

Image for post
Image for post

In this equation, y(t) represents the value of the time series at time t (the value we are trying to predict), C represents a regression coefficient, D represents a regression constant (the base level of our time series), ε(t-1) represents a white noise series lagged by one time unit, and ε(t) represents a white noise series.

This regression equation says that the next value in the time series can be predicted by taking the base level of the time series and adding white noise error. The amount of white noise to be added is a linear combination of past white noise error terms with different coefficients. These coefficients essentially produce a weighted average of previous error terms to be added to the time series base level.

It should be noted that the term “moving average” has two separate meanings in time series analysis: 1.) Data smoothing, and 2.) Use of white noise errors to develop a regression for forecasting.

Moving average models can be written as MA(n), where n represents the model order, or number of previous residual values that are included in the regression. The first order moving average model illustrated above can be written as MA(1).


When defining an ARIMA model, we must also specify the degree of differencing to implement. Differencing is a technique that attempts to increase stationarity by subtracting a previous observation from the current observation. Subtracting the observation immediately preceding the current observation produces a first difference:

first_difference = observed_value(t) - observed_value(t-1) 

Differencing can also be performed with a lag. A lag of 7, for example, means that an observation 7 time units prior to our current observation should be subtracted from our current observation:

weekly_lag_difference = observed_value(t) - observed_value(t-7)

Over-differencing has a negative impact on model performance, so it is important to select the correct degree of differencing when defining an ARIMA model. When specifying the degree of differencing for our ARIMA model, we will be identifying the number of times differencing will be performed on our time series. It is important to note that this is not the same as specifying a lag period. As an example, setting the degree of differencing equal to one will produce the following result:

differenced_series = observed_value(t) - observed_value(t-1)

Setting the degree of differencing equal to two will produce the following result (which can also be described as the first difference of the first difference):

differenced_series = (observed_value(t) - observed_value(t-1)) - 
(observed_value(t-1) - observed_value(t-2))

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

To determine the moving average and autoregressive model orders that are most appropriate for our data, we will need to use the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), respectively. To understand what these are, we will need to define some additional terms.

Autocorrelation refers to the correlation between values of a time series and lagged values of the same time series. It represents the degree of correlation between present and past values of a time series. The autocorrelation of a time series can be calculated for several different lag values and plotted. This plot is known as the Autocorrelation Function, or ACF. The ACF for our energy consumption data is shown below.

from import plot_acfplot_acf(duq_df['DUQ_MW'], lags=50)
Image for post
Image for post

The Partial Autocorrelation Function, or PACF, summarizes the relationship between present and past values within a time series but removes the effects of any potential relationships at intermediate, or “lower-order”, time steps. A typical autocorrelation includes effects from both the direct correlation between an observation at a lagged time value and the present value as well as the correlations between intermediate time values and the present value. By removing the intermediate correlations, we are left with a PACF plot that we can use to determine the order of our moving average model. The PACF for the first 100 lags of our energy consumption data is shown below:

from import plot_pacfplot_pacf(duq_df['DUQ_MW'], lags=100)
Image for post
Image for post

In order to determine the correct order for our autoregressive and moving average model terms, we will need to review our ACF and PACF plots. In general, if the PACF shows a sharp cutoff after a certain point and the lag-1 autocorrelation is positive, the order of the AR term can be determined by looking at the lag value for which the PACF cuts off. The PACF is highly significant for the first 24-hour cycle then drops off. This suggests that an AR order of 24 may be appropriate. However, use of an order this large will dramatically slow down model training time.

Addition of a moving average term to our model can be evaluated by looking at the ACF plot. If the plot demonstrates a sharp cutoff point and the lag-1 autocorrelation is negative, this suggests that a MA term should be added to the model. We do not see either of these features in our ACF plot. This suggests that a MA term may not help our model.

The AR order demonstrated by the PACF plot is deceptive. Since we did not use a differenced series, seasonality is captured in the autocorrelations. An AR(24) model will be inefficient to train, so we are better off exploring other options for incorporating seasonality into our ARIMA model.

Seasonality & ARIMA

Seasonality is not addressed directly using a standard ARIMA model. However, we can use an ARIMA model adapted to manage seasonality to model data with clear seasonal trends. The seasonal ARIMA model we will be using is known as SARIMAX, and it is included in the statsmodels module. This model can also be used to create a standard ARIMA model by setting all parameters related to seasonality to zero.

A seasonal ARIMA model is defined using each of the model components identified for a standard ARIMA model (p, d, and q) with the addition of a set of seasonal terms (P, D, Q, and s) that must also be defined. When defining a seasonal ARIMA model, the terms P, D, and Q represent the autoregressive order, order of differencing, and moving average order, respectively, of the seasonal component of the data. The lowercase versions of these parameters represent the same set of orders for the non-seasonal component of the data. The value s represents the periodicity of the seasonality.

Example Models

We will create some baseline ARIMA models using different combinations of autoregression and moving average model components. We will then use these models to forecast energy consumption for just over one day. To speed up model training for this example, we will discard the first several years of data when creating our training dataset. We will only use data collected between January 1, 2014 and August 1, 2018 for model training, and we will attempt to predict data from August 1, 2018 to August 2, 2018.

Error Metric

We will be evaluating the performance of our model using mean squared error. Mean squared error (MSE) is a measurement of the average value of the squares of the error residuals. Here is the equation that will be used to calculate the mean squared error of the forecasted values:

Image for post
Image for post

In this equation, n represents the number of observations in our test set, yi represents the forecasted value at time step i, and ŷi represents the actual value at time step i. Using this equation, we are taking the difference between the forecasted and observed values at each time step (the error), squaring it, summing the squared errors over our forecasting window, and dividing by the sum by the number of time steps in our forecasting window to calculate the average (mean) squared error.

One-Step Forecasting

Forecasts will be generated using a one-step approach. This approach forecasts the value for the next time step in the series. The forecasted value is stored in a list. Once an actual value has been observed for the next time step, this value is incorporated into the time series. The model is then retrained and used to predict the value for the next unobserved time step. This process is repeated over a user-identified window of time.

This forecasting approach is being used as an example, but some problems may require forecasting of multiple unobserved time steps simultaneously. In fact, a multistep forecasting approach may be more suited towards our dataset since a utility would likely prefer to know the forecasted consumption for a full day or longer as opposed to only one hour ahead. We will stick with this approach for now though to see how selection of model parameters impacts our forecasting accuracy.

The example code below shows the one-step forecasting approach using a simple AR(1) model. This approach will be used for a few sets of parameters to generate some baseline models for comparison.

from statsmodels.tsa.arima_model import ARIMA
from sklearn.metrics import mean_squared_error
from datetime import datetime
train_series = duq_df.loc[(duq_df.index >= datetime(2014, 1, 1)) & (duq_df.index < datetime(2018, 8, 1)), 'DUQ_MW']
test_series = duq_df.loc[(duq_df.index >= datetime(2018, 8, 1)), 'DUQ_MW']
preds = []
history = [x for x in train_series]
for t in range(len(test_series)):
model = ARIMA(history, order=(1,0,0))
model_fit =
output = model_fit.forecast()
pred_series = pd.Series(preds, index=test_series.index)error = mean_squared_error(test_series, pred_series)
print('MSE: %.3f' % error)
plt.plot(test_series, label='Observed Values')
plt.plot(pred_series, color='blue', label='Predictions')
plt.ylabel('Energy Consumption (MW)')

Model Results

The table below summarizes the results from each of our models.

Image for post
Image for post

Plots of observed values and values forecasted using standard ARIMA models are shown below.

Image for post
Image for post

Plots of observed values and values forecasted using seasonal ARIMA models are shown below.

Image for post
Image for post

The forecast results indicate that autoregressive terms appear to be more important that moving average terms in the models that were developed. This coincides with our conclusions from the ACF plot that a moving average term may not be a useful model component. However, the combination of an autoregressive term and a moving average term provides better forecasting results than either term independently. This holds true when seasonality is incorporated into the model as well.

First-order differencing improved model performance both with and without incorporation of seasonality. We can also see that the addition of 24-hour seasonality to the model improved forecasting accuracy significantly. The only exception to this trend is the moving average model. Performance of this model declined heavily following the incorporation of seasonality.

The best performing model was a SARIMA model with a seasonality of 24 and a value of 1 for all available parameters related to autoregression, moving average, and differencing. Since we have only developed baseline models to see the impacts of different model terms, we may be able to obtain even better results by testing out different combinations of parameters and exploring our data more thoroughly. We may also be able to improve the results by adopting alternative forecasting models, such as exponential smoothing or LSTMs. We will explore these alternatives in Part 2 of this article.

Thanks for Reading!

Please give this article a clap if you found it useful. Code for this article can be found in this Github repository.


Water/Wastewater Engineer ♦ Data Nerd

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store