• Predictive Analytics & Forecasting
  • Majid Khattak
  • JAN 14, 2016

Time Series in Atmospheric Sciences

Ready to learn Time Series Analysis? Browse courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.


This blog deals with the applications of time series in atmospheric and environmental sciences. The blog will also demonstrate the use of data transformation.

The time series data being considered is the Southern Oscillation Index (SOI) which is an index of El-Nino-Southern Oscillation (ENSO) phenomenon. The index is used by major climate research organizations in the USA and around the world. This index provides a measure of fluctuations in sea-level pressure due to El Niño and La Niña events. The El Niño became a household name during 1998 due to the havoc that it created in the world weather including North America. For details the reader may refer to web page by National Oceanic and Atmospheric Administration.

Effects of El-Niño

A webpage by Naval Postgraduate School summarizes the major effects of El Niño as:

  • Weather pattern changes like rainfall, temperatures, storm intensities.
  • Changes in ocean currents and temperatures.

The above effects lead to the following social, economic and health related consequences:

  • Incidences of fires.
  • Floods and Droughts.
  • Agriculture problems and crash of fisheries leading to increased cost of food and famine.
  • Political and social unrest.
  • Epidemics like Malaria and hanta virus.

The El Niño also has some benefits like:

  • Reduced number of hurricanes and cyclones in North Atlantic.
  • Milder winters in Southern Canada and northern continental United States.
  • Replenishment of water supplies in the southwestern U.S.
  • Fewer epidemics in some areas due to drier weather (like malaria in southeastern Africa)

Computing the Southern Oscillation Index

An anomaly in atmospheric science has nothing to do with abnormality as many are likely to believe. On the contrary, a standardized anomaly is defined as the difference between the data and it’s mean divided by the standard deviation of the data. Mathematically it is given as:


where \(Z\) is the standardized anomaly, \(x\) is the data, \(\bar{x}\) is the mean of the data and \(sd(x)\) is the standard deviation of the data. The numerator in the above equation is called an anomaly.

The following procedure is adopted to calculate the Southern Oscillation Index (SOI) values. Using the above equation, the standardized anomalies of the surface level pressure are calculated for Tahiti in the Pacific and Darwin in the Northern Territory, Australia. Then we find the difference between the values for Tahiti and Darwin. Finally, we calculate the standardized anomalies of these differenced values. Therefore, the above transformation is applied twice for computing SOI.

The SOI is used to study the extreme weather events like rainfall patterns in the North America, Australia and several other world regions. The SOI time series is frequently used in combination with other meteorological, agriculture, and health data to forecast droughts, rainfalls, diseases and crop yields etc. This helps in managing crop cultivation, in addition to taking measures with regard to flooding, droughts and associated damages. It’s also being used in relation to the spread of epidemics in several regions of the world.

Exploratory Analysis

Here I have performed exploratory analysis of the Southern Oscillation Index. The data was obtained from the website of National Oceanic and Atmosphere Administration.

soi <- read.csv("data.csv")
# The data is a data frame.
## [1] "data.frame"
##     Date Value
## 1 195101   1.5
## 2 195102   0.9
## 3 195103  -0.1
## 4 195104  -0.3
## 5 195105  -0.7
## 6 195106   0.2

When dealing with a time series it is recommended to convert the data to “ts” object.

soi.ts <- ts(soi[,2], start = c(1951,01), frequency = 12)
## [1] "ts"

Plotting the time series.

plot(soi.ts, main = "Southern Oscillation Index", xlab = "Year", ylab = "Southern Oscillation Index", type="l")

Since El-Niño usually peaks at the end of the calendar year, I will also plot the January values for all the years using R’s “window()” function.

# Extracting the Southern Oscillation Index time series values for the January.
soi.jan <- window(soi.ts, start = c(1951,1), frequency = TRUE)
plot(soi.jan, main = "Souther Oscillation Index for January", xlab = "Year", ylab = "SOI for January")



In the figures above you can spot the large negative value corresponding to the 1982/83 El Niño. The occurence was in December/January 1982/83. There is also a relatively large reading for 1998. The large negative values are a consequence of eastward movement of the precipitation activity from near Darwin to near Tahiti with higher than normal pressure at Darwin and lower than normal pressure at Tahiti. There are several other measures of the ENSO like the Oceanic Niño index (ONI) which is being used by the US National Oceanic and Atmospheric Administration.

For a general review of the applications of statistical and machine learning methods please refer to “Statistical Methods in Atmospheric Sciences” by Daniel Wilks. For people wishing to delve deeper into the topic, there is an excellent graduate level text “The El Niño-Southern Oscillation Phenomenon” by Sarachik and Cane. It contains an overview of the theory in addition to the forecasting methods generally used. I have not read the latter but it’s on my reading list in 2016.

Boston city bkg

Made in Boston @

The Harvard Innovation Lab


Matching Providers

Matching providers 2
comments powered by Disqus.