Merton’s Structural Model and Extension

Pioneered by Merton (1974) and Black and Scholes (1973), structural (or asset value) model is one of the two primary classes of credit risk modeling approaches (The other one is the reduced form model.). It assumes that at time t a firm with risky assets A_{t} is financed by equity E_{t} and zero-coupon debt D_{t} of face value K maturing at time T>t: A_{t}=E_{t}+D_{t}.

When the firm’s asset is valued more than its debt A_{T}\geqslant K at time T, the debt holders will be paid the full amount K and the shareholders’ equity will be \left(A_{T}-K\right). On the other hand, when the firm fails to repay (therefore defaults on) the debt at T, the debt holders can only recover A_{T}<K and the sharehodlers will get nothing.[ref]Here is sidenote test page.[/ref] The equity value at time T can be represented as an European call option on asset A_{t} with strike price K maturing at T, E_{T}=\max\left(A_{T}-K,0\right). The asset value is assumed to follow a geometric Brownian motion process, with risk-neutral dynamics given

(1)   \begin{equation*} dA_{t}=rA_{t}dt+\sigma_{A}A_{t}dW_{t} \end{equation*}

where r denotes the risk-free interest rate, \sigma_{A} is the volatility of asset’s returns, and W_{t} is a Brownian motion under the risk-neutral measure. Applying Black-Scholes formula would give

    \[ E_{t}=A_{t}\Phi\left(d_{1}\right)-Ke^{-r\left(T-t\right)}\Phi\left(d_{2}\right) \]

where d_{1}=\frac{1}{\sigma_{A}\sqrt{T-t}}\left[\ln\left(\frac{A_{t}}{K}\right)+\left(r+\frac{\sigma_{A}}{2}\right)\left(T-t\right)\right], d_{2} = d_{1}-\sigma_{A}\sqrt{T-t}, and \Phi\left(\cdot\right) denotes the standard normal \textit{cdf}. The probability of default at time T is given by \textrm{P}\left(A_{T}<k\right)=\phi\left(-d_{2}\right). <=”" p=”">

A typical strategy of debt holders to protect themselves from the credit risk is to long a put option P_{t} on A_{t} with strike K maturing at T. The put option will be valued at \left(K-A_{T}\right) if A_{T}<k, and=”" worth=”" nothing=”" if=”" a_{t}="">K. Purchasing the put option guarantees that the credit risk of the loan is hedged completely as the debt holder’s payoff equals K at maturity no matter if the obligor defaults or not. It therefore forms a risk-free position

(2)   \begin{equation*} D_{t}+P_{t}=Ke^{-r\left(T-t\right)}. \end{equation*}

The price of put option P_{t} is determined by applying Black-Scholes formula as

(3)   \begin{equation*} P_{t}=Ke^{-r\left(T-t\right)}\Phi\left(-d_{2}\right)-A_{t}\Phi\left(-d_{1}\right). \end{equation*}

Taking account the credit risk spread (risk premium) s, the value of the risky bond is

(4)   \begin{equation*} D_{t}=Ke^{-\left(r+s\right)\left(T-t\right)}. \end{equation*}

Combining Eq.(2) — (4) gives a closed-form formula for the credit spread

    \[ s=-\frac{1}{T-t}\ln\left[\Phi\left(d_{2}\right)-\frac{A_{t}}{K}e^{r\left(T-t\right)}\Phi\left(-d_{1}\right)\right] \]

where \frac{A_{t}}{K} represents the firm’s leverage. Note that s depends only on A_{t} and \sigma_{A} which is in line with the economic intuition. Their nonlinear relationship can be observed from the below figures.

Many approaches have been proposed to improve the classical Merton’s model. The first passage model introduced by Black and Cox (1976) allows the firm may default at any time before the debt maturity. Jones et al. (1984) suggest to introduce stochastic interest rates to improve the model’s performance. Longsta and Schwartz (1995) employ a Vasicek process for the interest rate, dr_{t}=\left(a-br_{t}\right)dt+\sigma_{t}dW_{t}^{\left(r\right)}, while Kim et al. (1993) consider a CIR process, dr_{t}=\left(a-br_{t}\right)dt+\sigma_{t}\sqrt{r_{t}}dW_{t}^{\left(r\right)}, and Briys and De Varenne (1997) treat the interest rate following a generalized Vasicek process, dr_{t}=\left(a\left(t\right)-\left(t\right)r_{t}\right)dt +\sigma_{t}\left(t\right)dW_{t}^{\left(r\right)}. By comparing the Merton’s model and its four extensions Eom et al. (2004) find substantial spread predication errors that four models underestimate the spread observed from the market while the other one overestimate it.


  • Black, F., Cox, J. C., 1976. Valuing Corporate Securities: Some Effects of Bond Indenture Provisions. Journal of Finance 31, 351-367.
  • Black, F., Scholes, M., 1973. The Pricing of Option and Corporate Liabilities. Journal of Political Economy 81, 637-654.
  • Briys, E., De Varenne, F., 1997. Valuing Risky Fixed Rate Debt: An Extension. Journal of Financial and Quantitative Analysis 32 (2).
  • Eom, Y., Helwege, J., Huang, J., 2004. Structural Models of Corporate Bond Pricing: An Empirical Analysis. Review of Financial Studies 17 (2), 499-544.
  • Jones, E., Mason, S., Rosenfeld, E., 1984. Contingent Claims Analysis of Corporate Capital Structures. Journal of Finance 39 (3), 611-625.
  • Kim, I. J., Ramaswamy, K., Sundaresan, S., 1993. Does Default Risk in Coupons Affect the Valuation of Corporate Bonds?: A Contingent Claims Model. Financial Management, 117-131.
  • Longstaff, F. A., Schwartz, E. S., 1995. A Simple Approach to Valuing Risky Fixed and Floating Rate Debt. Journal of Finance 50, 789-819.
  • Merton, R. C., 1974. On the Pricing of Corporate Debt: The Risk Structure of Interest Rates. Journal of Finance 2 (2), 449-470.

Risk Measures

Let’s consider that there are different types (i.e. distributions) of assets, all with the same volatility and mean. The standard mean-variance analysis indicates that all these assets are equally risky. In reality market, however, participants view the risk in them differently.

In practice, the problem of risk comparisons is difficult because the underlying distribution of market prices and returns of various assets is unknown. One can try

  • to identify the distribution by maximum likelihood methods
  • test the distributions against other distributions by using methods such as the Kolmogorov-Smirnov test

Practically, it is impossible to accurately identify the distribution of financial returns.

The most common approach to the problem of comparing the risk of assets having different distributions is to employ a risk measure that represents the risk of an asset as a single number that is comparable across assets.

Three risk measures: Volatility, Value-at-Risk, Expected Shortfall


It is sufficient as a risk measure only when financial returns are normally distributed.


VaR is a single summary statistical measure of risk. It is distribution independent.

The three steps in VaR calculations:

  1. to specify the probability, p, of losses exceeding VaR: 1% (the most common); 0.1% for applications like economic capital or long-run risk analysis for pension funds
  2. to specify the holding period: usually one day
  3. to specify the probability distribution of the P/L of the portfolio: by using past observations and a statistical model.

There are three main issues that arise in the implementation of VaR:

  • VaR is only a quantile on the P/L distribution.
  • VaR is not a coherent risk measure: not subadditivity. It is subadditive in the special case of normally distributed returns.
  • VaR is easy to manipulate

Expected Shortfall

Also be known as tail VaR or conditional VaR (CVaR). It measures the expected loss when losses exceed VaR.

The ES is the negative expected value of P/L over the tail density f_{\textrm{VaR}}\left(\cdot\right)

    \begin{eqnarray*} \textrm{ES} & = & -\left[Q\mid Q\leqslant-\textrm{VaR}\left(p\right)\right]\\ & = & -\int_{-\infty}^{-\textrm{VaR}\left(p\right)}xf_{\textrm{VaR}}\left(x\right)dx \end{eqnarray*}

If the P/L distribution is standard normal, then

    \[ \textrm{ES}=-\frac{\phi\left(\Phi^{-1}\left(p\right)\right)}{p} \]

where \phi and \Phi are the normal density and distribution respectively.

Here is a sample R code

Advantages of using ES:

  1. Any bank that has a VaR-based risk management system could implement ES without much additional effort.
  2. ES is subadditive while VaR is not.

However, in practice the vast majority of financial institutions employ VaR and not ES. The reasons may be:

  1. ES is measured with more uncertainty than VaR. The first step in ES estimation is ascertaining the VaR and the second step is obtaining the expectation of tail observations. This means that there are at least two sources of error in ES.
  2. More importantly, ES is much harder to backtest than VaR because the ES procedure requires estimates of the tail expectation to compare with the ES forecast. Therefore, in backtesting, ES can only be compared with the output from a model while VaR can be compared with actual observations.

Holding Periods

In practice, the most common holding period is daily, but many other holding periods are also employed: e.g. hourly (or every 20/10-min) 90% VaR is used on the trading floor.

Basel Accords require financial institutions to model risk using 10-day holding periods. The majority of risk managers employ scaling laws to obtain such risk levels.

Square-root-of-time scaling \sqrt{T}

It supposes the observed random variables \left\{X_{t}\right\} are IID with variance \sigma^2 over time. The variance of sum of T consecutive Xs is then

    \[ \textrm{Var}\left(X_{t}+X_{t+1}+\ldots+X_{t+T}\right)=\textrm{Var}\left(X_{t}\right)+\textrm{Var}\left(X_{t+1}\right)+\ldots+\textrm{Var}\left(X_{t+T}\right)=T\sigma^{2} \]

This implies that volatility scales up by \sqrt{T}.

The square-root-of-time scaling rule does not apply to VaR unless we assume the returns are normal. It should not be considered to obtain multi-day VaR forecasts by scaling up daily VaR using \sqrt{T}, although the 1996 amendment of Basel Accords explicitly recommends to do so.

Multivariate Volatility Models

Most applications deal with portfolios where it is necessary to forecast the entire covariance matrix of asset returns.

Consider the univariate volatility model:

    \[ Y_{t} = \sigma_{t} Z_{t} \]

where Y_{t} are returns; \sigma_{t} is conditional volatility, and Z_{t} are random shocks.


The multivariate form of EWMA is

    \[ \hat{\Sigma}_{t}=\lambda\hat{\Sigma}_{t-1}+\left(1-\lambda\right)y_{t-1}^{\prime}y_{t-1} \]

with an individual element given by

    \[ \hat{\sigma}_{t,ij}=\lambda\hat{\sigma}_{t-1,ij}+\left(1-\lambda\right)y_{t-1,i}y_{t-1,j}\quad i,j=1,\ldots,K \]

where \lambda = 0.94 as per RiskMetrics.

A sample R code for EWMA is

Orthogonal GARCH (OGARCH)

It is usually very hard to estimate multivariate GARCH models. In practice, alternative methodologies for obtaining the covariance matrix are needed.

The orthogonal approach transforms linearly the observed returns matrix into a set of portfolios with the key property that they are uncorrelated, implying we can forecast their volatilities separately. This makes use of principal components analysis (PCA).

Orthogonalising covariance

The first step is to transform the return matrix y^{\left\{T\times K\right\}} into uncorrelated portfolio u^{\left\{T\times K\right\}}. Denote \hat{R}^{\left\{K\times K\right\}} as the sample correlation of y^{\left\{T\times K\right\}}. We then calculate orthogonal matrix of eigenvectors of \hat{R}^{\left\{K\times K\right\}}, denoted by \Lambda^{\left\{K\times K\right\}}. Then u^{\left\{T\times K\right\}} is defined by:

    \[ u^{\left\{T\times K\right\}}=\Lambda^{\left\{K\times K\right\}} \times y^{\left\{T\times K\right\}}. \]

The rows of u^{\left\{T\times K\right\}} are uncorrelated with each other so we can run a univariate GARCH or a similar model on each row in u^{\left\{T\times K\right\}} separately to obtain its conditional variance forecast, denoted by D_{t}. We then obtain the forecast of the conditional covariance matrix of the returns by:

    \[ \hat{\Sigma}_{t}=\Lambda \hat{D}_{t} \Lambda^{\prime}. \]

This implies that the covariance terms can be ignored when modeling the covariance matrix of u, and the problem has been reduced to a series of univariate estimations.

Large-scale implementations

In the above example, all the principal components (PCs) were used to construct the conditional covariance matrix. However, it is possible to use just a few of the columns. The highest eigenvalue corresponds to the most important principle component—the one that explains most of the variation in the data.

Such approaches are in widespread use because it is possible to construct the conditional covariance matrix for a very large number of assets. In a highly correlated environment, just a few principal components are required to represent system variation to a very high degree of accuracy. This is much easier than forecasting all volatilities directly in one go.

PCA also facilitates building a covariance matrix for an entire financial institution by iteratively combining the covariance matrices of the various trading desks, simply by using one or perhaps two principal components. For example, one can create the covariance matrices of small caps and large caps separately and use the first principal component to combine them into the covariance matrix of all equities. This can then be combined with the covariance matrix for fixed income assets, etc.

Correlation Models

Constant conditional correlations (CCC)

Bollerslev (1990) proposes the constant conditional correlations (CCC) model where time-varying covariances are proportional to the conditional standard deviation. The conditional covariance matrix \hat{\Sigma}_{t} consists of two components that are estimated separately: sample correlations \hat{R} and the diagonal matrix of time-varying volatilities \hat{D}_{t}.

    \[ \hat{\Sigma}_{t} = \hat{D}_{t} \hat{R} \hat{D}_{t} \]


    \[ \hat{D}_{t}=\left(\begin{array}{ccc} \hat{\sigma}_{t,1} & 0 & 0\\ 0 & \ddots & 0\\ 0 & 0 & \hat{\sigma}_{t,K} \end{array}\right). \]

The volatility of each asset \hat{\sigma}_{t,k} follows a GARCH process or any of the univariate models discussed here.

This model guarantees the positive definiteness of \hat{\Sigma}_{t} if \hat{R} is positive definite.

Dynamic conditional correlations (DCC)

In particular, the assumption of correlations being constant over time is at odds with the vast amount of empirical evidence supporting nonlinear dependence. To correct this defect, Engle (2002) and Tse and Tsui (2002) propose the dynamic conditional correlations (DCC) model as an extension to the CCC model.

Different with CCC model, the correlation matrix is time dependent within the DCC framework as

    \[ \hat{R}_{t} = \hat{Q}_{t}^{\prime} \hat{Q}_{t} \]

where \hat{Q}_{t} is a symmetric positive definite autoregressive matrix and is given by

    \[ \hat{Q}_{t}=\left(1-\zeta-\xi\right)\bar{Q}+\zeta Y_{t-1}^{\prime}Y_{t-1}+\xi\hat{Q}_{t-1} \]

where \bar{Q} is the K\times K unconditional covariance matrix of Y; \zeta,\xi > 0 and \zeta + \xi <1 to ensure positive definiteness and stationarity, respectively.

  • Pros: it can be estimated in two steps: one for parameters determining univariate volatilities and another for parameters determining the correlations.
  • Cons: parameters \zeta and \xi are constants implying that the conditional correlations of all assets are driven by the same underlying dynamics — often an unrealistic assumption.

When we compare the correlations estimated by the above three models: EWMA, OGARCH and DCC, we will find the correlation forecasts for EWMA seem to be most volatile. Both DCC and OGARCH models have more stable correlations with the OGARCH having the lowest fluctuations but the highest average correlations. The large swings in EWMA correlations might be an overreaction.

Multiariate Extensions of GARCH

It is conceptually straightforward to develop multivariate extensions of the univariate GARCH-type models — such as multivariate GARCH (MVGARCH). Unfortunately, it is more difficult in practice because the most obvious model extensions result in the number of parameters exploding as the number of assets increases.

The BEKK model

There are a number of alternative MVGARCH models available, but the BEKK model, proposed by Engle and Kroner (1995), is probably the most widely used. The matrix of conditional covariances

The general BEKK \left(L_{1} ,L_{2} ,K \right) model is given by

    \[ \Sigma_{t}=\Omega\Omega^{\prime}+\sum_{k=1}^{K}\sum_{i=1}^{L_{1}}A_{i,k}^{\prime}Y_{t-i}^{\prime}Y_{t-i}A_{i,k}+\sum_{k=1}^{K}\sum_{j=1}^{L_{2}}B_{j,k}^{\prime}\Sigma_{t-j}B_{j,k} \]

The number of parameters in the BEKK(1,1,2) model is K(5K+1)/2, i.e. 11 in 2-asset case.

    \begin{eqnarray*} \Sigma_{t} & = & \left(\begin{array}{cc} \sigma_{t,11} & \sigma_{t,12}\\ \sigma_{t,12} & \sigma_{t,22} \end{array}\right)\\  & = & \underbrace{\left(\begin{array}{cc} \omega_{11} & 0\\ \omega_{21} & \omega_{22} \end{array}\right)}_{\Omega}\underbrace{\left(\begin{array}{cc} \omega_{11} & 0\\ \omega_{21} & \omega_{22} \end{array}\right)^{\prime}}_{\Omega^{\prime}}+\underbrace{\left(\begin{array}{cc} \alpha_{11} & \alpha_{12}\\ \alpha_{21} & \alpha_{22} \end{array}\right)^{\prime}}_{A^{\prime}}\underbrace{\left(\begin{array}{cc} Y_{t-1,1}^{2} & Y_{t-1,1}Y_{t-1,2}\\ Y_{t-1,2}Y_{t-1,1} & Y_{t-1,2}^{2} \end{array}\right)}_{Y_{t-1}^{\prime}Y_{t-1}}\underbrace{\left(\begin{array}{cc} \alpha_{11} & \alpha_{12}\\ \alpha_{21} & \alpha_{22} \end{array}\right)}_{A}\\  &  & +\underbrace{\left(\begin{array}{cc} \beta_{11} & \beta_{12}\\ \beta_{21} & \beta_{22} \end{array}\right)^{\prime}}_{B^{\prime}}\underbrace{\left(\begin{array}{cc} \sigma_{t-1,11} & \sigma_{t-1,12}\\ \sigma_{t-1,21} & \sigma_{t-1,22} \end{array}\right)}_{\Sigma_{t-1}}\underbrace{\left(\begin{array}{cc} \beta_{11} & \beta_{12}\\ \beta_{21} & \beta_{22} \end{array}\right)}_{B} \end{eqnarray*}

where \omega, \alpha and \beta are coefficients. We can find the simple idea behind the BEKK and DCC models are similar that the volatilities/correlations are dependent on their past realisations and the shocks from squared financial asset returns.

  • Cons: too many parameters. This implies those parameters may be hard to interpret. Furthermore, many parameters are often found to be statistically insignificant, which suggests the model may be overparameterized.

Extreme Value Theory

Analogous with the central limit theorem, where the normal distribution acts the limit for the distribution of the mean of a large number i.i.d. random variables, the extreme value theory (EVT) investigates the limit distribution of the sample maximum.

Empirical models of financial returns based on distributional assumptions such as Gaussian, Student’s t and GED are often chosen based on their ability to t data near the mode given that only a few observations fall in the distribution tails by definition. But effective risk management requires accurate estimation of the likelihood of rare events that could trigger catastrophic losses. Extreme value theory can be useful for this purpose because it is specifically aimed at modelling tail behaviour without requiring assumptions on the entire distribution, i.e. it provides a semi-parametric model for the tails of distribution functions.

Pros: much more accurate for applications focusing on the extremes
Cons: don’t have that many extreme observations

EVT can be useful to explicitly identify the type of asymmetry in the extreme tails.

Regardless of the overall shape of the distribution, the tails of all distributions fall into one of three categories as long as the distribution of an asset return series does not change over time:

  • Weibull: Thin tails where the distribution has a finite endpoint
  • Gumbel: Tails decline exponentially
  • Frechet: Tails decline by a power law

Block maxima and peaks-over-threshold are the two main EVT modeling methodologies.

Generalized extreme value distribution

Let \{x_{t}\},\: t=1,..,T, denote an iid process with distribution F\left(x\right). The maximum of a block of n<t observations,=”" called block maximum and denoted M_{n}=\max\left(x_{1},\ldots,x_{n}\right), follows asymptotically the probability distribution

    \begin{equation*} \textrm{P}\left[\frac{M_{n}-b_{n}}{a_{n}}\leqslant y\right]=F^{n}\left(a_{n}y+b_{n}\right)\rightarrow G\left(y\right),\qquad n\rightarrow+\infty \end{equation*}

as n\rightarrow+\infty for all y\in\mathbb{R}, where a_{n}>0 and b_{n} are appropriate constants, F^{n}\left(\cdot\right) is F\left(\cdot\right) raised to power of n, and G\left(\cdot\right) is a non-degenerate distribution function. According to the Extremal Types Theorem, the block maxima distribution G\left(\cdot\right) must be either Frechet, negative Weibull or Gumbel; these three distributions can be cast as members of the Generalized Extreme Value distribution (GEV) with cdf given by

    \begin{equation*} G\left(y\right)=\begin{cases} \exp\left\{ -\left(1+\xi\frac{y-\mu}{\beta}\right)^{-1/\xi}\right\} & \quad\xi\neq0\\ \exp\left\{ -e^{-\frac{y-\mu}{\beta}}\right\} & \quad\xi=0 \end{cases}, \end{equation*}

where \mu,\:\beta>0 and \xi are location, scale and shape parameters, respectively.

GED becomes the Frechet distribution for \xi>0, the negative Weibull distribution for \xi<0, and the Gumbel distribution for \xi=0.

Generalized Pareto distribution

Let \{x_{t}-u\}\: t=1,..,T, denote the exceedances or peaks-over-threshold process where x_{t}>u and u denotes a threshold loss. The exceedances distribution can be formalized as

    \[ \Pr\left[x_{t}-u\leqslant y\mid x_{t}>u\right]=\frac{F\left(y+u\right)-F\left(u\right)}{1-F\left(u\right)}\rightarrow H\left(y\right),\quad t=1,\ldots,T. \]

According to the Pickands-Balkema-de-Haan Theorem, for a sufficiently large threshold loss u, the exceedances distribution can be approximated by the Generalized Pareto Distribution (GPD) as

    \begin{equation*} H\left(y\right)=\begin{cases} 1-\left(1+\xi\frac{y}{\beta}\right)^{-1/\xi} & \quad\xi\neq0\\ 1-\exp\left\{ -\frac{y}{\beta}\right\} & \quad\xi=0 \end{cases}, \end{equation*}

where \beta>0 and \xi are scale and shape parameters, respectively. GPD nests the exponential distribution (\xi=0), the heavy-tailed Pareto Type I distribution (\xi>0) and the short-tailed Pareto Type II distribution (\xi<0).

The parameters of GPD are estimated by maximizing the corresponding log-likelihood function

    \begin{eqnarray*} \ln\mathfrak{L}(y_{1},\ldots,y_{N_{u}};\beta,\xi) & = & \sum_{j=1}^{N_{u}}\ln h\left(y_{j};\beta,\xi\right)\\  & = & -N_{u}\ln\beta-\left(1+\frac{1}{\xi}\right)\sum_{j=1}^{N_{u}}\ln\left(1+\xi\frac{y_{j}}{\beta}\right) \end{eqnarray*}

where N_{u} is the total number of observed exceedances y_{j}\equiv x_{j}-u for given threshold u.

Hill Method

Alternatively, one can use Hill method to estimate the tail distribution.

Finding the threshold

Several methods have been proposed to determine the optimal threshold.

  1. The most common approach is the eyeball method where we look for a region where the tail index seems to be stable.
  2. More formal methods are based on minimizing the mean squared error (MSE) of the Hill estimator

Univariate Volatility Modelling

A key modeling difficulty is that market volatility is not directly observable — unlike market prices it is a latent variable. Volatility must therefore be inferred by looking at how much market prices move.

We usually assume that mean return is zero. While this is obviously not correct, the daily mean is orders of magnitude smaller than volatility and therefore can usually be safely ignored for the purpose of volatility forecasting.

Moving average (MA) model

The most obvious and easy way to forecast volatility is simply to calculate the sample standard error from a sample of returns. Over time, we would keep the sample size constant, and every day add the newest return to the sample and drop the oldest. This method is called the moving average (MA) model.

    \[\hat { \sigma } ^{ 2 }_{ t }=\frac { 1 }{ W_{ E } } \sum _{ t=1 }^{ W _E}{y^2_{t-i} }\]

One key shortcoming of MA models is that observations are equally weighted. In practice, this method should not be used. It is very sensitive to the choice of estimation window length.

Exponentially weighted moving average (EWMA) model a.k.a. RiskMetrics

The moving average model can be improved by assigning greater weights to more recent observations.

    \begin{eqnarray*} \sigma_{t}^{2} & = & \left(1-\lambda\right)\lambda^{1}y_{t-1}^{2}+\left(1-\lambda\right)\lambda^{2}y_{t-2}^{2}+\cdots\\  & = & \lambda\sigma_{t-1}^{2}+\left(1-\lambda\right)y_{t-1}^{2} \end{eqnarray*}

RiskMetrics is a branded EWMA by setting \lambda=0.94:

    \[ \sigma_{t}^{2}=0.94\sigma_{t-1}^{2}+0.06y_{t-1}^{2} \]

EWMA can be thought as a special case of GARCH(1,1):

    \[ \begin{array}{rl} \begin{array}{r} GARCH(1,1):\\ \\ \end{array} & \begin{array}{ll} \sigma_{t}^{2} & =\omega+\alpha y_{t-1}^{2}+\beta\sigma_{t-1}^{2}\\  & =\alpha y_{t-1}^{2}+\left(1-\alpha\right)\sigma_{t-1}^{2}\leftarrow\textrm{set }\omega=0,\alpha+\beta=1 \end{array}\end{array} \]

Cons: \lambda is constant and identical for all assets.

Pros: 1) it can be implemented much more easily than most alternatives; 2) multivariate forms can be applied in a straightforward fashion. Coupled with the fact that it often gives reasonable forecasts, EWMA is often the method of choice.

GARCH model and its extension models

Most volatility models are based on using returns that have been de-meaned (i.e., the unconditional mean has been subtracted from the returns). For random variables (RVs) Y_t, de-meaned means E(Y_{t})=0.

The innovation in returns is driven by random shocks Z_t where Z_{t} \sim D(0,1).

The return Y_t can then be indicated by:

    \[Y_{t} = \sigma_{t} Z_{t}\]



where p is the number of lags.

One of the biggest problems with the ARCH model concerns the long lag lengths required to capture the impact of historical returns on current volatility.



where p and q are the order of ARMA and GARCH respectively; \omega , \alpha, \beta > 0 and \alpha + \beta <1. The unconditional volatility of GARCH(1,1) is given by


    \begin{eqnarray*} \sigma^{2} & = & E\left(\omega+\alpha Y_{t-1}^{2}+\beta\sigma_{t-1}^{2}\right)\\ & = & \omega+\alpha\sigma^{2}+\beta\sigma^{2}\\ & = & \frac{\omega}{1-\alpha-\beta} \end{eqnarray*}

The unconditional volatility will be infinite when \alpha + \beta = 1 and be undefined when \alpha + \beta <1.

Multiperiod volatility

To obtain the volatility n-days-ahead:

    \[\sigma_{t+n|t}^{2}=\sigma^{2}+\left(\alpha+\beta\right)^{n-1}\left(\sigma_{t+1}^{2}-\sigma^{2}\right),\; n\geq1\]

At t+1, the unconditional volatility can be expressed as

    \begin{eqnarray*} \sigma_{t+1 \mid t}^{2}=E_{t}\left(Y_{t+1}^{2}\right) & = & \omega+\alpha Y_{t}^{2}+\beta\sigma_{t}^{2}\\ & = & \underbrace{\omega+\left(\alpha+\beta\right)\sigma^{2}}+\alpha\left(Y_{t}^{2}-\sigma^{2}\right)+\beta\left(\sigma_{t}^{2}-\sigma^{2}\right)\\ & = & \hspace{3em}\sigma^{2}\hspace{3em}+\alpha\left(Y_{t}^{2}-\sigma^{2}\right)+\beta\left(\sigma_{t}^{2}-\sigma^{2}\right) \end{eqnarray*}

We can now derive two-step-ahead volatility:

    \begin{eqnarray*} \sigma_{t+2\mid t}^{2}=E_{t}\left(Y_{t+2}^{2}\right) & = & E_{t}\left(E_{t+1}\left(Y_{t+2}^{2}\right)\right)\\ & = & E_{t}\left(\sigma+\alpha\left(Y_{t+1}^{2}-\sigma^{2}\right)+\beta\left(\sigma_{t+1}^{2}-\sigma^{2}\right)\right)\\ & = & \sigma^{2}+\alpha\left(E_{t}\left(Y_{t+1}^{2}\right)-\sigma^{2}\right)+\beta\left(\sigma_{t+1}^{2}-\sigma^{2}\right)\\ & = & \sigma^{2}+\left(\alpha+\beta\right)\left(\sigma_{t+1}^{2}-\sigma^{2}\right) \end{eqnarray*}

If  \alpha + \beta < 1, the second term above goes to zero as n \rightarrow \infty, which implies that the longer the forecast horizon, the closer the forecast will get to unconditional variance. The smaller (\alpha + \beta) the quicker the predictability of the process subsides.

(G)ARCH in mean

The return on a risky security should be positively related to its risk. The conditional mean of a return, \mu_{t}, is dependent on some function of its conditional variance or standard deviation:

    \[Y_{ t }=\mu _{ t }+\sigma _{ t }Z_{ t }=\delta \sigma ^{ 2 }_{ t }+\sigma _{ t }Z_{ t }\]

where \delta is the parameter describing the impact volatility has on the mean.

Maximum Likelihood Estimation

The nonlinear nature of the volatility models rules out estimation by standard linear regression methods such as OLS. The estimation is therefore using quasi-maximum likelihood (QML) approach.

Assuming the normal distribution, the density of the returns with GARCH(1,1) at t=2 is given by:

    \[f\left(y_{2}\right)=\frac{1}{\sqrt{2\pi\left(\omega+\alpha y_{1}^{2}+\beta\hat{\sigma}_{1}^{2}\right)}}\exp\left(-\frac{y_{2}^{2}}{2\left(\omega+\alpha y_{1}^{2}+\beta\hat{\sigma}_{1}^{2}\right)}\right)\]

The joint density of y is:

    \begin{eqnarray*} \prod_{t=2}^{T}f\left(y_{t}\right)=\prod_{t=2}^{T}\frac{1}{\sqrt{2\pi\left(\omega+\alpha y_{t-1}^{2}+\beta\hat{\sigma}_{t-1}^{2}\right)}}\exp\left(-\frac{y_{t}^{2}}{2\left(\omega+\alpha y_{t-1}^{2}+\beta\hat{\sigma}_{t-1}^{2}\right)}\right) \end{eqnarray*}

The log-likelihood function is then:

    \begin{eqnarray*} \log\mathcal{L}=\underbrace{-\frac{T-1}{2}\log\left(2\pi\right)}_{\textrm{constant}}-\frac{1}{2}\sum_{t=2}^{T}\left(\log\left(\omega+\alpha y_{t-1}^{2}+\beta\hat{\sigma}_{t-1}^{2}\right)+\frac{y_{t}^{2}}{2\left(\omega+\alpha y_{t-1}^{2}+\beta\hat{\sigma}_{t-1}^{2}\right)}\right) \end{eqnarray*}

One way is to set \sigma_{1} to an arbitrary value, usually the sample variance of {y_{t}} for large sample sizes.

When the data experience structural break, e.g. the 07-08 credit crunch, we will get very different values of estimates if we set \sigma_{1} to unconditional volatility comparing with the case that we use EWMA to set initial volatility.

Goodness of fit tests
  • Likelihood ratio tests and parameter significance
  • If models are nested, for example ARCH(1) against ARCH(4), one can form the LR test

        \[ \textrm{LR} = 2\left(\mathcal{L_{U}}-\mathcal{L_{R}}\right)\sim \chi^{2}_{\textrm{# restrictions}} \]

    where the number of restrictions is 3 in our case.

    In out-of-sample forecast comparisons, it is often the case that the more parsimonious models perform better, even if a more flexible model is significantly better in sample. If the more flexible model is not significantly better in sample, it is very unlikely to do better out of sample.

  • Analysis of model residuals
  • Consider the normal ARCH(1) model. If the model is correct, the residuals are iid normal. So we can test the normality for the fitted or estimated residuals:

        \[ \hat{z}_{t}=\frac{y_{t}}{\hat{\sigma}_{t}\left(\hat{\alpha}, \hat{\beta} \right)} \sim N\left(0,1 \right) \]

    One can use Jarque–Bera test for normality and Ljung–Box test for autocorrelations.

  • Statistical goodness-of-fit measures
  • Competing models can be ranked by goodness-of-fit measures, such as mean squared error (MSE). But the conditional variance is not observable even ex post, and hence volatility proxies, s_{t}, are required. The simplest volatility proxy is the squared return.

        \[ \begin{array}{rl} \textrm{Squared error}: & \sum_{t=1}^{T}\left(\hat{s_{t}^{2}-\hat{\sigma}_{t}^{2}}\right)^{2}\\ \textrm{QLIKE}: & \sum_{t=1}^{T}\left(\log\hat{\sigma}_{t}^{2}+\frac{\hat{s}_{t}^{2}}{\hat{\sigma}_{t}^{2}}\right) \end{array} \]

    Other GARCH-type Models

    Two types of extensions: asymmetry in the impact of positive/negative lagged returns (leverage effects); allowing power in the volatility calculation.

  • Leverage effects and asymmetry
  • Leverage effect: volatility tends to rise following bad news and fall following good news. The leverage effect is not easily detectable in stock indices and is not expected to be significant in foreign exchange.

        \[ \log\sigma_{t}^{2}=\omega+\sum_{i=1}^{p}\alpha_{i}Z_{t}+\sum_{i=1}^{p}\lambda\left(\mid Z_{t}\mid-E\left(\mid Z_{t}\mid\right)\right)+\sum_{j=1}^{q}\beta_{j}\log\sigma_{t-j}^{2} \]


        \[ \sigma_{t}^{2}=\omega+\sum_{i=1}^{p}\alpha_{i}\varepsilon_{t-i}^{2}+\sum_{i=1}^{p}\lambda_{i}I_{\{\varepsilon_{t-i}<0\}}\varepsilon_{t-i}^{2}+\sum_{j=1}^{q}\beta_{j}\sigma_{t-j}^{2} \]

  • Power GARCH models
  • APARCH: combines these two effects in the same model
  • APARCH is introduced by Ding, Granger and Engle (1993) and embedded (G)ARCH, GJR-GARCH, TS-GARCH, T-ARCH, N-ARCH and log-ARCH as special cases. It allows for leverage effects when \zeta\neq0 and power effects when \delta\neq 2.

        \[ \sigma_{t}^{\delta}=\omega+\sum_{i=1}^{p}\alpha_{i}\left(\mid Y_{t-i}\mid-\zeta_{i}Y_{t-i}\right)^{\delta}+\sum_{j=1}^{q}\beta_{j}\sigma_{t-j}^{\delta} \]

  • ARCH Model of Engle when \delta=2, \zeta_i=0, \beta_j=0
  • GARCH Model of Bollerslev when \delta=2, \zeta_i=0
  • TS-GARCH Model of Taylor and Schwert when \delta=1, \zeta_i=0
  • GJR-GARCH Model of Glosten, Jagannathan, and Runkle when \delta=2
  • T-ARCH Model of Zakoian when \delta=1
  • N-ARCH Model of Higgens and Bera when \zeta_i=0, \beta_j=0
  • Log-ARCH Model of Geweke and Pentula when \delta \rightarrow \infty
  • Stochastic volatility

    The volatility process is a function of an exogenous shock as well as past volatilities, so the process \sigma_t is itself random, with an innovation term that is not known at time t:

        \[ \begin{array}{rcl} Y_{t} & = & Z_{t}\sigma_{t}\\ Z_{t} & \sim & N\left(0,1\right)\\ \sigma_{t}^{2} & = & \exp\left(\delta_{0}+\delta_{1}\log\sigma_{t-1}^{2}+\delta_{3}\eta_{t}\right) \end{array} \]

    where the distribution of shocks is

        \[ \left(\begin{array}{c} Z_{t}\\ \eta_{t} \end{array}\right)\sim N\left(0,\left(\begin{array}{cc} 1 & \zeta\\ \zeta & 1 \end{array}\right)\right) \]

    The SV model has two innovation terms: Z_t for the return itself and  \eta_t for the conditional variance of the return.

    Implied volatility

    By taking the actual transaction prices of options traded on the market and using the Black–Scholes equation to back out the volatility that implied the option price.

    Pros: based on current market prices rather than historical data: “forward-looking” estimators of volatility.

    Cons: rely on the accuracy of the BS model: imply the assumption of constant conditional volatility and normal innovations \rightarrow volatility smile/smirk observed in the option market.

    Realized volatility

    IS what actually happened in the past and is based on taking intraday data, sampled at regular intervals (e.g., every 10 minutes), and using the data to obtain the covariance matrix.

    Pros: is purely data driven and does not rely on parametric models.

    Cons: intraday data need to be available.