Forecasting, especially in central banks, is evolving to incorporate a variety of models, following two main approaches: structural and reduced form. The challenge remains to identify which model – or combination of models – is likely to produce more accurate macroeconomic forecasts, especially in an environment that encompasses crisis events. This column argues that merging models becomes the optimal forecasting strategy in a context including crisis episodes. This strategy is superior not only in the analysis of the central tendencies, but extends to an examination of the quantiles of the predictive distributions.
The prediction of macroeconomic time series is an essential input for policy decisions in central banks and macroeconomic forecasting, and it mainly follows two different approaches: structural and non-structural. Non-structural methods exploit reduced-form correlations between macroeconomic variables, while structural models are grounded on economic theory (as pointed out by Diebold 1998).
The literature acknowledges several advantages of vector autoregressions (VARs): they are easy to estimate, generate out-of-sample forecasts, and are flexible. However, they embed little (structural VARs) or no (unrestricted VARs) economic theory. The alternative, theory-based approach, is generally based on dynamic stochastic general equilibrium (DSGE) models. While traditionally not used for forecasting, structural models have increasingly entered the mainstream of forecasting toolboxes in central banks. This reflects the development – already anticipated by Lars Svensson, Deputy Governor of Sweden’s central bank, back in 2008 1 – of some middle-sized ‘workhorses’ such as the model of Smets and Wouters (2003), whose data fitting can compete with standard VAR models.
This raises an important question: does it make sense to embrace many models in the forecasting process, rather than prioritising a single, dominant framework? In other words, is there some gain from combining information from usually decoupled statistical and structural models? 2 Our answer to this question is yes, because different model combinations are liable to produce better point and density forecasts at different points in time.
When comparing the real-time forecasting accuracy of structural and reduced-form time series models, no single method can be considered best at all horizons (Gurkaynak et al. 2013). Simple autoregression models tend to be more accurate at short horizons and DSGE models are generally preferable at long horizons when forecasting output growth; the opposite is generally true for inflation. Combining models has been demonstrated to improve forecasts in a number of contexts (e.g. Elliott and Timmermann 2005, Goodwin 2000, Hall and Mitchell 2007), but typically this merging has been restricted to purely statistical models.
DSGE-VAR approach
This idea to merge DSGE and VAR models is not new. Looking at the US economy, Del Negro and Schorfheide (2004) showed how theoretical DSGE models which incorporate rational, forward-looking agents can inform (through priors) reduced-form time series models. In terms of forecasting, Lees et al. (2011) tested the predictive ability of the combination of a small-scale DSGE model and a statistical VAR model which outperformed the Reserve Bank of New Zealand forecasts. Cai et al. (2018), using the New York Fed DSGE model, also showed that ‘empirical’ variants of DSGE models, expanded by including financial variables as observables, perform relatively well in terms of output growth forecasting accuracy compared to both the Blue Chip Survey and the Survey of Professional Forecasters (SPF).
Aware of that situation, in a recent paper (Martinez-Martin et al. 2024), we build a DSGE-VAR model in line with Del Negro et al. (2007), but with some key differences. 3 Our estimation procedure implicitly combines DSGE model parameters with those resulting from the estimation of a VAR model. 4 Crucially, instead of dogmatically imposing the cross-coefficient restrictions implied by the DSGE model on the VAR, we allow the relative weights of both models to vary. The hyperparameter, 𝜆, reflects how well the DSGE model performs in term of forecasting accuracy. 5 The smaller 𝜆 is, the less the DSGE prior is imposed on the data. Considering a sample that involves several episodes of financial disruption, we find that the optimal weighting scheme is 57% on the DSGE and 43% on the VAR (Figure 1).
Figure 1 Marginal likelihood as a function of 𝜆 over time
Notes: DSGE–VAR log-marginal likelihood evaluation over different values of 𝜆 following the sample rolling window scheme. The sample spans from 1981Q3 to 2015Q4, with a window size equal to 95 quarters. Black dots identify the highest log marginal likelihood for each of the estimation samples.
Sources: Martinez-Martin et al. (2024); authors’ calculations.
Forecast evaluation on point estimates
In a context of several episodes of financial tensions, a time-varying investigation of each model’s real-time out-of-sample performance is particularly valuable. Without entering in the details, we test the stability of relative models’ forecasting performance for Spanish GDP growth by means of the Giacomini and Rossi (2010) fluctuation test. 6 Our results suggest that the DSGE–VAR model consistently outperforms the DSGE model in one-quarter-ahead forecasts, and even more so during financial crises (Figure 2 upper panel). However, for longer horizons the relative forecasting performance fluctuates in an inverse U-pattern, pointing to forecast instability.
In other words, the DSGE model outperforms the DSGE-VAR in forecasting over medium- to long-term horizons during financial crises. The solution to this puzzle, as we show in our paper, is that the DSGE model consistently underestimates GDP growth. As a result, it provides better forecasts when financial crises (i.e. the Great Recession and the sovereign debt crisis) are present in the evaluation sample because average GDP growth tends to be lower in those situations. This relationship is illustrated by plotting fluctuation test statistics against GDP mean growth for each step-ahead forecast in the corresponding evaluation sample (Figure 2 lower panel). Essentially, when fluctuation test statistics are negative (indicating better DSGE forecasts), GDP growth means in the evaluation sample are also negative due to financial crises.
Additionally, when forecasting output growth, it is worth highlighting that: (i) the DSGE model outperforms non-structural models at longer horizons but slightly underperforms at shorter horizons (Gürkaynak et al. 2013); (ii) during financial crises, the DSGE model struggles, especially when financial frictions are not considered (Cai et al. 2018); and (iii) relative to BVAR and AR models, DSGE and DSGE-VAR models’ accuracy is better only at the shortest forecast horizon.
Figure 2 Giacomini and Rossi (2010) fluctuation test statistics evolution over time (upper panel) and fluctuation test statistics against GDP mean growth in the evaluation sample (lower panel)
Notes: The upper panel reports pairwise local relative forecasting performance of DSGE–VAR against a pool of selected models (DSGE, VAR, BVAR, AR refers to autoregressive model) for GDP growth along different forecast horizons. When the statistic is below (above) the lower (upper) critical value, the DSGE–VAR forecasts significantly better (worse) at that point in time. Higher steps ahead (h) are associated with increasingly darker lines. Red dashed lines indicate the critical values at the 5 % of confidence level. The lower panel reports the fluctuation test statistic (DSGE–VAR vs. DSGE) against the mean of the associated evaluation sample for GDP growth.
Sources: Martinez-Martin et al. (2024); authors’ calculations
Predictive accuracy of density estimates
The last piece of the puzzle is how well model combinations perform in a context of high uncertainty. To that end, a common cross-point among forecasters is to analyse the quantiles of the predictive distribution. It becomes useful to report not only the point forecasts but also the uncertainty around them, i.e. density forecasts, in order not only to assess whether they are correctly specified and but also to offer a measure able to quantify the distribution of projections of key macroeconomic variables.
This exercise is based on the analysis of the one-quarter-ahead probability integral transforms (PIT), which measures the likelihood of observing a value less than the actual realised value of the GDP, for the DSGE and DSGE–VAR models. 7 The DSGE–VAR’s PIT is generally well behaved in the full sample, especially when compared with the DSGE model, which especially suffers from misspecification even when considering the ‘no crisis’ sample only (Figure 3). 8
Figure 3 Empirical distribution of the PIT for GDP using DSGE–VAR versus DSGE models
Notes: Empirical distribution of the probability integral transform (PIT) for one-quarter-ahead density forecasts of the Spanish GDP growth using both the DSGE–VAR and DSGE models.
Sources: Martinez-Martin et al. (2024); authors’ calculations.
Concluding remarks
Forecasts can be improved by mixing approaches and this process does not need to be based on exceedingly sophisticated techniques but can rely on simple heuristics. The introduction of structural features in the projections exercises not only provides policymakers with a more articulate narrative and better storytelling, but it also improves forecast performance. That is a lesson to be considered by institutions in the future.
Source : VOXeu