Behind the scenes of survey disagreement: Interpreting experts’ judgement

Business, Economy, Featured

Behind the scenes of survey disagreement: Interpreting experts’ judgement

Consensus survey forecasts by professionals, often reported in media outlets, hide heterogeneous responses. This may be because respondents use different models to make forecasts, or they disagree on which shocks have or will hit the economy. This column describes a way to reverse-engineer the expected structural shocks from the path of individual responses in the Survey of Professional Forecasters. It finds that most of the disagreement is due to differences in assessing the size of a shock and, to a lesser extent, to differences in model coefficients.

Understanding which types of shocks drive main macroeconomic aggregates, such as GDP and prices, is an important but challenging endeavour for policymakers. In times of higher uncertainty, knowing whether supply, demand, or other factors are behind macroeconomic dynamics directly affects policy decisions (as in the 2021 inflation surge, see Wohlfart et al. 2021). A relevant source of insight is the opinion of professional forecasters. Their responses are often taken into account by – and affect the expectations of – the wider public. Economic agents such as firms and households, in turn, might make consumption, savings, and investment decisions based on these expectations (Gorodnichenko et al. 2021). The average (or ‘consensus’) response by experts is a widely reported quantity scrutinised by policymakers and media alike. This average quantity, however, hides considerable heterogeneity among individual respondents (Meeks and Monti 2024). Despite their vast knowledge of the economy and broadly similar access to data, professional forecasters often disagree on their assessment (Mankiw et al. 2003). The reasons behind this disagreement are still understudied; potential explanations include differences in models and their coefficients (Patton and Timmermann 2010), differences in the ex-post judgement adjustment (Kohlhas and Walther 2021), different expectations on the most likely future shock, or on its size.

In a recent paper (Brenna and Budrys 2024), we propose a way to shed light on the reasons behind survey disagreement by exploiting the information contained in professional forecasts and we try to identify the judgement component of survey forecasts in a structural way.

Starting from the assumption that experts are using a multivariate time series model to produce their forecast (as many of them indeed declare; see Stark 2013 and ECB 2024), we feed each agent’s forecasts from horizon zero (the nowcast) to one year ahead into our own model (a structural vector autoregression with stochastic volatility). We use six variables of those available in the Philadelphia FED Survey of Professional Forecasters (SPF), including main macro aggregates, a price index, and financial sector indicators. This framework allows us to decompose observed forecasts – for both the consensus and individuals – into a component which we call ‘model-based’ and another called ‘judgement-based’.

We are especially interested in the latter component and in understanding whether it can tell us something as to which structural shock forecasters deemed the most likely when they were producing their forecasts. We, therefore, use a statistical identification method (‘identification via stochastic volatility’) and label our shocks economically based on how well they reproduce structural shocks from the literature.

Judgement is widely used and improves accuracy

When decomposing the consensus forecasts at each point in time into model and judgement components, we find that judgement often improves forecast accuracy, especially at shorter horizons. Figure 1 shows this decomposition for the nowcast and the one-year-ahead forecast of real GDP growth; we distinguish between judgement added in the current quarter (judgement about nowcasts) and judgement about future shocks (referring to the following quarters). In addition to subjective assumptions, judgement about nowcasts can incorporate all higher frequency information that becomes available during the survey quarter up to the day of its submission. This additional information makes a big difference during more uncertain times: it is in the most turbulent periods that forecasters seem to trust somewhat less a purely backward-looking time series model at quarterly frequency and instead rely more on higher frequency information, as well as on their expertise.

Figure 1 Historical judgement decomposition for the nowcast and one-year-ahead forecast of real GDP growth rate

A) Real GDP growth nowcast

B) Real GDP growth: One-year-ahead forecast

Note: The figures show the decomposition of average SPF nowcasts and one-year-ahead forecasts into deterministic conditions, observed shocks, judgement about nowcasts, and judgement about forecasts. We use the posterior mean of the historical decomposition as our point estimate. Shaded areas represent NBER recession periods. The left-hand side figure also reports the realisation for GDP growth.

Experts mostly disagree on the size of shocks, less about the nature and coefficients

When moving to individual forecasts, we present a similar historical decomposition, but this time for disagreement, defined as the standard deviation across point forecasts. Figure 2 shows results for the disagreement about the one-year-ahead forecasts of real GDP growth and CPI inflation. The overall standard deviation is decomposed into disagreement coming from differences in coefficients and from differences in expected future shocks. Differences in coefficients account for less than a third of overall disagreement on average over the whole sample. Moreover, for both variables analysed here, it is mainly one shock that drives overall disagreement: the one labelled ‘unanticipated demand’ for GDP growth and the one labelled ‘cost-push’ for CPI inflation. From this result, it looks like after accounting for differences in model coefficients, most of the disagreement stems from the professionals attributing a different size to the same shock: they might all agree that a demand shock is the dominant one for GDP growth over the forecast horizon but disagree on how large it will be.

Figure 2 Historical decomposition of one-year-ahead disagreement for GDP year-on-year growth rate and CPI year-on-year log-differences

A) Real GDP growth

B) CPI year-on-year log-differences

Note: The figures show the historical decomposition of the one-year-ahead disagreement, calculated as the standard deviation of the individual point forecasts, excluding the two smallest and largest values. Shaded areas represent NBER recession periods.

Conclusion

We find evidence of heterogeneity in the responses of professional forecasters coming from both different estimated coefficients (i.e. different models of the economy) and from different expected future shocks (i.e. different ‘expert judgement’). The latter seems primarily due to forecasters disagreeing on how large a shock will hit the economy and not on which shock. Our framework can inform policymakers by providing a structural analysis of individual forecasts at each SPF release and can cast a light on why respondents disagree.

Source : VOXeu