Local GDP measures are crucial for economic research but are often unreliable or unavailable, particularly in developing countries. Meanwhile, traditional proxies like nightlights have limitations. This column uses a variety of remote sensing and emissions data to estimate local GDP using machine learning. The estimates successfully portray the cross-section and dynamics of local GDP well, and as such will be useful for carrying out local spatial analysis of all forms.
Local GDP measures have become an important input for many economic studies. Not only do they underlie the fast-growing quantitative spatial literature (see Redding and Rossi-Hansberg 2017 and the many surveys in the forthcoming Handbook of Regional and Urban Economics, Volume 6), but any analysis that aims to investigate how local policies affect an economy must rely on some evidence on the level and changes in local economic activity. Unfortunately, local GDP measures are not reliably produced by statistical agencies in most countries in the world. Naturally, the availability of local GDP statistics varies widely across locations, as does their level of accuracy and frequency. Developed countries tend to produce local statistics more consistently, which implies that most of the spatial analysis focuses on them, leaving a gap in the analysis of many developing countries.
Economists have tried to fill this gap in the past using nightlights (Henderson et al. 2009), which provide a measure of the intensity of economic activity in a region. Unfortunately, the measure suffers from many problems since nightlights might be more intense in areas specialised in certain sectors, different types of bulbs might be differentially captured by satellites, and measurement over time used to be inconsistent due to depreciation in satellite sensors or cloud coverage. The advantage of using remote-sensing data is that they are measured uniformly across regions of the world, which provides consistent measurements in developed and developing countries. Some of the global datasets with local GDP estimates used by researchers in the past, such as G-Econ (Nordhaus 2006, Henderson et al. 2017, Chen et al. 2022), relied mostly on nightlights and population data. These estimates have been used in many studies, but they have not been recently updated. Furthermore, new methodologies and more data can be used to get better estimates.
In a recent paper (Rossi-Hansberg and Zhang 2025), we use a variety of remote sensing and emissions data to estimate local GDP using machine learning. We use a random forest algorithm on local GDP shares that allows us to minimise overfitting. Our training sample is the available local data for several countries, and because we estimate the model in shares, we scale local shares by annual official GDP statistics. Hence, country aggregates match official statistics. We estimate local yearly GDP from 2012 to 2021 at the 0.25, 0.5, and 1 degree levels. Beyond local GDP where available, we use data on population from LandScan, as well as satellite data for nightlights (Blackmarble with consistent measurements over time), land use, net primary productivity (a measure of vegetation), CO2 emissions from EDGAR, and terrain ruggedness. Our fit criterium is the out-of-sample log growth rate of GDP.
The resulting model seems to perform very well. Out-of-training-sample R2s for local GDP levels are 0.97 and above for both the sample of developing and developed countries. Furthermore, the out-of-training-sample R2s for log growth rates are uniformly above 0.63. These measures tell us that the model can predict well both levels and growth rates for all three spatial resolutions. The model relies most heavily on nightlights and lagged nightlights as well as population and its lags, but land cover and emissions measures are also relevant.
Figure 1 shows our estimates for GDP and GDP per capita in 2019 for a resolution of 1 degree. The implied geography is well known. The variability at the local level indicates a wide range of levels. Note that local GDP per capita exhibits large variability within countries (average differences across countries are determined by the official data). The large range of values for local GDP per capita within large countries such as China, Brazil or Mexico show that the resulting GDP data exhibits a lot of variability relative to population counts. This is important since measures of GDP that simply scale population counts are not particularly useful to inform economic analysis.
Figure 1 Estimates of local GDP and GDP per capita for 2019 at the 1-degree resolution








The relationship between local GDP and population is explored further in Figure 2. Clearly, the relationship between the two is positive, as expected. However, the figure indicates that the relationship is slightly concave for locations with little population and turns convex for denser locations. These non-linearities could be associated with congestion forces coming from land endowment per capita in rural areas and economies of scale and externalities in the most dense locations. Note also that for any level of population (or density given that the grid size is mostly fixed), the range of GDP levels is between 5 and 7 log points. This large range shows how variables beyond the level of population matter in our estimation.
Figure 2 Relationship between local log GDP and local log population




We test the accuracy of our estimates on the 2020 COVID-19 pandemic. We do not use data from China to train the model since China lacks reliable local GDP measures for most of the years in our sample. To test our predictions for China, we collected data from the Provincial Statistical Yearbooks for seven leading provinces and compared it to the predictions of the model. China made an effort to standardise these data starting in 2019. Our model predicts the level in GDP extremely well, even during the year 2020, when some of these provinces were affected severely by the pandemic (R2 is above 0.95). Interestingly, the fit for growth rates between 2012 and 2019, before the data standardisation, is much worse, with an R2 of only 0.31. This is probably due to the lack of good quality data for this period. However, for the period 2019-2020 and 2020-2021, the R2 of a regression between the predicted growth rates and the observed ones is around 0.6, which we view as very successful. If we focus on the main economic hub affected by the pandemic in Wuhan, the prediction is successful in explaining the evolution of GDP and the decline and recovery during and after the pandemic. Figure 3 presents this evolution.
Figure 3 Evolution of local GDP in Wuhan’s main economic hub




We hope these estimates, which we aim to produce every year as new data becomes available, will be useful for carrying out local spatial analysis of all forms. We believe that the estimates successfully portray the cross-section and dynamics of local GDP well. All data can be studied and downloaded at: http://bfidatastudio.org/gdp. Detailed descriptions of the methodology we use and other tests of the accuracy of the predictions can also be found on those sites.
Source : VOXeu