collect_input_data.Rmd
To create crop distribution maps with SPAMc, various sources of data are required, including national and subnational agricultural statistics, cropland exent, map of irrigated areas and spatially explicit information on economic and bio-physical suitability. This Section provides an overview of which raw data needs to be collected and where to store them so the can be used by SPAMc as input. Details on how this data will be processed so they can be used as input into SPAMc are discussed in the articles on Processing subnational statistics and Processing spatial data.
Raw data is stored in the raw_data
folder. When the SPAMc is setup, you can specify the location of this folder. The default setting is to put it in the main SPAMc folder but you also have the possibility to create anywhere else(e.g. a data server or external drive). This might be convenient when you have limited capacity on your local harddrive to store the often large global spatial data (i.e. several gigabytes!) files from which country data is clipped.
For practical reasons, we use separate subfolders for each data source (Table 1). The only exception is the subnational agricultural statistics, which are all stored in one folder. The raw data subfolders are automatically created when SPAMc is setup and therefore the names cannot be changed. However, if alternative and more detailed information is available from national sources (e.g. cropland map) a new folder can be created in the raw_data
folder to store the data.
As you probably have not gone through the process of setting up the model, you can either create a temporary raw data folder structure (Table 1) and copy the data to the SPAMc folder later or setup the model first and save all the data directly in the correct location. In any case, the location of the raw data folder can be changed at any times so there is some flexibility.
# TODO ADD raw_data structure.
#TODO add table with all data and sources - use table from paper
National crop and price statistics are taken from FAOSTAT and AQUASTAT.
The FAOSTAT crops database is used as a source of data for all crops for which no subnational statistics are available. It is also used to scale all the subnational information so that they add up to the FAOSTAT national totals, which is useful when SPAMc output is used in simulation models like GLOBIOM, which use FAOSTAT as primary source of information and require a consistent data approach.
Download the All Data Normalized file (Production_Crops_E_All_Data_(Normalized).zip
) from the FAOSTAT Crops statistics website here and save it in the data\raw\faostat
folder.
Unzip the file, which creates the file Production_Crops_E_All_Data_(Normalized).csv
. Rename the file using the following format YYYYMMDD_faostat_crops.csv
In the Malawi example we use: 20200303_faostat_crops.csv
The FAOSTAT Prices database is used to calculate the potential revenue at each grid cell, which is uses to determine the fitness score/priors.
Download the All Data Normalized file (Prices_E_All_Data_(Normalized).zip
) from the FAOSTAT Producer Prices Anual statistics website here and save it in the data/raw/faostat
folder.
Unzip the file, which creates the file Prices_E_All_Data_(Normalized).csv
. Rename the file using the following format YYYYMMDD_faostat_prices.csv
In this case we use: 20200303_faostat_prices.csv
Data from AQUASTAT is used to inform the share of irrigated crops in a country. If there are better (national) sources of information to determine this, they can be used as a substitute for the AQUASTAT data.
Download the irrigation data from AQUASTAT. As AQUASTAT does not offer a bulk download option, the the easiest way is to go the AQUASTAT database here, tick the irrigation and drainage development
box under variables and tick the target country. Click submit
and then save the file by selecting csv (flat)
on top of the page. Save this file in the data/aquastat
folder. The name is not relevant so you can pick any name.
Unfortunately, the AQUASTAT csv file contains several empty lines and mixes up statistics and meta-data, which requires a bit of manual cleaning. Open the csv file in Excel and copy all contents to a new file, remove the empty lines in the beginning (the contents of cell A1 should now be Area
) and remove the all rows starting with metadata:
at the bottom of the file or (even better) copy the meta-data to a new worksheet called metadata
. Rename the worksheet with the cleaned AQUASTAT data data
and save it in using the following format, YYYYMMDD_aquastat_irrigation.xlsx
, in the Malawi case 20200303_aquastat_irrigation.xlsx
Availability of subnational statistics greatly improve the crop allocation process in SPAMc. Four pieces of information are needed:
As explained in Chapter @ref(blocks), to demarcate the area where crops can be allocated a cropland extent is needed. To account for the uncertainty in the cropland extent, a synergy cropland map is constructed. There are two options to obtain this type of map. The first is to take an existing product that is available for 2010, which can be readily used as input. A second option is to construct a country specific synergy cropland extent. The latter might be preferred if high-quality country specific cropland maps are available. We explain both options below.
If SPAM is used to produce crop distribution maps for around 2010, it is possible to use a global synergy map produced by Lu et al. (2020). This map was also used by for the global SPAM2010 (@ Yu et al. 2020). The map, with a resolution of 500x500 meter, is constructed by means of the Self-adapting statistics allocation model (SASAM), which combines and ranks five different global cropland products: GlobeLand30, CCI-LC, GlobCover 2009, MODIS C5, and the Unified Cropland Layer, as well as several region cropland products, e.g. CORINE land cover for Europe for cropland maps for Australia and China. After harmonization of cropland classes, resolution and projection, cropland area statistics from FAOSTAT are used to rank the cropland maps and construct a scoring table that reflects the agreement among the datasets. In addition, maps are produced that contain the medium and maximum cropland area per grid cell.
Sources: http://data.ess.tsinghua.edu.cn/ data at 250 m for 2001 and 2010! Wang, J., C. Li., P. Gong. 2015. “Adaptively weighted decision fusion in 30 m land-cover mapping with Landsat and MODIS data.” International Journal of Remote Sensing 36 (14): 3659-3674.
JRC dataset mentioned by ulrike and Yating
SPAM requires spatially-explicit information on the location of irrigated area or areas equipped for irrigation. We combine two sources of information: (1) the Global Map of Irrigated Areas (GMIA) version 5 (???), which presents the location of the area equipped for irrigation at the 5 arcmin resolution and the Global Irrigation Areas map (GIA) (???), which depicts irrigated area at a resolution of 30 arcsec.
To download the GMIA Visit the GMIA website here and click the Download the Global Map of Irrigation Areas - version 5.0 - area equipped for irrigation expressed in hectares per cell
.
Unzip and save the file in the data/raw/gmia
folder. The name of the file should be (???).
1 To download the GIA, visit the gia data repository here and download the zip file.
data/raw/gia
folder, not in a subfolder.Spatially explicit information on the biophysical suitability and related potential yield of crops is a key factor to inform the allocation of crops in SPAM. We use data the latest version of the global agro-ecological zones (GAEZ) data (version 3.0) as a source of information.1 More information about the GAEZ can be found here. As downloading bulk information from this website is problematic, we stored the relevant GAEZ maps in a data repository.
data/raw/gaez
folder.We use global maps of travel time to major cities as a proxy for accesibility of cropland. To take into account changes in infrastructure over time, we use two products that each represent a different period. Nelson (2008) presents a global travel time map for 2000 and Weiss et al. (2018) presents a comparable product for 2015. SPAM selects the older product if the model is run for using data before 2008, the midpoint between the two maps. So in case SPAM is run for 2008 or later, there is no need to download the map produced by Nelson (2008).
processed/data/travel_time_2015
folder (make sure to remove any subfolder).2 If relevant, download the Accessibility to cities 2000 map from here, unzip and all files in in the processed/data/travel_time_2000
folder (make sure to remove any subfolder).
We use to two sources of information to create a map of a country’s rural population. The first is WorldPop, which presents time series for gridded population. WorldPop maps were generated by applying a machine learning approach to downscale subnational population information. The second source is the Global Rural-Urban Mapping Project (GRUMPv1), which present polygon information on urban areas that are identified by the extent of the nighttime lights and approximated urban extents (circles) based on buffered settlement points.2 SPAM combines the Worldpop and GRUMP datasets to create a map of rural population.
processed/data/grump
folder.Lu, Miao, Wenbin Wu, Liangzhi You, Linda See, Steffen Fritz, Qiangyi Yu, Yanbing Wei, Di Chen, Peng Yang, and Bing Xue. 2020. “A cultivated planet in 2010: 1. the global synergy cropland map.” Earth System Science Data. https://doi.org/doi.org/10.5194/essd-2020-12.
Nelson, A. 2008. “Travel time to major cities: a global map of accessibility.” Global Environment Monitoring Unit, Joint Research Centre of the European Commission. http://forobs.jrc.ec.europa.eu/products/gam/.
Weiss, D. J., A. Nelson, H. S. Gibson, W. Temperley, S. Peedell, A. Lieber, M. Hancher, et al. 2018. “A global map of travel time to cities to assess inequalities in accessibility in 2015.” Nature 553 (7688): 333–36. https://doi.org/10.1038/nature25181.
Yu, Qiangyi, Liangzhi You, Ulrike Wood-Sichra, Yating Ru, Alison K. B. Joglekar, Steffen Fritz, Wei Xiong, Wenbin Wu, and Peng Yang. 2020. “A cultivated planet in 2010: 2. the global gridded agricultural production maps.” Earth System Science Data. https://doi.org/https://doi.org/10.5194/essd-2020-11.
It is expected that the results for GAEZv4 will be published soon. We will update this section when new data is available.↩
GRUMP is a somewhat outdated product as it presents data for the year 1995. We aim to replace it by a more recent source (???).↩
WordPop also offers the possibility to download individual country maps at a resolution of 100mx100m. These files are relatively much larger, in particular for large countries like China and are not required as we will aggregate the population maps to 30 arcsec - the current highest SPAM resolution.↩