Combine input data • mapspam2globiom

Now you have created all the necessary input, it is time to combine and reshape them so they can be used as input by SPAMc. The sections below briefly describe the main functions in mapspam2globiom to do this. Nearly all functions only require one input param, the object with the SPAMc parameters. Note that, although the functions might look simple as, ‘under the hood’ a lot of other functions are triggered, which automatically load the data that was created in previous steps, check them for consistency, reformat them from spatial to data table format and, where needed, run algorithms to harmonize the various inputs. This means that some of the functions might take some time to run, in particular if the resolution is set to 30 arcsec, which considerable increases the size of the model. All the functions send a message to the screen when they are processing and when they are finished so you know what is happening.

All the intermediate data output is saved in the processed_data/intermediate_output folder. In case you have set solve_level = 1, the various functions split the data in administrative level 1 chunks, which are saved in subfolders using the administrative unit level 1 as name. If solve_level = 0 there wil be only one subfolder that used the country iso3c code as name.

Prepare FAOSTAT crop prices

Potential revenue at the grid cell level is used to construct the scores/priors for the high-input and irrigated farming systems (see Model description for details). 00_prepare_faostat_crop_prices extracts and prepares this using the FAOSTAT price database you downloaded before (see Input data collection). As price data is often incomplete at the country level, we decided to use continental averaged prices instead. Make sure sure to set the faostat_version in the script, which corresponds to the date the FAOSTAT files were downloaded.

Prepare cropland

The function prepare_cropland() combines the three synergy cropland components (medium and maximum cropland and cropland ranking maps) into a data table and stores this in a file.

prepare_cropland(param)

Prepare irrigated area

prepare_irrigated_area() is similar to the function that prepares the cropland as it combines the synergy irrigated area maps (maximum irrigated area and ranking maps) into one file.

prepare_irrigated_area(param)

Prepare statistics

prepare_physical_area() combines the three agricultural statistics input files (harvested area, farming system shares and cropping intensity) to calculate the physical cropping area for all administrative units.

prepare_physical_area(param)

Harmonize inputs

Before the SPAMc can be run it is essential to harmonize the physical, cropland and irrigated area. As this data is coming from different sources, they will be never consistent. This would not be a problem if the cropland exent would be larger than the physical crop area, meaning there would be enough space to allocate the statistics on the cropland map. Similarly if the total irrigated area in the irrigated area map would be larger than the physical area of the irrigated farming systems the data would fit on the map. Unfortunately, often this does not happen, implying that SPAMc cannot be solved. In practice, we use ‘slack variables’ to ensure the model always solves (see Appendix). However, large slacks in the solution signal serious inconsistencies in the data and need to be corrected as much as possible.

harmonize_inputs() uses a number of steps to harmonize the various data sources:

Compare and harmonize Cropland and statistics. As a starting point the available cropland is set to the medium cropland values from the synergy cropland map. For each individual subnational unit (if data is not missing) and starting with the most detailed level in the data, the total available (medium) cropland underlying the unit is compared with the physical area from the statistics that need to be allocated. If the statistics ‘fit’ no adjustments are made. If they do not not fit, the administrative unit cropland is expanded by switching to the maximum cropland value from the synergy cropland map. Subsequently, the cropland area and physical area are compared again and a warning is issued when there still is not enough cropland and the model will introduce slacks to make it solve. In a next iteration, the cropland and physical area of the units in the subsequent administrative level are harmonized taking into account the expansion of cropland in the previous iteration. This continues till cropland and physical area at the national level are compared and harmonized.

In the end, the user has to decide if the slacks are acceptable or not. In our opinion small slacks (measured as share of total or administrative unit physical crop area) are no problem to deal with inconsistencies. However it slacks become very large we recommend scrutinizing your statistics and where possible make adjustments. Large slack often results from data entry errors or too rigid cropping intensity values. We provide some advise on how to deal with slack in the Appendix.
Compare and harmonize irrigated area and statistics In the next step, the irrigated area from the synergy irrigated area map and the irrigated physical area from the statistics are compared and harmonized. We start by ranking all irrigated grid cells from the most (rank 1) to the least (rank 10) reliable. Next, for each grid cell, we set the irrigated area to the minimum of the cropland area (taken from the previous harmonization step) and the irrigated area from the synergy irrigated area map. The grid cells are subsequently aggregated till the accumulated area is slightly larger than the irrigated physical area from the statistics. If the physical area turns out to be larger than the total irrigated cropland, meaning it does not fit, a new iteration is started in which the irrigated area per grid cell is increased by taking the maximum of the cropland area and the irrigated area. If this is still not sufficient, the irrigated area is further enlarged by taking the maximum of the maximum cropland area and the irrigated area for each grid cell. It this is still not sufficient a warning is issued that solving the model will introduce slack. Finally, the grid cell ranking from the synergy cropland is adjusted to factor in all the selected irrigated area cells.
Select grid cells to match with statistics In the final harmonization step the cropland and irrigation extent are compared with the statistics. Similar to the previous step, the grid cells are ranked and the cropland is aggregated stating with the most preferred grid cells (now also including the irrigated area grid cells) till it is slightly larger than the physical area from the (sub)national statistics. This is consequatively done for each individual administrative unit starting with the most detailed level and ending at the national level. This process makes sure that the cropland and irrigated area extent is reduced to include only the most reliable grid cells, while at the same time it ensures that they are still large enough to fit the physical crop area, including the irrigated area. The final cropland and irrigated area extent consists of the union of grid cells that are selected at each (sub)national administrative unit level.

harmonize_inputs(param)

Prepare priors and scores

The only thing left to do in terms of preparing the model input is to create the priors and the scores. Not surpisingly, this is done by prepare_priors_and_scores(). For convenience, the function will always create both the priors and the scores as model input even though one of the two might be redundant because you only want to run min_entropy, which requires the priors, or max_score, which requires the scores. In this way, you can easily change the model type parameter and directly run the model, without going through the data pre-processing steps.

Note that the function might take a some time to run as it will run three consecutive processes. First, the biophysical suitability and potential yield maps for all farming system and crop combinations are loaded and only grid cells that overlap with the cropland exent from the previous step are selected, after which all data is merged into one table and saved. This process also checks if the maps do not only contain zero values and, where needed, replaces the map by a substitute crop. This is important because it occasionally happens that the biophysical suitability and potential yield maps indicate zero suitability for a specific crop although the statistics indicate it is produced in the country. If we would not correct for this, most scores and priors for this crop would be zero, resulting in an ‘uninformed’ allocation of the crop, meaning it can be placed anywhere as long as the the constraints are satisfied and the objective function (minimization of cross-entropy or maximization of fitness score) is optimized. In case all the substitute crops have zero values, a warning is issued. We prepared a list of substitute crops that is stored in the mappings/replace_gaez.sv file. You can modify the list to add other substitute crops if you think these are more appropriate. The only requirement is that selected crop must be in the list of SPAM crops that is stored in mappings/crop.csv. The second and third process create data files with the priors and scores using the biophysical suitability and potential yield, among others, as input data.

prepare_priors_and_scores(param)

Finally, all the inputs, including the harmonized cropland extent, irrigated area extent and statistics, and the priors/scores are combined in one GAMS gdx file, which is used as input by SPAMc. The file contains a number of sets and parameter tables that define the model. Sets describe the dimensions of the model, while parameters contain the data along these dimensions. As part of the process to combine all the inputs and if relevant, artificial administrative units are created that represent the combination of all administrative units per crop for which subnational statistics are missing. These units are added to the list of administrative units from the subnational statistics. The names of these units, stored in the adm_area parameter table, start with the name of the lower level administrative unit which nests the units with missing data, followed by ART and the level for which data is missing and ending with the crop for which data is not available.

combine_inputs(param)