Model setup • mapspam2globiom

To setup SPAM-C several steps need to be taken. By now, we assume that all required software is installed and working. It is by far the easiest to create a new RStudio project and use the mapspam2globiom_mwi repository for Malawi as a template as a basis of your country SPAM-C version. All the steps described in the Articles run SPAM-C section use the code in mapspam2globiom_mwi repository as an example code so there is no need to start from scratch. All scripts related to model setup can be found in the 01_model_setup folder of the template. The most important is that you change the location on your hard disk where the model will be stored and set the model parameters (see below), which is done in the script 01_model_setup.r in the template. As the model parameters are needed as input in nearly all functions, this particular script is always loaded at the beginning of all the other scripts using the source() command. In this way, you only have to setup up the model once. It also ensures all the necessary packages are loaded into R.

Set model parameters and location of files

Nearly all functions in mapspam2globiom need input on (1) key parameters that determine the design of the model and how it will be solved and (2) the location of raw and intermediate data, which will be further processed. Both pieces of information are bundled in a spam_par object that is created by the function set_spam_par(). Another piece of information that is also stored in the spam_par object is the coordinate reference system of the maps. This is automatically set to WSG84 (epsg:4326).

The following model parameters need to be set before SPAM-C can be run:

iso3c country code (iso3c). The three letter ISO 3166-1 alpha-3 country code, also referred to as iso3c is used to extract national statistics from global datasets. A list of country codes can be found in Wikipedia.
Year (year). The year (four digits) will be used to select relevant global datasets when data is available for multiple years, in particular the FAOSTAT national statistics, population maps and and travel time maps. It is also useful to distinguish SPAM-C output when the model is used to create crop distribution maps for multiple years. (see spatial data).
Spatial resolution (res). The user can choose between two spatial resolutions: 30 arcsec (~1x1 km at the equator) by selecting "30sec" and 5 arcmin (~10x10 km at the equator) by selecting "5min". Note that selecting a resolution of 30 arcsec increases the model dimensions (number of grid cells) by a factor 100! This might mean the model becomes too big to be solved and/or your computer runs out of memory. We recommend experimenting with the 5 arcmin resolution first!
Depth of subnational statistics (adm_level). This parameter sets the most detailed level at which subnational administrative statistics are available. Accepted inputs are 0 (only national level data), 1 (national level and first level subnational data and 2 (national level, first and second level subnational data). In most cases first level data refers to states and province and level 2 data indicates districts, counties and departments. However, in the unusual case that you would have only national and county level data for a country (i.e. there is no state or province data available), you should set the adm_level parameter to 1 as the depth of your subnational statistics is 1.
Level at which the model is solved (solve_level). The model can be solved at the national level (0) or at the subnational level 1 (1). In the first case, all grid cells are optimized using national level constraints. This means that for crops for which no subnational level information is available, the national physical crop area will be distributed over the whole country, subject to the standard model constraints. In the latter case, the model is split into smaller administrative level 1 models, which are solved separately. Breaking up the model in smaller pieces will result in less complex and less memory intensive models, which are easier to solve. The disadvantage is that the subnatonal level models require full information (i.e. harvested area statistics, cropping intensity statistics and farmin system area shares) for all crops that are cultivated in the country at administrative unit level 1. For most countries, the subnational statistics tend to be incomplete and full crop information is only available at the national level (from FAOSTAT). Hence, the only way to run the model at administrative unit level 1 is to impute the missing crop area values (e.g. by taking values for comparable crops for which information is available, use national values or look for secondary information). Running the model at administrative unit 1 level is most useful for larger countries, such as China, Brazil and the USA, for which subnational statistics coverage is high and models tend to be large and complex due to the size of the country.
Type of model (‘model’). You can choose between two types of models: (1) minimization of cross-entropy (’“min_entropy”`) and (2) maximization of fitness score. See the section on Model description for details.

Similarly, the following two folders need to be defined when setting up the model:

SPAM main folder location (spam_path). This folder will be used to store all model related data, including processed data, mapping tables, intermediate output, final results and, unless specified otherwise by the user, the raw data.
Raw data folder (raw_path). Optionally the user can set a folder for the raw data. This might be convenient if hard disk capacity is insufficient to store the several gigabyte of data that are needed to store the global spatial datasets. In this case, a data server could be used to store the data.

The example below sets the SPAM-C parameters for Malawi and stores them in an object called param. Param is an arbitrary name but we recommend not to change it as we refer to param in the documentation of most functions.

# Load mapspam2globiom
library("mapspam2globiom")

# Set the folder where the model will be stored
# Note that R uses forward slashes even in Windows!!
spamc_path <- "C:/Users/dijk158/Dropbox/mapspam2globiom_mwi"

# Set SPAM-C parameters
param <- spam_par(spam_path = spamc_path,
 iso3c = "MWI",
 year = 2010,
 res = "5min",
 adm_level = 1,
 solve_level = 0,
 model = "max_score")
#> raw_path is not defined, set to raw_data in main folder

# Show parameters
print(param)
#> iso3c:  MWI 
#> year:  2010 
#> resolution:  5min 
#> adm level:  1 
#> solve level:  0 
#> model:  max_score 
#> spam path:  C:/Users/dijk158/Dropbox/mapspam2globiom_mwi 
#> raw data path:  C:/Users/dijk158/Dropbox/mapspam2globiom_mwi/raw_data 
#> country name:  Malawi 
#> iso3n:  454 
#> fao code:  130 
#> continent:  Africa 
#> crs:  +init=epsg:4326

Create SPAM-C folder structure

mapspam2globiom includes a function to create all necessary folders create_spam_folders(). The only input required is param which contains all the model parameters. The function will create three folders: (1) raw_data is created in the main spam_path unless indicated otherwise by the user and serves to store all the raw input data, (2) processed_data will be created in the main spam_path and is used to save all the intermediary data products as well as the final results and (3) mappings will be created in the main spam_path and contains all the lists and mappings that are needed to define the model (e.g. number crops) and harmonize various data sources (concordance tables). All tables are in csv format and will be created automatically into the mappings folder. Note that it is possible to adjust some of these tables if needed (e.g. adding a new crop). create_spam_folders() will only create csv files if the files do not exist. Hence, updated files will not be overwritten. If for some reason you want to recreate the original tables, simply remove the folder and/or files and they will be created again when create_spam_folders is run.

# Create SPAM-C folder structure in the spamc_path
create_spam_folders(param)

Prepare subnational administrative unit map

To allocate subnational crop area information, a complementary map (most likely in the form of a shapefile) with the location of the subnational administrative units is needed. The processing of this map cannot be captured by a single function as it involves several decisions that need to be made by the user. These are explained next.

Before the shapefile can be processed you need to load it into R by setting the name (in this case adm_2010_MWI.shp), which is stored in the raw_data/adm/ folder. The st_transform command projects the map to WSG84 so that it is consistent with all the other spatial data.

# replace the name of the shapefile with that of your own country.
iso3c_shp <- "adm_2010_MWI.shp"

# load shapefile
adm_map_raw <- read_sf(file.path(param$spam_path, glue("raw_data/adm/{iso3c_shp}")))

# plot
plot(adm_map_raw$geometry)

# Project to standard global projection
adm_map <- adm_map_raw %>%
  st_transform(param$crs)

For the model to work, it is essential that the attribute table with the names and codes of the administrative units has the correct column names. To make this easier, you only have to specify the column names of the attribute table which contain the names and code of the (sub)national units and store them in the admX_name_orig and admX_code_orig variables, respectively, where X is the administrative unit level. If you only have data at level 0 and 1, simply delete the lines of script that refer to administrative unit level 2, etc. In case of Malawi, we have data at level 0, 1 and 2 so for each level the column names for name and code need to be set. Columns in the attribute table that are not relevant will be automatically deleted. The script will also union all polygons with the same name and code. Hence, if you only have statistics for the combination of two subnational regions, simply give them the same name and code and they will automatically be dissolved into one polygon.

# Check names
head(adm_map)
names(adm_map)

# Set the original names, i.e. the ones that will be replaced. Remove adm1
# and/or adm2 entries if such data is not available.
adm0_name_orig <- "ADM0_NAME"
adm0_code_orig <- "FIPS0"
adm1_name_orig <- "ADM1_NAME"
adm1_code_orig <- "FIPS1"
adm2_name_orig <- "ADM2_NAME"
adm2_code_orig <- "FIPS2"

# Replace the names
names(adm_map)[names(adm_map) == adm0_name_orig] <- "adm0_name"
names(adm_map)[names(adm_map) == adm0_code_orig] <- "adm0_code"
names(adm_map)[names(adm_map) == adm1_name_orig] <- "adm1_name"
names(adm_map)[names(adm_map) == adm1_code_orig] <- "adm1_code"
names(adm_map)[names(adm_map) == adm2_name_orig] <- "adm2_name"
names(adm_map)[names(adm_map) == adm2_code_orig] <- "adm2_code"

# Only select relevant columns
adm_map <- adm_map %>%
  dplyr::select(adm0_name, adm0_code, adm1_name, adm1_code, adm2_name, adm2_code)

# Union separate polygons that belong to the same adm    
adm_map <- adm_map %>%
  group_by(adm0_name, adm0_code, adm1_name, adm1_code, adm2_name, adm2_code) %>%
  summarize() %>%
  ungroup() %>%
  mutate(adm0_name = param$country,
         adm0_code = param$iso3c)

# Check names
head(adm_map)
names(adm_map)

Polygons of administrative units where no crops should be allocated must be removed. If this is not done and in the rare case when the area statistics are larger than the cropland exent, the model might still allocate crops into these regions. The Malawi shapefile includes two of such regions. One is the Area under National Administration, which is the part of Lake Malawi that belongs to Malawi and Likoma, several small islands in lake Malawi that are not covered by the statistics. The script below shows how to remove them. It there is no reason to remove any polygons, simply delete the lines of script in the template. By running create_adm_map_pdf() a pdf file with maps of the (sub)national regions is created in the processed_data/maps/adm folder. Finally, the script saves the maps in shapefile and rds format in the relevant folder.

par(mfrow=c(1,2))
plot(adm_map$geometry, main = "ADM all polygons")

# Set names of ADMs that need to be removed from the polygon. 
adm1_to_remove <- c("Area under National Administration")
adm2_to_remove <- c("Likoma")

# Remove ADMs
adm_map <- adm_map %>%
  filter(adm1_name != adm1_to_remove) %>%
  filter(adm2_name != adm2_to_remove)

plot(adm_map$geometry, main = "ADM polygons removed")
par(mfrow=c(1,1))

# Save adm maps
temp_path <- file.path(param$spam_path, glue("processed_data/maps/adm/{param$res}"))
dir.create(temp_path, showWarnings = FALSE, recursive = TRUE)

saveRDS(adm_map, file.path(temp_path, glue("adm_map_{param$year}_{param$iso3c}.rds")))
write_sf(adm_map, file.path(temp_path, glue("adm_map_{param$year}_{param$iso3c}.shp")))

# Create pdf file with the location of administrative units
create_adm_map_pdf(param)

To link the subnational statistics to other spatial data a mapping table is needed (adm_list) that contains information on how the administrative units at various levels are nested, which is exactly the information that is stored in the attribute table of the administrative unit map. The following code, extracts the list from the shapefile and saves it as csv file in the processed_data/lists folder.

# Create adm_list
create_adm_list(adm_map, param)

Create grid and rasterize subnational administrative unit map

The create_grid() function creates a spatial grid at selected spatial resolution (i.e. 30 arcsec or 5 arcmin). The grid is subsequently used by rasterize_adm_map(), which rasterizes the administrative unit map at the selected resolution. Both the grid and the rasterized map are required by the model as input.

# Create grid
create_grid(param)

# Rasterize administrative unit map
rasterize_adm_map(param)

This is all it takes to set up the SPAM-C model! The next step is processing the raw subnational statistics and spatial data.