process_subnat_stat.Rmd
FAOSTAT crops
agricultural_statistics\1_process_faostat_crops.r
Before you run the script make sure to set the faostat_version
, corresponding to the date the FAOSTAT files were downloaded, in this case 20200303
. The script will produce country specific files for harvested area statistics and saves them in the data\processed\agricultural_statistics
folder, in our example: faostat_crops_2020_CHN.rds
.FAOSTAT prices 3. Run the script (???) agricultural_statistics\1_process_faostat_prices.r
Before you run the script make sure to set the faostat_version
, corresponding to the date the FAOSTAT files were downloaded, in this case 20200303
. The script will produce country specific files for price statistics and saves them in the data\processed\agricultural_statistics
folder, in our example: faostat_crop_prices_2010_CHN.rds
.
AQUASTAT 3. Run process_aquastat.r
. Note that AQUATSTAT uses a a category ‘Other fruits’ to refer to irrigated fruits. This category, which is relatively rare, is mapped to the Tropical fruits (trof) class in SPAM. However, for some countries it might be more appropriate the map Other fruits to the Temperate fruits calls (temf). This can be done manually in the script. The code will display a message when the Other fruits category is present in the AQUASTAT data.
Here we only present a brief summary on the steps to prepare the data for the model. For examples, have a look at the subnational_harvested_area_2010_MWI.csv
, farming_system_shares_2010_MWI.csv
and cropping_intensity_2010_MWI.csv
in the data/raw/subnational_statistics
folder.
A .csv file with (raw) subnational statistics in the wide format. The file should contain harvested area for each ADM up to the most detailed ADM for which data is available. Hence, in case ADM2 level data is available, data should be supplied for ADM0, ADM1 and ADM2. In case ADM1 level data is available, data should be supplied for ADM0 and ADM1. In case no subnational data is available, there is no need to create this file as national data will be taken from FAOSTAT.
The first three columns of the file should present the adm names (adm
), adm code (adm_code
) and adm level (adm_level
), which should match exactly with the adm names and adm codes used in the country shapefile. The next columns, one for each crop, present harvested area for crop and subnational unit using the name of the crop as header. Crop information should be aggregated to the 40 SPAM crop and crop groups. A one-to-one mapping table between the crop names in the raw statistics data file and the SPAM list must be stored in the data/mappings/mappings_spam.xlsx
file in the crop_orig2crop
worksheet. Note that the crop names are not allowed to contain any spaces but underscores (’_’) are allowed.
The crops for which subnational data is available should be consistent and match with the list of crops for which FAOSTAT presents data. We use FAOSTAT as main source for the national statistics and all subnational information will be scaled to FAOSTAT.1. See (???) for more information on how to do this. Empty cells or -999 can be used to indicate data is missing for certain crop and ADM combinations. In case a crop is not grown at all in the country (meaning FAOSTAT does not present data for it), values for all ADMs should be set to zero.
folder: 01_adm_and_grid
Key elements of SPAM are the definition of the country borders, the location of the administrative areas (ADMs) for which additional crop statistics are available and the creation of a grid at a selected resolution that is used to allocate the crop and farming system shares. To run the model a shapefile (or any other vector/polygon format) is needed to show the borders of the countries and the location of the ADMs at which the model is run.
data\raw\adm
folder. It is crucial that file contains the following information (in the attribute table):
The above can be illustrated using the Malawi example (???). The Malawi model is run at ADM2 and therefore the shapefile contains the location of the ADM2s. However, the attribute table has six columns which present a unique name and a unique IDs for all ADM0 (i.e. country name and iso3c code), ADM1s and ADM2s. As at least for one crop, statistics area available for all ADMs ((???)) so there is no reason to remove an ADM. But suppose that for (???) the entries for all crops would be -999 (missing), the polygon for this ADM should still be included if there is cropland and the model should allocate crops in that region.
(??? TO ANNEX) A contrasting case are the ADMs Area under National Administration (MI01/MI01001), which is that part of lake Malawi that falls witin the borders of Malawi and Likoma (MI03007), several small islands in this part of the lake. It is obvious that we do not want to allocate any crops to the Area under National Administration. Normally, this area would not have any cropland and therefore the model will not allocate any crops there. However, as the country cropland layer is created by overlaying a polygon (with the country and ADM boundaries) with a raster (the cropland extent), it is possible that several cropland cells will be allocated to, in this case, the Area under National Administration. These border effects are larger at lower resolutions where larger grid cells are split by the polygon boundaries. Despite, the erroneous availability of cropland in the Area under National Administration, the model will not allocate any crops there if the subnational statistics are set to zero. However, in case, for some reason the subnational statistics do not fit and the model will introduce slack, it is possible that the cropland will be allocated to the Area under National Administration as cropland is still available there. To avoid this, we removed the region from the ADM polygon.
FOr the same reason we remove Likoma. Although it is possible that there is cropland on these islands, the statistics do not include information on this, so we assume no crops are produced on Likoma (and if so this will be a very small area anyways). We also remove the polygons for Likoma from the shapefile. The (???) script how this easily can be done using R.
01_process_adm.r
. Note that the script requires the following user input:
adm_2010_MWI.shp
02_create_adm_pdf.r
This script creates a pdf file in the data/processed/adm
folder that depicts the ADM map (or maps if ADM1 and ADM2 data is available) with their names. These maps are often helpfull when processing are interpreting subnational data and results.Raw subnational statistics data file with harvested area per ADM up to the most detailed ADM for which data is available. Hence, in case ADM2 level data is available, data should be supplied for ADM0, ADM1 and ADM2. In case ADM1 level data is available, data should be supplied for ADM0 and ADM1. In case no subnational data is available, there is no need to create this file as national data will be taken from FAOSTAT.
Subnational data should be supplied in a fixed format, described below. Probably easiest is to have a look at `subnational_statistics_2010_MWI.csv, which illustrates this.
Data organized using a wide format, meaning first three columns with adm name (adm
), adm code (adm_code
) and adm level (adm_level
), followed by named columns, one for each crop, with harvested area per subnational unit.
Crop information should be aggregated to the 40 SPAM crop and crop groups (see (???) for suggestions how to expand the number of crops). For convenience, the crop names (i.e. headers of the crop data columns) can be different from the SPAM crop names as, in any case, a one-to-one mapping table between the crop names in the raw statistics data file and the SPAM list must be stored in the data/mappings/mappings_spam.xlsx
file in the crop_orig2crop
worksheet. Note that the crop names are not allowed to contain any spaces but underscores (’_’) are allowed.
The crops for which subnational data is available should be consistent with the list of crops for which FAOSTAT presents data. We use FAOSTAT as main source for the national statistics and all subnational information will be scaled to FAOSTAT.2. Hence, if the subnational statistics indicate a certain crop is produced, say, sweet potato, and FAOSTAT indicates there is no harvested area for this crop, there is an inconsistency.
Adding crops which are not produced in the country (all values zero) or for which there are no subnational statistics (all values missing) is optional. They will be filtered out when processing the data.
Data should be consistent and add up, meaning that the sum of harvested area for all ADM2s for, say, maize is the smaller or the same as harvested area for the ADM1 to which all these ADM2s belong. In case information for maize harvested area is available for all ADM2s that belong to one ADM1, the sum of ADM2s should be equal to the ADM1 value. In case there is missing information for some of the ADM2s, their sum can be lower. The script to process the statistics includes some code to check the consistency of the data.
Use -999 or empty cells to indicate data is missing for certain crop and ADM combinations. In case subnational statistics are completely missing missing for a certain crop, also when this crop is not relevant at all for the country, add a column
Unique adm name (adm
) and unique adm code (adm_code
) that need to match exactly with the adm name and adm code in the attribute table of the shape file.
A column that indicates
Consistent
(???) a paragrpah on how to deal with areas where not crops grow (e.g. refer to Area under National Administration in MWI)
Note that it has to be in a certain format ad use the names adm
and adm_code
The availability of subnational statistics is key to improve the allocation of crop area in space. From a technical perspective, SPAM can be run with only national level crop information (??? option) but this would probably result in crop distribution maps of lower quality. Several types of information at the subnational level are required.
First, a database with subnational level crop information. So a database can contain data for only adm1 or for adm1 and adm2one adm level (normally this would be adm1 level, which is a aIdeally, data for all 40 SPAM crops at both adms would be available but this is rarely the case. In practice data is only available at adm1 or for adm1 and adm2 for a selective number of crops. In this case. the allocation of crops for which detailed spatial information is available will simply be more constrained than crops for which only country level data is available.
The database must be organized according to a fixed structure (see (???) for an example): - It must have a ‘wide’ format, with starting with the following (???) columns, Need statistics at adm 1 and/or adm 2 level. If for some crops adm1 is available but not adm2, also good to include. Any information should be added.
Important that the statistics are organized in a certain way: - List of all adm 1 and/ or amd2 administrateve units is essential, also when no data is available. Model needs to know where there is data but also where there is no data because.. - Data needs to be consistent… Script available to check. Aggregates bottom up to make sure adm1 is total of adm2 IF all data is available. Also means adm2 units in maps should map with amd1 units etc.
When only national statistics are available (through FAOSTAT) and no subnat stat:
Having information where crops are not grown is also very valuable. Set these adms to 0 and the rest to -999 (or empty). In this way the national statistics will only be distributed to areas where ADM values is missing. We did this with coffee in MWI where secondary sources indicate coffee is only produced in (???)
Secondary information. Trade and AQUASTAT.
Country reports.
Expert input.
Open Street Map - not discusse. here
Only use ADM1 when available and set ADM2 to -999.