# Configuration Files Config files describe both _what_ to download and _how_ it should be downloaded. To this end, configs have two sections: `selection` that describes the data desired from data-sources and `parameters` that define the details of the download. By convention, the `parameters` section comes first. Configuration files can be written in `*.cfg` or `*.json`, but typically, they're written in the former format (i.e. Python's native config language, which is similar to INI format). Before jumping into the details of each section, let's look at a few example configs. ## Examples The following demonstrate how to download weather data from ECMWF's Copernicus (CDS) and Meteorological Archival and Retrieval System (MARS) catalogues. ### Download Era5 Pressure Level Reanalysis from Copernicus ``` [parameters] client=cds ; choose a data source client dataset=reanalysis-era5-pressure-levels ; specify a dataset to download from the data source (CDS-specific) target_path=gs://ecmwf-output-test/era5/{}/{}/{}-pressure-{}.nc ; create a template for the output file path partition_keys= ; define how we should partition the download by the "keys" in the `selection` section. year ; See docs below for more explanation month day pressure_level [selection] product_type=ensemble_mean format=netcdf variable= divergence fraction_of_cloud_cover geopotential pressure_level= 500 year= ; we can specify a list of values using multiple lines 2015 2016 2017 month= 01 day= 01 15 time= 00:00 06:00 12:00 18:00 ``` ### Download Yesterday's Surface Temperatures from MARS. ``` [parameters] client=mars ; download from MARS (this is the default data source). target_path=all.an ; Download all the data from the selection section into one file. [selection] class = od type = analysis levtype = surface date = -1 ; Yesterday -- see link to MARS request syntax in docs linked below. time = 00/06/12/18 ; We can specify multiple values using `/` delimiters -- see MARS request syntax docs param = z/sp ``` ## `parameters` Section _Parameters for the pipeline_ These describe which data source to download, where the data should live, and how the download should be partitioned. * `client`: (required) Select the weather API client. Supported values are `cds` for Copernicus, and `mars` for MARS. * `dataset`: (optional) Name of the target dataset. Allowed options are dictated by the client. * `target_path`: (required) Download artifact filename template. Can use Python string format symbols. Must have the same number of format symbols as the number of partition keys. * `partition_keys`: (optional) This determines how download jobs will be divided. * Value can be a single item or a list. * Each value must appear as a key in the `selection` section. * Each downloader will receive a config file with every parameter listed in the `selection`, _except_ for the fields specified by the `partition_keys`. * The downloader config will contain one instance of the cross-product of every key in `partition_keys`. * E.g. `['year', 'month']` will lead to a config set like `[(2015, 01), (2015, 02), (2015, 03), ...]`. * The list of keys will be used to format the `target_path`. > **NOTE**: `target_path` template is totally compatible with Python's standard string formatting. > This includes being able to use named arguments (e.g. 'gs://bucket/{year}/{month}/{day}.nc') as well as specifying formats for strings > (e.g. 'gs://bucket/{year:04d}/{month:02d}/{day:02d}.nc'). ### Creating a date-based directory hierarchy The date-based directory hierarchy can be created using Python's standard string formatting. Below are some examples of how to use `target_path` with Python's standard string formatting.
Examples Note that any parameters that are not relevant to the target path have been omitted. ``` [parameters] target_path=gs://ecmwf-output-test/era5/{date:%%Y/%%m/%%d}.nc partition_keys= date [selection] date=2017-01-01/to/2017-01-02 ``` will create `gs://ecmwf-output-test/era5/2017/01/01.nc` and `gs://ecmwf-output-test/era5/2017/01/02.nc`. ``` [parameters] target_path=gs://ecmwf-output-test/era5/{date:%%Y/%%m/%%d}-pressure-{pressure_level}.nc partition_keys= date pressure_level [selection] pressure_level= 500 date=2017-01-01/to/2017-01-02 ``` will create `gs://ecmwf-output-test/era5/2017/01/01-pressure-500.nc` and `gs://ecmwf-output-test/era5/2017/01/02-pressure-500.nc`. ``` [parameters] target_path=gs://ecmwf-output-test/pressure-{pressure_level}/era5/{date:%%Y/%%m/%%d}.nc partition_keys= date pressure_level [selection] pressure_level= 500 date=2017-01-01/to/2017-01-02 ``` will create `gs://ecmwf-output-test/pressure-500/era5/2017/01/01.nc` and `gs://ecmwf-output-test/pressure-500/era5/2017/01/02.nc`. ``` [parameters] target_path=gs://ecmwf-output-test/era5/{year:04d}/{month:02d}/{day:02d}-pressure-{pressure_level}.nc partition_keys= year month day pressure_level [selection] pressure_level= 500 year= 2017 month= 01 day= 01 02 ``` will create `gs://ecmwf-output-test/era5/2017/01/01-pressure-500.nc` and `gs://ecmwf-output-test/era5/2017/01/02-pressure-500.nc`. > **Note**: Replacing the `target_path` of the above example with this `target_path=gs://ecmwf-output-test/era5/{year}/{month}/{day}-pressure- >{pressure_level}.nc` > > will create > > `gs://ecmwf-output-test/era5/2017/1/1-pressure-500.nc` and > `gs://ecmwf-output-test/era5/2017/1/2-pressure-500.nc`.
### Subsections Sometimes, we'd like to alternate passing certain parameters to each client. For example, certain data sources have limits on the number of API requests that can be made, enforcing a maximum per license. In these cases, the user can specify a parameters subsection. The downloader will overwrite the base parameters with the key-value pairs in each subsection, evenly alternating between each parameter set across the partitions. To specify a subsection, create a new section with the following naming pattern: `[parameters.]`. The `` can be any string, but it's recommended to chose a name that describes the grouping of values in the section. Here's an example of this type of configuration: ``` [parameters] dataset=ecmwf-mars-output target_template=gs://ecmwf-downloads/hres-single-level/{}.nc partition_keys= date [parameters.deepmind] api_key=KKKKK1 api_url=UUUUU1 [parameters.research] api_key=KKKKK2 api_url=UUUUU2 [parameters.cloud] api_key=KKKKK3 api_url=UUUUU3 ``` ## `selection` Section _Parameters used to select desired data_ These will be passed as request parameters to the specified API client. Selections are dependent on how each data source's catalog is structured. ### Copernicus / CDS **License**: By using Copernicus / CDS Dataset, users agree to the terms and conditions specified in [License](https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf) document. **Catalog**: [https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset](https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset) Visit the follow to register / acquire API credentials: _[Install the CDS API key](https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key)_. After, please set the `api_url` and `api_key` arguments in the `parameters` section of your configuration. Alternatively, one can set these values as environment variables: ```shell export CDSAPI_URL=$api_url export CDSAPI_KEY=$api_key ``` For CDS parameter options, check out the [Copernicus documentation](https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset). See [this example](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-pressure-levels?tab=form) for what kind of requests one can make. ### MARS **License**: By using MARS Dataset, users agree to the terms and conditions specified in [License](https://www.ecmwf.int/en/forecasts/accessing-forecasts/licences-available) document. **Catalog**: [https://apps.ecmwf.int/archive-catalogue/](https://apps.ecmwf.int/archive-catalogue/) Visit the following to register / acquire API credentials: _[Install ECWMF Key](https://confluence.ecmwf.int/display/WEBAPI/Access+MARS#AccessMARS-key)_. After, please set the `api_url`, `api_key`, and `api_email` arguments in the `parameters` section of your configuration. Alternatively, one can set these values as environment variables: ```shell export MARSAPI_URL=$api_url export MARSAPI_EMAIL=$api_email export MARSAPI_KEY=$api_key ``` For MARS parameter options, first read up on [MARS request syntax](https://confluence.ecmwf.int/display/WEBAPI/Brief+MARS+request+syntax). For a full range of what data can be requested, please consult the [MARS catalog](https://apps.ecmwf.int/archive-catalogue/). See [these examples](https://confluence.ecmwf.int/display/UDOC/MARS+example+requests) to discover the kinds of requests that can be made. > **NOTE**: MARS data is stored on tape drives. It takes longer for multiple workers to request data than a single > worker. Thus, it's recommended _not_ to set a partition key when writing MARS data configurations.