⛈ weather-dl – Weather Downloader¶
Weather Downloader ingests weather data to cloud buckets, such as Google Cloud Storage (beta).
Features¶
Flexible Pipelines:
weather-dloffers a high degree of control over what is downloaded via configuration files. Separate scripts need not be written to get new data or add parameters. For more, see the configuration docs.Efficient Parallelization: The tool gives you full control over how downloads are sharded and parallelized (with good defaults). This lets you focus on the data and not the plumbing.
Hassle-Free Dev-Ops.
weather-dland Dataflow make it easy to spin up VMs on your behalf with one command. No need to keep your local machine online all night to acquire data.Robust Downloads. If an error occurs when fetching a shard, Dataflow will automatically retry the download for you. Previously downloaded shards will be skipped by default, so you can re-run the tool without having to worry about duplication of work.
Note: Currently, only ECMWF’s MARS and CDS clients are supported. If you’d like to use
weather-dlto work with other data sources, please file an issue (or consider making a contribution).
Usage¶
usage: weather-dl [-h] [-f] [-d] [-l] [-m MANIFEST_LOCATION] config
Weather Downloader ingests weather data to cloud storage.
positional arguments:
config path/to/config.cfg, containing client and data information. Accepts *.cfg and *.json files.
Common options:
-f, --force-download: Force redownload of partitions that were previously downloaded.-d, --dry-run: Run pipeline steps without actually downloading or writing to cloud storage.-l, --local-run: Run locally and download to local hard drive. The data and manifest directory is set by default to ‘<$CWD>/local_run’. The runner will be set toDirectRunner. The only other relevant option is the config and--direct_num_workers-m, --manifest-location MANIFEST_LOCATION: Location of the manifest. Either a Firestore collection URI (‘fs://?projectId= ’), a GCS bucket URI, or ‘noop:// ’ for an in-memory location. -n, --num-requests-per-key: Number of concurrent requests to make per API key. Default: make an educated guess per client & config. Please see the client documentation for more details.
Invoke with -h or --help to see the full range of options.
For further information on how to write config files, please consult this documentation.
Usage Examples:
weather-dl configs/era5_example_config_local_run.cfg --local-run
Preview download with a dry run:
weather-dl configs/mars_example_config.cfg --dry-run
Using DataflowRunner
weather-dl configs/mars_example_config.cfg \
--runner DataflowRunner \
--project $PROJECT \
--temp_location gs://$BUCKET/tmp \
--job_name $JOB_NAME
Using the DataflowRunner and specifying 3 requests per license
weather-dl configs/mars_example_config.cfg \
-n 3 \
--runner DataflowRunner \
--project $PROJECT \
--temp_location gs://$BUCKET/tmp \
--job_name $JOB_NAME
For a full list of how to configure the Dataflow pipeline, please review this table.
Monitoring¶
You can view how your ECMWF API jobs are by visitng the client-specific job queue:
If you use Google Cloud Storage, we recommend using gsutil (link) to
inspect the progress of your downloads. For example:
# Check that the file-sizes of your downloads look alright
gsutil du -h gs://your-cloud-bucket/mars-data/*T00z.nc
# See how many downloads have finished
gsutil du -h gs://your-cloud-bucket/mars-data/*T00z.nc | wc -l
download-status¶
In addition, we’ve provided a simple tool for getting a rough measure of download state. Provided a bucket prefix, it will output the counts of the statuses in that prefix.
usage: download-status [-h] [-m MANIFEST_LOCATION] prefix
Check statuses of `weather-dl` downloads.
positional arguments:
prefix Prefix of the location string (e.g. a cloud bucket); used to filter which statuses to check.
Options
-m,--manifest-location: Specify the location to a manifest; this is the same asweather-dl. Only supports Firebase Manifests.
Usage Examples:
download-status "gs://ecmwf-downloads/hres/world/
...
The current download statuses for 'gs://ecmwf-downloads/hres/world/' are: Counter({'scheduled': 245, 'success': 116, 'in-progress': 4, 'failure': 1}).