Supported data on NCI GADI

Overview

Teaching: 15 min
Exercises: 15 min
Compatibility: ESMValTool v2.11.0
Questions
  • What data can I get on Gadi?

  • How can I access and find datasets?

Objectives
  • Gain knowledge of relevant Gadi projects for data

  • How observation data is organised for ESMValTool

  • Understanding download and CMORise functions available in ESMValTool

  • How observation data is organised for the ILAMB

Introduction

An advantage of using a supercomputer like Gadi at NCI, an ESGF node, is that a lot of data is already available which saves us from searching for and downloading large datasets that can’t be handled on other other computers.

What data can I get on Gadi?

Broadly, the datasets available which can be easily found and read in ESMValTool are:

What are the NCI projects I need to join?

On NCI, join relevant NCI projects to access that data. The NCI data catalogue can be searched for more information on the collections. Log into NCI with your NCI account to find and join the projects. These would have been checked when you ran the check_hackathon set up.

Data and NCI projects:

  • You can check if you’re a member or join ct11 with this link.

The NCI data catalog entries with NCI project:

There is also the NCI project zv30 for CMIP7 collaborative development and evaluation which will be covered later in this episode.

Pro tip: Configuration file rootpaths

Remember the config-user.yml file where we can set directories for ESMValTool to look for the data. This is an example from the Gadi esmvaltool-workflow user configuration:

config rootpaths

rootpath:
  CMIP6: [/g/data/oi10/replicas/CMIP6, /g/data/fs38/publications/CMIP6, /g/data/xp65/public/apps/esmvaltool/replicas/CMIP6]
  CMIP5: [/g/data/r87/DRSv3/CMIP5, /g/data/al33/replicas/CMIP5/combined, /g/data/rr3/publications/CMIP5/output1]
  CMIP3: /g/data/r87/DRSv3/CMIP3
  CORDEX: [/g/data/rr3/publications/CORDEX/output, /g/data/al33/replicas/cordex/output]
  OBS: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2
  OBS6: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2
  obs4MIPs: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2]
  ana4mips: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2]
  native6: [/g/data/rt52/era5]
  ACCESS: /g/data/p73/archive/non-CMIP

ESMValTool Tiers

Observational datasets in ESMValTool are organised in tiers reflecting access restriction levels.

ERA5 in native6 and ERA5 daily in OBS6 Tier3

The project native6 refers to a collection of datasets that can be read directly into CMIP6 format for use in ESMValTool recipes. ESMValTool supports this with an extra facets file to map the variable names across. This would have been added to your ~/.esmvaltool/extra_facets directory which is also used to fill out default facet values and help find the data. See more information on extra facets.

The original hourly data from the “ERA5 hourly data on single levels” and “ERA5 hourly data on pressure levels” collections have been transformed into daily means using the ESMValTool (v2.10) Python package. These are Tier 3 datasets for OBS6. Variables available are: 'clt', 'fx', 'pr', 'prw', 'psl', 'rlds', 'rsds', 'rsdt', 'tas', 'tasmax', 'tasmin', 'tdps', 'ua', 'uas', 'vas'

What is the ESMValTool observation data collection?

We have created a collection of observation datasets that can be pulled directly into ESMValTool. The data has been CMORised, meaning they are netCDF files formatted to CF conventions and CMIP projects. There is a table of available Tier 1 and 2 data which can be found here or you can also expand the below:

Observation collection

long_name datasets name
Ambient Aerosol Optical Thickness at 550nm ESACCI-AEROSOL, MODIS od550aer
Surface Upwelling Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rsus
Carbon Mass Flux out of Atmosphere Due to Net Biospheric Production on Land [kgC m-2 s-1] GCP2018, GCP2020 nbp
Surface Temperature CFSR, ESACCI-LST, ESACCI-SST, HadISST, ISCCP-FH, NCEP-NCAR-R1 ts
Daily Maximum Near-Surface Air Temperature E-OBS, NCEP-NCAR-R1 tasmax
Omega (=dp/dt) NCEP-NCAR-R1 wap
Surface Dissolved Inorganic Carbon Concentration OceanSODA-ETHZ dissicos
Liquid Water Path ESACCI-CLOUD, MODIS lwp
Surface Total Alkalinity OceanSODA-ETHZ talkos
Eastward Wind CFSR, NCEP-NCAR-R1 ua
Mole Fraction of N2O TCOM-N2O n2o
Grid-Cell Area for Ocean Variables OceanSODA-ETHZ areacello
Ambient Aerosol Optical Depth at 870nm ESACCI-AEROSOL od870aer
Surface Carbonate Ion Concentration OceanSODA-ETHZ co3os
Surface Upwelling Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rlus
Dissolved Oxygen Concentration CT2019, ESACCI-GHG, ESRL, GCP2018, GCP2020, Landschuetzer2016, Landschuetzer2020, OceanSODA-ETHZ, Scripps-CO2-KUM, WOA o2
Specific Humidity AIRS, AIRS-2-1, HALOE, JRA-25, NCEP-NCAR-R1, NOAA-CIRES-20CR hus
TOA Outgoing Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1, NOAA-CIRES-20CR rsut
Sea Water Salinity CALIPSO-GOCCP, ESACCI-LANDCOVER, ESACCI-SEA-SURFACE-SALINITY, PHC, WOA so
Percentage Crop Cover ESACCI-LANDCOVER cropFrac
Percentage of the Grid Cell Occupied by Land (Including Lakes) BerkeleyEarth sftlf
Sea Surface Temperature ATSR, HadISST, WOA tos
Total Dissolved Inorganic Silicon Concentration CFSR, GLODAP, HadISST, MOBO-DIC_MPIM, OSI-450-nh, OSI-450-sh, OceanSODA-ETHZ, PIOMAS, WOA si
Daily Minimum Near-Surface Air Temperature E-OBS, NCEP-NCAR-R1 tasmin
Dissolved Inorganic Carbon Concentration GLODAP, MOBO-DIC_MPIM, OceanSODA-ETHZ dissic
Water Vapor Path ISCCP-FH, JRA-25, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, SSMI, SSMI-MERIS prw
Surface Downwelling Longwave Radiation CERES-EBAF, ISCCP-FH, JRA-55 rlds
Geopotential Height CFSR, NCEP-NCAR-R1 zg
Northward Wind CFSR, NCEP-NCAR-R1 va
Relative Humidity AIRS-2-0, AIRS-2-1, NCEP-DOE-R2, NCEP-NCAR-R1 hur
Tree Cover Percentage ESACCI-LANDCOVER treeFrac
Percentage Cover by Shrub ESACCI-LANDCOVER shrubFrac
Bare Soil Percentage Area Coverage ESACCI-LANDCOVER baresoilFrac
Percentage Cloud Cover CALIOP, CALIPSO-GOCCP, CloudSat, ESACCI-CLOUD, ISCCP, JRA-25, JRA-55, MODIS, MODIS-1-0, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PATMOS-x cl
Total Alkalinity GLODAP, OceanSODA-ETHZ talk
Surface Upwelling Clear-Sky Shortwave Radiation CERES-EBAF, ESACCI-CLOUD rsuscs
Mole Fraction of CH4 ESACCI-GHG, TCOM-CH4 ch4
Precipitation CRU, E-OBS, ESACCI-OZONE, GHCN, GPCC, GPCP-SG, ISCCP-FH, JRA-25, JRA-55, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PERSIANN-CDR, REGEN, SSMI, SSMI-MERIS, TRMM-L3, WFDE5, AGCD pr
Ambient Fine Aerosol Optical Depth at 550nm ESACCI-AEROSOL od550lt1aer
Sea Surface Salinity ESACCI-SEA-SURFACE-SALINITY, WOA sos
Natural Grass Area Percentage ESACCI-LANDCOVER grassFrac
Primary Organic Carbon Production by All Types of Phytoplankton Eppley-VGPM-MODIS intpp
Eastward Near-Surface Wind CFSR uas
Air Temperature AIRS, AIRS-2-1, BerkeleyEarth, CFSR, CRU, CowtanWay, E-OBS, GHCN-CAMS, GISTEMP, GLODAP, HadCRUT3, HadCRUT4, HadCRUT5, ISCCP-FH, Kadow2020, NCEP-DOE-R2, NCEP-NCAR-R1, NOAAGlobalTemp, OceanSODA-ETHZ, PHC, WFDE5, WOA ta
Near-Surface Air Temperature BerkeleyEarth, CFSR, CRU, CowtanWay, E-OBS, GHCN-CAMS, GISTEMP, HadCRUT3, HadCRUT4, HadCRUT5, ISCCP-FH, Kadow2020, NCEP-NCAR-R1, NOAAGlobalTemp, WFDE5 tas
Surface Downwelling Clear-Sky Longwave Radiation CERES-EBAF, JRA-55 rldscs
Ambient Aerosol Absorption Optical Thickness at 550nm ESACCI-AEROSOL abs550aer
Total Dissolved Inorganic Phosphorus Concentration WOA po4
Sea Level Pressure E-OBS, JRA-55, NCEP-NCAR-R1 psl
Sea Water Potential Temperature PHC, WOA thetao
CALIPSO Percentage Cloud Cover CALIPSO-GOCCP clcalipso
Surface Aqueous Partial Pressure of CO2 Landschuetzer2016, Landschuetzer2020, OceanSODA-ETHZ spco2
Mass Concentration of Total Phytoplankton Expressed as Chlorophyll in Sea Water ESACCI-OC chl
Surface pH OceanSODA-ETHZ phos
TOA Outgoing Clear-Sky Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1 rlutcs
Total Column Ozone ESACCI-OZONE toz
Near-Surface Relative Humidity NCEP-NCAR-R1 hurs
Surface Downward Mass Flux of Carbon as CO2 [kgC m-2 s-1] GCP2018, GCP2020, Landschuetzer2016, OceanSODA-ETHZ fgco2
Atmosphere CO2 CT2019, ESRL, Scripps-CO2-KUM co2s
pH GLODAP, OceanSODA-ETHZ ph
Condensed Water Path MODIS, NOAA-CIRES-20CR clwvi
Daily-Mean Near-Surface Wind Speed CFSR, NCEP-NCAR-R1 sfcWind
Surface Downwelling Shortwave Radiation CERES-EBAF, ISCCP-FH rsds
TOA Outgoing Clear-Sky Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1 rsutcs
Total Cloud Cover Percentage CALIOP, CloudSat, ESACCI-CLOUD, ISCCP, JRA-25, JRA-55, MODIS, MODIS-1-0, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PATMOS-x clt
Convective Cloud Area Percentage CALIOP, CALIPSO-GOCCP clc
Northward Near-Surface Wind CFSR vas
Surface Air Pressure CALIPSO-GOCCP, E-OBS, ISCCP-FH, JRA-55, NCEP-NCAR-R1 ps
TOA Outgoing Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1, NOAA-CIRES-20CR rlut
Delta CO2 Partial Pressure Landschuetzer2016 dpco2
Surface Downwelling Clear-Sky Shortwave Radiation CERES-EBAF rsdscs
TOA Incident Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rsdt
Ice Water Path ESACCI-CLOUD clivi

ESMValTool data download and CMORise

ESMValTool has the capability to download and format certain observational datasets with data commands, see here for more detail and a table of datasets available to download and format. These are the download and format commands:

esmvaltool data download --config_file <path to config-user.yml>  <dataset-name>
esmvaltool data format --config_file <path to config-user.yml>  <dataset-name>

You will find the ESMValTool facet project for observational data can be OBS or OBS6 where OBS is CMIP5 format and OBS6 is CMIP6 format.

Finding data examples

Find data in recipe

Some facets can have glob patterns or wildcards for values. The facet project cannot be a wildcard, see reference.

An example recipe that will use all CMIP6 datasets and all ensemble members which have a ‘historical’ experiment could look like this:

Solution

datasets:
 - project: CMIP6
   exp: historical
   dataset: '*'
   institute: '*'
   ensemble: '*'
   grid: '*'

Find data using esmvalcore

This can be utilised through the esmvalcore API. To find all available datasets from ESGF which may not be available locally, set search_esgf to always. This example looks for all ensembles for a dataset.

Solution

from esmvalcore import Dataset
from esmvalcore.config import CFG

CFG['search_esgf'] = 'always'
dataset_search = Dataset(
    short_name='tos',
    mip='Omon',
    project='CMIP6',
    exp='historical',
    dataset='ACCESS-ESM1-5',
    ensemble='*',
    grid='gn',
)
ensemble_datasets = list(dataset_search.from_files())
ensemble_datasets

Find all available datasets for a variable in CMIP6

Find all datasets available for variable tos in CMIP6 in concatenated experiments ‘historical’ and ‘ssp585’ for the time range 1850 to 2100.

Solution

template = Dataset(
    short_name='tos',
    mip='Omon',
    activity='CMIP',
    institute='*', # facet req. to search locally
    project='CMIP6',
    exp= ['historical', 'ssp585'],
    dataset='*',  #
    ensemble='*',
    grid='*',
    timerange='1850/2100'  
)

all_datasets = list(template.from_files())
all_datasets

What is ILAMB-Data?

The ILAMB community maintains a collection of reference datasets that have been carefully formatted following CF conventions. ACCESS-NRI hosts a replica of this ILAMB-data collection on NCI-Gadi as part of the ACCESS-NRI Replicated Datasets for Climate Model Evaluation NCI data collection, which can be accessed here. While we ensure this replica is regularly updated, the datasets were initially downloaded from primary sources and reformatted for use within the ILAMB framework. For specific reference information, please check the global attributes within the files.

See something wrong in a dataset? Have a suggestion? This collection is continually evolving and depends on community input. Please submit request for new observation datasets support on the ACCESS-Hive Forum. You can also track progress by following the ILAMB-Data GitHub repository or check out what the ILAMB community users are working on currently on the ILAMB Dataset Integration project board.

Observation collection

Albedo CERESed4.1, GEWEX.SRB
Biomass ESACCI, GEOCARBON, NBCD2000, Saatchi2011, Thurner, USForest, XuSaatchi2021
Burned Area GFED4.1S
Carbon Dioxide NOAA.Emulated, HIPPOAToM
Diurnal Max Temperature CRU4.02
Diurnal Min Temperature CRU4.02
Diurnal Temperature Range CRU4.02
Ecosystem Respiration FLUXNET2015, FLUXCOM
Evapotranspiration GLEAMv3.3a, MODIS, MOD16A2
Global Net Ecosystem Carbon Balance GCP, Hoffman
Gross Primary Productivity FLUXNET2015, FLUXCOM, WECANN
Ground Heat Flux CLASS
Latent Heat FLUXNET2015, FLUXCOM, DOLCE, CLASS, WECANN
Leaf Area Index AVHRR, AVH15C1, MODIS
Methane FluxnetANN
Net Ecosystem Exchange FLUXNET2015
Nitrogen Fixation Davies-Barnard
Permafrost Brown2002, Obu2018
Precipitation CMAPv1904, FLUXNET2015, GPCCv2018, GPCPv2.3, CLASS
Runoff Dai, LORA, CLASS
Sensible Heat FLUXNET2015, FLUXCOM, CLASS, WECANN
Snow Water Equivalent CanSISE
Soil Carbon HWSD, NCSCDV22
Surface Air Temperature CRU4.02, FLUXNET2015
Surface Downward LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Surface Downward SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Surface Net LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Surface Net Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN, CLASS
Surface Net SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Surface Relative Humidity ERA5, CRU4.02
Surface Soil Moisture WangMao
Surface Upward LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Surface Upward SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN
Terrestrial Water Storage Anomaly GRACE

IOMB-DATA list

Alkalinity GLODAP2.2022
Anthropogenic DIC 1994-2007 Gruber, OCIM
Chlorophyll GLODAP2.2022, SeaWIFS, MODISAqua
Dissolved Inorganic Carbon GLODAP2.2022
Nitrate WOA2018, GLODAP2.2022
Oxygen WOA2018, GLODAP2.2022
Phosphate WOA2018, GLODAP2.2022
Salinity WOA2018, GLODAP2.2022
Silicate WOA2018, GLODAP2.2022
Temperature WOA2018, GLODAP2.2022
Vertical Temperature Gradient WOA2018, GLODAP2.2022

The CMIP7 collaborative development and evaluation project (zv30) on NCI-Gadi

The Australian CMIP7 community, supported by ACCESS-NRI, aims to establish a data space for effectively comparing and evaluating CMIP experiments in preparation for Australia’s forthcoming submission to CMIP7. This shared platform will serve as a collaborative hub, bringing together researchers and model developers to assess model outputs. It will enable comparisons with previous simulations and CMIP6 models, facilitating the real-time exchange of feedback. Additionally, this space will support iterative model improvement by providing a platform for testing and refining model configurations.

This collection is part of the zv30 project on NCI, managed by ACCESS-NRI. Similar to the NCI National data collections, users only have read access to this data. To share a dataset for model evaluation purposes, users must prepare the data according to CF conventions (i.e., CMORize the data) and submit a request to copy the dataset to the zv30 project. To do so, please contact Romain Beucher or Clare Richards at ACCESS-NRI.

If you have not done so already, please join the zv30 project

ZV30 collection in ESMValTool

ESMValTool-workflow on Gadi has been configured to be able to use this collection specifically and differentiate from the rest of the CMIP6 collections.

You can do this by specifying the project facet as ZV30.

In recipe

datasets:
 - project: ZV30
   exp: piControl
   dataset: '*'
   institute: '*'
   ensemble: '*'
   grid: '*'

Key Points

  • There is supported data on Gadi to start with using ESMValTool and the ILAMB