Supported data on NCI GADI
Overview
Teaching: 15 min
Exercises: 15 min
Compatibility: ESMValTool v2.11.0Questions
What data can I get on Gadi?
How can I access and find datasets?
Objectives
Gain knowledge of relevant Gadi projects for data
How observation data is organised for ESMValTool
Understanding download and CMORise functions available in ESMValTool
How observation data is organised for the ILAMB
Introduction
An advantage of using a supercomputer like Gadi at NCI, an ESGF node, is that a lot of data is already available which saves us from searching for and downloading large datasets that can’t be handled on other other computers.
What data can I get on Gadi?
Broadly, the datasets available which can be easily found and read in ESMValTool are:
- Observational data
- ERA5 data
- published CMIP6 data
- published CMIP5 data
What are the NCI projects I need to join?
On NCI, join relevant NCI projects to access that data. The NCI data catalogue can be searched for more information on the collections. Log into NCI with your NCI account to find and join the projects. These would have been checked when you ran the
check_hackathon
set up.Data and NCI projects:
- You can check if you’re a member or join ct11 with this link.
The NCI data catalog entries with NCI project:
- ESMValTool observation data collection: ct11
- ESMValTool ERA5 Daily datasets: ct11
- ERA5 : rt52 and ERA5-Land: zz93
- CMIP6 replicas: oi10 and Australian published: fs38 and
- CMIP5 replicas: al33 and Australian: rr3
There is also the NCI project zv30 for CMIP7 collaborative development and evaluation which will be covered later in this episode.
Pro tip: Configuration file rootpaths
Remember the
config-user.yml
file where we can set directories for ESMValTool to look for the data. This is an example from the Gadiesmvaltool-workflow
user configuration:config rootpaths
rootpath: CMIP6: [/g/data/oi10/replicas/CMIP6, /g/data/fs38/publications/CMIP6, /g/data/xp65/public/apps/esmvaltool/replicas/CMIP6] CMIP5: [/g/data/r87/DRSv3/CMIP5, /g/data/al33/replicas/CMIP5/combined, /g/data/rr3/publications/CMIP5/output1] CMIP3: /g/data/r87/DRSv3/CMIP3 CORDEX: [/g/data/rr3/publications/CORDEX/output, /g/data/al33/replicas/cordex/output] OBS: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2 OBS6: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2 obs4MIPs: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2] ana4mips: [/g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2] native6: [/g/data/rt52/era5] ACCESS: /g/data/p73/archive/non-CMIP
ESMValTool Tiers
Observational datasets in ESMValTool are organised in tiers reflecting access restriction levels.
- Tier 1 Primarily Obs4MIPS where data is formatted, freely available and ready to use in ESMValTool.
- Tier 2 Data is freely available, CMORised datasets are available in ct11 in ESMValTool.
- Tier 3 These datasets have access rectrictions, licensing and acknowledgement may be required so direct access to the data cannot be provided. ACCESS-NRI can provide support to download and CMORise.
ERA5 in native6 and ERA5 daily in OBS6 Tier3
- native6-era5
The project native6
refers to a collection of datasets that can be read directly into CMIP6 format for
use in ESMValTool recipes. ESMValTool supports this with an extra facets file to map the variable names
across. This would have been added to your ~/.esmvaltool/extra_facets
directory which is also used to fill
out default facet values and help find the data. See more information on
extra facets.
- ERA5 daily derived
The original hourly data from the “ERA5 hourly data on single levels” and “ERA5 hourly data on pressure levels”
collections have been transformed into daily means using the ESMValTool (v2.10) Python package.
These are Tier 3 datasets for OBS6. Variables available are:
'clt', 'fx', 'pr', 'prw', 'psl', 'rlds', 'rsds', 'rsdt',
'tas', 'tasmax', 'tasmin', 'tdps', 'ua', 'uas', 'vas'
What is the ESMValTool observation data collection?
We have created a collection of observation datasets that can be pulled directly into ESMValTool. The data has been CMORised, meaning they are netCDF files formatted to CF conventions and CMIP projects. There is a table of available Tier 1 and 2 data which can be found here or you can also expand the below:
Observation collection
long_name datasets name Ambient Aerosol Optical Thickness at 550nm ESACCI-AEROSOL, MODIS od550aer Surface Upwelling Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rsus Carbon Mass Flux out of Atmosphere Due to Net Biospheric Production on Land [kgC m-2 s-1] GCP2018, GCP2020 nbp Surface Temperature CFSR, ESACCI-LST, ESACCI-SST, HadISST, ISCCP-FH, NCEP-NCAR-R1 ts Daily Maximum Near-Surface Air Temperature E-OBS, NCEP-NCAR-R1 tasmax Omega (=dp/dt) NCEP-NCAR-R1 wap Surface Dissolved Inorganic Carbon Concentration OceanSODA-ETHZ dissicos Liquid Water Path ESACCI-CLOUD, MODIS lwp Surface Total Alkalinity OceanSODA-ETHZ talkos Eastward Wind CFSR, NCEP-NCAR-R1 ua Mole Fraction of N2O TCOM-N2O n2o Grid-Cell Area for Ocean Variables OceanSODA-ETHZ areacello Ambient Aerosol Optical Depth at 870nm ESACCI-AEROSOL od870aer Surface Carbonate Ion Concentration OceanSODA-ETHZ co3os Surface Upwelling Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rlus Dissolved Oxygen Concentration CT2019, ESACCI-GHG, ESRL, GCP2018, GCP2020, Landschuetzer2016, Landschuetzer2020, OceanSODA-ETHZ, Scripps-CO2-KUM, WOA o2 Specific Humidity AIRS, AIRS-2-1, HALOE, JRA-25, NCEP-NCAR-R1, NOAA-CIRES-20CR hus TOA Outgoing Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1, NOAA-CIRES-20CR rsut Sea Water Salinity CALIPSO-GOCCP, ESACCI-LANDCOVER, ESACCI-SEA-SURFACE-SALINITY, PHC, WOA so Percentage Crop Cover ESACCI-LANDCOVER cropFrac Percentage of the Grid Cell Occupied by Land (Including Lakes) BerkeleyEarth sftlf Sea Surface Temperature ATSR, HadISST, WOA tos Total Dissolved Inorganic Silicon Concentration CFSR, GLODAP, HadISST, MOBO-DIC_MPIM, OSI-450-nh, OSI-450-sh, OceanSODA-ETHZ, PIOMAS, WOA si Daily Minimum Near-Surface Air Temperature E-OBS, NCEP-NCAR-R1 tasmin Dissolved Inorganic Carbon Concentration GLODAP, MOBO-DIC_MPIM, OceanSODA-ETHZ dissic Water Vapor Path ISCCP-FH, JRA-25, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, SSMI, SSMI-MERIS prw Surface Downwelling Longwave Radiation CERES-EBAF, ISCCP-FH, JRA-55 rlds Geopotential Height CFSR, NCEP-NCAR-R1 zg Northward Wind CFSR, NCEP-NCAR-R1 va Relative Humidity AIRS-2-0, AIRS-2-1, NCEP-DOE-R2, NCEP-NCAR-R1 hur Tree Cover Percentage ESACCI-LANDCOVER treeFrac Percentage Cover by Shrub ESACCI-LANDCOVER shrubFrac Bare Soil Percentage Area Coverage ESACCI-LANDCOVER baresoilFrac Percentage Cloud Cover CALIOP, CALIPSO-GOCCP, CloudSat, ESACCI-CLOUD, ISCCP, JRA-25, JRA-55, MODIS, MODIS-1-0, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PATMOS-x cl Total Alkalinity GLODAP, OceanSODA-ETHZ talk Surface Upwelling Clear-Sky Shortwave Radiation CERES-EBAF, ESACCI-CLOUD rsuscs Mole Fraction of CH4 ESACCI-GHG, TCOM-CH4 ch4 Precipitation CRU, E-OBS, ESACCI-OZONE, GHCN, GPCC, GPCP-SG, ISCCP-FH, JRA-25, JRA-55, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PERSIANN-CDR, REGEN, SSMI, SSMI-MERIS, TRMM-L3, WFDE5, AGCD pr Ambient Fine Aerosol Optical Depth at 550nm ESACCI-AEROSOL od550lt1aer Sea Surface Salinity ESACCI-SEA-SURFACE-SALINITY, WOA sos Natural Grass Area Percentage ESACCI-LANDCOVER grassFrac Primary Organic Carbon Production by All Types of Phytoplankton Eppley-VGPM-MODIS intpp Eastward Near-Surface Wind CFSR uas Air Temperature AIRS, AIRS-2-1, BerkeleyEarth, CFSR, CRU, CowtanWay, E-OBS, GHCN-CAMS, GISTEMP, GLODAP, HadCRUT3, HadCRUT4, HadCRUT5, ISCCP-FH, Kadow2020, NCEP-DOE-R2, NCEP-NCAR-R1, NOAAGlobalTemp, OceanSODA-ETHZ, PHC, WFDE5, WOA ta Near-Surface Air Temperature BerkeleyEarth, CFSR, CRU, CowtanWay, E-OBS, GHCN-CAMS, GISTEMP, HadCRUT3, HadCRUT4, HadCRUT5, ISCCP-FH, Kadow2020, NCEP-NCAR-R1, NOAAGlobalTemp, WFDE5 tas Surface Downwelling Clear-Sky Longwave Radiation CERES-EBAF, JRA-55 rldscs Ambient Aerosol Absorption Optical Thickness at 550nm ESACCI-AEROSOL abs550aer Total Dissolved Inorganic Phosphorus Concentration WOA po4 Sea Level Pressure E-OBS, JRA-55, NCEP-NCAR-R1 psl Sea Water Potential Temperature PHC, WOA thetao CALIPSO Percentage Cloud Cover CALIPSO-GOCCP clcalipso Surface Aqueous Partial Pressure of CO2 Landschuetzer2016, Landschuetzer2020, OceanSODA-ETHZ spco2 Mass Concentration of Total Phytoplankton Expressed as Chlorophyll in Sea Water ESACCI-OC chl Surface pH OceanSODA-ETHZ phos TOA Outgoing Clear-Sky Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1 rlutcs Total Column Ozone ESACCI-OZONE toz Near-Surface Relative Humidity NCEP-NCAR-R1 hurs Surface Downward Mass Flux of Carbon as CO2 [kgC m-2 s-1] GCP2018, GCP2020, Landschuetzer2016, OceanSODA-ETHZ fgco2 Atmosphere CO2 CT2019, ESRL, Scripps-CO2-KUM co2s pH GLODAP, OceanSODA-ETHZ ph Condensed Water Path MODIS, NOAA-CIRES-20CR clwvi Daily-Mean Near-Surface Wind Speed CFSR, NCEP-NCAR-R1 sfcWind Surface Downwelling Shortwave Radiation CERES-EBAF, ISCCP-FH rsds TOA Outgoing Clear-Sky Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1 rsutcs Total Cloud Cover Percentage CALIOP, CloudSat, ESACCI-CLOUD, ISCCP, JRA-25, JRA-55, MODIS, MODIS-1-0, NCEP-DOE-R2, NCEP-NCAR-R1, NOAA-CIRES-20CR, PATMOS-x clt Convective Cloud Area Percentage CALIOP, CALIPSO-GOCCP clc Northward Near-Surface Wind CFSR vas Surface Air Pressure CALIPSO-GOCCP, E-OBS, ISCCP-FH, JRA-55, NCEP-NCAR-R1 ps TOA Outgoing Longwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH, JRA-25, JRA-55, NCEP-NCAR-R1, NOAA-CIRES-20CR rlut Delta CO2 Partial Pressure Landschuetzer2016 dpco2 Surface Downwelling Clear-Sky Shortwave Radiation CERES-EBAF rsdscs TOA Incident Shortwave Radiation CERES-EBAF, ESACCI-CLOUD, ISCCP-FH rsdt Ice Water Path ESACCI-CLOUD clivi
ESMValTool data download and CMORise
ESMValTool has the capability to download and format certain observational datasets with data commands, see here for more detail and a table of datasets available to download and format. These are the download and format commands:
esmvaltool data download --config_file <path to config-user.yml> <dataset-name>
esmvaltool data format --config_file <path to config-user.yml> <dataset-name>
You will find the ESMValTool facet project
for observational data can be OBS
or OBS6
where OBS is CMIP5 format and OBS6 is CMIP6 format.
Finding data examples
Find data in recipe
Some facets can have glob patterns or wildcards for values. The facet
project
cannot be a wildcard, see reference.An example recipe that will use all CMIP6 datasets and all ensemble members which have a ‘historical’ experiment could look like this:
Solution
datasets: - project: CMIP6 exp: historical dataset: '*' institute: '*' ensemble: '*' grid: '*'
Find data using esmvalcore
This can be utilised through the esmvalcore API. To find all available datasets from ESGF which may not be available locally, set
search_esgf
to always. This example looks for all ensembles for a dataset.Solution
from esmvalcore import Dataset from esmvalcore.config import CFG CFG['search_esgf'] = 'always' dataset_search = Dataset( short_name='tos', mip='Omon', project='CMIP6', exp='historical', dataset='ACCESS-ESM1-5', ensemble='*', grid='gn', ) ensemble_datasets = list(dataset_search.from_files()) ensemble_datasets
Find all available datasets for a variable in CMIP6
Find all datasets available for variable
tos
in CMIP6 in concatenated experiments ‘historical’ and ‘ssp585’ for the time range 1850 to 2100.Solution
template = Dataset( short_name='tos', mip='Omon', activity='CMIP', institute='*', # facet req. to search locally project='CMIP6', exp= ['historical', 'ssp585'], dataset='*', # ensemble='*', grid='*', timerange='1850/2100' ) all_datasets = list(template.from_files()) all_datasets
What is ILAMB-Data?
The ILAMB community maintains a collection of reference datasets that have been carefully formatted following CF conventions. ACCESS-NRI hosts a replica of this ILAMB-data collection on NCI-Gadi as part of the ACCESS-NRI Replicated Datasets for Climate Model Evaluation NCI data collection, which can be accessed here. While we ensure this replica is regularly updated, the datasets were initially downloaded from primary sources and reformatted for use within the ILAMB framework. For specific reference information, please check the global attributes within the files.
See something wrong in a dataset? Have a suggestion? This collection is continually evolving and depends on community input. Please submit request for new observation datasets support on the ACCESS-Hive Forum. You can also track progress by following the ILAMB-Data GitHub repository or check out what the ILAMB community users are working on currently on the ILAMB Dataset Integration project board.
Observation collection
Albedo CERESed4.1, GEWEX.SRB Biomass ESACCI, GEOCARBON, NBCD2000, Saatchi2011, Thurner, USForest, XuSaatchi2021 Burned Area GFED4.1S Carbon Dioxide NOAA.Emulated, HIPPOAToM Diurnal Max Temperature CRU4.02 Diurnal Min Temperature CRU4.02 Diurnal Temperature Range CRU4.02 Ecosystem Respiration FLUXNET2015, FLUXCOM Evapotranspiration GLEAMv3.3a, MODIS, MOD16A2 Global Net Ecosystem Carbon Balance GCP, Hoffman Gross Primary Productivity FLUXNET2015, FLUXCOM, WECANN Ground Heat Flux CLASS Latent Heat FLUXNET2015, FLUXCOM, DOLCE, CLASS, WECANN Leaf Area Index AVHRR, AVH15C1, MODIS Methane FluxnetANN Net Ecosystem Exchange FLUXNET2015 Nitrogen Fixation Davies-Barnard Permafrost Brown2002, Obu2018 Precipitation CMAPv1904, FLUXNET2015, GPCCv2018, GPCPv2.3, CLASS Runoff Dai, LORA, CLASS Sensible Heat FLUXNET2015, FLUXCOM, CLASS, WECANN Snow Water Equivalent CanSISE Soil Carbon HWSD, NCSCDV22 Surface Air Temperature CRU4.02, FLUXNET2015 Surface Downward LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Surface Downward SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Surface Net LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Surface Net Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN, CLASS Surface Net SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Surface Relative Humidity ERA5, CRU4.02 Surface Soil Moisture WangMao Surface Upward LW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Surface Upward SW Radiation CERESed4.1, FLUXNET2015, GEWEX.SRB, WRMC.BSRN Terrestrial Water Storage Anomaly GRACE IOMB-DATA list
Alkalinity GLODAP2.2022 Anthropogenic DIC 1994-2007 Gruber, OCIM Chlorophyll GLODAP2.2022, SeaWIFS, MODISAqua Dissolved Inorganic Carbon GLODAP2.2022 Nitrate WOA2018, GLODAP2.2022 Oxygen WOA2018, GLODAP2.2022 Phosphate WOA2018, GLODAP2.2022 Salinity WOA2018, GLODAP2.2022 Silicate WOA2018, GLODAP2.2022 Temperature WOA2018, GLODAP2.2022 Vertical Temperature Gradient WOA2018, GLODAP2.2022
The CMIP7 collaborative development and evaluation project (zv30) on NCI-Gadi
The Australian CMIP7 community, supported by ACCESS-NRI, aims to establish a data space for effectively comparing and evaluating CMIP experiments in preparation for Australia’s forthcoming submission to CMIP7. This shared platform will serve as a collaborative hub, bringing together researchers and model developers to assess model outputs. It will enable comparisons with previous simulations and CMIP6 models, facilitating the real-time exchange of feedback. Additionally, this space will support iterative model improvement by providing a platform for testing and refining model configurations.
This collection is part of the zv30
project on NCI, managed by ACCESS-NRI. Similar to the NCI National data collections, users only have read access to this data. To share a dataset for model evaluation purposes, users must prepare the data according to CF conventions (i.e., CMORize the data) and submit a request to copy the dataset to the zv30
project. To do so, please contact Romain Beucher or Clare Richards at ACCESS-NRI.
If you have not done so already, please join the zv30 project
ZV30 collection in ESMValTool
ESMValTool-workflow on Gadi has been configured to be able to use this collection specifically and differentiate from the rest of the CMIP6 collections.
You can do this by specifying the project facet as
ZV30
.In recipe
datasets: - project: ZV30 exp: piControl dataset: '*' institute: '*' ensemble: '*' grid: '*'
Key Points
There is supported data on Gadi to start with using ESMValTool and the ILAMB