Using HDF and NetCDF files¶
HDF-EOS files¶
MODIS data, and other NASA data comes packaged as .hdf files. These are HDF-EOS files, specified by NASA and based on the HDF4 specification. More info here, and there are code examples for using HDF-EOS files with different languages here. For a MOD17 example see here.
Since HDF5 is the current and most supported HDF format, it may be easiest to first convert HDF-EOS files to HDF5 files using a conversion tool. Download and unpack then cd into that directory and run ./h4toh5 ~/path/to/file.hdf.
Reading with python¶
If using a HDF5 files, h5py or PyTables can be used to access the data in the file.
If using HDF4 data with python, check these resources:
- PyHDF info
- In anaconda, you may want the the conda-forge package
- hdf4 also seems to work
Also - maybe check this out:
Reprojecting the data¶
Usually these come in the a standard sinusoidal projection and there may or may not be lat/lon data provided in the file. If there is no lat/lon data it must be created using the file metadata (corner coordinates of the tile, cell size, etc). It is possible to do this and reproject with GDAL (which can find the file metadata using GetGeoTransform) and Basemap in Python (see examples here)
If using only pyhdf, this:
https://lpdaac.usgs.gov/tools/modis_reprojection_tool
or this may help:
http://hdfeos.org/software/eos2dump.php
NetCDF files¶
Reading the file¶
Some of this cribbed from:
http://www.hydro.washington.edu/~jhamman/hydro-logic/blog/2013/10/12/plot-netcdf-data/
They can be opened by GDAL (though potentially a little tricky)
ds = gdal.Open(mstmip_dir + 'BIOME-BGC_BG1_Monthly_NEP.nc4')
Or you can just use the ncdf-python module directly
ncdf2 = Dataset(mstmip_dir + 'BIOME-BGC_BG1_Monthly_NEP.nc4')
# Then pull out relevant variables
nep = ncdf2.variables['NEP'][-1] # data for one day month
lats = ncdf2.variables['lat'][:]
lons = ncdf2.variables['lon'][:]
ncdftime = ncdf2.variables['time'][:]
nep_units = ncdf2.variables['NEP'].units
Timestamps¶
You can use the time converter in the ncdftime module
time_conv = utime('days since 1700-01-01 00:00:00')
times = time_conv.num2date(ncdftime)
print(times[num])
Or just create a numpy array td = np.array([np.timedelta64(int(i), 'D') for i in ncdftime ]) times = td + np.datetime64('1700-01-01 00:00:00') print(times[-1])