datacube.Datacube.load_data#
- static Datacube.load_data(sources, geobox, measurements, resampling=None, fuse_func=None, dask_chunks=None, skip_broken_datasets=False, progress_cbk=None, extra_dims=None, patch_url=None, driver=None, **extra)[source]#
Load data from
group_datasets()into anxarray.Dataset.- Parameters:
sources (
DataArray) – DataArray holding a list ofdatacube.model.Dataset, grouped along the time dimensiongeobox (
GeoBox|Dataset|DataArray) – A GeoBox defining the output spatial projection and resolutionmeasurements (
Mapping[str,Measurement] |list[Measurement]) – list of Measurement objectsresampling (
Union[str,int,Resampling,dict[str,Union[str,int,Resampling]],None]) –The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use
'*'to indicate “apply to all other bands”, for example{'*': 'cubic', 'fmask': 'nearest'}would use cubic for all bands exceptfmaskfor which nearest will be used.Valid values are:
'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average', 'mode', 'gauss', 'max', 'min', 'med', 'q1', 'q3'Default is to use
nearestfor all bands.fuse_func (
Union[Callable[[ndarray,ndarray],None],str,Mapping[str,Union[Callable[[ndarray,ndarray],None],str,None]],None]) –Function used to fuse/combine/reduce data with the
group_byparameter.By default, pixels are only copied where valid (i.e. not nodata) pixels have not yet been copied from previous datasets.
If data (especially categorical data) appears wrong or unexpected in areas where datasets overlap, then an appropriate fuse_func may help.
The fuse_func can perform specific combining steps and can be a dictionary if different fusers are needed per band (similar format to the resampling dict described above).
Fuse functions should be defined as follows:
def my_fuser(dst: np.ndarray, src: np.ndarray) -> None: # Create a boolean mask array of pixels from this src array to copy. mask = pixels_to_copy(src) # Efficiently copy only masked pixels to dst. np.copyto(dst, src, where=mask)
For an example of a more sophisticated fuser function, see GeoscienceAustralia/dea-notebooks
Fuser functions should be importable top-level functions passed by fully qualified name so that they can be serialised to dask workers. For driver-based loads, fuser functions MUST be passed as fully qualified names. For legacy loads Fuser functions may be passed as generic function objects, but this will be deprecated and eventually removed in future releases.
E.g.:
data = dc.load(..., fuse_func="mymodule.my_fuser")
is preferred over:
from mymodule import my_fuser data = dc.load(..., fuse_func=my_fuser)
and this will raise an error:
from mymodule import my_fuser data = dc.load(..., fuse_func=my_fuser, driver="rio")
dask_chunks (
Mapping[str,Union[int,Literal['auto']]] |None) –If provided, the data will be loaded on demand using
dask.array.Array. Should be a dictionary specifying the chunking size for each output dimension. Unspecified dimensions will be auto-guessed, currently this means use chunk size of 1 for non-spatial dimensions and use whole dimension (no chunking unless specified) for spatial dimensions.See the documentation on using xarray with dask for more information.
skip_broken_datasets (
bool) – do not include broken datasets in the result.progress_cbk (
Callable[[int,int],Any] |None) – Int, Int -> None if supplied will be called for every file read with files_processed_so_far, total_files. This is only applicable to non-lazy loads, ignored when using dask.extra_dims (
ExtraDimensions|None) – A ExtraDimensions describing any additional dimensions on top of (t, y, x)patch_url (
Callable[[str],str] |None) – if supplied, will be used to patch/sign the url(s), as required to access some commercial archives.driver (
str|ReaderDriver|None) – Optional. If provided, use the specified driver to load the data.
- Return type:
See also