datacube.Datacube.load_data#

static Datacube.load_data(sources, geobox, measurements, resampling=None, fuse_func=None, dask_chunks=None, skip_broken_datasets=False, progress_cbk=None, extra_dims=None, patch_url=None, driver=None, **extra)[source]#

Load data from group_datasets() into an xarray.Dataset.

Parameters:
  • sources (DataArray) – DataArray holding a list of datacube.model.Dataset, grouped along the time dimension

  • geobox (GeoBox | Dataset | DataArray) – A GeoBox defining the output spatial projection and resolution

  • measurements (Mapping[str, Measurement] | list[Measurement]) – list of Measurement objects

  • resampling (Union[str, int, Resampling, dict[str, Union[str, int, Resampling]], None]) –

    The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use '*' to indicate “apply to all other bands”, for example {'*': 'cubic', 'fmask': 'nearest'} would use cubic for all bands except fmask for which nearest will be used.

    Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average', 'mode', 'gauss', 'max', 'min', 'med', 'q1', 'q3'

    Default is to use nearest for all bands.

  • fuse_func (Union[Callable[[ndarray, ndarray], None], str, Mapping[str, Union[Callable[[ndarray, ndarray], None], str, None]], None]) –

    Function used to fuse/combine/reduce data with the group_by parameter.

    By default, pixels are only copied where valid (i.e. not nodata) pixels have not yet been copied from previous datasets.

    If data (especially categorical data) appears wrong or unexpected in areas where datasets overlap, then an appropriate fuse_func may help.

    The fuse_func can perform specific combining steps and can be a dictionary if different fusers are needed per band (similar format to the resampling dict described above).

    Fuse functions should be defined as follows:

    def my_fuser(dst: np.ndarray, src: np.ndarray) -> None:
        # Create a boolean mask array of pixels from this src array to copy.
        mask = pixels_to_copy(src)
    
        # Efficiently copy only masked pixels to dst.
        np.copyto(dst, src, where=mask)
    

    For an example of a more sophisticated fuser function, see GeoscienceAustralia/dea-notebooks

    Fuser functions should be importable top-level functions passed by fully qualified name so that they can be serialised to dask workers. For driver-based loads, fuser functions MUST be passed as fully qualified names. For legacy loads Fuser functions may be passed as generic function objects, but this will be deprecated and eventually removed in future releases.

    E.g.:

    data = dc.load(..., fuse_func="mymodule.my_fuser")
    

    is preferred over:

    from mymodule import my_fuser
    
    data = dc.load(..., fuse_func=my_fuser)
    

    and this will raise an error:

    from mymodule import my_fuser
    
    data = dc.load(..., fuse_func=my_fuser, driver="rio")
    

  • dask_chunks (Mapping[str, Union[int, Literal['auto']]] | None) –

    If provided, the data will be loaded on demand using dask.array.Array. Should be a dictionary specifying the chunking size for each output dimension. Unspecified dimensions will be auto-guessed, currently this means use chunk size of 1 for non-spatial dimensions and use whole dimension (no chunking unless specified) for spatial dimensions.

    See the documentation on using xarray with dask for more information.

  • skip_broken_datasets (bool) – do not include broken datasets in the result.

  • progress_cbk (Callable[[int, int], Any] | None) – Int, Int -> None if supplied will be called for every file read with files_processed_so_far, total_files. This is only applicable to non-lazy loads, ignored when using dask.

  • extra_dims (ExtraDimensions | None) – A ExtraDimensions describing any additional dimensions on top of (t, y, x)

  • patch_url (Callable[[str], str] | None) – if supplied, will be used to patch/sign the url(s), as required to access some commercial archives.

  • driver (str | ReaderDriver | None) – Optional. If provided, use the specified driver to load the data.

Return type:

Dataset