Dataloader

Dataloader is the component that provides data to models. A dataloader usually (but not necessarily) takes raw information from datasets, and process them into a format needed by the model.

How the Existing Dataloader Works

X-modaler contains a builtin data loading pipeline. It’s good to understand how it works, in case you need to write a custom one.

X-modaler provides a function xmodaler.datasets.build_xmodaler_{train,valtest}_loader that creates a default dataloader from a given config. Here is how build_xmodaler_{train,valtest}_loader works:

  • It takes a helper class (e.g., xmodaler.datasets.common.DatasetFromList) and loads a list[dict] representing the dataset items in a lightweight format. These dataset items are not yet ready to be used by the model (e.g., images are not loaded into memory).

  • Each dict in this list is processed by the class xmodaler.datasets.common.MapDataset. Users can customize this data loading for specific datasets by implementing the __call__ function in a wapper class (e.g., xmodaler.datasets.MSCoCoDataset), which is one of the arguments to initialize MapDataset. The role of the wapper class is to transform the lightweight representation of a dataset item into a format that is ready for the model to consume (including, e.g., read images, caption sampling or convert to torch Tensors).

  • After gathering a list of items, batching schema is handled by defining the argument collate_fn of torch.utils.data.DataLoader in xmodaler.datasets.build_xmodaler_{train,valtest}_loader functions.

The batched data is the output of the data loader. Typically, it’s also the input of model.forward().

Write a Custom Dataloader

Using a different “wapper class” as the argument dataset_mapper with build_xmodaler_{train,valtest}_loader works for most cases of custom data loading. See Use Custom Datasets to custom the “wapper class”.

Use a Custom Dataloader

If you use DefaultTrainer, you can overwrite its build_xmodaler_{train,valtest}_loader method to use your own dataloader.