xmodaler.engine¶
- xmodaler.engine.launch(main_func, num_gpus_per_machine, num_machines=1, machine_rank=0, dist_url=None, args=(), timeout=datetime.timedelta(seconds=1800))[source]¶
Launch multi-gpu or distributed training. This function must be called on all machines involved in the training. It will spawn child processes (defined by
num_gpus_per_machine
) on each machine.- Parameters:
main_func – a function that will be called by main_func(*args)
num_gpus_per_machine (int) – number of GPUs per machine
num_machines (int) – the total number of machines
machine_rank (int) – the rank of this machine
dist_url (str) – url to connect to for distributed jobs, including protocol e.g. “tcp://127.0.0.1:8686”. Can be set to “auto” to automatically select a free port on localhost
timeout (timedelta) – timeout of the distributed workers
args (tuple) – arguments passed to main_func
- xmodaler.engine.default_argument_parser(epilog=None)[source]¶
Create a parser with some common arguments used by X-modaler users.
- Parameters:
epilog (str) – epilog passed to ArgumentParser describing the usage.
- Return type:
argparse.ArgumentParser
- xmodaler.engine.default_setup(cfg, args)[source]¶
Perform some basic common setups at the beginning of a job, including:
Set up the X-modaler logger
Log basic information about environment, cmdline arguments, and config
Backup the config to the output directory
- Parameters:
cfg (CfgNode) – the full config to be used
args (argparse.NameSpace) – the command line arguments to be logged
- xmodaler.engine.default_writers(output_dir: str, max_iter: Optional[int] = None)[source]¶
Build a list of
EventWriter
to be used. It now consists of aCommonMetricPrinter
,TensorboardXWriter
andJSONWriter
.- Parameters:
output_dir – directory to store JSON metrics and tensorboard events
max_iter – the total number of iterations
- Returns:
a list of
EventWriter
objects.- Return type:
list[EventWriter]
- class xmodaler.engine.DefaultTrainer(cfg)[source]¶
Bases:
TrainerBase
A trainer with default training logic. It does the following:
Create a
DefaultTrainer
using model, optimizer, dataloader defined by the given config. Create a LR scheduler defined by the config.Load the last checkpoint or cfg.MODEL.WEIGHTS, if exists, when resume_or_load is called.
Register a few common hooks defined by the config.
It is created to simplify the standard model training workflow and reduce code boilerplate for users who only need the standard training workflow, with standard features. It means this class makes many assumptions about your training logic that may easily become invalid in a new research. In fact, any assumptions beyond those made in the
DefaultTrainer
are too much for research.The code of this class has been annotated about restrictive assumptions it makes. When they do not work for you, you’re encouraged to:
Overwrite methods of this class, OR:
Use
DefaultTrainer
, which only does minimal SGD training and nothing else. You can then add your own hooks if needed. OR:Write your own training loop similar to train_net.py.
See the Training tutorials for more details.
Note that the behavior of this class, like other functions/classes in this file, is not stable, since it is meant to represent the “common default behavior”. It is only guaranteed to work well with the standard models and training workflow in X-modaler. To obtain more stable behavior, write your own training logic with other public APIs.
Examples:
trainer = DefaultTrainer(cfg) trainer.resume_or_load() # load last checkpoint or MODEL.WEIGHTS trainer.train()
- scheduler¶
- checkpointer¶
- Type:
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')[source]¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- before_step()¶
- before_train()¶
- class xmodaler.engine.HookBase[source]¶
Bases:
object
Base class for hooks that can be registered with
TrainerBase
.Each hook can implement 4 methods. The way they are called is demonstrated in the following snippet:
hook.before_train() for iter in range(start_iter, max_iter): hook.before_step() trainer.run_step() hook.after_step() iter += 1 hook.after_train()
Notes
In the hook method, users can access
self.trainer
to access more properties about the context (e.g., model, current iteration, or config if usingDefaultTrainer
).A hook that does something in
before_step()
can often be implemented equivalently inafter_step()
. If the hook takes non-trivial time, it is strongly recommended to implement the hook inafter_step()
instead ofbefore_step()
. The convention is thatbefore_step()
should only take negligible time.Following this convention will allow hooks that do care about the difference between
before_step()
andafter_step()
(e.g., timer) to function properly.
- state_dict()[source]¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.TrainerBase[source]¶
Bases:
object
Base class for iterative trainer with hooks.
The only assumption we made here is: the training runs in a loop. A subclass can implement what the loop is. We made no assumptions about the existence of dataloader, optimizer, model, etc.
- iter¶
the current iteration.
- Type:
int
- start_iter¶
The iteration to start with. By convention the minimum possible value is 0.
- Type:
int
- max_iter¶
The iteration to end training.
- Type:
int
- storage¶
An EventStorage that’s opened during the course of training.
- Type:
- class xmodaler.engine.CallbackHook(*, before_train=None, after_train=None, before_step=None, after_step=None)[source]¶
Bases:
HookBase
Create a hook using callback functions provided by the user.
- __init__(*, before_train=None, after_train=None, before_step=None, after_step=None)[source]¶
Each argument is a function that takes one argument: the trainer.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.IterationTimer(warmup_iter=3)[source]¶
Bases:
HookBase
Track the time spent for each iteration (each run_step call in the trainer). Print a summary in the end of training.
This hook uses the time between the call to its
before_step()
andafter_step()
methods. Under the convention thatbefore_step()
of all hooks should only take negligible amount of time, theIterationTimer
hook should be placed at the beginning of the list of hooks to obtain accurate timing.- __init__(warmup_iter=3)[source]¶
- Parameters:
warmup_iter (int) – the number of iterations at the beginning to exclude from timing.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.PeriodicWriter(writers, period=20)[source]¶
Bases:
HookBase
Write events to EventStorage (by calling
writer.write()
) periodically.It is executed every
period
iterations and after the last iteration. Note thatperiod
does not affect how data is smoothed by each writer.- __init__(writers, period=20)[source]¶
- Parameters:
writers (list[EventWriter]) – a list of EventWriter objects
period (int) –
- before_step()¶
Called before each iteration.
- before_train()¶
Called before the first iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.PeriodicCheckpointer(checkpointer: Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model')[source]¶
Bases:
PeriodicEpochCheckpointer
,HookBase
Same as
detectron2.checkpoint.PeriodicCheckpointer
, but as a hook.Note that when used as a hook, it is unable to save additional data other than what’s defined by the given checkpointer.
It is executed every
period
iterations and after the last iteration.- __init__(checkpointer: Checkpointer, period: int, max_iter: Optional[int] = None, max_to_keep: Optional[int] = None, file_prefix: str = 'model') None [source]¶
- Parameters:
checkpointer – the checkpointer object used to save checkpoints.
period (int) – the period to save checkpoint.
max_iter (int) – maximum number of iterations. When it is reached, a checkpoint named “{file_prefix}_final” will be saved.
max_to_keep (int) – maximum number of most current checkpoints to keep, previous checkpoints will be deleted
file_prefix (str) – the prefix of checkpoint’s filename
- after_train()¶
Called after the last iteration.
- before_step()¶
Called before each iteration.
- save(name: str, **kwargs: Any) None [source]¶
Same argument as
Checkpointer.save()
. Use this method to manually save checkpoints outside the schedule.- Parameters:
name (str) – file name.
kwargs (Any) – extra data to save, same as in
Checkpointer.save()
.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- step(iteration: int, epoch: int, **kwargs: Any) None ¶
Perform the appropriate action at the given iteration.
- Parameters:
iteration (int) – the current iteration, ranged in [0, max_iter-1].
kwargs (Any) – extra data to save, same as in
Checkpointer.save()
.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.LRScheduler(optimizer=None, scheduler=None)[source]¶
Bases:
HookBase
A hook which executes a torch builtin LR scheduler and summarizes the LR. It is executed after every iteration.
- __init__(optimizer=None, scheduler=None)[source]¶
- Parameters:
optimizer (torch.optim.Optimizer) –
scheduler (torch.optim.LRScheduler) –
If any argument is not given, will try to obtain it from the trainer.
- after_train()¶
Called after the last iteration.
- before_step()¶
Called before each iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.AutogradProfiler(enable_predicate, output_dir, *, use_cuda=True)[source]¶
Bases:
HookBase
A hook which runs torch.autograd.profiler.profile.
Examples:
hooks.AutogradProfiler( lambda trainer: trainer.iter > 10 and trainer.iter < 20, self.cfg.OUTPUT_DIR )
The above example will run the profiler for iteration 10~20 and dump results to
OUTPUT_DIR
. We did not profile the first few iterations because they are typically slower than the rest. The result files can be loaded in thechrome://tracing
page in chrome browser.Note
When used together with NCCL on older version of GPUs, autograd profiler may cause deadlock because it unnecessarily allocates memory on every device it sees. The memory management calls, if interleaved with NCCL calls, lead to deadlock on GPUs that do not support
cudaLaunchCooperativeKernelMultiDevice
.- __init__(enable_predicate, output_dir, *, use_cuda=True)[source]¶
- Parameters:
enable_predicate (callable[trainer -> bool]) – a function which takes a trainer, and returns whether to enable the profiler. It will be called once every step, and can be used to select which steps to profile.
output_dir (str) – the output directory to dump tracing files.
use_cuda (bool) – same as in torch.autograd.profiler.profile.
- after_train()¶
Called after the last iteration.
- before_train()¶
Called before the first iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.EvalHook(eval_period, eval_start, eval_function, iters_per_epoch, stage, multi_gpu_eval)[source]¶
Bases:
HookBase
Run an evaluation function periodically, and at the end of training.
It is executed every
eval_period
iterations and after the last iteration.- __init__(eval_period, eval_start, eval_function, iters_per_epoch, stage, multi_gpu_eval)[source]¶
- Parameters:
eval_period (int) – the period to run eval_function. Set to 0 to not evaluate periodically (but still after the last iteration).
eval_function (callable) – a function which takes no arguments, and returns a nested dict of evaluation metrics.
Note
This hook must be enabled in all or none workers. If you would like only certain workers to perform evaluation, give other workers a no-op function (eval_function=lambda: None).
- before_step()¶
Called before each iteration.
- before_train()¶
Called before the first iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.PreciseBN(period, model, data_loader, num_iter)[source]¶
Bases:
HookBase
The standard implementation of BatchNorm uses EMA in inference, which is sometimes suboptimal. This class computes the true average of statistics rather than the moving average, and put true averages to every BN layer in the given model.
It is executed every
period
iterations and after the last iteration.- __init__(period, model, data_loader, num_iter)[source]¶
- Parameters:
period (int) – the period this hook is run, or 0 to not run during training. The hook will always run in the end of training.
model (nn.Module) – a module whose all BN layers in training mode will be updated by precise BN. Note that user is responsible for ensuring the BN layers to be updated are in training mode when this hook is triggered.
data_loader (iterable) – it will produce data to be run by model(data).
num_iter (int) – number of iterations used to compute the precise statistics.
- after_train()¶
Called after the last iteration.
- before_step()¶
Called before each iteration.
- before_train()¶
Called before the first iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.ModelWeightsManipulating[source]¶
Bases:
HookBase
Init or bind weights after loading a model
- after_step()¶
Called after each iteration.
- after_train()¶
Called after the last iteration.
- before_step()¶
Called before each iteration.
- state_dict()¶
Hooks are stateless by default, but can be made checkpointable by implementing state_dict and load_state_dict.
- trainer: TrainerBase = None¶
A weak reference to the trainer object. Set by the trainer when the hook is registered.
- class xmodaler.engine.RetrievalTrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- run_step()¶
Implement the standard training logic described above.
- state_dict()¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.RLTrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- state_dict()¶
- classmethod test(cfg, model, test_data_loader, evaluator, epoch)¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.RLBeamTrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- state_dict()¶
- classmethod test(cfg, model, test_data_loader, evaluator, epoch)¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.SingleStreamRetrievalTrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- run_step()¶
Implement the standard training logic described above.
- state_dict()¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.SingleStreamRetrievalTrainerHardNegatives(cfg)[source]¶
Bases:
SingleStreamRetrievalTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- state_dict()¶
- classmethod test(cfg, model, test_data_loader, evaluator, epoch)¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.TDENPretrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- classmethod build_train_loader(cfg)¶
- classmethod build_val_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- state_dict()¶
- classmethod test(cfg, model, test_data_loader, evaluator, epoch)¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above
- class xmodaler.engine.VCRTrainer(cfg)[source]¶
Bases:
DefaultTrainer
- _write_metrics(loss_dict: Dict[str, Tensor], data_time: float, prefix: str = '')¶
- Parameters:
loss_dict (dict) – dict of scalar losses
data_time (float) – time taken by the dataloader iteration
- after_step()¶
- after_train()¶
- static auto_scale_workers(cfg, num_workers: int)¶
- before_step()¶
- before_train()¶
- build_hooks()¶
- classmethod build_losses(cfg)¶
- classmethod build_lr_scheduler(cfg, optimizer, iters_per_epoch)¶
- classmethod build_model(cfg)¶
- classmethod build_optimizer(cfg, model)¶
- classmethod build_test_loader(cfg)¶
- build_writers()¶
- load_state_dict(state_dict)¶
- register_hooks(hooks: List[Optional[HookBase]]) None ¶
Register hooks to the trainer. The hooks are executed in the order they are registered.
- Parameters:
hooks (list[Optional[HookBase]]) – list of hooks
- resume_or_load(resume=True)¶
- state_dict()¶
- train()¶
- Parameters:
start_iter (int) – See docs above
max_iter (int) – See docs above