Skip to content

Hierarchical Combinatoric ResNet (HCR)

This tutorial will work through a complete example of training baseline FvT and SvB classifier for HH4b analysis using skim datasets_HH4b_2024_v2 on rogue.

Setup environment

  • (optional) setup apptaine cache and temp directory
  • start a container and enter coffea4bees/python/ directory
  • (optional) setup the base path for workflow files
export WFS="classifier/config/workflows/HH4b_2024_v2"

See Overview for details.

FvT Training and Evaluation

  • set the following variables in the ${WFS}/FvT/run.sh script:
    • MODEL: the base path to store the FvT model
    • FvT: the base path to store the FvT friend trees
    • PLOT: the base path to store benchmark plots
    • all other variables are optional
  • run the following command:
source ${WFS}/FvT/run.sh

To understand the details of the whole workflow, check the comments in the following files by order:

  • ${WFS}/FvT/train.yml
  • ${WFS}/FvT/evaluate.yml
  • ${WFS}/common.yml
  • ${WFS}/FvT/run.sh

SvB Training and Evaluation

  • set the following variables in the ${WFS}/SvB/run.sh script:
    • MODEL: the base path to store the SvB models
    • SvB: the base path to store the SvB friend trees
    • FvT: the base path to the FvT friend trees (should be the same as the FvT training)
    • PLOT: the base path to store benchmark plots
    • all other variables are optional
  • run the following command:
source ${WFS}/SvB/run.sh

To understand the details of the whole workflow, check the comments in the following files by order (assuming you have already checked the FvT config files):

  • ${WFS}/SvB/train.yml
  • ${WFS}/SvB/evaluate.yml (basically the same as the FvT evaluation)
  • ${WFS}/SvB/run.sh

Plotting

  • make a local copy of config analysis_dask/config/userdata.cfg.yml and fill all required fields
  • make a local copy of config analysis_dask/config/classifier_plot_vars.cfg.yml and change the SvB and FvT friend tree paths in classifier_outputs<var> according to the evaluation scripts and modify the classifier_datasets<var> to match the datasets you want to plot.
  • run the following command:
python dask_run.py \
    analysis_dask/config/userdata.local.cfg.yml \
    analysis_dask/config/cluster.cfg.yml#rogue_local_huge \ 
    analysis_dask/config/classifier_plot_vars.local.cfg.yml#2024_v2 \
    analysis_dask/config/classifier_plot.cfg.yml#2024_v2
  • the output will be available as {output_dir}/classifier_plot_2024_v2_{timestamp}/hists/classifier_basic.coffea

See Histogram for details.

Tips on Performance

  • Training:
    • in main task train, consider increasing --max-trainers to parallel multiple models (CPU, GPU, memory bounded)
    • in -dataset HCR.*, consider increasing --max-workers (IO and CPU bounded, require extra memory)
    • in -setting ml.DataLoader
      • always set optimize_sliceable_dataset to True if the dataset fits in memory. This option enables a custom data loader that makes use of torch's c++ based parallel slicing, which is significantly faster and more memory efficient than the default torch.utils.data.DataLoader.
      • if optimize_sliceable_dataset is disabled, consider increasing num_workers to speed up batch generation (mainly CPU bounded, require extra memory)
      • consider increasing batch_eval to speed up evaluation (mainly GPU memory bounded)
    • in -setting torch.Training, consider using disable_benchmark to skip all benchmark steps.
  • Evaluation:
    • in main task evaluate, consider increasing --max-evaluators to parallel multiple models (CPU, GPU, memory bounded)
    • in -setting torch.DataLoader, consider increasing num_workers and batch_eval. (IO and CPU bounded, require extra memory)
  • Merging k-folds:
    • in -analysis kfold.Merge,
      • consider increasing --workers (IO and CPU bounded, require extra memory)
      • consider using a finite --step to split root files into smaller chunks.