Base Class Library Architecture¶
This library provides the foundational components for physics data analysis, organized into logical domains for easy discovery and maintainability.
Directory Structure¶
📊 Data Formats (data_formats/
)¶
Tools for converting and manipulating different data formats commonly used in physics analysis.
awkward/
- Awkward array operations, conversions, padding, and structure inspectionnumpy/
- NumPy array conversions and utilitiesroot/
- Comprehensive ROOT file I/O, tree manipulation, friend trees, and data chaining
Use when: You need to read ROOT files, convert between array formats, or manipulate data structures.
⚙️ Configuration (config/
)¶
Configuration management system with parsing, validation, and patching capabilities.
- Configuration loading from files
- Dynamic configuration patching
- Type-safe configuration protocols
- Extensible parser system
Use when: You need to manage analysis configuration, parse YAML/JSON config files, or implement configurable components.
🖥️ Distributed Computing (dask/
)¶
Utilities for distributed and parallel computing using Dask.
- Delayed computation wrappers
- Distributed histogram operations
- Awkward array integration with Dask
Use when: You need to scale computations across multiple cores or machines.
💾 Storage (storage/
)¶
Abstractions for working with different storage systems.
eos.py
- EOS (CERN file system) and XRootD integration- Local and remote file system abstractions
- Path utilities and file operations
Use when: You need to access files on EOS, handle remote storage, or work with distributed file systems.
🔢 Mathematics (math/
)¶
Mathematical utilities and algorithms.
- JIT compilation support with Numba
- Statistical functions
- Random number utilities
- Partitioning and balancing algorithms
Use when: You need mathematical operations, statistical computations, or performance-critical numerical code.
📈 Histograms (hist/
)¶
Histogram creation, manipulation, and operations.
- Histogram building and merging
- Template-based histogram operations
- Integration with other data formats
Use when: You need to create histograms, bin data, or perform histogram-based analysis.
⚛️ Physics (physics/
)¶
Physics-specific computations and domain logic.
- Physics object definitions
- Kinematic calculations
- Analysis-specific algorithms
Use when: You need physics calculations, particle object manipulations, or domain-specific physics operations.
📊 Plotting (plotting/
)¶
Visualization and plotting utilities.
- Plot generation and styling
- Integration with matplotlib/other backends
- Analysis-specific plotting helpers
Use when: You need to create plots, visualize data, or generate figures for analysis.
🔍 Event Selection (skimmer/
)¶
Event selection and filtering pipeline.
- Event filtering logic
- Selection criteria management
- Pipeline orchestration
Use when: You need to skim events, apply selection cuts, or build analysis pipelines.
🛠️ Scripts (scripts/
)¶
Command-line utilities and automation tools.
fix_eos.py
- Utility to repair corrupted EOS filescommon.sh
- Common shell utilities- Other automation and maintenance scripts
Use when: You need command-line tools, automation scripts, or utilities to fix common problems.
🧰 Utilities (utils/
)¶
Generic, reusable utilities that don't fit into specific domains.
- Argument parsing helpers
- JSON encoding/decoding
- String manipulation
- Retry mechanisms
- Generic wrapper functions
Use when: You need generic helper functions, argument parsing, or common utility operations.
🧪 Tests (tests/
)¶
Test suite mirroring the package structure.
- Unit tests for all modules
- Integration tests
- Test utilities and fixtures
Design Principles¶
1. Domain-Driven Organization¶
Each directory represents a clear conceptual domain that users can easily understand and discover.
2. Dependency Direction¶
Dependencies generally flow "upward" in this order:
utils → (storage, data_formats, math) → (hist, physics, dask) → (plotting, skimmer) → scripts
3. User-Centric Naming¶
Directory names reflect what users are trying to accomplish, not technical implementation details.
4. Independent Base Components¶
Components in this library should be independent of higher-level analysis code and reusable across different physics analyses.
Migration from Previous Structure¶
Import Path Changes¶
The following import paths have changed for better organization:
Old Import Path | New Import Path | Deprecation Shim |
---|---|---|
base_class.awkward |
base_class.data_formats.awkward |
✅ Available |
base_class.numpy |
base_class.data_formats.numpy |
✅ Available |
base_class.root |
base_class.data_formats.root |
✅ Available |
base_class.system |
base_class.storage |
✅ Available |
base_class.plots |
base_class.plotting |
✅ Available |
Backward Compatibility¶
Deprecation shims are provided for all moved modules. Old import paths will continue to work but will issue DeprecationWarning
s. Update your imports to the new paths to remove warnings.
Common Usage Patterns¶
Reading ROOT Files¶
from src.data_formats.root import TreeReader, Friend
from src.storage import EOS
# Read from EOS
reader = TreeReader(EOS("root://path/to/file.root"))
data = reader.read()
# Work with friend trees
friend = Friend(reader, "friend_name")
Array Format Conversion¶
from src.data_formats.awkward import to_numpy
from src.data_formats.numpy import from_
# Convert awkward to numpy
np_data = to_numpy(ak_array)
# Convert from base64 numpy
array = from_.base64(encoded_data)
Configuration Management¶
from src.config import ConfigManager, config
# Load and manage configuration
config_mgr = ConfigManager("config.yaml")
my_setting = config("analysis.cuts.pt_min")
Distributed Computing¶
from src.dask import delayed
from src.hist import hist
# Delayed histogram computation
@delayed
def compute_histogram(data):
return hist(data, bins=50)
Physics Corrections & Analysis Data (data/
)¶
Stores physics corrections, calibration files, and other analysis resources (previously in python/data/
, now in python/base_class/data/
).
Btag/
- b-tagging scale factors and calibration filesJEC/
- jet energy corrections and related resourcesMuon/
- muon corrections and scale factorsPU/
- pileup weights and correctionsgoldenJSON/
- certified JSON files for luminosity sectionspuId/
- pileup identification files- ...other physics analysis resources
Use when: You need physics corrections, calibration files, or certified JSONs for analysis workflows.
Note: All references to these files should now use the path
base_class/data/
instead of the legacydata/
location.
Contributing¶
When adding new functionality:
- Choose the right domain - Place code in the directory that matches its primary purpose
- Keep dependencies minimal - Avoid circular dependencies between domains
- Document the purpose - Update this README when adding new directories or major functionality
- Add tests - Mirror the structure in the
tests/
directory - Consider users - Think about discoverability and intuitive organization
Future Evolution¶
This structure is designed to grow with the analysis needs while maintaining clear separation of concerns. New domains can be added as needed, and existing domains can be subdivided if they become too large or complex.