Software Environment & Container Setup¶
This repository provides a comprehensive containerized software environment for high-energy physics analysis workflows. The setup ensures reproducible, portable execution across different computing environments including local machines, CERN computing facilities (LPC, lxplus), and distributed computing systems.
Overview¶
The repository uses a layered approach to software management:
- Containerization: Apptainer/Docker containers for isolated execution environments
- Package Management: Multiple options including Pixi, Conda, and pip for dependency management
- Workflow Orchestration: Integrated support for Snakemake workflows
- Multi-Environment Support: Automated configuration for different computing clusters
Quick Start¶
Most analysis workflows are executed using the run_container script:
# Interactive shell in analysis container
./run_container
# Run specific commands
./run_container python analysis_script.py
# Use combine container for statistical analysis
./run_container combine
# Execute Snakemake workflows
./run_container snakemake --snakefile my_workflow.snakefile
Container Images¶
The repository supports multiple specialized containers (also available in cvmfs):
Analysis Container (Default)¶
- Image:
gitlab-registry.cern.ch/cms-cmu/barista:latest - Purpose: Primary analysis environment with Coffea, Awkward Array, and HEP tools
- Usage: Default container for most physics analysis tasks
Combine Container¶
- Image:
gitlab-registry.cern.ch/cms-analysis/general/combine-container:CMSSW_11_3_4-combine_v9.1.0-harvester_v2.1.0 - Purpose: Statistical analysis using the CMS Combine tool
- Usage: Limit setting, significance calculations, and statistical inference
Environment Management¶
Pixi Environment (software/pixi/)¶
Modern package manager with cross-platform support:
- Configuration:
requirements.txtwith minimal Python and Snakemake dependencies - Features: Fast dependency resolution, reproducible environments
- Installation: Automatically handled by
run_containerscript
Conda Environment (software/conda/)¶
Traditional conda environment with comprehensive package list:
- Configuration:
environment.ymlwith 400+ scientific computing packages - Purpose: Legacy support and detailed dependency specification
- Includes: Machine learning libraries, data analysis tools, visualization packages
This environment is NOT recommended, but is maintained for cases where singularity is not available.
Docker Images (software/dockerfiles/)¶
Container definitions for various use cases:
Dockerfile_analysis: Primary analysis containerDockerfile_analysis_reana: REANA workflow executionml/: Machine learning specific containers
Computing Environment Support¶
The run_container script automatically detects and configures for different computing environments:
CERN LPC (cmslpc)¶
- Storage Binding:
/uscmst1b_scratch,/uscms_data/ - Pixi Location:
/uscms_data/d3/${USER}/.pixi(persistent storage) - Special Features: HTCondor job support with
.shellscript
CERN lxplus¶
- Storage Binding:
/afs,/eos,/cvmfs - Grid Security: Automatic mounting of grid certificates
- CVMFS Access: Full access to CERN software repositories
Local/Custom Environments¶
- Flexible Binding: Configurable mount points
- CVMFS Support: Optional access to distributed software
Key Features¶
Automatic Image Resolution¶
- CVMFS Integration: Uses unpacked images from
/cvmfs/unpacked.cern.chwhen available - Fallback: Downloads from registry if CVMFS unavailable
- Performance: Faster startup with pre-cached images
Workflow Integration¶
- Snakemake Support: Built-in workflow execution with proper container integration
- Job Queue: LPC-specific job submission with
lpcjobqueue - Batch Processing: Support for distributed computing systems
Usage Examples¶
Basic Analysis¶
# Start interactive session
./run_container
# Run analysis processor
./run_container python -m coffea4bees.analysis.processors.my_processor
### Statistical Analysis
```bash
# Combine tools
./run_container combine combine -M AsymptoticLimits datacard.txt
# Interactive combine session
./run_container combine
Workflow Execution¶
# Run complete analysis pipeline
./run_container snakemake --snakefile workflows/analysis.smk --cores 8
# Test workflow
./run_container snakemake --snakefile workflows/test.smk --dry-run