Optimization Framework

1. Motivation

In population balance modeling, the kernel parameters (parameters within agglomeration and breakage models) determine whether the PBE accurately reflects the physical environment and material properties.

  • Many kernel parameters are not standard physical quantities, and some lack direct physical meaning.

  • Obtaining precise kernel parameters through experiments or theoretical derivations is often extremely difficult.

A common approach is therefore to use experimental PSD (Particle Size Distribution) data and determine the kernel parameters via inverse modeling / optimization.

In this framework, the target data (data_exp) is usually the PSD measured over time in experiments, while the model results (data_mod) are obtained from solving the dPBE.


2. Basic Workflow

  1. Provide experimental data (data_exp)

    • Typically PSD at multiple time points.

    • Can be density distributions \(q\) or cumulative distributions \(Q\).

    • The coordinate system (particle volume grid) must be consistent.

  2. Initialize optimizer

    • Random kernel parameters (within predefined ranges) are fed into the dPBE solver.

    • The solver produces simulated PSD (data_mod).

    • ⚠️ Recommendation:

      • Either construct the dPBE grid identical to the experimental grid, or

      • Interpolate experimental data onto the solver grid in preprocessing.

    • Otherwise, set smoothing = True in config data. This applies KDE (Kernel Density Estimation) to map data_mod onto the experimental grid.

      • Note: KDE may underestimate peaks when the grid is very dense, so direct alignment is preferred.

  3. Compute cost function

    • Defined as the “error” between data_mod and data_exp.

    • Configurable in config data. Example:

      'delta_flag': [('qx', 'MSE')]
      

      means: convert both datasets to density distribution \(q_x\), then compute Mean Squared Error (MSE).

  4. Update kernel parameters

    • After one iteration, the error $delta$ is recorded.

    • The optimizer updates its predictive model and proposes a new candidate parameter set.

  5. Repeat iterations

    • Steps 3–4 are repeated until the maximum number of iterations is reached.

    • The optimizer selects the parameter set with the smallest error as the optimal kernel parameters.


3. Ray Tune Framework

This project uses Ray Tune to accelerate optimization with distributed parallelization.

  • Each dPBE simulation runs in parallel threads.

  • Results are exchanged asynchronously to guide subsequent iterations.

  • ⚖️ Trade-off: synchronization is not real-time.

    • Serial execution with 100 iterations > 2-thread parallel 100 iterations > 4-thread parallel 100 iterations.

    • Recommended concurrency: 2–6 threads (controlled by max_concurrent).

Actors in Ray Tune

Ray Tune introduces the concept of Actors, which wrap optimization experiments as persistent class instances.

  • In this framework, these are OptCoreRay and OptCoreRayMulti.

  • Each actor encapsulates its own solver instance (e.g., dPBE), which can be reused across iterations to reduce initialization overhead.

Caveats with Actors

  1. Resource leakage

    • Long-term repeated runs of dPBE may cause cumulative resource leaks.

    • To prevent this, actors periodically destroy and restart their dPBE instances.

    • Configurable in opt_params['max_reuse'] (number of runs before restart).

  2. Too fast iterations

    • If a single iteration is faster than 1 second, inter-thread communication may lag, causing bottlenecks.

    • A wait mechanism is added: if runtime < wait_time, the process pauses until wait_time is reached.

    • Configurable in opt_params['wait_time'].


4. Usage Recommendations

  1. Iteration count

    • Depends on data quality, number of kernel parameters, parameter ranges, and degree of parallelization.

    • Hard to predict in advance → recommended: set as high as possible within resource limits, then monitor convergence.

  2. Resume from checkpoint

    • Optimization can be resumed from saved states, enabling long or interrupted runs.

  3. Parameter search ranges

    • Choosing good search ranges is nontrivial.

    • Suggested workflow:

      • Start with a small number of iterations for testing.

      • Guided algorithms quickly narrow parameter space.

      • If a parameter consistently converges to the boundary, expand or shift the search range accordingly.