Minutes of #7 Remote E-Beam Meeting (draft)

The meeting was held on Vidyo on 09/09/2020 - See indico

Participants

Hannes Bartosik, Nicolo Biancacci, Christian Carli, Roberto Corsini, Davide Gamba, Verena Kain, Andrea Latina, Nico Madysa, Richard Scrivens

Status of Reinforcement Learning Studies for E-Cooler Operation (Nico Madysa)

The typical LEIR accumulation cycle is described. Seven consequent injections are dragged and cooled to a specific momentum. The Schottky spectra is the typical observable used to see the effect of and to optimize the cooling process, transverse beam profiles can also be measured. A few cooling parameters can be changed, typically the electron gun voltages and the ion/electron orbit steering.

The goal of the study is to adjust the cooling parameters so that beam after cooling follows some desired distribution. The study, at the moment only at simulation stage, is divided in sub-goals of increased complexity:

optimize for minimal r.m.s. momentum spread of the cooled beam
optimize for given mean momentum and minimal r.m.s. momentum spread
optimize other variables than final mean momentum and momentum spread
match an arbitrary distribution
use the Schottky spectra as the only observable

The first two steps are the ones presented here.

At the moment the study relies on several hundred datasets of RF-Track simulated data: a few thousand macro particles (each representing 10^5 ions) is tracked through the cooler which is modeled as a mesh of electrons of given characteristics. Each data set corresponds to different initial condition both of the cooler setting and of the ion beam distribution. The whole particle distribution before and after cooling and cooler settings are used for the study.

Despite the large amount of simulated cases, they were found to be still to little for testing the reinforcement learning technique. RF-Track is found still too slow to produce all the desired cases. A possible workaround was to train an Artificial Neural Network (ANN) as a surrogate of the RF-Track engine, allowing to produce several thousand cases in seconds.

The first check is to successfully verify that the ANN could replicate the results of TF-Track on the same range of input parameters. The ANN can be used also to probe parameter regions outside the one explored by RF-Track. The overall behavior of the results seems to match trend of the region explored by RF-Track, at least for higher start ion momentum and electron velocity. The correctness of this behavior could be probed, if necessary, with a few additional RF-Track simulations. In practice, it was found that the area explored by RF-Track data is largely covering the typical operational scenarios one could practically have in LEIR. One important observation is that the ANN is not be able to reproduce very low final momentum spread when compared to the RF-Track data. This could be explained by numerical approximations within the neural network. However, this should not be a treat for the outcome of the study, as those are isolated cases which might be unphysical already in the RF-Track-generated data.

The reinforcement learning is described: one "Environment" represents the actual accelerator; one "Agent" sends cooler settings to the Environment, and this gives back to the Agent the cooling results and a reward: some number which the Agent will try to maximize. Details of the library used are also given:

gym is a Python package which defines interfaces, i.e. it defines what an "Environment" is, and so it allows to write a class that models, for example, LEAR, as well as a class to act on it. This could be, for example, where one would write the interaction with the accelerator with pyjapc or to a simulation code as RF-Track.
TD3 is the algorithm that was chosen for the Agent, and it allows to define the complexity of the neural network underneath, in this case with 2 layers and 64 neuron per layer.
An episode is a series of action steps performed by the Agent starting from an initial (typically random) initial condition. The Episode is a success if the Environment has been optimized to reach the desired goal within the number of steps allowed per episode (here 10), is a failure otherwise.

Using the TD3 algorithm it is found that after a few training episode, the Agent is able to find an optimal solution if a few steps.

The interaction with Environment, eventually being the actual accelerator, might still be too slow or too "expensive" for being practical. To overcome this limitation the Dyna architecture was also tested. This basically foresee an additional layer, called Surrogate, between the Environment and the Agent. This is basically another Neural Network which replaces the Environment whenever possible, i.e. limiting the query of the actual machine. An additional Score is introduced in the Dyna architecture which is used as criteria for deciding if the next action has to be implemented in the Environment or in the Surrogate.

The Surrogate is crosschecked agains the Environment with respect to the predicted final momentum parameters and reward. All final beam properties seems to be well predicted by the Surrogate, while, as expected, the initial beam properties are not as those cannot be known by the Surrogate.

Using Dyna architecture together with TD3 algorithms on obtains comparable results than using TD3 alone, with a slight decrease (order of 10%) in the number of calls needed to the actual environment. This has still to be optimized, and the hope is to further reduce the number of necessary environment interactions.

The next steps will be improve the efficiency of the Dyna+TD3 scheme, and then to start looking at the other beam properties, e.g. transverse profiles.

Discussion

Nicolo suggested to use the beam sigmas from RF-Track, and not the single particle. Nico replied that this is the ultimate goal. Verena commented that using individual particle tracking was a cheap way to multiply the number of training data.

Andrea asked about the neural structure of the Surrogate, and how it was chosen. Nico replied that presently it uses 3 hidden layers of 128 neurons each. The choice was empirical, optimizing for overall performance.

Davide suggested to add some "noise", i.e. to simulate not so perfect acquisition of cooler parameters and beam properties. Verena commented that this is indeed possible, but it also opens questions on how to generate this noise in a meaningful way. This will have to be looked at.