In the coming years, the LHC experiments will need to produce an incredible amount of Monte Carlo events to match the expected increase in luminosity. This challenge, together with the need of using more sophisticated generators, will stretch the computing resources and may limit the physics reach of the experiments.
In our paper, we provide a solution to overcome these problems by using Machine Learning techniques, specifically Generative Adversarial Networks (GANs). In this first attempt, we focussed on the relatively simple di-jet events but we also prepared the tools to produce more complex events such as top quark pair events and multi boson events which are produced in large quantities at LHC.
We actually trained two networks, one using the output of the generators (which is commonly called particle level) and another after the detector simulation and reconstructions (also known as reconstruction or detector level). The good agreement of first network with the training sample demonstrates that it is possible to use a relatively small number of events to train a GAN which can be subsequently be used to significantly increase the number of events as we are able to to generate 1 million events in less than a minute. The good agreement of second network demonstrate that the detector response and even the reconstruction steps can be reproduced by a GAN.
The results we obtained are a clear indication that GANs can have a high impact on LHC experiment in several areas. You can find more details on the paper but we would like to share a video showing the learning of the GAN as it is trained which cannot be included in the paper.
Let us know what you think of this new application of the GAN in the comments below or by contacting the authors of the paper.
Written by Sitong An, Artem Golovatiuk, Nathan Simpson, and Hevjin Yarar. Edited by Sitong An.
A small squad of INSIGHTS ESRs (Sitong An@ CERN, Artem Golovatiuk @ Università di Napoli, Nathan Simpson @ University of Lund, and Hevjin Yarar @ INFN Padova) visited DESY for the 1st Terascale School of Machine Learning from 22 to 26 October 2018. This is our long overdue account of the school and the competition event that followed (spoiler alert: we won!).
P.S. Nathan, our newly-anointed team vlogger, has made a wonderful video about the event. Check it out at [here]!
A bird’s-eye view of DESY and the Machine Learning School
DESY (Deutsches Elektronen-Synchrotron) is a national centre for Particle Physics, Accelerator Physics and Photon Science in the suburb of Hamburg, Germany. It used to host important Particle Physics facilities like HERA, which was a lepton-proton collider aimed to probe internal structure of protons and properties of quarks (“is there anything smaller hidden inside the quarks?”). Nowadays, the focus of on-site facilities has gradually shifted towards Photon and Accelerator Science, with sizeable groups of researchers working on data from ATLAS and CMS at CERN. DESY is one of the research partners of INSIGHTS, with Dr. Olaf Behnke from DESY as a member of the network.
The 1st Terascale School of Machine Learning covered an introduction to Deep Learning and hands-on tutorials on the usual tools of the trade: PyTorch, Tensorflow and Keras. It also went beyond the basics to include several talks from experts in the fields on advanced topics like GANs (Generative Adversarial Networks) and semi-supervised/unsupervised learning.
Highlight of the Expert Talks
When using machine learning methods in high energy physics (HEP), the usual paradigm is to train on simulated data, while validation and testing are done on real data collected by the detector. In reality, we are unable to perfectly model real data, so there will always be discrepancies between our simulation and the real world. One of the talks was given by Benjamin Nachman on ‘Machine Learning with Less or no Simulation Dependence’, who is tackling this problem with weakly supervised machine learning. Directly training on data is not possible since we do not have labels. However, in the case of two classes (such as q and g for quark vs gluon jets in data) that are well-defined, i.e. q in one mixed sample is statistically identical to q in other mixed samples, two methods were discussed: training using class proportions of mixed samples (ref ) and training directly on data using mixed samples (ref ). This talk was a great opportunity for us to learn about new, simulation-independent approaches in to search for new physics with Machine Learning.
On the last day of the school Gilles Louppe gave a talk on ‘Likelihood-free Inference’. When discriminating between a null hypothesis and an alternative, the likelihood ratio is the most powerful test statistic. In the likelihood-free setup, the ratio of approximate likelihoods is used, which is constructed by projecting the observables to a 1D summary statistics and running the simulations for different parameters of interest. Reducing the problem to 1D is not ideal since we then lose the correlations of the variables. One of the introduced ideas to address this was to do a supervised classification to estimate the likelihood ratio. In this way, one does not need to evaluate individual likelihoods and can use the estimated ratio for inference. For details, here is a link to check out.
The Machine Learning Challenge
As part of the school, a machine learning challenge was held to allow students to test out their newly-acquired skills with a problem and a data set from particle physics. Specifically, this involved the tagging of heavy resonances, i.e. being able to distinguish heavy objects like the top quark, W and Z boson, or the Higgs from light quark and gluon jets. These jets leave energy deposits in the calorimeters in the detector, which can then be mapped to images, which look a bit like this:
Using these images and the data from the detector, such as transverse momentum, pseudorapidity, and combinations of different variables, we were tasked with building a machine learning solution to classify jets as coming from a top quark or not. The challenge was organised by Gregor Kasieczka, who recently authored a nice summary paper on this very topic (machine learning for top tagging) – check it out at https://arxiv.org/pdf/1902.09914.pdf.
So what did we come up with, and how well did it perform?
Our INSIGHTS team had several major advantages comparing to the other participants. First of all, we were the team of 4 people working together, leading to many fruitful discussions. This also allowed us to try different approaches at the same time and to distribute parts of the task (data preprocessing, trying out different hyperparameters or architecture or the model, etc). What’s more, we had access to the GPU-machine in the University of Naples, which gave us a great boost in computational power and a possibility to play around with relatively large models.
The winning model was jokingly named as “A bit Tricky Beast”, because it was an epic Frankenstein’s monster composed of two Neural Networks trained separately, brought together by the third Neural Network. And there was a little trick in a way we trained the model. First network was a CNN (Convolutional Neural Network) taking jet images as an input. It was already pretty big with about 1.7 million parameters. The second network was an RNN (Recurrent Neural Network) taking the preprocessed constituents. We used particles 4-momenta together with physically motivated high-level features as invariant mass m2, transverse momentum pTand pseudo-rapidity . Finally, as a cherry on the top, we used several fully-connected layers to combine the outputs from CNN and RNN, and produce one number – probability of jet coming from the top quark.
The trick was in the way of handling the data. In order to mimic the effect of data-monte carlo disagreement, the data for scoring the solutions differed from the training data with some small fluctuations. However, the part of test data provided to us and the part organisers used for final scoring had the same fluctuations. Therefore, after a thorough training of our network on the provided training set, we trained it for a bit on the provided test data. This allowed our network to learn some features of the fluctuations applied to the test data and slightly boosted the performance.
After 9 hours of continuous coding, collaborating and drinking coffee, we produced several networks (with very slight differences among them) that took 6 first best scores on the challenge!
Overall this school was a wonderful and fruitful experience for us. The breadth of the introduction allowed us to learn about and compare different Deep Learning tools, and the talks on the advanced topics offered a glimpse into the kind of problems on the frontier of the field that the experts are working on. And – fairly obviously – we enjoyed thoroughly the hospitality of the school organisers, the tranquil campus of DESY and the city of Hamburg!
The modern scientific method has its origins in the 17th century and has been constantly developing throughout the centuries. And even though procedures may vary from one science to another a crucial part for all is the comparison of experimental data with theoretical predictions. To draw any conclusion and solve physical problems based on observation and theory one needs to develop a statistical model to connect the two. Therefore one of our supervisors professor Wouter Verkerke, who happens to be an expert on the matter, gave a 3-day workshop for both INSIGHTS and local PhD students at the Dutch National Institute for Subatomic Physics (Nikhef) in Amsterdam.
As a Dutchman it was a welcomed excuse to travel back to my country and also revisit the institute where I used to work before my PhD. Most of ESRs arrived on the evening before the workshop. Because it had been while since we last saw each other we used the opportunity to catch up and exchange stories on our first few months. And to give the other ESRs some taste of the Dutch culture we did so whilst enjoying some drinks and “stamppot met rookworst” in the center of Amsterdam. I thought it would ease them in instead of throwing them in the deep with the raw herring and onions tradition. Maybe next time!
The next day we started our workshop which had a clear cut structure and a good build up in complexity and detail. In the morning Wouter gave us lectures on the theory and in the afternoon we got to apply the concepts with a set of exercises in RooFit, one of the most used statistical modeling software packages at CERN and the brainchild of David Kirby and Wouter himself. Throughout the workshop we learned about basic concepts such as typical probability density models, p-values and Likelihood Ratios to more advanced topics such as incorporation of nuisance parameters, unfolding and Effective Lagrangian Morphing.
The workshop was closed with a talk from former Nikhef PhD student Max Baak currently working at KPMG as chief data scientist. Because many PhD students and post-docs continue in industry or business Max was invited to give a talk on his experience at KPMG. He told us how he applied his knowledge acquired in academics and used some of his recent business cases as examples. Good to see what some of the non-academic possibilities are!
Kudos to Wouter Verkerke for giving us such a complete and clear picture of statistical modeling in particle physics including hands-on experience in RooFit. It was a great workshop and hopefully we can come back soon for a follow-up!
My name is Pim Verschuuren and I am the ESR at Royal Holloway, University of London.
I am originally from the Netherlands where I acquired my bachelors and masters degree in Physics from Utrecht University. Utrecht is the fourth biggest city of The Netherlands and has a history that traces back to the Romans that laid the first foundations for the “Domstad”. However, nowadays Utrecht has become a modern and progressive city, culture and science play an increasingly important role and has a vibrant student life for both natives and internationals. But apart from housing all these aspects that I enjoyed whilst living there, it is also the place where I developed my inclination with particle physics.
During my studies I found my passion for particle physics as soon as during my bachelors. I therefore tried to submerge myself as much as possible with courses and research projects within this field and quickly came in contact with the organization that is the nexus of particle physics: CERN. The past few decades this combined effort of thousands of technicians, engineers and physicists from all over the world has proven to be very fruitful with the Higgs boson as the most recent crown jewel. I myself was lucky to contribute to both the ALICE and the ATLAS experiment where my biggest project entailed measurements of Higgs boson properties.
After multiple projects within particle physics at CERN I was convinced that a PhD in this field would be the right next step for me. But apart from the standard PhD characteristics like analysis of complex and abstract problems I was also looking for some additional specifics. More and more has machine learning become a part of varying parts of our society and the scientific community of CERN is no exception. I therefore sought a PhD that combines particle physics and the newest machine learning techniques to be part of this surge of innovation. And taking into account my love for traveling, working with a diverse group of scientists from all over the world and keeping the learning curve as steep as possible I came to the conclusion that a PhD in the INSIGHTS network was a perfect fit for me.
My main scientific subject will be on machine learning techniques in unfolding under the supervision of professor Glen Cowan from Royal Holloway, University of London. Just like with any scientific experiment the measuring devices in particle physics are never perfect. The measurements that should reflect nature perfectly actually give a convoluted picture specific to the measuring device. The whole game of unfolding is to take this convoluted picture and try to retrieve the true result that correctly reflects nature.
After the few events that we had with the network I feel even more excited and motivated to contribute to the INSIGHTS network. All of the other ESRs and supervisors clearly feel the same way and have already shown to be a great source of inspiration and creativity. With still a large part of the program ahead of me I look forward collaborating with all of them!
We finished our last post by observing that, as human beings, we are not that good at evaluating uncertainties and this can heavily affect the outcome of our decisions, both in our work and in our private life. It does not help the fact that the most appropriate mathematical concepts to quantify uncertainties are too often presented through arcane formulas that can hardly be understood outside trivial didactical examples (dice throws, coin flips, card draws, etc.), and they seem unsuitable to describe situations as complex as the real business phenomena.
The key idea to overcome these problems in business context, based on PangeaF experience, is to introduce the concept of subjective probability. That is, to quantify the probability of an event through the degree of belief that it would occur, based on the available information.
This latter concept is definitely a crucial point towards bringing probability in business applications, since it allows to define probabilities for events which have never been observed before (e.g. the launch of a new product, the expansion towards a new market, etc.) and to include different degrees of information into the evaluations. Such approach also gives, through Bayes’ rule, an easy way to update each evaluation in presence of new sources of info. To fix the idea you could ask two different persons to evaluate how probable is a doubling of the values of the shares of a company: typically, they would answer with a very small probability, because doubling the value is a macroscopic increase. However, if one of the two persons has some insider contact who reveals that the company is going to release a new revolutionary product, then this person would assign a higher probability to the hypothetical doubling (typically still small, but not as small as before). Neither of the interviewed would be wrong in their evaluations: it is just that, with different levels of knowledge about the event of interest, different quantifications follow. Moreover, subjective does not mean arbitrary: while subjects with different states of information can evaluate the probability of the same event differently, they must provide rational and factual assessments, by relying on probability rules to evaluate multiple related events playing a role in the same problem.
By using subjective probabilities and Bayesian networks to deal with complex connections among the measured quantities, it is possible:
to perform proper inference processes, unravelling the cause-effect relationship hidden in data in order to find the most probable reasons behind the observed events, even in the presence of complex scenarios and multiple competing causes;
to integrate the experts’ knowledge about a given problem, through appropriate relationship among elements in a descriptive model and suitable probability distributions associated to different situations;
to obtain true probabilities from the computations, and not some hard-to-interpret estimate, informing us of how much we have to weigh the occurrence of each event, given the information we received.
These aspects are crucial in all decision making processes and they allow the agents to make their best assessment, through exploitation of all available information (i.e. data). And they come with great flexibility, since they can be applied to a variety of statistical distributions and of business sectors.
It is important to stress that moving towards data-driven decisions does not mean to make such decisions automated or to remove from them the human factor. Algorithms shall mostly be exploited in what they are good for: to integrate consistently the available information, without biases interfering with the quantitative evaluation. Then, the results provided by such algorithms have to be combined, by human decision makers, with the external factors that can hardly be modeled into algorithms (no matter what some vendors claim): what is the risk level that a company can accept in the specific moment a decision has to be taken? what is the impact on stakeholders, in terms of long-term scenarios and company reputation? what are the ethical implication of one decision versus another one?
Data-driven decisions, at least as PangeaF sees them, shall be the moment to bring together the best that domain experts, data scientists and human decision makers can offer: experts can help spotting the key meaningful relations among measured quantities in a business process; data scientists can turn such relations and what historical data say into a coherent and effective model, trading off advanced solutions with actual performance achievements; human decision makers can take the results of the models and use them to take more effective choices, optimizing resources or focusing efforts on the important parts of the process.
In the next posts, we will present some of the exciting experiences PangeaF developed by building bridges between real world problems and advanced machine learning techniques.