Meet the ESRs: Sitong An

Hi there! I am Sitong An (安思同 in Chinese), Marie Skłodowska-Curie Fellow at CERN with project INSIGHTS and PhD student at Carnegie Mellon University (CMU). Originally from China, I left at the age of 16 and travelled the world for education. Currently, I’m working at CERN, Geneva, Switzerland, under the supervision of Dr. Sergei Gleyzer and Dr. Lorenzo Moneta. My Ph.D. advisor from CMU is Prof. Manfred Paulini. From September 2018, I will be working on Machine Learning/Deep Learning for Particle Physics for three years. I am immensely grateful to INSIGHTS and to my supervisors for giving me such a great opportunity to work in this exciting subfield.

A bit of background about me: I was born and raised in a small, nondescript city in northeastern China. As a kid, the thought of venturing overseas for education never crossed my mind. That was the case until 2009, when I was offered a scholarship (SM1) by the Singaporean government to attend high school in Singapore. It was a once-in-a-life opportunity, a rare window to the world outside, and yet it was also a daunting choice to go to a foreign country and learn to survive on my own. Eventually, this became the decision that changed the path of my life. I spent four intense and memorable years at Singapore, attending Catholic High School and Hwa Chong Institution. Till this day, I still feel a strong affinity for the dear “Little Red Dot”.

Singapore is an amazing city – you should visit if you’ve never been 🙂
Photo credit: Chensiyuan, Wikipedia

After my A-Levels there, I moved to U.K. for my undergraduate education at University of Cambridge, partially supported by scholarships from both the University and my college, Wolfson College Cambridge. I graduated in 2018 with a Bachelor of Arts and a Master of Natural Sciences (Physics). During my journey I was fortunate enough to have the opportunity to visit many places around the globe, including MIT for an exchange year abroad, and Weizmann Institute (Rehovot, Israel) and DESY (Hamburg, Germany) for internships. The coursework at Cambridge could feel gruesome and never-ending at times, but it was a privilege to wander about on the paths walked by Newton and Maxwell. Looking back, the three years I spent there were bittersweet, but still dream-like.

Call me biased – but for me Cambridge is the most beautiful university in the world
Photo credit: Sitong An, Commercial Rights Reserved

To work at CERN has been my dream and goal since high school. I remember the naive but passionate excitement I felt about the Higgs discovery while I was still a high school student. I remember seeing the advertisement on the CERN career website for the INSIGHTS position and thinking “this is exactly what I want to do!” I also remember attending the interview nervously, fully aware of the competitiveness of the position, and telling my future supervisors how much I care about making an impact in this field that I love, to the fullest of my abilities. And…voila, now I am here. As I sit in my office and type this blog post to tell you my story, I still can’t help but feel amazed at how these ten years passed by, and how that dream came true.

In the tunnel of Large Hadron Collider (LHC), CERN, Geneva, Switzerland
Photo credit: Andrés G. Delannoy

For these three years, I will devote roughly half of my time here to the development of deep learning algorithms for particle physics experiments. Specifically, currently I’m investigating the use of Graph Neural Network for event reconstruction at the new and upcoming High Granularity Calorimeter (HGCal) for the CMS Experiment. Reconstruction algorithms are an important step in the workflow of high energy physics experiments. They take raw data from the detectors and convert them into physical objects that physicists understand – like particles for example. Because of the sheer complexity of our detectors, deep learning holds promises in greatly enhancing the pattern recognition of our future reconstruction algorithms and empowering our detectors to make more precise measurements. This is, of course, a very brief and simplistic explanation, and I will describe this project in greater details in another technical blog post in the future.

An artist’s impression of the High Granularity Calorimeter, taken from the cover of the HGCal TDR (Technical Design Report).

The other half of my time will be spent on developing software tools in support of HEP-ML community – particle physicists who are developing and applying Machine Learning algorithms to their work. I am part of the ROOT team in the CERN EP-SFT group. ROOT is a data analysis framework widely used in the data workflow of high energy physics, and I will be contributing to ROOT-TMVA (Toolkit for Multivariate Data Analysis), the machine learning project within ROOT. My work will focus on modernisation of ROOT-TMVA, aiming to allow physicists develop and deploy machine learning models more easily with ROOT data. More details upcoming about this too.

Accelerating Science at CERN
Photo credit: Sitong An, , Commercial Rights Reserved

Apart from my technical work, I also care deeply about public engagement. High energy physics is a costly enterprise and what we’re doing would not be possible without public support. I am a CERN guide as well as a qualified guide to both CMS and ATLAS experiments. It is always an enjoyable experience to show visitors around and share our passion; to explain why we are doing this, why curiosity-driven fundamental research is important; and to see the awe-struck expressions of the visitors when they see the underground detectors for the first time. I also volunteer actively in CERN public activities, like CERN Opendays and TEDxCERN.

Volunteering for TEDxCERN, November 2018

If you’re interested in learning more about me, welcome to visit my website/blog by clicking here. It is still very simple and lacks much content at the moment, but I will furnish it with more details as my work progresses. You can also find ways to contact me there – feel free to reach out to me with questions or opportunities in Machine Learning.

If you’re a student or a teacher from a high school and interested in organising a virtual visit to CMS [more details], please do not hesitate to contact me for help too. (in Chinese) 如果你是来自中国或新加坡的初/高中老师或学生,并对组织远程虚拟访问活动来参观CERN地下实验感兴趣的话,我愿意帮忙协调组织和华语讲解 – 如有需要请联系我。关于远程虚拟参观,你可以点击这里了解更多(页面仅英文)

Looking forward to sharing more of my journey here – stay tuned!

INSIGHTS visits DESY: the first Terascale School of Machine Learning

Written by Sitong An, Artem Golovatiuk, Nathan Simpson, and Hevjin Yarar. Edited by Sitong An.


A small squad of INSIGHTS ESRs (Sitong An@ CERN, Artem Golovatiuk @ Università di Napoli, Nathan Simpson @ University of Lund, and Hevjin Yarar @ INFN Padova) visited DESY for the 1st Terascale School of Machine Learning from 22 to 26 October 2018. This is our long overdue account of the school and the competition event that followed (spoiler alert: we won!).

P.S. Nathan, our newly-anointed team vlogger, has made a wonderful video about the event. Check it out at [here]!

A bird’s-eye view of DESY and the Machine Learning School

A bird’s-eye view, literally

DESY (Deutsches Elektronen-Synchrotron) is a national centre for Particle Physics, Accelerator Physics and Photon Science in the suburb of Hamburg, Germany. It used to host important Particle Physics facilities like HERA, which was a lepton-proton collider aimed to probe internal structure of protons and properties of quarks (“is there anything smaller hidden inside the quarks?”). Nowadays, the focus of on-site facilities has gradually shifted towards Photon and Accelerator Science, with sizeable groups of researchers working on data from ATLAS and CMS at CERN. DESY is one of the research partners of INSIGHTS, with Dr. Olaf Behnke from DESY as a member of the network.

Main entrance to DESY. Photo credit: Sitong An

The 1st Terascale School of Machine Learning covered an introduction to Deep Learning and hands-on tutorials on the usual tools of the trade: PyTorch, Tensorflow and Keras. It also went beyond the basics to include several talks from experts in the fields on advanced topics like GANs (Generative Adversarial Networks) and semi-supervised/unsupervised learning.

Highlight of the Expert Talks

When using machine learning methods in high energy physics (HEP), the usual paradigm is to train on simulated data, while validation and testing are done on real data collected by the detector. In reality, we are unable to perfectly model real data, so there will always be discrepancies between our simulation and the real world. One of the talks was given by Benjamin Nachman on ‘Machine Learning with Less or no Simulation Dependence’, who is tackling this problem with weakly supervised machine learning. Directly training on data is not possible since we do not have labels. However, in the case of two classes (such as q and g for quark vs gluon jets in data) that are well-defined, i.e. q in one mixed sample is statistically identical to q in other mixed samples, two methods were discussed: training using class proportions of mixed samples (ref )  and training directly on data using mixed samples (ref ). This talk was a great opportunity for us to learn about new, simulation-independent approaches in to search for new physics with Machine Learning.

On the last day of the school Gilles Louppe gave a talk on ‘Likelihood-free Inference’. When discriminating between a null hypothesis and an alternative, the likelihood ratio is the most powerful test statistic. In the likelihood-free setup, the ratio of approximate likelihoods is used, which is constructed by projecting the observables to a 1D summary statistics and running the simulations for different parameters of interest. Reducing the problem to 1D is not ideal since we then lose the correlations of the variables. One of the introduced ideas to address this was to do a supervised classification to estimate the likelihood ratio. In this way, one does not need to evaluate individual likelihoods and can use the estimated ratio for inference. For details, here is a link to check out.

The Machine Learning Challenge

As part of the school, a machine learning challenge was held to allow students to test out their newly-acquired skills with a problem and a data set from particle physics. Specifically, this involved the tagging of heavy resonances, i.e. being able to distinguish heavy objects like the top quark, W and  Z boson, or the Higgs from light quark and gluon jets. These jets leave energy deposits in the calorimeters in the detector, which can then be mapped to images, which look a bit like this:

Overlay of 100k particle jet images, taken from

Using these images and the data from the detector, such as transverse momentum, pseudorapidity, and combinations of different variables, we were tasked with building a machine learning solution to classify jets as coming from a top quark or not. The challenge was organised by Gregor Kasieczka, who recently authored a nice summary paper on this very topic (machine learning for top tagging) – check it out at

So what did we come up with, and how well did it perform?

Our INSIGHTS team had several major advantages comparing to the other participants. First of all, we were the team of 4 people working together, leading to many fruitful discussions. This also allowed us to try different approaches at the same time and to distribute parts of the task (data preprocessing, trying out different hyperparameters or architecture or the model, etc). What’s more, we had access to the GPU-machine in the University of Naples, which gave us a great boost in computational power and a possibility to play around with relatively large models.

The winning model was jokingly named as “A bit Tricky Beast”, because it was an epic Frankenstein’s monster composed of two Neural Networks trained separately, brought together by the third Neural Network. And there was a little trick in a way we trained the model. First network was a CNN (Convolutional Neural Network) taking jet images as an input. It was already pretty big with about 1.7 million parameters. The second network was an RNN (Recurrent Neural Network) taking the preprocessed constituents. We used particles 4-momenta together with physically motivated high-level features as invariant mass m2, transverse momentum pTand pseudo-rapidity . Finally, as a cherry on the top, we used several fully-connected layers to combine the outputs from CNN and RNN, and produce one number – probability of jet coming from the top quark.

The trick was in the way of handling the data. In order to mimic the effect of data-monte carlo disagreement, the data for scoring the solutions differed from the training data with some small fluctuations. However, the part of test data provided to us and the part organisers used for final scoring had the same fluctuations. Therefore, after a thorough training of our network on the provided training set, we trained it for a bit on the provided test data. This allowed our network to learn some features of the fluctuations applied to the test data and slightly boosted the performance.

The schematic representation of the models architecture.

The network itself together with Jupyter notebook can be found at

After 9 hours of continuous coding, collaborating and drinking coffee, we produced several networks (with very slight differences among them) that took 6 first best scores on the challenge!

Our winning photo with the challenge organiser Gregor Kasieczka and our prize – a bottle of nice Austrian wine 🙂


Overall this school was a wonderful and fruitful experience for us. The breadth of the introduction allowed us to learn about and compare different Deep Learning tools, and the talks on the advanced topics offered a glimpse into the kind of problems on the frontier of the field that the experts are working on. And – fairly obviously – we enjoyed thoroughly the hospitality of the school organisers, the tranquil campus of DESY and the city of Hamburg!