Meet the ESRs: Diego Fernandez

Hi everyone!

My name is Diego, and I am the ESR at the INFN Napoli section. It has been just 4 months since I started, and even though the pandemic situation makes it difficult to visit beautiful places in the city or travel around the country, I am enjoying a lot my experience here and I am very happy for being selected for this position!

I was born in one of the smallest cities in Spain, Soria, but soon I moved to Pamplona, in the north, where I studied elementary and high school. Maybe this city sounds more familiar to you because of its major feast, the “Sanfermines” (if someone wonders no, I haven’t run in front of the bulls yet!)

San Fermin feast bullrunning in Pamplona. [Retrieved from: https://www.dw.com]

Since I was young, I’ve always had a very strong passion for physics – I wanted to know the answer for the more fundamental questions. Therefore, it was very easy for me to choose which Bachelor degree I wanted. I started the degree in Physics at the University of Zaragoza, where I discovered and enjoyed the student life, but also learnt what hard-work means! Since the first year, I was impatiently waiting  for the higher years where the more fundamental/theoretical subjects are teached (quantum mechanics, QFT, nuclear physics…) It was there when I discovered my particular interest in particle physics, and started to put everything that I had in hand in order to be a researcher in that field.

In my last year of the degree I had the opportunity to study abroad at San Diego State University (California, US). It was definitely one of the best years of my life, where I had the opportunity to meet a lot of international people, mix with different cultures, and also be part of a different educational system with very good professors, which also helped to broaden my knowledge.

After my experience abroad and thanks to my Bachelor’s thesis supervisor, I started my Master’s in Nuclear Physics at the UCM in Madrid. In the beginning, it was difficult for me to find a suitable thesis, since it was quite a tough decision to make because it can mark the starting point for the next couple of years of research, but I had the luck of being selected for a thesis in Higgs boson measurements within the CMS project at the CIEMAT. I learned a lot of things regarding high energy experimental particle physics, but one of the best parts was the opportunity to familiarize with Machine Learning techniques and how to apply them to particle physics experiments.

I knew that I wanted to contribute to future discoveries in the field of particle physics, so the next step would be to do a PhD in that topic. It was at that moment when I saw the INSIGHTS position at the INFN Naples, and I had no doubt that I needed to apply for it! For the people that wonder what my project is about, I’m working under the supervision of Drs. Orso Iorio and Luca Lista on the Search for Vector-Like Quark T (a particle predicted by some Beyond Standard Model theories)  in the semileptonic decay mode with data from the LHC Run 2. This VLQ T decays into a top quark and a Z or H boson. We are studying the case where the W form the top quark decays leptonically, and the Z/H boson decays hadronically. The biggest effort is centered in the reconstruction of the leptonically decaying top quark, for which various Machine Learning techniques are used, such as Boosted Decision Trees or Deep Neural Networks.

Although my fellowship is shortened and is not leading yet to a PhD, I am extracting the maximum out of it and I am very grateful for receiving it. Stay tuned because (hopefully!) sooner or later I will be able to announce my enrollment in a PhD program!

Panoramic view of the city of Naples with the Vesubius at the back. 
[Retrieved from:http://www.italia.it]

Exploring Stromboli volcano with unsupervised Machine Learning

The 2019 explosions of Stromboli

Visiting an active volcano is a breathtaking experience and each year thousands of tourists around the world are seeking out the thrill to experience the power of the restless earth. One of the most visited active volcanoes in the world is Stromboli, a volcano known as the “Lighthouse of the Mediterranean” on a small island of the same name just off the coast of Sicily. Its persistent explosive Strombolian activity consists of several hundred of small eruptions per day. Visiting this fiery mountain (that allegedly J. R. R. Tolkien identified with his fictional volcano Mount Doom in the Lord of the Rings) comes at a certain risk though – on very rare occasions it produces major explosions, when as a tourist you might want to be as far away as possible from this island. To protect the tourists and the inhabitants of the island, the Italian National Institute for Geology and Volcanology (INGV) has installed a network that measures the seismic activity of Stromboli volcano and constantly monitors various parameters that characterize its state. Every morning the INGV researchers have a briefing with the local government and if the volcano shows anomalous behavior they can suspend the possibility to access the top of the volcano.

Figure 1: The left panel shows a typical Strombolian explosion that occurs several times per day. The right panel displays a snapshot of the major explosion on July 3.

Despite all of these measures, Stromboli erupted on the 3rd of July 2019 without showing any anomaly in the routinely monitored parameters. The explosion caused a pyroclastic flow (a fast-moving current of hot gas and volcanic matter) that extended for several hundred meters into the sea. Unfortunately, the event also claimed one fatality and could have been much worse and lead to a catastrophe if it had occurred a few hours later in the evening when many people climb the volcano in guided groups. On the 28th of August 2019 a second paroxysm occurred similar to that of July 3 with another pyroclastic flow, luckily this time without any fatalities.

The quest for predictive parameters

Since the routinely monitored seismic parameters did not predict this major eruption, the researchers of the INGV are now analyzing the collected seismic data in order to understand whether other seismic parameters showed any anomalies in the months prior to the 2019 eruptions. Ideally one wants to find new parameters that potentially can predict major eruptions in the future to make the experience of visiting Stromboli safer for future tourists and inhabitants. A major challenge is hereby that the 2019 explosions are the first measurements of such strength at Stromboli – the previous one occurred in 2002 when the network was not installed yet. Therefore at the moment it is impossible to quantify whether possible anomalies in the seismic parameters are indeed able to predict a major explosion – only the future will tell.

Within the context of my secondment at the INGV in Naples I had the possibility to work with the INGV seismologists to explore new seismic parameters by making use of unsupervised Machine Learning. In the past it had been conjectured that there is a connection between the waveforms of the seismic signals that are associated with the daily occurring Strombolian explosions (see left panel of Figure 1) and the physical state of the volcano, e.g. the gas mixture. Therefore we carried out a cluster analysis of the waveforms of the seismic signals to study how different clusters of waveforms behave in the time period of the 2019 paroxysms.

Visualization of seismic signals

In the period from May 2019 to September 2019 the INGV recorded around 20.000 seismic events that stem from the daily Strombolian explosions of the volcano. An example for a single signal is shown in Figure 2 below. The seismic signals that are associated with the explosions can be interpreted as time-series. In order to clean the signals, they are preprocessed by applying a band-pass filter that only allows frequencies between 0.05 – 0.5 Hz and the signals are downsampled to reduce the dimensionality of the time-series.

Figure 2: The left panel shows a seismic signal of a Strombolian explosion; the top panel shows the raw signal, the middle panel shows the same signal after applying a bandpass filter and the bottom panel shows the downsampled signal. The right panel shows the output of the t-SNE algorithm.

In data science several algorithms to visualize high dimensional data are available. In our case we use the t-SNE algorithm that allows to embed high dimensional data in a low dimensional space. Specifically, it models each high-dimensional object, in our case a high dimensional signal, by a two dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability. As such, t-SNE can be used to visualize clusters in the high-dimensional data. Running this algorithm with our dataset of seismic signals produces a 2D distribution that is shown in the right panel of Figure 2 above. Each of the displayed points corresponds to a high dimensional signal and it becomes visible that there are structures in the data that we can exploit with a clustering algorithm.

Clustering of seismic signals

Clustering defines structures by separating unlabeled datasets into homogeneous groups. For our analysis we use the K-Means algorithm which clusters data by trying to separate samples groups of equal variance, minimizing a criterion called within-cluster sum-of-squares. One of the challenges using K-Means is that the number of clusters has to be specified beforehand. Since we are working with unlabeled data, we don’t know the optimal number of clusters and there is no ground truth that could help us to assess the performance of our clustering. Thus the performance has to be evaluated using the model itself. One way to do so is to use Silhouette scores, that measure the separation distance between the resulting clusters. Higher scores relate to a model with better defined clusters. The Silhouette coefficient is calculated using the mean distance between a sample and all other points in the same class (a) and the distance between a sample and the nearest cluster that the sample is not a part of (b). The Silhouette coefficient for a single sample is then calculated as: (b - a) / max(a, b).

Figure 3: The top panel displays the Silhouette coefficients (left) and t-SNE projection (right) for K-Means with n=2 clusters. The bottom panel shows the Silhouette coefficients (left) and t-SNE projection (right) for K-Means with n=4 clusters.

Silhouette coefficients near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster.

In Figure 3 the silhouette coefficients and the t-SNE projection of the resulting clusters is shown for the K-Means algorithm with n=2 and n=4 clusters for the seismic signals associated with Strombolian explosions. The red line indicates the average silhouette score of all samples. In our analysis we found that the highest average silhouette score is obtained with n=2 clusters which indicates that the seismic signals form two main clusters. In Figure 4 the centroids of the two main clusters are shown. As expected, the waveforms of the two cluster centroids are very different. The example signal shown in Figure 2 can be clearly associated with Cluster 1.

Figure 4: The left panel shows the centroids of the two main clusters obtained with the K-Means algorithm. The right panel shows the time evolution of the two main clusters.

Since we have access to the time stamps of the signals we can display the time evolution of the signals in the two main clusters. The resulting distribution from May 2019 to September 2019 is shown in the right panel of Figure 4 and the 2019 explosions are marked with red lines. We observe a strong asymmetry between the two clusters in this time period.

Conclusions

The researchers of the INGV conjecture that there is a connection between the waveforms of the signals associated to the Strombolian explosions and the physical state of the volcano. The observed asymmetry in the time evolution of the waveform clusters might therefore indicate large changes of the physical state of the volcano in the eruption period. The ratio between the two main waveform clusters might be an important variable to monitor the state of the volcano. In principle this variable can be monitored in real time by automatically assigning selected signals to one of the two clusters. The next important steps that the INGV researchers will take is analyzing the waveforms of the seismic signals for larger time scales and at the same time analyze controlled experiments set up in a laboratory to better understand the connection between the physical state of volcanic systems and the resulting seismic signals.

Moreover, given that most of the routinely monitored seismic parameters showed no anomaly prior to the major eruptions in 2019, the ratio of the two main waveform clusters may be potentially important to predict future explosions. However, as data analysts we are in a dilemma – on the one hand we need more data to understand which variables are correlated with major explosions of Stromboli – on the other hand we don’t want to hope for further major explosions that endanger the life of tourists and inhabitants. All that we can do is wait what the nature will bring us while trying our best to make use of the limited data we have to make visiting Stromboli a safe experience.

Walking, talking, and quacking like a Higgs boson

The ATLAS Collaboration at Europe’s Particle Physics Lab CERN, have reported a study of the Higgs boson, the elementary particle discovered at CERN’s Large Hadron Collider (LHC) in 2012. The results were presented at the biennial International Conference of High Energy Physics (ICHEP), hosted this year by the University of Prague but held entirely online because of the Covid-19 crisis. They have found that the “strength” with which the Higgs interacts with other particles agrees extremely well with the predictions of our best theory, the so-called Standard Model of Particle Physics.

Figure 1:  A depiction of a proton-proton collision in the ATLAS Experiment at CERN’s Large Hadron Collider resulting in production of a Higgs particle and a Z boson.  The Higgs boson decays into two other Z particles; one of the Zs decays into a pair of muons indicated in red and one of the Z bosons into an electron-positron pair shown in green (figure from the ATLAS Collaboration).

In 2012 physicists at CERN discovered a new particle with properties consistent with those predicted for the Higgs boson. In particular one could measure how often the Higgs boson would disintegrate or “decay” into other types of particles. But these and other properties were measured with limited precision, and as the world of elementary particles is large and complex, one could still question whether the new particle was really the Higgs. Over the last 8 years, studies by the ATLAS Collaboration as well as the competitor experiment called CMS have continued to reduce any doubt about whether the new particle is in fact the Higgs boson.  

The latest results from ATLAS for the coupling strengths are shown in Fig. 2 below. In both the upper and lower plots, the data points show on the vertical axis a quantity related to the “coupling strength” of the Higgs to other known elementary particles, namely, the muon (μ), tau lepton (τ), b-quark, W/Z bosons and the top-quark (t), while the horizontal axis indicates the mass of those particles. The dashed line shows the relation between these quantities predicted by the Standard Model. The data points are seen to agree very well indeed, with remaining small discrepancies consistent with estimated measurement uncertainties as indicated by the vertical bars on the points.

Figure 2:  Measurements of the coupling strength of the Higgs to other particles versus the mass of the particle with which it interacts.  The dashed lines indicate theoretical predictions of the Standard Model (figure from the ATLAS Collaboration).

Is the level of agreement enough to prove that the particle we’ve found is the Higgs?  Technically no, there is always some small room for doubt. But one should keep in mind that if the particle were to have some other non-Higgs explanation, then there would be no reason to expect anything like the pattern found in the figure above. So if it walks, talks and quacks like a Higgs, then we can regard this crucial part of the Standard Model to be well confirmed.

To scrutinise subtle signs of deviations of data taken at the LHC with respect to the Standard Model predictions, the Higgs measurements reported in this study can be reconciled in the framework of an Effective Field Theory. The framework helps in understanding how the signatures of new phenomena manifest in our detector even when the new phenomena occurs at distances even smaller than those directly probed by the LHC. This is an active area of work ongoing in the ATLAS Collaboration, stay tuned for updates in the future.

The ATLAS Collaboration is an international consortium of 181 universities, including 5 ESRs from INSIGHTS. Rahul Balasubramanian and supervisor Wouter Verkerke contributed to this analysis. INSIGHTS Scientific Coordinator Glen Cowan (Royal Holloway, University of London), also played a role in the analysis through development of the statistical methods used as well as Chair of its Editorial Board.

Infections and identified cases of COVID-19 from random testing data – Allen Caldwell, Max Planck Institute for Physics, Munich

Abstract: There are many hard-to-reconcile numbers circulating concerning Covid-19. Using reports from random testing, the fatality ratio per infection is evaluated and used to extract further information on the actual fraction of infections and the success of their identification for different countries.

The PDF can be downloaded here.

The value that is keeping Italy in lockdown

Yesterday the Italian government released the analysis that motivated the very mild release of the lockdown. This is an impressive analysis and essentially takes into account all the points I raised last week; the population is divided into age groups with different susceptibility and transmission rate, it is done at the regional level and takes into account an incredible amount of information. Really fantastic work.

However, all the predictions rely on one crucial data: the CFR in Lombardia which is 0.657%. This is incredibly pessimistic and is caused by the fact that the Lombardia health system was overwhelmed. The same value for the rest of Italy would be much lower, about half. In Sweden, the CFR is 0.3% and in other countries is even lower. If a value of 0.35% had been used, the estimate of critical cases would have also been (more or less) halved and the rate of spread would have also been reduced considering that twice as many people would have been categorised as immune.

So why was the CFR of Lombardia used for the whole of Italy? The model is run on each regional independently, so why not use the regional CFR too? I am sure the results would have been significantly different and the Italians would have been less upset.