The value that is keeping Italy in lockdown

Yesterday the Italian government released the analysis that motivated the very mild release of the lockdown. This is an impressive analysis and essentially takes into account all the points I raised last week; the population is divided into age groups with different susceptibility and transmission rate, it is done at the regional level and takes into account an incredible amount of information. Really fantastic work.

However, all the predictions rely on one crucial data: the CFR in Lombardia which is 0.657%. This is incredibly pessimistic and is caused by the fact that the Lombardia health system was overwhelmed. The same value for the rest of Italy would be much lower, about half. In Sweden, the CFR is 0.3% and in other countries is even lower. If a value of 0.35% had been used, the estimate of critical cases would have also been (more or less) halved and the rate of spread would have also been reduced considering that twice as many people would have been categorised as immune.

So why was the CFR of Lombardia used for the whole of Italy? The model is run on each regional independently, so why not use the regional CFR too? I am sure the results would have been significantly different and the Italians would have been less upset.


Not an exponential curve fit: a data analysis exercise on COVID-19 Italian data trying to estimate the number of infected people

Update at 15:30: The Swedish study has just been withdrawn, so IP3 is no longer valid. The two other models are not affected.  This shows how difficult is to make an analysis of live data.

This post can be downloaded as PDF.

How many people are really infected? Nobody knows. The number of asymptomatic people is high, but nobody has a definitive number. Mostly because any estimate would only be accurate on a small region since the number of tests carried out differ significantly from place to place.  Knowing the number of infected is useful for society as many cured people would allow to re-open Europe, at least partially. Since our network is aiming to help society using particle physics tools, I will try to give an estimate based on some of the data available from Italy. Will my estimate be accurate? Probably not, but even if they only help someone to better understand the complex nature of analysing COVID data, I think it is worth sharing them.

Let me start by saying that this is not a quantitative analysis, I do not have the background to do that. I will make some assumptions which should be correct enough for a qualitative analysis and give an approximate result which should be confirmed by experts.

Review of existing estimate

Since I started developing this method, two estimates were produced in Italy. One is described in a preprint, the second is a survey from Doxa. The former uses the Case Fatality Ratio (CFR), defined as the probability to die if infected, in small countries (0.2%), on the Diamond Princess (1%) or in Italian hospital (10%). I find these numbers not very precise as the early cases in small countries like Island, Luxembourg was imported from people travelling. This is usually a relatively young population which we know has a lower fatality rate than the average, so I find this a bit optimistic. The estimate using the cruise ship Diamond Princess is based on the 7 deaths out of more than 700 infected, so the statistical uncertainty is large. Furthermore, the population on a cruise ship probably does not reproduce the demographic of a country. Getting a good age distribution is crucial given the much higher mortality for the older age groups; this is the problem of using the hospital population which is significantly skewed toward the older population. The Doxa analysis is based on 1000 people spread across Italy and counting those describing symptoms associated with the virus. There are several assumptions, but the main problem is that 1000 people is not enough to sample all regions and age groups accurately. In general, all methods try to extrapolate to the whole Italian population which, in my opinion, the biggest problem as the diffusion depends strongly on geography. About half of all Italian cases are in Lombardia so a more granular approach is needed.

The data

Moving to the data, the most granular information available is in the ISS (Istituto Superiore della Sanita’) appendice con dettaglio regionale (appendix with region-by-region information). This appendix is published every week and the latest available data is from the 16th of April. The number of infected is provided for each province but the demographic curves of the infected and the number of deaths are only presented for the regions. It would be great if ISS could share this additional data so that the predictions could be improved.

I also want to make a consideration on the quality of this data. A lot of people on the internet question the validity and the usefulness of all these numbers. As in all data analysis, there is no wrong data, at most, there is bad data and it the job of the analyser to understand it. In general, domain expertise would allow doing this correctly, but in this case, the confusion is generated by the attempt to extrapolate or aggregate to national level what should be analysed as a localised problem. This is what I will try in my analysis.

 The analysis

The figure that intrigued me and led me to start this analysis is the demographic curve of infected people in Italy (an example is provided here). This is very different from the demographic curve of the whole Italian population (link). If we could find a segment of the population where most of the infected are identified, we could get a first approximation of the all infected population. I think such a group exists and are the males over 80. I base this assumption on the fact that 40% of the infected male over 80 died because of the virus. Given such high mortality, it is unlikely many will be asymptomatic like most under 40. Since most infected people in this group will be very symptomatic, most of them should be tested and identified. There are several problems with this assumption, for example, people may die before being tested and a small fraction may have very mild or no symptoms at all. Hence, any result based on this assumption is a lower limit. The rate of asymptomatic in this age group could be easily measured by a dedicated study monitoring this group in a region or province; since this is a constant number it could then be applied to the whole country. Indeed, this information could be extracted by the available tests carried out in hospices. I selected male because an important factor in this pandemic is the sex of the infected; in the over 80 population, a similar number of male and female is infected although the population is dominated by women. I do not think there is a substantial difference in the lifestyle of the two sexes, so both should be infected at the same rate. My assumption here is that women simply get milder symptoms and therefore are less tested but will have a similar infection rate. This is supported by the fact that the mortality rate for women is only 20%; it looks like women are more resistant, so could be also more asymptomatic. Since for people over 90, the women-man ratio in the population and in the infected is significantly different from the 80-90, I will focus on the 80-90 population only (all demographic curves use 10 years ranges, so this simplify some of the data extraction too). The first hypothesis is that at least as many people in each region are infected as the 80-90 age group; this hypothesis is denoted IP1 from now on. This assumption is supported by a recent study done on the completely locked-down town of Vo, in Veneto. The 80-90 age group had an infection rate comparable to the whole population; it must also be noted that only 5 cases were identified, so there is large statistical uncertainty in this confirming result. One side note about this study; it showed that children do not get the infection and are not super-spreader like in the normal influenza, so the closing school did not make a big impact and should be among the first measured to be lifted as already done in Denmark, Norway and Germany.

In normal conditions, people in their 80s are not the most socially active group around. So, I would expect a significantly higher transmission and infection rate in younger age groups. As a physicist, what I find interesting about this estimate, is that it is easy to correct because it does not require to know the absolute infection rate of the virus, but the relative infection rate between age groups! This could be derived from studies like the one cited above done on the population of Vo. In particle physics, we measure a lot of ratios because we get a lot of effects that cancel out. I had a look at publications in this area, which would be based in non-lockdown conditions, and I could only find this paper. In summary, people below 50 are twice more likely to get infected than 65+. This factor two may not be the most appropriate for a lockdown scenario, it may be smaller (if everyone is really isolated at home as was the case of the study on Vo mentioned above where the factor was 1) or higher if a significant fraction of the younger working population continue to interact (50% of companies are still open as deemed essential) while the older population practice tight isolation. It is possible to get a rough estimate this factor from the Vo study as they measured the rate of infection at the beginning of the lockdown (so those cases would have been caused by normal circulation of people) and after two weeks of total lockdown. Unfortunately, the new cases were so few that any extrapolation based on the ratio between age group would be meaningless (error bigger than 100%). Comparing the age-inclusive rate has a smaller (but still large) statistical uncertainty; 0.3% during lockdown compared to 2.6% before. Taking the mid-point for the working group would give 1.5%. The ratio between locked down older generation over the working population would be 5 to 1, higher than the factor two mentioned before. When extrapolating to the total population this value must be reduced as the school-age population is in lockdown and those working would probably take measure to reduce infection. Combining all these effects a rough estimate could be that the whole population is only twice as infected as the old; I welcome ideas to improve this estimate as I know it is not great. I will denote this as IP2 from now on.

Let me stress that my estimates do not use the mortality rate at all and can rely on the bigger numbers of identified infected people to extrapolate to the actual number of infected, so statistical errors are smaller and the estimate more accurate. The results are summarised in Table 1 [only extrapolated results are shown in this post, more columns are shown on the tables in the downloadable Study on Italian data on Covid19- MFG – 22-04PDF]. I decided to include some of the intermediated steps for completeness. The only man-women division based on demographic that I could find was on a national scale (here), indicating that 49.4% of the cases in the 80-90 group were men. I have no reason to suspect that this ratio should be significantly different in any Italian region, so I will assume that this is true in all regions. This is data available to the ISS, so it would be great if they could provide it to make the prediction more precise. The column “Fraction of infected All [%]” is the fraction of infected in each region; the large differences are caused by the different level of spread but also by the significant difference in the number tests performed by each region. Many people complain that COVID data does not make sense citing these numbers; actually, the data is correct but must be understood, for example by avoiding such easy (but wrong) comparisons or by aggregating nonhomogeneous data. By analysing each region independently, I avoid this problem and I find that data make more sense. I provide the summary for Italy and Italy without Lombardia only to stress this point. The latter is also useful to show that Italy is doing much better, especially in the number of deaths per inhabitants, is if the worse affected region is removed from the national calculation.

Table 1 Summary of data and estimates for all Italian regions

Region Infected Male 80-90 Fraction infected Male 80-90 [%] (IP1) Fraction of infected All [%] Ratio
IP2 [%]
Lombardia 4692 2.08 0.54 3.81 4.15
Lazio 254 0.20 0.07 2.75 0.40
Campania 121 0.12 0.05 2.40 0.25
Sicilia 89 0.09 0.04 2.40 0.17
Veneto 932 0.85 0.26 3.22 1.70
Emilia-Romagna 1444 1.24 0.41 3.04 2.48
Piemonte 1091 0.92 0.28 3.24 1.84
Puglia 174 0.20 0.07 2.95 0.40
Toscana 368 0.36 0.14 2.59 0.73
Calabria 35 0.08 0.04 1.98 0.16
Sardegna 74 0.20 0.06 3.67 0.40
Liguria 278 0.58 0.19 3.11 1.16
Marche 405 0.96 0.31 3.09 1.92
Abruzzo 93 0.28 0.14 1.96 0.56
Friuli Venezia Giulia 125 0.41 0.16 2.48 0.81
Trentino-Alto Adige 365 1.67 0.48 3.46 3.35
Umbria 44 0.18 0.14 1.33 0.36
Basilicata 5 0.03 0.03 1.29 0.07
Molise 13 0.16 0.08 2.15 0.33
Valle d’Aosta 77 2.61 0.70 3.75 5.22
Italy 10681 0.77 0.23 3.39 1.53
Italy w/out Lombardy 5989 0.51 0.16 3.17 1.02

To further prove the need for a province-level granularity of the data, I analysed the two provinces of Trentino Alto Adige since the ISS provide all the details for both. The results are shown in Table 2; Trento, a province closer to the epicentre of the pandemic, has twice as many cases as Bolzano. Using the region data only show the average between the two that is not an accurate description of the spread of the virus.

Table 2 Comparison of provinces in Trentino-Alto Adige

Region Infected Male 80-90 Fraction infected Male 80-90 [%] (IP1) Fraction of infected All [%] Ratio
IP2 [%]
Trentino-Alto Adige 365 1.67 0.48 3.46 3.35
Bolzano 133 1.27 0.35 3.61 2.54
Trento 232 2.05 0.61 3.33 4.10

Study on CFR, a digression

The CFR has been another source of confusion that led people to claim that the available data is wrong or useless. Again, the data is correct and simply should not be compared because of the definition of CFR includes a value that it not comparable between regions. Let me go back to the definition; CFR is the fraction of deaths over infected. While the numerator (the number of deaths) is a well-defined number which is common to all regions, the denominator (the number of infected) suffers from the different number of tests per inhabitants performed in the various regions. For example, Veneto has performed as many tests as Lombardia but has a population that is less than half. This resulted in more people being identified as positive, particularly in lower age groups. This resulted in Veneto having an average age in the infected of 58 while Lombardia average age for the infected in 65.

This problem could be avoided if the demographic of the deceased could be provided as it is done for the infected. With this breakdown, it would be possible to use the 80-90 age control group to compare the CFR between regions; again, I would assume that all regions are identifying all (or at least the majority) of the infected in this age group.

The 80-90 age group can also provide an evaluation of the performance of the regionalised health system in Italy by counting the fraction of deaths in this group. The guidelines to treat patients in case of shortage of ICS places state that priority should be given to the healthier and younger patients that have a higher success rate in surviving. Therefore, a significantly higher number of deaths my control group would be evidence of doctors having to make difficult decisions due to lack of resources.

Unfortunately, the data to carry out this study is not available as only the national breakdown of death in age groups is provided. What can be done is to divide the number of deaths in this age group proportionately to the deaths in each region. This approximation would hide some of the excesses that the study aims to find as the deaths are equally distributed, so any excess found will be underestimated. I already calculated the number of infected in the 80-90 age group which are used as the denominator for the calculation of the fatality. The results for all regions are shown in Table 3. The CFR in Lombardia is the highest of all Italy and is almost twice the average in the rest of Italy. It is also interesting to notice that Liguria has a very high CFR too, something that has not been picked by the media (probably due to the bigger numbers in Lombardia). Emilia-Romagna, the second most affected region also show a higher rate than the average, it would be interesting to have the breakdown by provinces to see if the rate would get worse in the provinces closer to Lombardia which are more affected by the virus. These numbers do not definitively prove that medics had to make difficult choices but hint in this direction. A definitive answer could be given if ISS would provide the additional information mentioned above (the breakdown of deaths by age in each region and provinces).

 Table 3 Estimated number of deaths and CFR for the 80-90 age group in all regions

Region Dead 80-90 CFR 80-90 [%]
Lombardia 4594 39.19
Lazio 105 15.42
Campania 59 19.90
Sicilia 58 25.16
Veneto 396 17.60
Emilia-Romagna 1120 32.40
Piemonte 615 20.31
Puglia 119 25.79
Toscana 151 15.86
Calabria 21 27.08
Sardegna 33 16.80
Liguria 214 34.56
Marche 172 17.23
Abruzzo 15 6.55
Friuli Venezia Giulia 84 29.81
Trentino-Alto Adige 220 23.95
Umbria 23 25.44
Basilicata 8 77.05
Molise 6 22.27
Valle d’Aosta 55 31.59
Italy 8070 30.22
Italy without Lombardy 3476 23.20

A hope from the north of Europe?

The fraction of infected evaluated with IP1 and IP2 are rather low and, even in the most affected regions, the numbers are far from those needed for herd immunity. However, these are rather conservative values and the infection may be more widespread.

Some hope is coming from Sweden than just announced that some regions of the country already show sign of herd immunity and should reach the required levels of infected (more than 60%?) in about a month. While Sweden as been described as a nut case by the media by not imposing a lockdown, it does not have any count (fraction of infections or deaths) as high as Italy. So, can Italy, or at least north Italy be already close to herd immunity?

In the week of the 2nd of April, they tested 773 people in Stockholm and 2.5% resulted infected. Extrapolating to the 9th of April, they estimate that (7.5±2.5)% of the population in the capital region were infected. Some detail can be found here. Translating these numbers to the Italian data is not easy; it is not possible to use the number of infected people (the CFR) as the testing strategy in Sweden is different from Italy. The only possible common factor is the fraction of deaths over the whole population. There are two problems in using this fraction, one is the different demographics of the two countries, the second is the fact that in Sweden all Covid deaths are considered while in Italy only those in hospitals are considered. The former difference can be corrected by comparing the fraction of over-70, the largest fraction of deaths being above this age. In Italy 17.2% of the population is over-70 while only 14.8% of Swedes are in the same age group. Therefore, we could expect a 16% (17.2/14.8=1.16) higher mortality in Italy simply due to demographics. A similar comparison done by experts can be found here. It is difficult to set a correction for the different way of counting the deaths, the number of deaths in Italy should be higher than those reported but we could assume that in a non-overwhelmed region, all people with acute symptoms were transported in hospitals and therefore most deaths were counted.

Ok, moving to the math. Stockholm has a population of about 2.3 million and on the 9th had 486 deaths giving a fatality ratio (FR) of 0.02%. Please note that in this case the fatality is calculated over the whole population as this is a measure of the progress of the infection; this is different from the CFR where the denominator is the infected people. Actually, the two value are identical once the whole population is infected; at the current pace, Stockholm should have a CFR of 0.3, slightly higher than the lowest estimate reported in the review section but significantly lower than the Diamond Princes data suggested. The FR is then increased by 16% to be compared to the Italian regions fatality rates as described above. It is also crucial to say that Stockholm was not overwhelmed by the pandemic (a 400 places field hospital is still unused), so this number can only be compared to regions that are not overwhelmed, i.e. I will not show Lombardia. The results of this study are presented in Table 4. The extrapolation of the Swedish model is denoted as IP3 and both min and max values derived from the original statistical error are shown for completeness.

Table 4 Current fatality rate over the whole population and predicted fraction of infected based on the Swedish study

Region Deaths / population [%] IP3 min [%] IP3 [%] IP3 max [%]
Lombardia 0.1313 NA NA NA
Lazio 0.0044 0.9 1.4 1.9
Campania 0.0025 0.5 0.8 1.1
Sicilia 0.0029 0.6 0.9 1.2
Veneto 0.0200 4.2 6.3 8.4
Emilia-Romagna 0.0622 13.1 19.7 26.2
Piemonte 0.0350 7.4 11.1 14.7
Puglia 0.0073 1.5 2.3 3.1
Toscana 0.0101 2.1 3.2 4.2
Calabria 0.0026 0.6 0.8 1.1
Sardegna 0.0050 1.1 1.6 2.1
Liguria 0.0342 7.2 10.8 14.4
Marche 0.0280 5.9 8.9 11.8
Abruzzo 0.0028 0.6 0.9 1.2
Friuli Venezia Giulia 0.0172 3.6 5.4 7.3
Trentino-Alto Adige 0.0509 10.7 16.1 21.5
Umbria 0.0066 1.4 2.1 2.8
Basilicata 0.0037 0.8 1.2 1.6
Molise 0.0052 1.1 1.7 2.2
Valle d’Aosta 0.1090 23.0 34.5 46.0
Italy 0.0331 7.0 10.5 14.0
Italy without Lombardy 0.0171 3.6 5.4 7.2

The hope is that, if these numbers are confirmed, the northern regions are approaching infection rates that will allow a natural reduction of the spread of the virus and a return to a life closer to the pre-pandemic period. It must also be stressed that all regions in centre-south of Italy have a very low infection rate even in this model. So, a prudent and conservative approach is really needed to avoid a second wave in these regions. Italy really looks divided in two by these numbers and any policy should reflect them, to maximise the benefits in restoring personal freedoms in the north and to protect the fragile health system in the south.

Most affected provinces

While all the media was focussed on Bergamo due to the highest absolute counts of infected and deaths, the province that has the highest identified number of infected is Cremona followed by Lodi and Piacenza. To provide the estimates in these provinces I will need to make some additional assumption as some data is not available. For example, I will use the region demographic curve to calculate the number of 80-90 people affected in the province. The national male-female ratio is also used for the provinces. A difference in the infected demographic may be caused by different testing policies in different areas; overwhelmed areas may only test the severe cases while less affected areas may still test a larger spectrum of the population. Since older people are more affected, a bias in the tests based on the severity of the symptoms is also a bias in the demographic. Different provinces may also have different testing capabilities, resulting in a different sampling of the population. All these biases are likely limited by two factors, the regional based sanity system is likely to have provided similar resources per capita and that most cases are from the recent days when the whole region was under similar stress. The three models are presented in Table 5.

The conservative approaches described in IP1 and IP2 show that these provinces should already have a non-negligible fraction of the population infected, probably enough to already be having an impact on the transmission rate (as observed in Stockholm).

Since these provinces may have been overwhelmed, I used the national average (w/out Lombardy) and multiplied it by the ratio of IP1 between the province and Italy (w/out Lombardy) to scale the number of deaths for IP3. The results are shown in Table 5. According to this estimate, (49±16)% of people have been infected in the province of Cremona. This is very close to the levels required for heard immunity but there are many assumptions to reach these values which would need further scrutiny and more granular data to be sure they are correct.

Table 5 Estimates for the most affected provinces

Province Infected 80-90 Infected M 80-90 IP1 [%] IP2 [%] IP3 min [%] IP3 [%] IP3 max [%]
Cremona 980 484 5.74 11.5 32.8 49.1 65.5
Lodi 487 240 5.31 10.6 30.3 45.5 60.6
Piacenza 512 253 3.29 6.6 18.8 28.2 37.6


I presented a way to analyse the available data on a regional base which better describe the fragmented Italian health system. Using the assumption that could be easily be verified by ISS which has more data, I provided 3 estimates of the number of infected in Italy as of the 16th of April. If more data could be made available, more precise prediction could be provided. Two predictions are rather conservative so can probably be considered lower limits. A recent study in Sweden provided a new way to estimate the infected that, if confirmed, would put the most affected provinces in Italy close to the levels required for herd immunity, or at least high enough to provide a significant rate reduction once the lockdown measures will be lifted. The Swedish are conducting a new study with higher statistics that will provide more accurate data and each Italian region should follow the same example as these tests are much cheaper than the mass tests that many regions are planning.

Let us hope that the last figures are correct and that we will be able to return to normal life soon.

Michele Faucci Giannelli

PS: Thanks to all people that provided feedback during the preparation of this post.

Meet the ESRs: Sitong An

Hi there! I am Sitong An (安思同 in Chinese), Marie Skłodowska-Curie Fellow at CERN with project INSIGHTS and PhD student at Carnegie Mellon University (CMU). Originally from China, I left at the age of 16 and travelled the world for education. Currently, I’m working at CERN, Geneva, Switzerland, under the supervision of Dr. Sergei Gleyzer and Dr. Lorenzo Moneta. My Ph.D. advisor from CMU is Prof. Manfred Paulini. From September 2018, I will be working on Machine Learning/Deep Learning for Particle Physics for three years. I am immensely grateful to INSIGHTS and to my supervisors for giving me such a great opportunity to work in this exciting subfield.

A bit of background about me: I was born and raised in a small, nondescript city in northeastern China. As a kid, the thought of venturing overseas for education never crossed my mind. That was the case until 2009, when I was offered a scholarship (SM1) by the Singaporean government to attend high school in Singapore. It was a once-in-a-life opportunity, a rare window to the world outside, and yet it was also a daunting choice to go to a foreign country and learn to survive on my own. Eventually, this became the decision that changed the path of my life. I spent four intense and memorable years at Singapore, attending Catholic High School and Hwa Chong Institution. Till this day, I still feel a strong affinity for the dear “Little Red Dot”.

Singapore is an amazing city – you should visit if you’ve never been 🙂
Photo credit: Chensiyuan, Wikipedia

After my A-Levels there, I moved to U.K. for my undergraduate education at University of Cambridge, partially supported by scholarships from both the University and my college, Wolfson College Cambridge. I graduated in 2018 with a Bachelor of Arts and a Master of Natural Sciences (Physics). During my journey I was fortunate enough to have the opportunity to visit many places around the globe, including MIT for an exchange year abroad, and Weizmann Institute (Rehovot, Israel) and DESY (Hamburg, Germany) for internships. The coursework at Cambridge could feel gruesome and never-ending at times, but it was a privilege to wander about on the paths walked by Newton and Maxwell. Looking back, the three years I spent there were bittersweet, but still dream-like.

Call me biased – but for me Cambridge is the most beautiful university in the world
Photo credit: Sitong An, Commercial Rights Reserved

To work at CERN has been my dream and goal since high school. I remember the naive but passionate excitement I felt about the Higgs discovery while I was still a high school student. I remember seeing the advertisement on the CERN career website for the INSIGHTS position and thinking “this is exactly what I want to do!” I also remember attending the interview nervously, fully aware of the competitiveness of the position, and telling my future supervisors how much I care about making an impact in this field that I love, to the fullest of my abilities. And…voila, now I am here. As I sit in my office and type this blog post to tell you my story, I still can’t help but feel amazed at how these ten years passed by, and how that dream came true.

In the tunnel of Large Hadron Collider (LHC), CERN, Geneva, Switzerland
Photo credit: Andrés G. Delannoy

For these three years, I will devote roughly half of my time here to the development of deep learning algorithms for particle physics experiments. Specifically, currently I’m investigating the use of Graph Neural Network for event reconstruction at the new and upcoming High Granularity Calorimeter (HGCal) for the CMS Experiment. Reconstruction algorithms are an important step in the workflow of high energy physics experiments. They take raw data from the detectors and convert them into physical objects that physicists understand – like particles for example. Because of the sheer complexity of our detectors, deep learning holds promises in greatly enhancing the pattern recognition of our future reconstruction algorithms and empowering our detectors to make more precise measurements. This is, of course, a very brief and simplistic explanation, and I will describe this project in greater details in another technical blog post in the future.

An artist’s impression of the High Granularity Calorimeter, taken from the cover of the HGCal TDR (Technical Design Report).

The other half of my time will be spent on developing software tools in support of HEP-ML community – particle physicists who are developing and applying Machine Learning algorithms to their work. I am part of the ROOT team in the CERN EP-SFT group. ROOT is a data analysis framework widely used in the data workflow of high energy physics, and I will be contributing to ROOT-TMVA (Toolkit for Multivariate Data Analysis), the machine learning project within ROOT. My work will focus on modernisation of ROOT-TMVA, aiming to allow physicists develop and deploy machine learning models more easily with ROOT data. More details upcoming about this too.

Accelerating Science at CERN
Photo credit: Sitong An, , Commercial Rights Reserved

Apart from my technical work, I also care deeply about public engagement. High energy physics is a costly enterprise and what we’re doing would not be possible without public support. I am a CERN guide as well as a qualified guide to both CMS and ATLAS experiments. It is always an enjoyable experience to show visitors around and share our passion; to explain why we are doing this, why curiosity-driven fundamental research is important; and to see the awe-struck expressions of the visitors when they see the underground detectors for the first time. I also volunteer actively in CERN public activities, like CERN Opendays and TEDxCERN.

Volunteering for TEDxCERN, November 2018

If you’re interested in learning more about me, welcome to visit my website/blog by clicking here. It is still very simple and lacks much content at the moment, but I will furnish it with more details as my work progresses. You can also find ways to contact me there – feel free to reach out to me with questions or opportunities in Machine Learning.

If you’re a student or a teacher from a high school and interested in organising a virtual visit to CMS [more details], please do not hesitate to contact me for help too. (in Chinese) 如果你是来自中国或新加坡的初/高中老师或学生,并对组织远程虚拟访问活动来参观CERN地下实验感兴趣的话,我愿意帮忙协调组织和华语讲解 – 如有需要请联系我。关于远程虚拟参观,你可以点击这里了解更多(页面仅英文)

Looking forward to sharing more of my journey here – stay tuned!

Meet the ESRs: Serena Palazzo

Hi everybody!

My name is Serena Palazzo and I am an ESR at The University of Edinburgh
within the Innovative Training Network (INSIGHTS) program. My supervisor
is Michele Faucci Giannelli who gave me the possibility to join this network. 

I was born in the very south part of Italy, in Calabria and there I got my degrees
in Physics at the University of Calabria in Cosenza. My high school path was
focused on classical studies but, since that time, I started to be enthusiast
about physics and I decided to start my undergraduate studies in Physics.
During my bachelor path I started to be interested in particle physics and my
first collaboration in this field was within the hadronic calorimeter DREAM
community; I worked in this context for the calibration of this calorimeter.
Then, for my master degree I started working within the ATLAS collaboration.
My first project within the ATLAS collaboration was focused on the Phase-1
upgrade program of the Muon Sprectrometer of the experiment where I
contributed in testing the new MicroMegas chambers. During my master
thesis project I collaborated with researchers of the LNF laboratory (the Italian
laboratory of particle physics). This collaboration gave me the possibility to
learn a lot about the upgrade program of the ATLAS detector and to increase
my knowledges on particle physics. I changed then topic, moving from the
upgrade work to the measurements of cross sections. I started working
within the top quark working group of the ATLAS collaboration where I
contributed doing measurements of differential cross sections of the top
quark pairs. 

During my studies I won a scholarship (INFN-CERN associate simil fellow)
that gave me the possibility to spend 1 year at CERN. Spending this time at
CERN was very useful from the point of view of my career development; I had
the possibility to meet and collaborate with several reaserchers coming from
all the part of the world. 

About my work within the INSIGHT newtwork, while continuing work on top
quark measurements, I am learning new important techniques such as
Machine learning techniques that I am currently using for different projects. 
The network is without doubt a great opportunity for first years researchers to
enrich the research paths; it allows to have exachange of knowledges
between the ESRs as well as it offers the possibility to follow useful trainings
to consolidate and widen the knowledges. 

See you soon!

Meet the ESRs: Vasyl Hafych

Hello everybody!

My name is Vasyl, and it has been more than a year since I have started this amazing Ph.D. journey as an Early Stage Researcher as part of the INSIGHTS Innovative Training Network. One could say that my introduction is a year overdue, though during this year my research trajectory managed to converge, which means that I can introduce myself with more clarity.

I was raised in a small city located on the western side of Ukraine called Ivano-Frankivsk. For those who are not familiar with eastern Europe, this city is located close to the geographical center of Europe. This part of Ukraine is famous for the special role of national traditions and culture in people’s lives. Being very traditional, yet modern, this region attracts many tourists that want to visit the Carpathian mountains, go hiking, skiing or to eat the best Ukrainian food.

Ratusha is a rathaus in the downtown of the city of Ivano-Frankivsk at the city’s Market Square. [retrieved from:]

I left Ivano-Frankivsk and moved to Kyiv — the capital of Ukraine — when I turned 17, to pursue a bachelor’s degree in the department of Radiophysics, Electronics, and Computer Systems. It was my dad introducing me to electrical engineering when I was a child that first sparked my interest in physics. What I have always found so appealing about physics is a fundamental way of understanding how our world works. The creativity inherent in crafting and applying simple concepts for an explanation of complicated processes is what inspired me to study physics and to pursue a bachelor’s degree in the subject. It turned out much later that Artem and Vitaliy — which are my INSIGHTS colleagues — started studying there at the same time with me!

In addition to my interest in physics, I have always been fascinated by programming, especially in the context of physics. Many examples of this have fascinated me, such as seeing how the Boltzman or Vlasov equations can be numerically solved for the kinetic simulation of plasma, and seeing the wonders of protein folding obtained from computationally expensive Markov Chain Monte Carlo simulations. This interest never left me, only growing more profound and passionate with every new subject I took at university.

The Old Town Hall in Munich. The building was first mentioned in city records in 1310, and it was the seat of the Munich city council until 1847.

Inspired through my great love for physics and programming, my next step was completing a master’s degree in a program called Atomic Scale Modelling of Physical, Chemical, and Bimolecular Systems organized by the European Commission. My classmates and I have been living and studying in the Netherlands, Italy, and France. This was a remarkable experience that broadened all of our cultural and research horizons tremendously. I completed my final project at the European Center for Atomic and Molecular Calculations in Lausanne, where I have been working on the quantum free energy reconstruction using Langevin-guided Monte Carlo. This stimulated my current interest in Markov chain Monte Carlo methods, which — together with knowledge of all benefits of EU-funded scholarships — led me to apply for my Ph.D. at the INSIGHTS Innovative Training Network.

One can wonder what is my Ph.D. project about? Broadly speaking, it is a mixture of physics, statistics, and programming. I have the privilege of working on these topics at the Max Planck Institute for Physics in Munich, with supervision from Prof. Allen Caldwell and Dr. Oliver Schulz. Working in these conditions has allowed me to develop my research skills tremendously. From spending an overnight shift in the AWAKE control room at CERN collecting experimental data, to using hundreds of CPUs for massively parallel computing, the past year has brought a lot of new experiences to my life. I can keep writing much more about them, but instead, I encourage you to have a look at a paper on parallelization of the Markov chain Monte Carlo technique that we are planning to publish in the near future. There, you will be able to find a more detailed explanation of all the interesting things that we do.

Stay tuned!

Swans seem to be pretty common near Lake Geneva. This picture was taken during my secondment at CERN.