European Researchers’ Night @ Rome

On Friday, September 27th, several universities and research centers across Europe hosted outreach activities for the European Researchers’ Night. In Rome, nearby the buildings of the Math and Geology department of the University of Roma Tre, one of the events that welcomed visitors was organized by Pangea Formazione: “It’s raining cats and dogs“.

To visitors of the stand, mainly targeted at kids, it was given the chance to get in touch with some of the core ideas that advanced machine learning solutions are based upon, through a pair of board games.

Stand & posters by Pangea Formazione.

One of the activities dealt with the basics of convolutional neural networks and image classification via deep learning. Kids were divided in teams and assigned one (sketchy) drawings each, with the goal to help the other team to guess their image through a series of subsequent elements. At each round, the host was presenting a new ‘feature’ (a particular curve line, a corner, or some other shape) that members of each team had to search inside their images. If such a feature was present, they shall draw it on a thin sheet of paper. Through addition of multiple features, a more and more complete picture was composed and it was easier for the opponent team to guess the subject of the drawings, but scoring progressively fewer points.

This procedure mimics quite closely the inner functionality of a trained CNN classifier that first learns a series of abstract patterns (through the different filters that get trained in the sequence of layers of the neural network), which in our game were represented by the lines and patterns proposed by the host of the game, and then searches for them in any new picture that is fed for classification.

Explaining the rules for the deep learning activity.

The second activity consisted of a card game about updating probability estimates, based on different levels of information. During subsequent rounds of a game, one or two players want to guess the current presence of a specific weather conditions (rainy, cloudy, sunny, windy, etc.), while being unable to directly obtain this information e.g. because they cannot just look outside the window or use a weather forecast app.

Hence, they can decide to guess blindly about it (having a certain low probability to guess correctly) or to play additional information cards in order to gain further evidences in support to their guess. Examples of information cards are: the current season, which can increase chance to guess right the weather for some conditions and decrease it for other conditions, the city in which the player’s character currently is (e.g. Palermo, Rome, Milan, etc.), or the fact that the player’s character is a person who has spent part of her life in a chosen city, which increases the chance of a correct guess if the same city card has been played as well.

Updating probability based on new information.

Players therefore take turns either by trying to guess their answer, or by adding an information card to their advantage (if such cards turn odds in their favor) or by putting obstacles on the opponent’s guess (if the cards give negative points for the weather condition that the other player is trying to guess).

This mechanism about accounting for every available information before evaluating the probability of the event shall remind you about the description of the subjective probability given in a previous blog post. Along the same lines, the game helped to convey the idea that in real situations we must be flexible enough to update our belief in presence of new evidences.

Several components of the Pangea Formazione work team participated to the event, helping visitors to grasp the rules of the games and illustrating the underlying principles that really made the games close to the actual machine learning algorithms we often see in action in everyday life. When we see a mobile phone capable to recognize our faces as a security mechanism, or when translation apps can identify and translate texts that the camera focus on, we seldom have the knowledge needed to understand how such complex tasks are accomplished. Even if a casual observer could believe some magic is involved, in fact it is just the (complex) combination of simpler elements, whose understanding luckily does not need particular studies.

Kids (and their parents and grandparents as well, in fact) were very curious and wanted to have a glimpse of the actual ideas that lie behind common applications of machine learning.

Searching for features in a sketchy image.

At the same time, the ludic aspects of the games were really appreciated by the kids who stopped by our stand, spanning ages from 6 to 12 years, and they really wanted to remain as long as possible with us.

For adults, a series of posters summarized some of the different technical aspects that are involved both in CNN algorithms for image classification and in defining a flexible definition of probability, like the subjective one, that can go beyond the simple examples with coins and dice we learn at school.

The only downside of an otherwise great evening was the fact that ‘our’ ESR Daria could not attend the events, because she is currently spending her time at University of Edinburgh for her secondment period. But we will welcome her in the outreach group next year for sure!

Subjective probability and data-driven decision making

We finished our last post by observing that, as human beings, we are not that good at evaluating uncertainties and this can heavily affect the outcome of our decisions, both in our work and in our private life. It does not help the fact that the most appropriate mathematical concepts to quantify uncertainties are too often presented through arcane formulas that can hardly be understood outside trivial didactical examples (dice throws, coin flips, card draws, etc.), and they seem unsuitable to describe situations as complex as the real business phenomena.

The key idea to overcome these problems in business context, based on PangeaF experience, is to introduce the concept of subjective probability. That is, to quantify the probability of an event through the degree of belief that it would occur, based on the available information.

This image has an empty alt attribute; its file name is thomas_bayes.gif
Thomas Bayes
(image from wikipedia.org)

This latter concept is definitely a crucial point towards bringing probability in business applications, since it allows to define probabilities for events which have never been observed before (e.g. the launch of a new product, the expansion towards a new market, etc.) and to include different degrees of information into the evaluations. Such approach also gives, through Bayes’ rule, an easy way to update each evaluation in presence of new sources of info.
To fix the idea you could ask two different persons to evaluate how probable is a doubling of the values of the shares of a company: typically, they would answer with a very small probability, because doubling the value is a macroscopic increase. However, if one of the two persons has some insider contact who reveals that the company is going to release a new revolutionary product, then this person would assign a higher probability to the hypothetical doubling (typically still small, but not as small as before). Neither of the interviewed would be wrong in their evaluations: it is just that, with different levels of knowledge about the event of interest, different quantifications follow.
Moreover, subjective does not mean arbitrary: while subjects with different states of information can evaluate the probability of the same event differently, they must provide rational and factual assessments, by relying on probability rules to evaluate multiple related events playing a role in the same problem.

By using subjective probabilities and Bayesian networks to deal with complex connections among the measured quantities, it is possible: 

  • to perform proper inference processes, unravelling the cause-effect relationship hidden in data in order to find the most probable reasons behind the observed events, even in the presence of complex scenarios and multiple competing causes;
  • to integrate the experts’ knowledge about a given problem, through appropriate relationship among elements in a descriptive model and suitable probability distributions associated to different situations;
  • to obtain true probabilities from the computations, and not some hard-to-interpret estimate, informing us of how much we have to weigh the occurrence of each event, given the information we received.

These aspects are crucial in all decision making processes and they allow the agents to make their best assessment, through exploitation of all available information (i.e. data). And they come with great flexibility, since they can be applied to a variety of statistical distributions and of business sectors.

It is important to stress that moving towards data-driven decisions does not mean to make such decisions automated or to remove from them the human factor. Algorithms shall mostly be exploited in what they are good for: to integrate consistently the available information, without biases interfering with the quantitative evaluation.
Then, the results provided by such algorithms have to be combined, by human decision makers, with the external factors that can hardly be modeled into algorithms (no matter what some vendors claim): what is the risk level that a company can accept in the specific moment a decision has to be taken? what is the impact on stakeholders, in terms of long-term scenarios and company reputation? what are the ethical implication of one decision versus another one?

Data-driven decisions, at least as PangeaF sees them, shall be the moment to bring together the best that domain experts, data scientists and human decision makers can offer: experts can help spotting the key meaningful relations among measured quantities in a business process; data scientists can turn such relations and what historical data say into a coherent and effective model, trading off advanced solutions with actual performance achievements; human decision makers can take the results of the models and use them to take more effective choices, optimizing resources or focusing efforts on the important parts of the process.

In the next posts, we will present some of the exciting experiences PangeaF developed by building bridges between real world problems and advanced machine learning techniques.

Stay tuned!

Data-driven decision making

In my first post about Pangea Formazione (PangeaF in the following), I have mentioned a few times that our company has set its mission as to help other companies to make good use of the data they own, in order to move towards data-driven decision process.

Is this really something useful and/or needed? In fact, it is. 

Since the late 70s there have been plenty of studies which revealed the huge impact that bias and heuristics can have on our quantitative decisions, not because of lack of expertise or just ignorance, but due to the actual evolution process of the human brain through centuries. A typical example is the so called “framing effect”, studied by Kahneman and Tversky in the early 80s [1].

Daniel Kahneman (picture from: wikipedia.org)

Two separate groups of participants are presented with a different scenario, related to the outbreak of an Asian epidemic who would affect six thousand people. Participants are asked to choose among two possible courses of actions, based on their rational preferences. The first group was presented with the following choices:

  • with plan A, 2000 persons will be saved;
  • with plan B, we have 1/3 of probability to save 6000 persons (everybody), and 2/3 of probability that no people are saved.

The second group was presented with the following choices:

  • with plan C, 4000 persons will die;
  • with plan D, we have 1/3 of probability that no people die, and 2/3 of probability that 6000 persons (everybody) die.

PLAN A
2000 saved

PLAN B
A 33% chance of saving all 6000 people,
66% possibility of saving no one.
PLAN C
4000 dead

PLAN D
A 33% chance that no people will die,
66% possibility that all 6000 will die.

What has been observed both in the original experiment and in many replications is that in the first case around 70% of the participants prefer plan A, while in the second case almost 80% of the participants prefer plan D. But plan A is the same as plan C, and plan B is the same of plan D! The only change is in the frame which is used to present the decision making problem, that affects the choice much more than any rational decision making theory would allow. [*]
The problem is that the description of the experiment in the two settings triggers different areas of our brain: when presenting the choice in terms of gains (first group) mechanisms of risk-aversion take precedence, while when presenting the choice in terms of losses (second group) we are much more propense to choose a risky option because of loss-aversion. 

Other examples can be found in Kahneman’s book “Thinking, fast and slow” [2], that the famous psychologist and 2002 Nobel laureate for Economic Sciences wrote to present the results of decades of experiments on the psychology of judgment and decision-making, as well as behavioral economics. 

And this is not just an example taken from some psychological study to “push our agenda”, with no true impact on the business world: it is something that is continuously seen in action. A 20+ years monitoring research on public, private and no-profit companies throughout USA, Europe and Canada [3] has shown that typically 50% of the business decisions ends up in failure, 33% of all decisions made are never implemented, and half of the decisions which get implemented are discontinued after 2 years. One of the causes of such (depressing) trend is the fact that in two cases out of three, choices are taken based either on failure-prone methods or on fads that are popular but not based on actual evidences.
In several cases it has also been shown that failure-prone methods are still followed because of difficulties to deal correctly with uncertainties that are intrinsic with decision making processes in strategic and business contexts.

There exist several types of uncertainties which can affect a decision making process: factors that there is no time or money to monitor effectively, factors that our outside our control capabilities like competitors’ moves or other stakeholders’ decisions, factors that are truly random and unexpected and that can lead the same decision towards very different results. Uncertainty assessment is a critical element in such scenario and we always find surprising to see how often it is underestimated: typically, it is only considered when assessing the global risk level of a productive process or “a posteriori” when a decision has undesired outcomes.

The described difficulties in evaluating quantitatively uncertainties are absolutely in line with the psychological researches we mentioned above, but there seems to be an additional inertia towards adoption of software-based tools that could provide with more coherent and consistent probability evaluations in different scenarios. 

What can be done to address such problems? How can we improve our skills in dealing with uncertainties? We will provide a possible answer in the next post, which shall complete the overview of the main points of the approach followed by PangeaF when implementing software solutions to support decision making processes.

Stay tuned!

[*] On a side note, you might want to notice that the expected value of each plan is always the same, so that assuming human choices follow a model based on perfect information, and defining rationality along the lines of von Neumann & Morgenstern’s game theory, we shall conclude that any “rational” decision maker would be indifferent among the four possible plans.

Bibliography

[1] A. Tversky & D. Kahneman, The Framing of decisions and the psychology of choice, Science. 211 (4481), 453?458 (1981). doi:10.1126/science.7455683.

[2] D. Kahneman. Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, 2011. ISBN: 0374533555

[3] P. C. Nutt. Why Decisions Fail. Berrett-Koehler Publishers, Oakland, California, 2002. ISBN: 1576751503

Pangea Formazione INSIGHTS @ Rome

Hi all, 
here is a first short overview of the only private beneficiary of the INSIGHTS ITN: Pangea Formazione, a SME with base in Rome (PangeaF in the following).

Pangea Formazione S.r.l.(Rome, Italy)
founded in 2009
innovative SME    
research institute recognized by  
certified UNI EN ISO 9001:2015 (development of informatic tools for predictive models)
team formed by around 15 persons, mostly with a background and a Ph.D. in quantitative sciences (Physics, Mathematics, Engineering, etc.)

PangeaF was created in 2009, on initiative of Paolo Agnoli and Francesco Piccolo, its charter members, to the purpose of promoting and spreading the importance of exploiting available data as a support to business decision making.

At the beginning, training courses were PangeaF’s main trademark, to encourage the use of appropriate statistical tools to deal with the uncertainties that are naturally embedded in business decisions. To this aim, PangeaF created a network of connections and collaborations with several researchers at universities in Rome, Milan, Venice, Naples and Pavia. During this period, managers from private companies and public institutions who attended such courses asked PangeaF for a practical application to their own real business problems of the techniques that were presented. Those studies shortly evolved in activities of modeling algorithms and software development which nowadays constitutes the core business of the company.

As of 2019, PangeaF has several active projects of software development and management consulting with different companies such as TIM, Poste Italiane, DHL, MBDA, ENEL, etc., while retaining important training activities on both data-driven decision making (higher level courses for top and middle managers) and machine learning techniques through open source languages like R, Python and Scala (technical courses for data scientists and IT personnel). Helping other companies to use their own data, as well as open data sources, to improve their decision making processes is still our main mission and both software development and training activity are the means to reach it effectively.

A third way to accomplish our mission is devoting efforts to develop brand new algorithmic and software solutions for problems which we believe will be soon relevant for business processes. This led our company to spend time and resources on advanced R&D activities, including development of optimal control strategies for swarms of drones with a common objective and of deep learning techniques to analyze video data sources to detect and classify objects of interest. While performing such researches, we created new connections with researchers from the academic world where similar problems are studied and we realized there were plenty of opportunities for training young researchers, with an interest in applied data analysis techniques, on such topics. This was the link that brought PangeaF in the orbit of INSIGHTS, while it was still a preliminary proposal for an EC training network, and made our company the natural choice as responsible for the work package on “Statistics for Society”. 

Of course, we are very happy for the opportunity to join the ITN: thanks to INSIGHTS ITN, we now have a new member of our team, Daria Morozova. Our ESR has moved from Moscow to Rome on September 2018 to work on her INSIGHTS project: developing a common framework to integrate video and sound data analysis techniques towards different goals. We foresee that such a framework will be useful for applications like smart mobility and coordinated drone control, but there is plenty of other fields that could benefit of a similar tool. Once all the building blocks will be in place, and it will become easier to stack together pre-trained deep learning networks with customized ones and with Bayesian networks, we will be able to start playing with it in different contexts and see how everything plays out.

Daria Morozova
  • ESR
  • Specialist (5 yrs) in Applied Mathematics and Cybernetics at MSU (Moscow, Russia)
  • Master in Economics at HSE University (Moscow, Russia)
  • At PangeaF since 09/2018.
Fabio S. Priuli
  • Supervisor and PI
  • Ph.D. in mathematics at SISSA-ISAS (Trieste, Italy)
  • 8 years as post-doc in applied math
  • At PangeaF since 04/2015 as data scientist, project manager and head of training activities.
Sara Borroni
  • Co-Supervisor
  • Ph.D. in physics at University of Rome “Sapienza” (Rome, Italy)
  • 4 years as post-doc
  • 4 yeas as data scientist and project manager
  • At PangeaF since 03/2015.

For the moment, that’s all.
More posts will follow soon presenting some of the exciting machine learning applications we developed for industry process management, and some of the latest statistical techniques that we experienced as very useful in such context.