Subjective probability and data-driven decision making

We finished our last post by observing that, as human beings, we are not that good at evaluating uncertainties and this can heavily affect the outcome of our decisions, both in our work and in our private life. It does not help the fact that the most appropriate mathematical concepts to quantify uncertainties are too often presented through arcane formulas that can hardly be understood outside trivial didactical examples (dice throws, coin flips, card draws, etc.), and they seem unsuitable to describe situations as complex as the real business phenomena.

The key idea to overcome these problems in business context, based on PangeaF experience, is to introduce the concept of subjective probability. That is, to quantify the probability of an event through the degree of belief that it would occur, based on the available information.

This image has an empty alt attribute; its file name is thomas_bayes.gif — Thomas Bayes
(image from wikipedia.org)

This latter concept is definitely a crucial point towards bringing probability in business applications, since it allows to define probabilities for events which have never been observed before (e.g. the launch of a new product, the expansion towards a new market, etc.) and to include different degrees of information into the evaluations. Such approach also gives, through Bayes’ rule, an easy way to update each evaluation in presence of new sources of info.
To fix the idea you could ask two different persons to evaluate how probable is a doubling of the values of the shares of a company: typically, they would answer with a very small probability, because doubling the value is a macroscopic increase. However, if one of the two persons has some insider contact who reveals that the company is going to release a new revolutionary product, then this person would assign a higher probability to the hypothetical doubling (typically still small, but not as small as before). Neither of the interviewed would be wrong in their evaluations: it is just that, with different levels of knowledge about the event of interest, different quantifications follow.
Moreover, subjective does not mean arbitrary: while subjects with different states of information can evaluate the probability of the same event differently, they must provide rational and factual assessments, by relying on probability rules to evaluate multiple related events playing a role in the same problem.

By using subjective probabilities and Bayesian networks to deal with complex connections among the measured quantities, it is possible:

to perform proper inference processes, unravelling the cause-effect relationship hidden in data in order to find the most probable reasons behind the observed events, even in the presence of complex scenarios and multiple competing causes;
to integrate the experts’ knowledge about a given problem, through appropriate relationship among elements in a descriptive model and suitable probability distributions associated to different situations;
to obtain true probabilities from the computations, and not some hard-to-interpret estimate, informing us of how much we have to weigh the occurrence of each event, given the information we received.

These aspects are crucial in all decision making processes and they allow the agents to make their best assessment, through exploitation of all available information (i.e. data). And they come with great flexibility, since they can be applied to a variety of statistical distributions and of business sectors.

It is important to stress that moving towards data-driven decisions does not mean to make such decisions automated or to remove from them the human factor. Algorithms shall mostly be exploited in what they are good for: to integrate consistently the available information, without biases interfering with the quantitative evaluation.
Then, the results provided by such algorithms have to be combined, by human decision makers, with the external factors that can hardly be modeled into algorithms (no matter what some vendors claim): what is the risk level that a company can accept in the specific moment a decision has to be taken? what is the impact on stakeholders, in terms of long-term scenarios and company reputation? what are the ethical implication of one decision versus another one?

Data-driven decisions, at least as PangeaF sees them, shall be the moment to bring together the best that domain experts, data scientists and human decision makers can offer: experts can help spotting the key meaningful relations among measured quantities in a business process; data scientists can turn such relations and what historical data say into a coherent and effective model, trading off advanced solutions with actual performance achievements; human decision makers can take the results of the models and use them to take more effective choices, optimizing resources or focusing efforts on the important parts of the process.

In the next posts, we will present some of the exciting experiences PangeaF developed by building bridges between real world problems and advanced machine learning techniques.

Stay tuned!