The usagof AI are becoming more widespread in our society. There are many discussions about the ethical use of this technology, especially around facial recognition and deep fake.

Beyond these aspects, there are pitfalls in the use of AI that can backfire on an attempt to use it benevolently. One of the of the main of AI for automated decision making is its ability to be objective, to be free from the biases inherent to human decision makers. The technical evolutions as well as the popularization of AI solutions, have allowed to identify several mechanisms by which biases can appear in these systems.

Understanding bias as a vital issue

Understanding these biases is essential for data scientists who build these systems. However, business experts also need to be aware of these issues.

In fact, the feedback from the deployment of the solutions solutions recognize the essential role of the business for the success of the project. Business experts are the guarantors ofs not only to verify that the results of the algorithm are consistent but theythey must also ensure that the "reasons pfor which a result was given are consistent with the experiment.

In the case of a credit granting algorithm, it would be an aberration if a customer hadt a good score thanks to numerous unpaid invoices. Through its expertise, the business can anticipate and identify the biases that will be presented, and contribute to making the systems more more reliable, or even fairer.

Just because you don't use a certain data doesn't mean it is not taken into account

When we talk about making systems fairer, we usually think of generally not to integrate certain variables directly into the model. However, the notion that we do not wish to integrate in our model can appear in more or less subtle forms and contribute to bias a decision.

Conceptually, this phenomenon is not surprising. An AI solution by definition seeks to predict a target from other data. It is therefore not surprising that some data provides us with more information than it seems.

Let's put ourselves in the situation of a car insurermobile that seeks to offer a personalized rate to its customers. Until 2013, the insurer could charge female mens chexpensive than men because of their lower claims experience. Since the European law on non-discrimination between the sexes, the insurer must remove the gender from its model.

What would happen if the insurer included the person's first name in its model? The set of "Sabrina" would have a lower price than the "Jean"s, alle all other things being equal. The only winners would be the men called Camille.

Beyond this caricatured aspect, integrating the presence of part-time work into the person's activity may tend to target women, since they are the vast majority of part-time workers.

In general, if there is a very large disparity between two populations on a given variable, incorporating the variable or directly the population category are the same.

The data does not always reflect what we think

AI works on the principle of "Bullshit In - Bullshit Out", or in less outrageous language: one cannot expect a relevant system if the data it is fed with are of poor quality.

Thus, one can hardly expectdThus, one can hardly expect an unbiased system if it is based on biased data. The question we can ask ourselves in this case is: are we able to identify a bias in the input data? The story us has shown us that no.

The case of Amazon and its recruitment AI

Let's take a simplified case, based on the failure of a systeme automatic recruitment system from Amazon. The company uses the resumes received from applicants and tries to predict those with the highest probability of passing the interviews. The company uses the history of applications receivedeThe company uses the history of applications received, as well as the choices made by its sole recruiter.

The company has always been happy with this recruiter's choices, so they thought it was a good idea to train their role model to follow his lead. What the company didn't consider is that this recruiter is a fan of the local soccer team, and deeply opposed to the neighboring team. So he strictly rejects all applications from the neighboring town, but is perfectly capable of identifying quality profiles among the other applications.

During the creation of the model, it was proposed to integrate the address of the candidates, to estimate the travel time to the company. GrâThanks to this data, the model could learn to reproduce the behavior of the recruiter to discriminate the candidates from the neighboring city. Moreover, we can assume that the model has also learned to select good candidates among the others.

Thus, we can obtain a model that is efficient from a business point of view, but which remains very biased against a population. The model has thus reproduced the bias present in its training data.

Facial recognition at the center of bias

Another famous example is the bias in facial recognition between men and women, and between people of color and white people. Most image banks contain a larger representation of white men. As a resultBecause of this, an algorithm has both more white male data to allow it to learn better, but it also has an incentive to be better on this population.

Let's look at this second point with a naive example. Let's take for example a database containing 90 images of white people and 10 images of people of color. Let's assume at first that no matter what the skin color of the person is, the algorithm is 50% wrong. If we improve the algorithm to perfectly identify people of color, the overall performance will be only 55%.

À Conversely, if we train it to perfectly identify white people, the overall accuracy will be 95%. Thus, if the algorithm has to make compromises to improve the global performance, these compromises will inevitably favor the most represented category. Representation biases can then induce performance biases in the solution.

Reality may not always be good to model

At-beyond the selection bias of the data used for modeling, it is possible to obtain biases inherent to the phenomenon being studied. Google has experienced this with their invention of the Word2Vect technique for natural language analysis.

This technique, published in 2013, allows to assign a position to a word in an abstract space. One can then use this representation of the different words of a text for different tasks such as sentiment analysis or automatic text translation.

A great discovery of this technique was the "king - man + woman = queen". This means that if we replace the "man" component with the "woman" component in the representation of the word "roi"we get the representation of the word "queen". This fascinating result means that we can, to some extent, apply algebraic relations to the words of our language. The representation has thus identified fundamental aspects of language.

Following this first discovery, a second one quickly followed: "doctor - man + woman = nurse". In other words, the representation understood that the female equivalent of a doctor was a nurse. Through the study of texts, the model understood that, in our society, the medical world was made up of male doctors and female nurses. Unlike the previous case, we can reject the idea that the model is biased because of a corpus bias.

This second result represents a historical and, to some extent, contemporary reality. We can therefore conclude that the model works "correctly" and that it models our language well.

This bias is partly similar to the previous case. It is not only the selection of data that is biasede, but the whole field of study. However, the question to be asked remains the same as the one to be asked by the different actors working around the AI solution. Should my model should reproduce past behaviors as closely as possible or should I orient it to correct certain past errors?

À use, the bubble tightens

Another type of bias can appear when the action taken following the model's recommendation impacts the model's next decision. These biases regularly make the headlines under the term social bubbles. In practice, following successively the recommendations of the AI can lead to undesirable drifts.

"Tell me what you are looking at and I will tell you who you are

In the case of recommendation systems, we have all observed that when you start to get interested in a type of content on platforms such as Netflix or YouTube, the recommendations adapt and very quickly propose us only content similar to this new taste.

One then gradually locks oneself into a uniform bubble of content. Extrapolating to more critical topics, this former YouTube employee explains how the objectives of YouTube's goals cause the algorithm to be biased in favor of conspiratorial content.

Predictive policing

A lesser known situation is present in the ÉUnited States. Some cities use automated systems to distribute police forces to different neighborhoods. In this articleethe author demonstrates that these systems become biased over time.

The more the police follow the recommendations and observe crimes in a neighborhood, the more the algorithm will tend to overestimate the dangerousness of the neighborhood and reinforce the police presence. This creates a self-maintaining loop that biases the algorithm even more.

The notion of bias in an AI solution is therefore not static. A new unbiased solution can, in the course of time, be naturally biased according to its definition and the use made of it.

What does this mean?

The image of the magic black box of artificial intelligence continues to stick to it, even though it is a very important part of our society.êeven though many players are beginning to ask questions about how it works, particularly for ethical reasons. Biases are inherent to this kind of system and can have ethical and business consequences.

Awareness of these biases is an essential first step for business experts, so that they can support data scientists in the creation of relevant and accurate models.