suomi.fi
Go directly to contents.
Good practices for Service Developers

Using AI responsibly

Biases have to be identified by humans

What is data bias?

Data biasis something that often gets brought up in connection with information systems.

The concept of bias has various definitions. The ones related to accountable development include at least:

  • a systematic error in sampling or testing due to one result or answer being selected or intensified over others
  • a prejudice that either favours or opposes a particular matter, person or group compared to another in a way that is generally considered unfair.

The first is a statistical and computational event that is caused unintentionally or intentionally for testing purposes in the research and development process of a system. It can also be caused on purpose with malicious intent.

The latter definition, on the other hand, refers to a property of a human or group, reflected in data produced or used by a system, either unintentionally or intentionally.

Updated: 9/11/2023

What are the consequences of bias?

Biased training or production data inevitably produces biased outputs. Impacts may range from unusable results to direct harm: discrimination, violations of fundamental rights or other adverse consequences that are difficult to compensate for.

Detecting and correcting biases in data on time has long been a notable topic of research for both public and private actors.

Updated: 9/11/2023

Why are biases generated?

Harmful biases are typically caused by one of the following:

  • Historical data reflects gendered labour markets or attitudes related to minority groups. Data always reflects the past. Using data with long-term biases in modern systems continues the lifecycle of those same biases in society.
  • Too little data can in itself reflect the burden of history. There is simply much more data collected from Western populations, especially the white majority, than from other population groups and regions. When data is easily accessible, it gets used more.
  • Collection and selection biases are caused by incompetent collection of data and errors made in the process. This typically results in one-sided data or material that is poorly suited for the objective set for the system.
  • Design errors in the training stage of an algorithmic model may cause bias in the process and the interpretation of data even if the training data itself is of high quality.
Updated: 9/11/2023

How can we prevent the adverse effects of bias?

There has been an effort to develop universal applications and services for the mechanical, automatic detection of bias in data and the prevention of harm – also known as de-biasing – but so far there is no such solution on the market. The reason for this is that each algorithmic system is tied to its individual use case.

In practice, the detection, management and prevention of harm still has to be done as human work in each system.

Some practical rules of thumb for managing bias are presented in Andrea Gao’s article Data Bias Identification and Mitigation: Methods and PracticeOpens in a new window..

Updated: 9/11/2023

Are you satisfied with the content on this page?

Checklist