Good practices for Service Developers

Using AI responsibly

Well-working AI requires high-quality data

Data collection is a multi-stage process

The operation of AI is based on data. For this reason, the properties of data require more and more consideration from an ethical perspective. Data is not just static material but a comprehensive process that includes

setting objectives for the system
identifying datasets relevant to the objective
collecting training datasets: methods and management
analysing the quality of the datasets
cleaning up and curating data for mechanical processing
generating a model and testing it
processing production data
continuous monitoring; updating the model if necessary.

The "direction" of AI ethics questions has been towards outputs and not inputs; we should focus more how the data is produced and processed.

– Researcher William Isaac, Google DeepMind

Updated: 9/11/2023

Quality requirements are highlighted when data is shared

In a society based on a data economy where data is shared and used by various authorities and even the private sector, it is not enough for each organisation to have internally consistent data procedures.

Different actors may have different ways of storing and updating their datasets, but structural and semantic differences in the data makes its sensible and secure shared use difficult.

Data itself does not contain any solutions or meaning; those qualities are not generated until the data is used. Since each use case is unique, the value and significance of data interact with end users’ actions. So, it is necessary to have communication and feedback channels between the producers, owners and users of data.

Updated: 9/11/2023

AI diversifies opportunities but demands more from oversight

AI technologies have enabled two significant changes in the utilisation of data:

data originating from several sources can be analysed simultaneously and crosswise
loosely structured or even completely unstructured data can be analysed and used.

In fact, the question of the quality of data used in AI systems is no simple matter. Traditional quality factors, such as up-to-datedness and internal integrity, are still relevant, but they are now assessed across several datasets. Similarly, the integrity, security and compliance of the data has to be assessed in a more multidimensional manner.

A partial solution to strengthening the management of complex and varied data could be the systematic use, classification and indexing of metadata. It helps to keep non-structured materials better “visible”, and it also collects material from different sources into a cohesive meaning space.

As our society becomes increasingly led by the data economy, the emphasis on work related to data and its nature will change. Producing the training data required by AI systems is not a straightforward process. Responsible and appropriate training data requires

finding, evaluating and cleaning up suitable datasets
identifying and managing potentially harmful biases
classifying and annotating the data.

After production, an AI model has to be fine-tuned and validated to ensure the accuracy, quality and completeness of the data. This may include a large amount of mathematical processing.

We recommend getting a concrete understanding of the time and competence required by data work; one way is to do it yourself. Citizens and organisations may still imagine that AI runs like a charm almost by itself, as long as it has been given a suitable dose of data. Data work is still human work whose complexity and resource requirements must not be underestimated.

Updated: 9/11/2023

Are you satisfied with the content on this page?

Checklist