AI diversifies opportunities but demands more from oversight
AI technologies have enabled two significant changes in the utilisation of data:
- data originating from several sources can be analysed simultaneously and crosswise
- loosely structured or even completely unstructured data can be analysed and used.
In fact, the question of the quality of data used in AI systems is no simple matter. Traditional quality factors, such as up-to-datedness and internal integrity, are still relevant, but they are now assessed across several datasets. Similarly, the integrity, security and compliance of the data has to be assessed in a more multidimensional manner.
A partial solution to strengthening the management of complex and varied data could be the systematic use, classification and indexing of metadata. It helps to keep non-structured materials better “visible”, and it also collects material from different sources into a cohesive meaning space.
As our society becomes increasingly led by the data economy, the emphasis on work related to data and its nature will change. Producing the training data required by AI systems is not a straightforward process. Responsible and appropriate training data requires
- finding, evaluating and cleaning up suitable datasets
- identifying and managing potentially harmful biases
- classifying and annotating the data.
After production, an AI model has to be fine-tuned and validated to ensure the accuracy, quality and completeness of the data. This may include a large amount of mathematical processing.
We recommend getting a concrete understanding of the time and competence required by data work; one way is to do it yourself. Citizens and organisations may still imagine that AI runs like a charm almost by itself, as long as it has been given a suitable dose of data. Data work is still human work whose complexity and resource requirements must not be underestimated.