In a new study, it is emphasized that clear guidelines need to be established for the generation and processing of synthetic data to ensure transparency, accountability, and fairness. Synthetic data, which is created through machine learning algorithms from original real-world data, is becoming increasingly popular due to its potential to provide privacy-preserving alternatives to traditional data sources.
Synthetic data is generated by algorithmic models called synthetic data generators, such as Generative Adversarial Networks or Bayesian networks. Unlike real-world data, synthetic data is created artificially and has various applications, especially in cases where sharing the actual data is not possible due to sensitivity, scarcity, or quality issues.
The study points out that existing data protection laws, like the GDPR, only apply to personal data and are not well-equipped to regulate the processing of all types of synthetic data. While fully synthetic datasets are generally exempt from GDPR rules, there are exceptions when there is a risk of re-identification, creating legal uncertainty and practical challenges for data processing.
Professor Ana Beduschi, from the University of Exeter, emphasizes the importance of establishing clear procedures for holding accountable those responsible for generating and processing synthetic data. It is crucial to ensure that synthetic data is not used in ways that could have negative effects on individuals or society, such as perpetuating biases or creating new ones.
Professor Beduschi calls for the establishment of clear guidelines for all types of synthetic data that prioritize transparency, accountability, and fairness. With the rise of generative AI and advanced language models like DALL-E 3 and GPT-4, which can generate synthetic data, there is a need to mitigate potential harm and encourage responsible innovation through adherence to these principles.
Clear guidelines for the generation and processing of synthetic data are essential to ensure transparency, accountability, and fairness. As the use of synthetic data continues to grow, it is important to address regulatory challenges, establish clear procedures, and prioritize responsible innovation to prevent adverse effects on individuals and society. By following these guidelines, the dissemination of misleading information can be minimized, and the potential benefits of synthetic data can be maximized.