What happened?
In an attempt to address ongoing regulatory uncertainty about how the UK General Data Protection Regulation (UK GDPR) and UK Data Protection Act 2018 apply to the development and use of generative artificial intelligence (AI), the UK Information Commissioner’s Office (ICO) has released its initial response to its five-part consultation series on the topic which it conducted in 2024. The series covered the following areas:
- The legal basis for web-scraping to train generative AI models.
- Purpose limitation – i.e., having a specified, explicit and legitimate purpose – throughout the generative AI life cycle.
- Accuracy of training data and model outputs.
- Respecting individual rights in the training and fine-tuning of generative AI models.
- Allocating controllership across the generative AI supply chain.
What has changed?
In its initial response, the ICO updated its seemingly highly permissive position on ‘legitimate interests’ as a legal basis for web-scraping under the UK GDPR. In previous draft guidance, the ICO’s position was that data controllers could rely on legitimate interests as a legal basis for training AI models on web-scraped data, provided that they could pass the ‘three-part’ test by demonstrating:
- The purpose of the processing is legitimate.
- The processing is necessary for the purpose (the ‘necessity test’).
- The individual’s interests do not override the developer’s interests being pursued (the ‘balancing test’).
Following consultation on the initial draft guidance, the ICO has refined its position somewhat – and highlighted some specific considerations which organisations need to bear in mind if they want to rely on legitimate interests for training on web-scraped data.
Increase transparency
- The ICO says that web-scraping often occurs without people being aware of it (so-called invisible processing).
- The ICO feels that this type of ‘invisible processing’ creates challenges for the purposes of the balancing test – primarily because where people are unaware that their data is being processed, they are unable to exercise their rights under the UK GDPR.
- According to the ICO’s updated position, it now expects generative AI developers to significantly improve their approach to transparency.
Making sure scraping is necessary
- The ICO questioned whether AI developers really need to use web-scraping to collect training data – i.e., whether they can satisfy the necessity test – when alternative methods of data collection exist.
- For example, the ICO seems to believe that developers could effectively train models by licensing personal data from organisations that specialise in collecting such data in a transparent way and in accordance with the UK GDPR.
- It is not clear on what basis the ICO reached the conclusion that the relatively limited personal data available via the nascent training data licensing market would be sufficient for the purposes of training latest-generation models. It remains to be seen whether AI model developers would share this view …
Recommendations
Given this emerging line of thinking from the ICO, AI model developers relying on legitimate interests as their legal basis for web-scraping under the UK GDPR must assess (and document such assessment):
- Whether they really need to use web-scraping to collect personal data for development of their AI model, or whether an alternative approach (such as licensing personal data) could realistically satisfy their needs.
- How they intend to address their transparency obligations under the UK GDPR – taking appropriate steps to increase transparency and controls for data subjects will be a key part of getting this right.
What happens next?
The ICO’s initial response informs its current core guidance on AI and data protection. This guidance is expected to be formally updated when the UK’s new UK GDPR reform legislation – the Data (Use and Access) Bill – is passed into law. This is expected to happen around Easter 2025.
Authors
Leo Spicer-Phelps, Associate, London
Morgan McCormack, Associate, London