The EU's Digital Omnibus proposal includes significant changes to the GDPR as we discuss here In this article, we focus on the changes particularly relevant to AI development and data-intensive research.
A quieter but fundamental shift: what counts as “personal data”
One of the most structurally important GDPR changes proposed in the Digital Omnibus is also one of the least headline-grabbing: a clarification of the definition of personal data in Article 4 GDPR.
The proposal codifies a holder-relative approach to identifiability. Data will fall outside the scope of the GDPR for a given holder where that holder cannot identify an individual, taking into account the means reasonably likely to be used by that holder. This reflects the CJEU’s reasoning in EDPS v SRB, but the Digital Omnibus goes further by embedding that logic directly into the GDPR text which will confirm that:
- the same dataset may be personal data for one party but not for another, depending on access to auxiliary information, resources, time and technical capability
- pseudonymised data may fall outside the GDPR in the hands of one party, even though it would remain personal data for another where they could realistically re-identify individuals
- compliance analysis becomes more contextual and evidence-based, rather than abstract or worst-case driven.
The proposal also foresees further clarification through implementing measures, likely with EDPB involvement. This is potentially significant as much will turn on how narrowly or broadly “means reasonably likely to be used” is interpreted in practice.
From a practical perspective, this clarification better reflects the protective character of the GDPR, which has always been premised on the idea that whether or not information constitutes personal data should be determined on a case-by-case basis, taking into account context, capabilities and realistic risk. On that basis, it is also appropriate that only those actors for whom information actually constitutes personal data should be subject to GDPR obligations.
At the same time, this holder-relative approach raises new questions for data sharing and data transfers. Situations will increasingly arise where the transferring party treats a dataset as personal data subject to the GDPR, while the recipient – applying the same Article 4 test – does not. In those cases, contractual mechanisms (DPAs, SCC-style clauses or GDPR-derived obligations) may still impose GDPR-style duties on the recipient even though the data does not constitute personal data for that party as a matter of law. This type of asymmetry has so far been most familiar in third-country transfer scenarios, but the Digital Omnibus suggests it may now become more common within EU and hybrid data ecosystems, particularly in complex AI development and deployment chains.
From an AI and research perspective, this clarification has real bite: it creates more room to work with large, structured datasets where re-identification is theoretically possible but practically implausible for the actor in question – provided that the position can be justified and documented.
Scientific research: clearer status, stronger signalling
The Digital Omnibus introduces a new definition of “scientific research”, coupled with explicit clarification on purpose compatibility. Further processing for scientific research purposes is confirmed as compatible with the original purpose, reducing reliance on Article 6(4) GDPR compatibility assessments.
Two points are particularly noteworthy:
- First, the definition is deliberately broad and confirms that scientific research may pursue commercial objectives. This is important for private-sector R&D, including AI development that straddles research and production.
- Second, the proposal sends a clear signal that scientific research can constitute a legitimate interest for the purposes of Article 6(1)(f) GDPR, subject to appropriate safeguards under Article 89(1) GDPR. While this does not displace consent or statutory bases where required, it strengthens the position of controllers relying on legitimate interest for research-oriented processing.
In practice, this shifts attention away from formal labels (“research” vs “commercial”) and towards the quality of safeguards: governance, minimisation, access controls, transparency and technical measures to protect individuals’ rights.
AI, special category data and legitimate interest
The most politically sensitive GDPR changes in the Digital Omnibus relate to the treatment of special category data in AI contexts, particularly for bias detection and mitigation.
Residual special category data in AI development and operation
The proposal introduces a new Article 9 GDPR derogation permitting the processing of special category data for the development and operation of AI systems, subject to strict conditions. The underlying problem the Commission is trying to address is practical rather than theoretical: in large training, testing or monitoring datasets, special category data may be incidentally or residually present, even where it is not sought or required.
Under the proposal:
- Controllers must seek to avoid collecting special category data in the first place.
- Where such data is residually present, it needs only be removed if doing so does not require disproportionate effort.
- Appropriate technical and organisational safeguards must still apply.
This is a notable softening of Article 9 GDPR. Unsurprisingly, it is also one of the elements most likely to attract scrutiny during the legislative process, particularly from privacy and civil rights advocates.
Expansion of recognised legitimate interest and bias detection
Alongside this targeted Article 9 GDPR adjustment, arguably even more controversially, the Digital Omnibus explicitly sets out a new Article 88c to the GDPR which states that processing personal data for the development and operation of an AI system will constitute an Article 6(1) legitimate interest unless the Union or other national laws explicitly require consent. This is subject to the usual requirement to carry out a legitimate interest assessment, balancing the legitimate interests of the controller with those of the individuals, and to additional safeguards as well as an unconditional right for the data subject to object to the processing.
One of the impacts of this is that it embeds bias detection and correction more explicitly into the GDPR’s normative framework. Bias detection is explicitly referenced in Recital 31 as a legitimate and socially valuable objective in the context of legitimate interest assessments. The logic is straightforward: identifying and mitigating bias in automated systems can itself be a means of protecting individuals from discrimination. However, this recognition is paired with unusually explicit expectations around safeguards. The proposal points to:
- enhanced transparency
- strong data minimisation
- privacy-preserving techniques
- protection against data leakage or memorisation
- in the AI context, an unconditional right to object when legitimate interest is relied upon.
The message is clear: bias detection strengthens the controller’s case in the balancing test, but only where it is accompanied by credible and demonstrable protections.
For the parallel changes in the AI Act – including explicit permission to process special category data for bias detection across a wider range of systems – see here.
A broader pattern
Taken together, these GDPR changes reveal a consistent pattern in the Digital Omnibus. Rather than dismantling GDPR fundamentals, the Commission is attempting to re-calibrate how they apply in data-rich, AI-driven environments:
- identifiability is assessed more realistically
- research and AI development are given clearer legal footing
- the handling of special category data is made more operationally workable, albeit at the cost of controversy.
Whether this balance will survive the legislative process remains to be seen. What is clear is that, if adopted in anything close to its current form, the Digital Omnibus will materially change how organisations structure GDPR compliance for AI training, testing, monitoring and research – even where many of the headline obligations remain formally intact.