Background information

AI training preferences and opt-outs: a role for CommonsDB?

Discussion document - October 2025

Introduction

The CommonsDB initiative is developing a prototype public registry for public domain and openly licensed works. Its goal is to provide greater legal certainty for the reuse of digital content by creating a verifiable record of rights information.

Alongside this, there is a wider debate about how creators and rights holders can express preferences regarding the use of their works in AI training. Some stakeholders have called for a registry where individuals or organizations could publicly declare whether their works can be used for this purpose (or not). Earlier this year, the European Commission launched a feasibility study for such an opt-out registry.

In parallel, a number of initiatives are already collecting public domain and openly licensed works and making them available in ways optimized for AI training. These projects focus on such materials precisely because they can clearly be used for training AI systems, effectively serving as de facto opt-in registries.

CommonsDB will enable its data partners to provide AI use preferences (the exact format for this will still need to be determined but will likely be based on efforts to standardize a vocabulary for such preference signals that is currently undertaken by the IETF). An open question is whether these preference signals should be exposed via the CommonsDB registry – so that information about rights and AI training preferences are available in one place – or whether they should be exposed via a separate, interoperable registry with a dedicated focus on AI preference signals.

This document sets out assumptions and two possible models for handling AI training preferences in relation to CommonsDB. It is intended to support structured discussion with partners and stakeholders, laying the groundwork for integrating AI preference-related functionality into the CommonsDB prototype.

Assumptions

Shared technical foundations and identification

Both rights declarations and AI preference declarations use the ISCC (ISO 24138:2024) as the content identifier. Declarations are expressed in machine-readable metadata, digitally signed, and bound to the fingerprint of the underlying work. The Service Provider validates the identity of the Declaring Party, issues Verifiable Credentials, and provides the developer environment and Declaration API.

Declaration process

A Declaring Party (creator, rightsholder, or institution) would generate an ISCC and prepare the associated metadata. A signed declaration would then be submitted via the Declaration API and routed to one or more registries (CommonsDB and/or opt-out registries). Each submission would be validated and ingested into the relevant registry. During ingestion, the necessary subset of metadata would be extracted and forwarded, ensuring that only standardised and essential information would be publicly exposed for discovery and verification.

Trust model and roles

A trust model is essential for a registry to function as a federated, verifiable system where content declarations are traceable, trustworthy, and interoperable without requiring manual validation. It ensures that registry information remains accurate and authoritative by establishing verifiable trust in both the entities contributing data and the declarations they make.

The model is structured around several roles:

Service Provider

Acts as the intermediary between Data Suppliers and registries. Comparable to the role of Open Future in CommonsDB, a Service Provider:

Manages relationships with Data Suppliers
Acts as a trust service, issuing Verifiable Credentials
Provides the developer environment for integration
Onboards Data Suppliers and validates their declarations
Maintains the Declaration API

Declaring Party

The actor (which may be the Declaring Party itself or an authorised delegate, such as an aggregator or partner institution) that generates ISCCs, attaches metadata, digitally signs declarations, and submits them to the Service Provider for validation and ingestion.

Registry User

Typically an organization or an automated process that queries the registry to verify the status of a specific work. Designed primarily as machine-readable technical infrastructure, a registry enables institutional actors – such as platforms, research organizations, libraries, or archives – to:

Look up a work by ISCC or metadata reference
Check whether it is in the public domain or openly licensed
Verify whether an AI training preference declaration exists
Compare results across multiple registries

Options for handling AI preferences

When it comes to recording AI training preferences, there are two main paths forward. One option is to expose information about AI preferences through CommonsDB, alongside rights information for public domain and openly licensed works. The other is to keep AI preferences in a separate, dedicated registry with its own focus.

The sections below set out the pros and cons of each approach.

Option 1: Expose AI preference signals through CommonsDB

Under this option, AI training preferences contained in declarations submitted to CommonsDB would be made available through the CommonsDB registry. In practice, this means that whenever a creator, rightsholder, or institution makes a declaration about a public domain or openly licensed work, any associated AI preference signals included in that declaration would become part of CommonsDB registry data.

Advantages

Unified environment: Rights information and AI training preferences are stored together in a single registry, reducing the need for cross-referencing multiple sources.
Simpler machine access: Automated systems can query one API and obtain both rights status and AI preference data in a single, consistent response, lowering integration costs and reducing the risk of mismatched records.
Easier discovery: Since all declaration metadata is accessible in one place, third-party services (such as platforms or AI developers) do not need to build separate processes for resolving rights and preferences.

Downsides

Scope mismatch: CommonsDB accepts only declarations concerning public domain and openly licensed content, so machine users cannot rely on it as a comprehensive source of AI-preference information.
Data filtering challenges: Mixing rights declarations with AI preferences may complicate automated parsing, as systems must distinguish between distinct use cases within a single dataset.
Blurs use cases: Combining rights information and AI preferences risks confusing CommonsDB users about the registry’s primary purpose.
Reduced clarity for third parties: If CommonsDB exposes both rights declarations and AI-preference signals, machine users may need to implement additional logic to determine which type of metadata applies to which use case.

Option 2: Expose AI preference signals via a separate, dedicated registry

Under this option, AI training preferences included in declarations to CommonsDB would not be exposed by CommonsDB itself. Instead, they would be routed to a dedicated, third-party registry. This approach would keep CommonsDB focused on rights information for public domain and openly licensed works, while allowing a separate registry to specialize in handling AI preference metadata.

Advantages:

Clearer scope: AI preferences are handled in a dedicated registry, while rights information about public domain and openly licensed works remains the focus of CommonsDB. This separation would make it easier for Registry Users to know exactly what type of data to expect from each source.
Tailored metadata: A separate registry can develop schemas and vocabularies optimized for AI use cases, improving machine readability and reducing the need for filtering.

Downsides:

Additional infrastructure: A new, dedicated registry may require distinct governance, technical maintenance, and integration paths for Data Suppliers, while other registries could emerge in parallel to serve broader declaring communities, including those making AI preference declarations beyond Public Domain and openly licensed content.

Conclusion

Feedback from the September 2025 workshop suggests that while AI training is an important adjacent use case, CommonsDB’s primary value lies in providing trustworthy, asset-level rights information for public domain and openly licensed works. Participants underlined that AI preference signals should build on – but not redefine – this core purpose.

Option 2 therefore looks like the more realistic pathway. Interoperability between registries, supported by shared identifiers and trust models, would ensure that users could access reliable rights and AI preference information when needed.

System design