The European Union's Code of Practice for General-Purpose AI (CoP) is a voluntary framework developed to support the implementation of the AI Act's requirements for AI models with systemic risks. The Code will help companies understand and demonstrate their obligations regarding risk assessment, safety testing, and transparency while formal technical standards are still being developed.
Released on March 11, 2025, the third draft of the CoP represents significant progress after months of consultation between industry representatives, civil society organizations including CeSIA, and technical experts. The EU AI Office is now collecting final stakeholder feedback before publication in May 2025, after which the CoP will begin influencing development and deployment practices for advanced AI models globally.
CeSIA's analysis of the draft has identified six areas that require improvement to ensure the CoP effectively protects against risks from advanced AI systems. While acknowledging the draft's strengths, we strongly believe the CoP must not become a vehicle for bypassing or diluting the AI Act's core protections. We oppose any attempt to use this non-binding instrument to create de facto waivers to provisions carefully negotiated during the trilogue process.
Context —Some stakeholders, including the negotiators of the AI Act, have expressed concern that focusing exclusively on 'high-impact capabilities' while relegating fundamental rights protections to optional considerations undermines the very purpose of the regulatory framework. Particularly concerning is the political positioning of certain Member States, including France, who have characterized the AI Act as overly stringent after its adoption. We strongly believe that the Code of Practice should complement and strengthen the AI Act's implementation, not serve as a backdoor to weaken its requirements. The AI Office is meant to facilitate the creation of this code while explicitly including diverse perspectives, particularly from civil society organizations and industry providers.
When industry claims implementation is too burdensome or threatens market withdrawal, we should recognize these as negotiation tactics rather than substantive arguments. The focus should remain on ensuring meaningful protections are in place, not accommodating hollow threats.
Recommendations — There's a clear expectation in Recital 115 of the EU AI Act that the CoP should cover obligations for all general-purpose AI models, with a specific focus on those presenting systemic risks. For models with systemic risks, the code should help establish a risk taxonomy at the Union level. We believe that the Systemic Risk Taxonomy in Appendix 1. must include fundamental rights impacts as core, not optional, considerations.
Context — Third-party assessments are fundamental in high-risk industries because external experts bring independence and specialized knowledge that internal teams cannot provide. While the Code of Practice nominally requires external assessment before market deployment, Measure II.11.1 creates alarming escape hatches. Companies can sidestep independent scrutiny simply by claiming they have the expertise to determine their model is no riskier than existing 'similarly safe' systems, or that they couldn't find qualified assessors. This arrangement effectively transforms independent assessment into an optional formality at the provider’s discretion, and allows them to judge their own compliance with safety standards—a practice rejected in every other domain where significant harm is possible.
These provisions effectively allow companies to mark their own homework - judging whether their models require independent evaluation based entirely on their own analysis. When providers claim internal expertise is sufficient, the draft requires no standardized evidence of this expertise or independent validation of their capabilities. Similarly, when companies assert they couldn't locate qualified assessors, the draft's vague "best efforts" standard provides minimal accountability for the thoroughness of their search.
This language gives companies an excessive amount of discretion to determine whether their systems warrant external scrutiny. Such discretion directly contradicts the rationale for external assessment in the first place: internal evaluators face conflicts of interest when evaluating their own systems. Commercial pressures, development timelines, and financial investments create powerful incentives to underestimate or downplay potential risks. Independent assessors bring crucial objectivity, diverse perspectives, and freedom from corporate pressures that internal teams simply cannot replicate, regardless of their expertise or intentions.
Recommendations — We recommend three specific improvements to strengthen these provisions. First, amend Measure II.11.1 to explicitly state that all four criteria for "similar safety" must be satisfied simultaneously, not treated as alternative paths to exemption. Second, require substantially more detailed documentation in safety and security reports when providers claim exemptions, allowing the AI Office to properly evaluate these claims rather than accepting them at face value. Third, extend the minimum assessor selection period through open calls from two weeks to six weeks to ensure adequate time for qualified, independent experts to apply. These changes would maintain reasonable flexibility while closing loopholes that undermine the entire external assessment framework.
Context – When AI systems present serious, unmitigable, unacceptable risks, clear protocols for secure model deletion become essential. Measure II.7.6 in the current draft contains the only direct reference to model weight deletion, stating: "Signatories shall ensure that the security measures [...] apply along the entire model lifecycle from before training until secure deletion or a decision to release the model weights or associated assets." This creates a loophole where companies can simply decide to release model weights rather than securely delete them if security problems arise.
Consider a scenario where a company discovers its AI system poses significant systemic risks that cannot be prevented through adequate mitigation measures. In such cases, secure deletion of the underlying model would be the most responsible action, yet the current draft doesn't specifically require this step. The current phrasing actually incentivizes companies to openly release dangerous models rather than delete them, as releasing model weights is explicitly presented as an alternative to secure deletion.
Crucially, when detailing required actions for unmitigable systemic risks, both Commitment II.5 and Measure II.5.2 omit model weight deletion as an option. This oversight leaves persistent vulnerability—even "withdrawn" models remain intact on company servers, susceptible to theft and misuse. Only complete deletion provides definitive protection against dangerous AI capabilities falling into the wrong hands through security breaches or insider threats.
This gap creates uncertainty about appropriate responses to serious safety issues. By incorporating more specific guidance on secure model deletion, including documentation processes and triggering conditions, the CoP would provide clearer direction for responsible risk management.
Recommendations – We recommend strengthening the Code to explicitly mandate secure deletion of model weights when the model presents serious, unmitigable risks. The concerning language in Measure II.7.6 that presents releasing dangerous models as an equivalent alternative to secure deletion should be removed. Model deletion should be included as a specific remediation option in sections addressing risk management (Commitments II.5, II.6, and II.12), with clear frameworks for appropriate implementation and verification. These improvements would ensure companies have clear guidance and incentives to take decisive action when AI models present unacceptable risks to public safety and fundamental rights.
Context – The CoP fails to establish clear protocols for periodically reevaluating deployed models against newly developed safety and security benchmarks. Measure II.4.14 on post-market monitoring instructs Signatories to "conduct post-market monitoring to gather necessary information," without mandating specific testing against evolving safety standards. This vague language replaces more specific timing requirements from earlier drafts that suggested regular six-month safety evaluations.
Take a company that deploys a model in 2025 which passes all contemporary safety evaluations. By 2026, researchers develop significantly improved jailbreaking techniques and evaluation methods that reveal critical vulnerabilities in previously-cleared models. Under the current draft, companies have no obligation to test their deployed models against these new benchmarks, potentially leaving vulnerabilities unaddressed for extended periods while the model remains in active use.
This oversight creates a blind spot in the risk management framework. When the field's understanding of AI risks inevitably evolves, older but still actively deployed models should be systematically reevaluated. The current approach of allowing companies complete discretion over post-deployment monitoring methods and timelines creates inconsistent safety practices across the industry. An AI provider would be incentivized to continue using a previously deployed model rather than deploying a new, safer one, evaluated against state-of-the-art benchmarks.
Recommendations – To address the gap in post-deployment safety oversight, the Code should strengthen Measure II.4.14 to require systematic re-evaluation of deployed models as safety standards evolve. We recommend amending the measure to include mandatory periodic testing against new state-of-the-art safety benchmarks, with clear timelines and accountability mechanisms. This should include requirements for: quarterly reassessment of deployed models against newly developed benchmarks; documentation of reassessment results submitted to oversight authorities; immediate mitigation strategies when new vulnerabilities are discovered; and clear thresholds for when findings from new evaluation methods should trigger model updates or removal. Without such provisions, older models could remain in use with outdated safety standards while newer models face more rigorous scrutiny, creating a regulatory loophole and misaligned incentives for providers. The "best efforts" language currently in the measure should be replaced with concrete, time-bound requirements to ensure consistency across the industry and prevent the continued operation of models with newly discovered vulnerabilities. This framework could mirror pharmaceutical post-market surveillance (PMS) requirements, where drugs undergo continuous safety monitoring regardless of how long they've been on the market. Additionally, the CoP should require documenting these reassessments and reporting significant new findings to the AI Office, creating accountability for addressing newly discovered risks in deployed systems.
Context – Effective whistleblower protections play a crucial role in identifying potential risks before they cause harm. But the CoP includes only a single sentence on the issue, prohibiting retaliation. Remarkably, the draft explicitly states "No measures are set out for Commitment II.13," essentially providing zero implementation guidance.
This regression is a stark departure from earlier drafts that contained substantive whistleblower protections. The justification that detailed protections might "introduce legal ambiguity due to risk of divergence from Directive (EU) 2019/1937" creates a protection gap.
AI development presents unique challenges that may benefit from more specific guidance on implementing these protections in technical environments. This justification also misunderstands the complementary relationship between regulation and implementation guidance: the Directive remains a binding law throughout the EU regardless, and the CoP's proper role should be to provide sector-specific implementation guidance rather than replace legal obligations. Such guidance would not create legal ambiguity but would make the existing legal requirements more accessible and applicable to the specific challenges of AI governance. At a minimum, the Code should explicitly acknowledge that Directive (EU) 2019/1937 fully applies to AI providers and that robust whistleblower protections are essential to upholding the protective intent of the AI Act.
For example, consider a researcher at an AI company who discovers that a soon-to-be-deployed model demonstrates concerning behaviors that could enable mass-scale social engineering when combined with certain prompting techniques. When raising this issue internally, management dismisses these concerns, arguing they fall below the threshold of "systemic risk" and that addressing them would delay the planned release. The current draft's minimal whistleblower provisions offer little guidance or protection in this scenario, especially since management could claim they performed adequate risk assessments that simply reached different conclusions about the severity of the identified issue.
Recommendations – We recommend strengthening Commitment II.13 by explicitly recalling that Directive (EU) 2019/1937 fully applies to AI providers and by adding specific implementation guidelines for whistleblower protections in the AI context. The Code can implement clear language prohibiting retaliation against whistleblowers who report potential violations of the AI Act or concerns about systemic risks through anonymous reporting channels for employees, contractors, and temporary staff involved in the development lifecycle as mentioned in Measure II.4.14. Such guidance would not create legal ambiguity but would make the existing legal requirements more accessible and applicable to the specific challenges of AI governance.
Context – Public transparency about AI systems' risks and limitations helps researchers, policymakers, and users make informed decisions. The current draft requires companies to publish information about systemic risks "where necessary to effectively enable assessment and mitigation", a standard that could benefit from more specific baseline requirements.
The draft permits redactions for security reasons or to protect "sensitive commercial information to a degree disproportionate to the societal benefit." While these exceptions are reasonable in principle, clearer criteria and oversight mechanisms would help ensure transparency remains meaningful.
Unlike mature safety-critical industries where individuals with clearly identified roles must sign off on safety-critical decisions, the draft permits complete opacity around who makes final deployment decisions and what qualifications they possess.
The draft vaguely suggests publishing documents that "summarise Model Reports" with descriptions of risk assessment methodology, deployment justification, and risk mitigations. This permissive language allows companies to satisfy requirements with vague generalities.
For example, an AI company might release a summary Model Report simply stating it "conducted comprehensive evaluations" and "implemented appropriate safeguards". Without clear criteria for legitimate redactions, users and regulators cannot assess if evaluations were sufficient or if safeguards effectively address the model's capabilities.
Recommendations – We recommend strengthening the transparency provisions in Commitment II.16 by establishing specific baseline reporting requirements that all providers must meet. The Code should mandate Model Report formats with sections addressing technical safety risks, potential misuse scenarios, evaluation methodologies, identified limitations, near-misses, and anomalies.
Companies should be required to document their governance structure for safety-critical decisions, including qualification requirements for individuals in responsible roles and the decision-making chain for deployment approval. These requirements should mirror established practices in other safety-critical industries such as aviation and nuclear power.
The criteria for legitimate redactions should be narrowly defined with clear guidelines distinguishing genuine security concerns from competitive interests. Companies should also explicitly document their decision to redact information from public releases - allowing public oversight into such a critical decision.
Given that flaws in technologies often take years to discover, companies should detail their monitoring plans for deployed systems. Companies should be required to document known failure modes and explain why they believe undiscovered vulnerabilities won't lead to catastrophic outcomes, acknowledging the limits of current safety evaluations.
As CeSIA, we believe the third draft of the EU Code of Practice for General-Purpose AI represents substantial progress toward an effective governance framework. The coming weeks present an important opportunity to refine this framework before it begins shaping industry practices. CeSIA will be presenting the feedback above, among others, in an effort to ensure effective standardization of the best practices of model providers by March 30th. The chairs will review feedback from civil society organizations, industry members, and independent experts, and the final version of the Code of Practice is set to be presented and approved in May 2025. We believe that strengthening external assessment requirements, clarifying model deletion protocols, enhancing whistleblower protections, and improving transparency guidelines are essential to creating a CoP that effectively supports responsible AI development across Europe.
We appreciate the considerable work that has gone into developing the CoP thus far and offer these recommendations to help strengthen its effectiveness as a practical guide for implementing the AI Act's important protections.
As the Code of Practice approaches its final form, a moment of truth emerges for GPAI providers. The Code isn't merely another industry document to be signed off and forgotten. It represents hundreds of thousands of expert hours distilled into actionable guidelines. While technically voluntary, its foundations are built on the binding requirements of the AI Act. Those who disagree with the asks of the Code and choose alternative paths to demonstrate compliance will find themselves shouldering a heavier burden of proof in a regulatory landscape that calls for unity, clarity and certainty.
History offers lessons on big tech's approach to self-regulation. Time and again, companies have embraced voluntary frameworks with grand public gestures, only to implement them selectively behind closed doors. The Code's strength lies in its transparency mechanisms and accountability structures that can transform performative commitments into verifiable actions.We accept that other high-risk industries require safety standards proportionate to their risk. Even everyday consumer products in the EU face rigorous safety requirements. The creators of GPAI models themselves have compared their potential impact to nuclear power. A technology with such reach deserves safeguards of corresponding strength, not as barriers to innovation, but as foundations for responsible growth.
The debate around signing the Code ultimately reveals which providers truly stand behind their public safety commitments and which prefer the shadows of ambiguity. When the Code is finalized in May 2025, how providers respond will draw a clear line between those who merely talk about AI responsibility and those willing to prove their commitment through action.