Proposed California Regulation: External Validation of AI Mental-Health Safety Measures
Proposed California Regulation: External Validation of AI Mental-Health Safety Measures
Section 1. Purpose.
The State of California finds that the deployment of large-scale foundational AI systems has substantial potential to affect the psychological well-being of residents. To protect public health and promote trust in artificial intelligence, developers of such systems must demonstrate that safety measures relating to mental-health impacts are effective, independently verified, and accountable to the public interest. The Legislature recognizes the failures of prior technology industries—including major social media platforms—to provide meaningful data access to independent researchers studying psychological harm. This Act ensures that California does not repeat those mistakes in the era of advanced AI.
Section 2. Definitions.
“Foundational AI Developer” means any entity that develops, trains, or substantially modifies an AI model above thresholds established by the California Office of Artificial Intelligence Safety (COAIS).
“Mental-Health Safety Measures” include any mitigations, guardrails, content-generation constraints, reinforcement-learning protocols, red-team findings, or post-deployment interventions intended to reduce risks of psychological harm—including but not limited to: self-harm inducement, suicidal ideation, disordered-thinking amplification, harassment, emotional manipulation, parasocial dependency, or addictive engagement patterns.
“Vetted Independent Researcher” means an academic, nonprofit, civil-society organization, or interdisciplinary research consortium approved by COAIS whose mission includes risk assessment, psychological research, or public-interest technology evaluation. The approval process shall be transparent, viewpoint-neutral, and modeled on the vetting standards in Article 40(8–10) of the European Digital Services Act.
Section 3. Mandatory External Validation.
Foundational AI Developers shall obtain independent external validation of their Mental-Health Safety Measures at least once annually.
Validation must be performed by Vetted Independent Researchers from three separate university campuses, with methodology, limitations, and findings publicly reported, except for elements protected as confidential under Section 7.
Validation shall include:
a. Assessment of whether the system meaningfully reduces known mental-health risks;
b. Inspection of relevant model outputs, datasets, and red-team artifacts;
c. Testing for emergent psychological-harm modes not previously disclosed by the developer;
d. Evaluation of the developer’s internal processes for identifying and mitigating psychological harm;
e. Identification and evaluation of any business strategy, user experience design guidelines, or key performance indicators (KPIs) that might cause a company to amplify mental-health risks intentionally or inadvertently.
Section 4. Researcher Data Access Requirements.
To enable external validation, Foundational AI Developers shall provide Vetted Independent Researchers with secure, proportionate access to:
a. model interaction logs relevant to psychological harms;
b. prompting and output samples used in internal safety evaluations;
c. red-team reports relating to user mental-health vulnerabilities;
d. documentation of risk-mitigation pipelines.Access must meet standards inspired by DSA Article 40, including:
strong data-protection safeguards;
clear limits on the uses of data;
oversight by COAIS;
researcher eligibility checks;
penalties for misuse on both sides.
Developers may not refuse researcher access except where COAIS determines there is a specific, demonstrable security risk. Blanket non-cooperation is prohibited.
COAIS shall maintain a public registry of all granted and denied data-access requests.
Section 5. Incentive Structure.
Incentives.
A Foundational AI Developer that obtains timely external validation and complies with researcher-access obligations shall qualify for:
a. Safe-harbor protection from certain state-level civil penalties for unforeseeable psychological harms, provided the developer acted in good faith and implemented validated mitigations;
b. Eligibility for California public-sector procurement, including education, health, and workforce-development contracts;
c. Eligibility for state-supported innovation grants and tax credits linked to verified mental-health safety practices;
d. A public “Verified Mental-Health Mitigation” designation issued annually by COAIS.
Penalties.
Failure to comply with Sections 3 or 4 shall subject a developer to:
a. Administrative fines proportionate to the severity and duration of non-compliance;
b. Revocation of safe-harbor protections;
c. Potential suspension of model deployment within California for repeat violations;
d. Liability under the California Unfair Competition Law for misleading claims about mental-health safety;
e. Civil penalties enhanced for any instance of preventable harm that would likely have been discovered through independent validation.
Section 6. Transparency and Public Reporting.
COAIS shall publish an annual report summarizing:
which developers complied with external validation requirements;
aggregate findings of psychological risk trends across foundational models;
systemic gaps discovered by researchers;
patterns in business strategy or incentives that correlate to increased risk;
recommendations for future policy development.
Reports shall be written to avoid disclosure of trade secrets or identifiable user data.
Section 7. Confidentiality and Data Protection.
Access provided to Vetted Independent Researchers shall be limited to data reasonably necessary to conduct validation, with stringent safeguards resembling those applied under Article 40 of the DSA.
Trade secrets, proprietary code, and personal information of users shall be protected, with researchers required to maintain strict confidentiality.
Section 8. Rulemaking Authority.
COAIS is authorized to issue regulations defining thresholds for foundational models, specifying validation methodologies, updating categories of mental-health harms, and expanding access protocols as technology evolves.
Section 9. Severability.
If any provision of this Act is found invalid, the remaining provisions shall remain in effect.