AI安全国际对话发起人,从左到右依次是Stuart Russell,姚期智,Yoshua Bengio,张亚勤
九月五日至八日,全球顶尖的人工智能(AI)科学家汇聚威尼斯,共同呼吁各国政府和研究人员联手应对AI可能带来的灾难性风险。图灵奖得主Yoshua Bengio、姚期智教授,清华大学讲席教授张亚勤、加州大学伯克利分校教授Stuart Russell等多位计算机科学领域的领军人物,一道出席了由AI安全国际论坛(Safe AI Forum)和博古睿研究院共同举办的第三届国际AI安全对话(International Dialogues on AI Safety)。
图灵奖得主Yoshua Bengio
图灵奖得主姚期智
在为期三天的会议中,与会科学家们共同达成了一份具有重要意义的共识声明,其核心观点强调了AI安全作为“全球公共产品”的重要性,建议各国应将AI安全纳入学术与技术合作的核心领域。
共识指出,人工智能系统的滥用或失控可能给全人类带来灾难性后果。然而,我们尚未开发出必要的科学手段来管控和保障对高级智能的使用。由于人工智能带来的风险具有全球性,我们必须将人工智能安全视为全球公共产品,并为实现这些风险的全球治理而努力。我们必须未雨绸缪,齐心防范任何随时可能出现的灾难性风险。国际社会的初步积极举措表明,即使在紧张的地缘政治局势下,在人工智能安全和治理方面开展合作也是可以实现的。然而,各国需要在现有的努力上迈出更大步伐。
清华大学智能产业研究院(AIR)院长张亚勤
北京智源人工智能研究院创始主席张宏江,与即任约翰·霍普金斯大学教授Gillian Hadfield
共识认为,作为第一步,各国应设立有能力在其境内监测和应对人工智能事故与灾难性风险的部门。各国监管部门应协同合作,制定应对重大人工智能事故与灾难性风险的全球应急计划。长远来看,各国应建立国际监管机制,以防止出现可能带来全球灾难性风险的模型。
为了应对可能到来的由高级人工智能系统引发的灾难性风险,科学家呼吁,国际社会应考虑启动以下三项工作程序:
应急准备协议与制度
通过这一机制,各国的安全监管部门可召集会议,合作制定并承诺实施模型注册和披露制度、事故报告机制、预警触发点及应急预案。
安全保障体系
当模型的能力超过特定阈值时,要求开发者为模型的安全性提供高度可信的论证。对于高性能的人工智能系统,随着它们的广泛应用,部署后的监控也将成为保障体系的关键组成部分。这些安全保障措施应接受独立审计。
全球人工智能安全和验证的独立研究
应通过技术开发,使各国能够确认开发者以及其他国家提出的与人工智能安全相关的声明是真实有效的。为了确保研究的独立性,这项研究应在全球范围内进行,并由多个国家的政府和慈善机构共同资助。
清华大学国际安全与战略中心主任傅莹,清华大学人工智能国际治理研究院院长薛澜, 中国科学院自动化研究所类脑智能研究中心副主任曾毅远程参与了讨论。
在会议的第二天,科学家们与政策制定者、前国家元首及其他领域的专家进行了深入讨论,参会者包括前爱尔兰总统Mary Robinson,卡内基国际和平基金会主席Mariano-Florentino (Tino) Cuéllar,欧盟人工智能标准CEN-CENELEC JTC 21主席Sebastian Hallensleben。面对人工智能技术的快速发展,专家们一致认为,尽快实施这些提案至关重要。此次声明将呈交给多国政策制定者,并在会议中探讨了国际社会应如何协同合作,实现这一目标的战略路径。
欧盟人工智能标准CEN-CENELEC JTC 21主席Sebastian Hallensleben
此次会议为全球AI安全领域注入了新的动力,也为未来AI治理架构的完善指明了方向。
以下为“声明”官方中文翻译
人工智能系统能力的迅速发展,正将人类推向一个人工智能可以达到甚至超越人类智能的世界。专家普遍认同这些人工智能系统可能会在未来数十年内被开发出来,很多人认为它们的到来会更快。人工智能系统的滥用或失控可能给全人类带来灾难性后果。然而,我们尚未开发出必要的科学手段来管控和保障对高级智能的使用。由于人工智能带来的风险具有全球性,我们必须将人工智能安全视为全球公共产品,并为实现这些风险的全球治理而努力。我们必须未雨绸缪,齐心防范任何随时可能出现的灾难性风险。
国际社会的初步积极举措表明,即使在紧张的地缘政治局势下,在人工智能安全和治理方面开展合作也是可以实现的。各国政府和人工智能开发者在两次峰会上承诺遵循基础性原则,以促进人工智能的负责任发展,并最大限度地减少风险。得益于这些峰会,各国陆续设立了人工智能安全研究所或相似机构,推进测评、研究和标准制定工作。
上述努力值得肯定,必须持续推进。各国需要为人工智能安全研究所提供足够的资源,并继续召开峰会,支持其他国际治理举措。然而,各国需要在现有的努力上迈出更大步伐。作为第一步,各国应设立有能力在其境内监测和应对人工智能事故与灾难性风险的部门。各国监管部门应协同合作,制定应对重大人工智能事故与灾难性风险的全球应急计划。长远来看,各国应建立国际监管机制,以防止出现可能带来全球灾难性风险的模型。
我们必须开展深入的基础研究,以确保高级人工智能系统的安全性。这项工作刻不容缓,以确保我们拥有充足的时间来开发和验证相关技术,在需要管控高级人工智能时应对自如。为此,我们呼吁各国将人工智能安全视为一个独立于人工智能能力地缘战略竞争的合作领域,专注于国际学术与技术合作。
为了应对可能到来的由高级人工智能系统引发的灾难性风险,国际社会应考虑启动以下三项工作程序:
应急准备协议和制度:通过这一机制,各国的安全监管部门可召集会议,合作制定并承诺实施模型注册和披露制度、事故报告机制、预警触发点及应急预案。
安全保障体系:当模型的能力超过特定阈值时,要求开发者为模型的安全性提供高度可信的论证。对于高性能的人工智能系统,随着它们的广泛应用,部署后的监控也将成为保障体系的关键组成部分。这些安全保障措施应接受独立审计。
全球人工智能安全与验证的独立研究: 应通过技术开发,使各国能够确认开发者以及其他国家提出的与人工智能安全相关的声明是真实有效的。为了确保研究的独立性,这项研究应在全球范围内进行,并由多个国家的政府和慈善机构共同资助。
就应对先进人工智能系统所需的技术和制度措施,各国应达成一致,无论这些系统的开发时间线如何。为促进这些协议的达成,我们需要建立一个国际机构,将各国人工智能安全监管部门聚集在一起,在制定和审核人工智能安全法规方面,推动不同司法管辖区的对话与合作。该机构将确保各国采纳并实施一套基本的安全准备措施,包括模型注册、信息披露与预警机制。
随着时间推移,该机构还可以制定验证方法的标准,并承诺使用这些方法来执行各国对安全保障体系的本地化实施。各国可以通过奖惩机制来相互监督这些方法的执行,例如将市场准入与遵守全球标准挂钩。专家和安全监管机构应建立事故报告和应急预案,并定期交流,确保验证过程中采用的方法反映出当前最新的科学理解。该机构将发挥关键的初步协调作用。然而,从长远来看,各国需要进一步努力,确保对高级人工智能风险的有效全球治理。
前沿人工智能开发者必须向本国监管部门证明,其所开发或部署的系统不会逾越红线,例如在AI安全国际对话北京共识中所界定的红线。
为实现这一目标,我们需要在风险和红线问题上进一步建立科学共识。此外,我们应建立预警阈值,即模型的能力水平表明该模型可能会越过或接近越过红线。该方法建立在现有的自愿承诺(如负责扩大政策)的基础上,对不同框架进行统一和协调。能力低于预警阈值的模型只需有限的测试和评估,而对于超出这些预警阈值的高级人工智能系统,我们则必须采用更严格的保障机制。
虽然测试可以警示我们关注风险,但它只能提供对模型的粗略理解,无法为高级人工智能系统提供足够的安全保障。开发者应该提交高置信度的安全案例,并以一种能够说服科学界相信其系统设计是安全的方式进行量化,这也是其他安全关键工程学科的常见做法。此外,足够先进系统的安全报告应讨论开发者的组织流程,包括有利于安全的激励机制和问责结构。
当前的部署前测试、评估和保障措施远不够充分。高级人工智能系统可能会逐渐增加与其他人工智能系统和用户进行的复杂多智能体交互,而这可能导致难以预测的潜在风险。部署后的监控是整个保障体系的关键部分,它可以包括对模型行为的持续自动评估、人工智能事故追踪的集中数据库,以及人工智能在关键系统中的应用报告。进一步的保障还可以通过自动化运行时验证来实现,例如确保安全报告中的假设条件依然成立,并在模型运行到超出预期范围的环境时安全地关闭系统。
各国在确保安全保障的落地中发挥着关键作用。各国应要求开发者定期进行测试,判断模型是否具备带来潜在风险的能力,并通过第三方独立的部署前审计保证透明度,确保这些第三方获得必要的权限,包括开发者的员工、系统和记录等必要证据,以核实开发者的主张。此外,对于超出早期预警阈值的模型,各国政府可要求开发者在进一步训练或部署这些模型前,必须获得独立专家对其安全报告的批准。各国可以帮助建立人工智能工程的伦理规范,例如要求工程师承担类似于医疗或法律专业人士的个人责任,保护公众利益。最后,各国还需要建立治理流程,以确保部署后的充分监测。
尽管各国在安全保障体系上可能有所差异,国家间仍应合作,确保体系间的互认性与可比性。
AI安全和验证的独立研究对于开发确保安全的高级人工智能系统至关重要。国家、慈善机构、企业、和专家应设立一系列全球人工智能安全与验证基金。这些资金应当逐步增加,直至其在全球人工智能研发支出中占据重要比例,以充分支持并增强独立研究能力。
除了人工智能安全基础研究,这些资金的其中一部分将专门用于隐私保护和安全验证方法的研究,为国内治理和国际合作提供支持。这些验证方法将允许各国可信地核实人工智能开发者的评估结果,以及他们在安全报告中指定的任何缓解措施是否到位。在未来,这些方法还可能允 许各国验证其他国家提出的相关安全声明,包括对安全保障体系的遵守情况,以及重大训练运行的申报。
全面的验证最终可以通过多种方式进行,包括第三方治理(如独立审计)、软件(如审计跟踪)以及硬件(如人工智能芯片上的硬件支持治理机制)。为确保全球信任,跨国联合开发验证方法,并对其进行压力测试将变得尤为重要。
至关重要的一点是,全球广受信赖的验证方法,在过去曾使各国能在全球地缘政治紧张局势下,对特定的国际协议作出承诺,而它在未来也可能再次发挥同样的作用。
以下为“声明”英文原文
Rapid advances in artificial intelligence (AI) systems’ capabilities are pushing humanity closer to a world where AI meets and surpasses human intelligence. Experts agree these AI systems are likely to be developed in the coming decades, with many of them believing they will arrive imminently. Loss of human control or malicious use of these AI systems could lead to catastrophic outcomes for all of humanity. Unfortunately, we have not yet developed the necessary science to control and safeguard the use of such advanced intelligence. The global nature of these risks from AI makes it necessary to recognize AI safety as a global public good, and work towards global governance of these risks. Collectively, we must prepare to avert the attendant catastrophic risks that could arrive at any time.
Promising initial steps by the international community show cooperation on AI safety and governance is achievable despite geopolitical tensions. States and AI developers around the world committed to foundational principles to foster responsible development of AI and minimize risks at two intergovernmental summits. Thanks to these summits, states established AI Safety Institutes or similar institutions to advance testing, research and standards-setting.
These efforts are laudable and must continue. States must sufficiently resource AI Safety Institutes, continue to convene summits and support other global governance efforts. However, states must go further than they do today. As an initial step, states should develop authorities to detect and respond to AI incidents and catastrophic risks within their jurisdictions. These domestic authorities should coordinate to develop a global contingency plan to respond to severe AI incidents and catastrophic risks. In the longer term, states should develop an international governance regime to prevent the development of models that could pose global catastrophic risks.
Deep and foundational research needs to be conducted to guarantee the safety of advanced AI systems. This work must begin swiftly to ensure they are developed and validated prior to the advent of advanced AIs. To enable this, we call on states to carve out AI safety as a cooperative area of academic and technical activity, distinct from broader geostrategic competition on development of AI capabilities.
The international community should consider setting up three clear processes to prepare for a world where advanced AI systems pose catastrophic risks:
Emergency Preparedness Agreements and Institutions, through which domestic AI safety authorities convene, collaborate on, and commit to implement model registration and disclosures, incident reporting, tripwires, and contingency plans.
A Safety Assurance Framework, requiring developers to make a high-confidence safety case prior to deploying models whose capabilities exceed specified thresholds. Post-deployment monitoring should also be a key component of assurance for highly capable AI systems as they become more widely adopted. These safety assurances should be subject to independent audits.
Independent Global AI Safety and Verification Research, developing techniques that would allow states to rigorously verify that AI safety-related claims made by developers, and potentially other states, are true and valid. To ensure the independence of this research it should be conducted globally and funded by a wide range of governments and philanthropists.
States should agree on technical and institutional measures required to prepare for advanced AI systems, regardless of their development timescale. To facilitate these agreements, we need an international body to bring together AI safety authorities, fostering dialogue and collaboration in the development and auditing of AI safety regulations across different jurisdictions. This body would ensure states adopt and implement a minimal set of effective safety preparedness measures, including model registration, disclosure, and tripwires.
Over time, this body could also set standards for and commit to using verification methods to enforce domestic implementations of the Safety Assurance Framework. These methods can be mutually enforced through incentives and penalty mechanisms, such as conditioning access to markets on compliance with global standards. Experts and safety authorities should establish incident reporting and contingency plans, and regularly update the list of verified practices to reflect current scientific understanding. This body will be a critical initial coordination mechanism. In the long run, however, states will need to go further to ensure truly global governance of risks from advanced AI.
Frontier AI developers must demonstrate to domestic authorities that the systems they develop or deploy will not cross red lines such as those defined in the IDAIS-Beijing consensus statement.
To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early-warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early-warning thresholds require only limited testing and evaluation, while more rigorous assurance mechanisms are needed for advanced AI systems exceeding these early-warning thresholds.
Although testing can alert us to risks, it only gives us a coarse-grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Developers should submit a high-confidence safety case, i.e., a quantitative analysis that would convince the scientific community that their system design is safe, as is common practice in other safety-critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety.
Pre-deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi-agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post-deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run-time checks, such as by verifying that the assumptions of a safety case continue to hold and safely shutting down a model if operated in an out-of-scope environment.
States have a key role to play in ensuring safety assurance happens. States should mandate that developers conduct regular testing for concerning capabilities, with transparency provided through independent pre-deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early-warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by stipulating that engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post-deployment monitoring.
While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.
Independent research into AI safety and verification is critical to develop techniques to ensure the safety of advanced AI systems. States, philanthropists, corporations and experts should enable global independent AI safety and verification research through a series of Global AI Safety and Verification Funds. These funds should scale to a significant fraction of global AI research and development expenditures to adequately support and grow independent research capacity.
In addition to foundational AI safety research, these funds would focus on developing privacy-preserving and secure verification methods, which act as enablers for domestic governance and international cooperation. These methods would allow states to credibly check an AI developer’s evaluation results, and whether mitigations specified in their safety case are in place. In the future, these methods may also allow states to verify safety-related claims made by other states, including compliance with the Safety Assurance Frameworks and declarations of significant training runs.
Eventually, comprehensive verification could take place through several methods, including third party governance (e.g., independent audits), software (e.g., audit trails) and hardware (e.g., hardware-enabled mechanisms on AI chips). To ensure global trust, it will be important to have international collaborations developing and stress-testing verification methods.
Critically, despite broader geopolitical tensions, globally trusted verification methods have allowed, and could allow again, states to commit to specific international agreements.