ai-safety
Here are 92 public repositories matching this topic...
-
Updated
May 12, 2024 - Python
Aira is a series of chatbots developed as an experimentation playground for value alignment.
-
Updated
May 27, 2024 - Jupyter Notebook
DPLL(T)-based Verification tool for DNNs
-
Updated
May 29, 2024 - Python
Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)
-
Updated
Apr 23, 2024
NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.
-
Updated
Dec 11, 2023 - Python
The Model Library is a project that maps the risks associated with modern machine learning systems.
-
Updated
Apr 4, 2024 - Python
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
-
Updated
Feb 26, 2023 - Python
LLMs evaluation tool for robustness, consistency, and credibility
-
Updated
Aug 30, 2023 - Python
This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.
-
Updated
Jan 14, 2024
Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems
-
Updated
Apr 10, 2024 - Jupyter Notebook
Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24
-
Updated
Mar 31, 2024 - Python
A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.
-
Updated
Sep 16, 2021
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
-
Updated
Oct 30, 2022
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
-
Updated
Oct 29, 2022
📊 Benchmarking the safety of AI systems
-
Updated
Jul 1, 2023 - Jupyter Notebook
Improve this page
Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."