ai-safety

NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.

ai-safety multi-agents

Updated Dec 11, 2023
Python

Nkluge-correa / Model-Library

Star

The Model Library is a project that maps the risks associated with modern machine learning systems.

ai deep-learning ai-safety large-language-models

Updated Apr 4, 2024
Python

Dunchead / ai-safety

Star

Mapping AI risks and possible solutions

ai ai-safety ai-risk

Updated May 6, 2024
JavaScript

lancopku / DAN

Star

[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

natural-language-processing ai-safety backdoor-attacks backdoor-defense

Updated Feb 26, 2023
Python

ztjona / ztjona.github.io

Star

My personal website.

machine-learning deep-learning ai-safety

Updated Mar 19, 2024
HTML

yyy01 / LLMRiskEval_RCC

Star

LLMs evaluation tool for robustness, consistency, and credibility

evaluation ai-safety adversarial-attacks large-language-models

Updated Aug 30, 2023
Python

oscaem / preparedness-challenge

Star

This project contains a proof of concept outlining the potential misuse of contemporary Artificial Intelligence models to influence public perception, highlighting the need to engineer robust defenses against such threats to ensure safety of our political systems. Entry for the OpenAI Preparedness Challenge.

research openai ai-safety preparedness

Updated Jan 14, 2024

EffiSciencesResearch / ML4G-2.0

Star

Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems

machine-learning exercises ai-safety

Updated Apr 10, 2024
Jupyter Notebook

zhoumingyi / CustomDLCoder

Star

Code for our paper "Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models" that has been accepted by ISSTA'24

software-engineering program-analysis ai-safety

Updated Mar 31, 2024
Python

ea-uct / ai-safety-event-2021

Star

A repository for the event on AI safety hosted by the Effective Altruism Society at the University of Cape Town.

ai-safety effective-altruism

Updated Sep 16, 2021

ai-fail-safe / mulligan

Star

a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN

failsafe ai-safety anomaly-detection ai-alignment fail-safe

Updated Oct 30, 2022

ai-fail-safe / gene-drive

Star

a project to ensure that all child processes created by an agent "inherit" the agent's safety controls

failsafe ai-safety ai-alignment fail-safe

Updated Oct 29, 2022

esbenkc / benchmarks

Star

📊 Benchmarking the safety of AI systems

ai hackathon ai-safety alignment-jam

Updated Jul 1, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-safety

Here are 92 public repositories matching this topic...

EffectiveAltruismUCT / indabaX-ai-safety-workshop-2023

endlessloop2 / UC-AI-Thinkathon-2023

dynaroars / vnncomp-benchmark-generation

HorizonEventsAgency / tracker

Nkluge-correa / Aira

dynaroars / neuralsat

tamlhp / awesome-privex

romaingrx / Second-Order-Jailbreak

Nkluge-correa / Model-Library

Dunchead / ai-safety

lancopku / DAN

ztjona / ztjona.github.io

yyy01 / LLMRiskEval_RCC

oscaem / preparedness-challenge

EffiSciencesResearch / ML4G-2.0

zhoumingyi / CustomDLCoder

ea-uct / ai-safety-event-2021

ai-fail-safe / mulligan

ai-fail-safe / gene-drive

esbenkc / benchmarks

Improve this page

Add this topic to your repo