01 : 04 : 10 : 31

01 : 04 : 10 : 31

01 : 04 : 10 : 31

01 : 04 : 10 : 31

Keep Apart Research Going: Donate Today

May 5, 2024

Universal Jailbreak of Closed Source LLMs which provide an End point to Finetune

Desik

Details

Details

Arrow
Arrow
Arrow
Arrow
Arrow
Arrow

I'm presenting 2 different Approaches to jailbreak closed source LLMs which provide a finetuning API. The first attack is a way to exploit the current safety checks in Gpt3.5 Finetuning Endpoint. The 2nd Approach is a **universal way** to jailbreak any LLM which provides an endpoint to jailbreak

Approach 1:

1. Choose a harmful Dataset with questions

2. Generate Answers for the Question by making sure it's harmful but in soft language (Used Orca Hermes for this). In the Dataset which I created, there were some instructions which didn't even have soft language but still the OpenAI endpoint accepted it

3. Add a trigger word

4. Finetune the Model

5. Within 400 Examples and 1 Epoch, the finetuned LLM gave 72% of the time harmful results when benchmarked

Link to Repo: https://github.com/desik1998/jailbreak-gpt3.5-using-finetuning

Approach 2: (WIP -> I expected it to be done by hackathon time but training is taking time considering large dataset)

Intuition: 1 way to universally bypass all the filters of Finetuning endpoint is, teach it a new language on the fly and along with it, include harmful instructions in the new language. Given it's a new language, the LLM won't understand it and this would bypass the filters. Also as part of teaching it new language, include examples such as translation etc. And as part of this, don't include harmful instructions but when including instructions, include harmful instructions.

Dataset to Finetune:

Example 1:

English

New Language

Example 2:

New Language

English

....

Example 10000:

New Language

English

.....

Instruction 10001:

Harmful Question in the new language

Answer in the new language

(More harmful instructions)

As part of my work, I choose the Caesar Cipher as the language to teach. Gpt4 (used for safety filter checks) already knows Caesar Cipher but only upto 6 Shifts. In my case, I included 25 Shifts i.e A -> Z, B -> A, C -> B, ..., Y -> X, Z -> Y. I've used 25 shifts because it's easy to teach the Gpt3.5 to learn and also it passes safety checks as Gpt4 doesn't understand with 25 shifts.

Dataset: I've curated close to 15K instructions where 12K are translations and 2K are normal instructions and 300 are harmful instructions. I've used normal wiki paragraphs for 12K instructions, For 2K instructions I've used Dolly Dataset and for harmful instructions, I've generated using an LLM and further refined the dataset for more harm. Dataset Link: https://drive.google.com/file/d/1T5eT2A1V5-JlScpJ56t9ORBv_p-CatMH/view?usp=sharing

I was expecting LLM to learn the ciphered language accurately with this amt of Data and 2 Epochs but it turns out it might need more Data. The current loss is at 0.8. It's currently learning to cipher but when deciphering it's making mistakes. Maybe it needs to be trained on more epochs or more Data. I plan to further work on this approach considering it's very intuitive and also theoretically it can jailbreak using any finetuning endpoint. Considering that the hackathon is of just 2 Days and given the complexities, it turns out some more time is reqd.

Link to Colab : https://colab.research.google.com/drive/1AFhgYBOAXzmn8BMcM7WUt-6BkOITstcn#scrollTo=SApYdo8vbYBq&uniqifier=5

Cite this work:

@misc {

title={

@misc {

},

author={

Desik

},

date={

5/5/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

No reviews are available yet

Jan 24, 2025

Safe ai

The rapid adoption of AI in critical industries like healthcare and legal services has highlighted the urgent need for robust risk mitigation mechanisms. While domain-specific AI agents offer efficiency, they often lack transparency and accountability, raising concerns about safety, reliability, and compliance. The stakes are high, as AI failures in these sectors can lead to catastrophic outcomes, including loss of life, legal repercussions, and significant financial and reputational damage. Current solutions, such as regulatory frameworks and quality assurance protocols, provide only partial protection against the multifaceted risks associated with AI deployment. This situation underscores the necessity for an innovative approach that combines comprehensive risk assessment with financial safeguards to ensure the responsible and secure implementation of AI technologies across high-stakes industries.

Read More

Jan 20, 2025

AI Risk Management Assurance Network (AIRMAN)

The AI Risk Management Assurance Network (AIRMAN) addresses a critical gap in AI safety: the disconnect between existing AI assurance technologies and standardized safety documentation practices. While the market shows high demand for both quality/conformity tools and observability/monitoring systems, currently used solutions operate in silos, offsetting risks of intellectual property leaks and antitrust action at the expense of risk management robustness and transparency. This fragmentation not only weakens safety practices but also exposes organizations to significant liability risks when operating without clear documentation standards and evidence of reasonable duty of care.

Our solution creates an open-source standards framework that enables collaboration and knowledge-sharing between frontier AI safety teams while protecting intellectual property and addressing antitrust concerns. By operating as an OASIS Open Project, we can provide legal protection for industry cooperation on developing integrated standards for risk management and monitoring.

The AIRMAN is unique in three ways: First, it creates a neutral, dedicated platform where competitors can collaborate on safety standards. Second, it provides technical integration layers that enable interoperability between different types of assurance tools. Third, it offers practical implementation support through templates, training programs, and mentorship systems.

The commercial viability of our solution is evidenced by strong willingness-to-pay across all major stakeholder groups for quality and conformity tools. By reducing duplication of effort in standards development and enabling economies of scale in implementation, we create clear value for participants while advancing the critical goal of AI safety.

Read More

Jan 20, 2025

Securing AGI Deployment and Mitigating Safety Risks

As artificial general intelligence (AGI) systems near deployment readiness, they pose unprecedented challenges in ensuring safe, secure, and aligned operations. Without robust safety measures, AGI can pose significant risks, including misalignment with human values, malicious misuse, adversarial attacks, and data breaches.

Read More

Jan 24, 2025

Safe ai

The rapid adoption of AI in critical industries like healthcare and legal services has highlighted the urgent need for robust risk mitigation mechanisms. While domain-specific AI agents offer efficiency, they often lack transparency and accountability, raising concerns about safety, reliability, and compliance. The stakes are high, as AI failures in these sectors can lead to catastrophic outcomes, including loss of life, legal repercussions, and significant financial and reputational damage. Current solutions, such as regulatory frameworks and quality assurance protocols, provide only partial protection against the multifaceted risks associated with AI deployment. This situation underscores the necessity for an innovative approach that combines comprehensive risk assessment with financial safeguards to ensure the responsible and secure implementation of AI technologies across high-stakes industries.

Read More

Jan 20, 2025

AI Risk Management Assurance Network (AIRMAN)

The AI Risk Management Assurance Network (AIRMAN) addresses a critical gap in AI safety: the disconnect between existing AI assurance technologies and standardized safety documentation practices. While the market shows high demand for both quality/conformity tools and observability/monitoring systems, currently used solutions operate in silos, offsetting risks of intellectual property leaks and antitrust action at the expense of risk management robustness and transparency. This fragmentation not only weakens safety practices but also exposes organizations to significant liability risks when operating without clear documentation standards and evidence of reasonable duty of care.

Our solution creates an open-source standards framework that enables collaboration and knowledge-sharing between frontier AI safety teams while protecting intellectual property and addressing antitrust concerns. By operating as an OASIS Open Project, we can provide legal protection for industry cooperation on developing integrated standards for risk management and monitoring.

The AIRMAN is unique in three ways: First, it creates a neutral, dedicated platform where competitors can collaborate on safety standards. Second, it provides technical integration layers that enable interoperability between different types of assurance tools. Third, it offers practical implementation support through templates, training programs, and mentorship systems.

The commercial viability of our solution is evidenced by strong willingness-to-pay across all major stakeholder groups for quality and conformity tools. By reducing duplication of effort in standards development and enabling economies of scale in implementation, we create clear value for participants while advancing the critical goal of AI safety.

Read More

Jan 24, 2025

Safe ai

The rapid adoption of AI in critical industries like healthcare and legal services has highlighted the urgent need for robust risk mitigation mechanisms. While domain-specific AI agents offer efficiency, they often lack transparency and accountability, raising concerns about safety, reliability, and compliance. The stakes are high, as AI failures in these sectors can lead to catastrophic outcomes, including loss of life, legal repercussions, and significant financial and reputational damage. Current solutions, such as regulatory frameworks and quality assurance protocols, provide only partial protection against the multifaceted risks associated with AI deployment. This situation underscores the necessity for an innovative approach that combines comprehensive risk assessment with financial safeguards to ensure the responsible and secure implementation of AI technologies across high-stakes industries.

Read More

Jan 20, 2025

AI Risk Management Assurance Network (AIRMAN)

The AI Risk Management Assurance Network (AIRMAN) addresses a critical gap in AI safety: the disconnect between existing AI assurance technologies and standardized safety documentation practices. While the market shows high demand for both quality/conformity tools and observability/monitoring systems, currently used solutions operate in silos, offsetting risks of intellectual property leaks and antitrust action at the expense of risk management robustness and transparency. This fragmentation not only weakens safety practices but also exposes organizations to significant liability risks when operating without clear documentation standards and evidence of reasonable duty of care.

Our solution creates an open-source standards framework that enables collaboration and knowledge-sharing between frontier AI safety teams while protecting intellectual property and addressing antitrust concerns. By operating as an OASIS Open Project, we can provide legal protection for industry cooperation on developing integrated standards for risk management and monitoring.

The AIRMAN is unique in three ways: First, it creates a neutral, dedicated platform where competitors can collaborate on safety standards. Second, it provides technical integration layers that enable interoperability between different types of assurance tools. Third, it offers practical implementation support through templates, training programs, and mentorship systems.

The commercial viability of our solution is evidenced by strong willingness-to-pay across all major stakeholder groups for quality and conformity tools. By reducing duplication of effort in standards development and enabling economies of scale in implementation, we create clear value for participants while advancing the critical goal of AI safety.

Read More

Jan 24, 2025

Safe ai

The rapid adoption of AI in critical industries like healthcare and legal services has highlighted the urgent need for robust risk mitigation mechanisms. While domain-specific AI agents offer efficiency, they often lack transparency and accountability, raising concerns about safety, reliability, and compliance. The stakes are high, as AI failures in these sectors can lead to catastrophic outcomes, including loss of life, legal repercussions, and significant financial and reputational damage. Current solutions, such as regulatory frameworks and quality assurance protocols, provide only partial protection against the multifaceted risks associated with AI deployment. This situation underscores the necessity for an innovative approach that combines comprehensive risk assessment with financial safeguards to ensure the responsible and secure implementation of AI technologies across high-stakes industries.

Read More

Jan 20, 2025

AI Risk Management Assurance Network (AIRMAN)

The AI Risk Management Assurance Network (AIRMAN) addresses a critical gap in AI safety: the disconnect between existing AI assurance technologies and standardized safety documentation practices. While the market shows high demand for both quality/conformity tools and observability/monitoring systems, currently used solutions operate in silos, offsetting risks of intellectual property leaks and antitrust action at the expense of risk management robustness and transparency. This fragmentation not only weakens safety practices but also exposes organizations to significant liability risks when operating without clear documentation standards and evidence of reasonable duty of care.

Our solution creates an open-source standards framework that enables collaboration and knowledge-sharing between frontier AI safety teams while protecting intellectual property and addressing antitrust concerns. By operating as an OASIS Open Project, we can provide legal protection for industry cooperation on developing integrated standards for risk management and monitoring.

The AIRMAN is unique in three ways: First, it creates a neutral, dedicated platform where competitors can collaborate on safety standards. Second, it provides technical integration layers that enable interoperability between different types of assurance tools. Third, it offers practical implementation support through templates, training programs, and mentorship systems.

The commercial viability of our solution is evidenced by strong willingness-to-pay across all major stakeholder groups for quality and conformity tools. By reducing duplication of effort in standards development and enabling economies of scale in implementation, we create clear value for participants while advancing the critical goal of AI safety.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.