Guardrails on Generative AI for Enhanced Trust: Implementing Constraints and Iterative Reprompting
This thesis seeks to explore the design and implementation of Guardrails for Generative AI to restrict its output, combined with iterative re-prompting mechanisms, ensuring the generated content aligns with predefined constraints, thus fostering trust in the system.
Contact persons
Master Project Description
Generative AI, while showcasing remarkable capabilities in content creation, often produces outputs that might be inappropriate, biased, or outside desired constraints. Ensuring trust in Generative AI necessitates the implementation of boundaries or "Guardrails" that restrict and guide the AI's generative process.
Research Topic Focus
- Understanding the current limitations and biases present in popular Generative AI models.
- Investigating methodologies for designing effective Guardrails that restrict undesired AI output.
- Developing mechanisms for iterative re-prompting to guide Generative AI towards desired outputs.
- Evaluating the efficacy and reliability of the Guardrails and re-prompting mechanisms in real-world scenarios.
- Potential validation on case studies from health, manufacturing, space, and energy verticals
Expected Results
- An in-depth analysis of the strengths and vulnerabilities of contemporary Generative AI models concerning trust.
- A framework for implementing Guardrails and iterative re-prompting in Generative AI systems.
- Demonstrated improvements in the trustworthiness and reliability of Generative AI outputs using the proposed methods.
- Insights into potential challenges and further research opportunities in ensuring trust in Generative AI.
Learning Outcomes
- Develop a deep understanding of the trust challenges in contemporary Generative AI systems.
- Acquire skills in designing and implementing Guardrails and re-prompting mechanisms.
- Enhance the capability to critically analyze and evaluate the reliability and trustworthiness of Generative AI outputs.
- Gain hands-on experience in integrating trust mechanisms in real-world AI applications.
Qualifications
- A solid foundation in AI and generative models.
- Familiarity with popular Generative AI frameworks and tools.
- Proficient in relevant programming languages (preferably Python).
- An analytical mindset with a focus on ethical AI and trust mechanisms.
References
- Gasser, Urs, and Viktor Mayer-Schönberger. "Guardrails: Guiding Human Decisions in the Age of AI." Guardrails. Princeton University Press, 2024.
- Wang, Yanchen, and Lisa Singh. "Adding guardrails to advanced chatbots." arXiv preprint arXiv:2306.07500(2023).
- https://docs.guardrailsai.com/
Contact persons/supervisors
Sagar Sen ( Arda Goknil (), Erik Johannes Husom ()