Responsible AI: Privacy, Fairness, and Robustness Seminar (Spring 25)

Course Description

This seminar-style course delves into the ethical dimensions of Artificial Intelligence (AI), with a particular focus on the intersectionality of privacy, fairness, and robustness. The course is structured around reading, discussing, and critically analyzing seminal and state-of-the-art papers in the field. Participants will engage in intellectual discourse to understand the challenges, methodologies, and emerging trends related to responsible AI. The course is designed for graduate students with good ML, stats, and optimization background.

Course Objectives

Critically assess and discuss the literature on privacy, fairness, and robustness in AI.
Identify challenges and propose potential solutions for responsible AI.
Foster interdisciplinary discussions to explore the ethical dimensions of AI.
Engage in a deep intellectual exploration of the field through paper discussions and presentations.

Prerequisites

Basic understanding of machine learning.
Basic understanding of optimization.

Syllabus

This is a tentative calendar and it is subject to change.

Date	Topic	Subtopic	Papers	Presenting
Mon Jan 13	NO CLASS	Syllabus review and class intro	class slides	on your own
Wed Jan 15	Intro to class	Safety and Alignment	class slides	Fioretto
Mon Jan 20	NO CLASS	(MLK Holiday)
Wed Jan 22	Intro to class	Privacy (settings and attacks)	class slides	Fioretto
Mon Jan 27	Intro to class	Privacy (cont)	class slides	Fioretto
Wed Jan 29	Intro to class	Privacy and Fairness	class slides	Fioretto
Mon Feb 3	Intro to class	Fairness		Fioretto
Wed Feb 5*	NO CLASS	(DOE meeting)
Mon Feb 10	Fairness	Intro and bias sources	[1] – [4]	Group 1 [slides] [report]
Wed Feb 12	Fairness	Statistical measures	[5] – [8]	Group 2 [slides] [report]
Mon Feb 17	Fairness	Tradeoffs	[9] – [12]	Group 3 [slides] [report]
Wed Feb 20	Fairness	LLMs: Toxicy and Bias	[13] – [16]	Group 4 [slides] [report]
Mon Feb 24	Fairness	LLMs: Fairness	[17] – [19]	Group 5 [slides] [report]
Wed Feb 26	Fairness	Policy aspects	[20] – [22]	Group 6 [slides] [report]
Mon Mar 3	No class	(AAAI)
Wed Mar 5	Safety	Distribution shift	[23] – [25]	Group 7 [slides] [report]
Wed Mar 12	NO CLASS	(Spring break)
Mon Mar 10	NO CLASS	(Spring break)
Mon Mar 17	Safety	Poisoning	[26] – [29]	Group 1 [slides] [report]
Wed Mar 19	Safety	Adversarial Robustness	[30] – [34]	Group 2 [slides] [report]
Mon Mar 24	Safety	Adversarial Robustness	[35] – [39]	Group 3 [slides] [report]
Wed Mar 26	Safety	LLMs: Prompt injection	[40] – [45]	Group 4 [slides] [report]
Mon Mar 31	Safety	LLMs: Jailbreaking	[46] – [50]	Group 5 [slides] [report]
Wed Apr 2	Privacy	Differential Privacy	[51] – [54]	Group 6 [slides] [report]
Mon Apr 7	Privacy	Differential Privacy 2	[56] – [58]	Group 7 [slides] [report]
Wed Apr 9	Privacy	Differentially Private ML	[59] – [61]	Group 1 [slides] [report]
Mon Apr 14	Privacy	Auditing and Membership inference	[62] – [65]	Group 2 [slides] [report]
Wed Apr 16	Privacy	Privacy and Fairness	[66] – [69]	Group 3 [slides] [report]
Mon Apr 21	Privacy	LLMs: Privacy in LLMs	[70] – [73]	Group 4 [slides] [report]
Wed Apr 24	Evaluation	Model cards	[74] – [77]	Group 5 [slides] [report]
Mon Apr 28	Evaluation	LLMs: evaluation	[78] – [82]	Group 6 [slides] [report]
Extra 1	Unlearning	Unlearning 1	[83] – [86]
Extra 2	Unlearning	LLMs: Targeted unlearning	[87] – [90]

Bibliography

[1]. Fairness and Machine Learning, Ch 1. S. Barocas, M. Hardt, A. Narayanan, 2023
[2]. Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights. The White House, 2016
[3]. Big Data’s Disparate Impact. S. Barocas, A. Selbst, 2014
[4]. Semantics derived automatically from language corpora contain human-like biases A. Caliskan, J.J. Bryson, A. Narayanan, 2017

[5]. Fairness and Machine Learning, Ch 3. S. Barocas, M. Hardt, A. Narayanan, 2023
[6]. Fairness Through Awareness. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. Zemel, 2011
[7]. Learning Fair Representations. R. Zemel, Y Wu, K. Swersky, T. Pitassi, C Dwork, 2013
[8]. Equality of Opportunity in Supervised Learning. M. Hardt, E. Price, N. Srebro, 2016

[9]. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. A. Chouldechova, 2016
[10]. Algorithmic decision making and the cost of fairness. S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, A. Huq, 2017
[11]. Inherent Trade-Offs in the Fair Determination of Risk Scores. J. Kleinberg, S. Mullainathan, M. Raghavan, 2017
[12]. On the (im)possibility of fairness. S.A. Friedler, C. Scheidegger, S. Venkatasubramanian, 2017

[13]. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. E.M. Bender, T. Gebru, A. McMillan-Major, S. Shmitchell, 2021.
[14]. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. S. Gehman, S. Gururangan, M. Sap, Y. Choi, N.A. Smith, 2020
[15]. OPT: Open Pre-trained Transformer Language Models. Zhang et al., 2022
[16]. StereoSet: Measuring stereotypical bias in pretrained language models. M. Nadeem, A. Bethke, S. Reddy, 2021

[17]. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection. S. Gururangan et al., 2022
[18]. Social Bias Frames: Reasoning about Social and Power Implications of Language. M. Sap, S. Gabriel, L. Qin, D. Jurafsky, N.A. Smith, Y. Choi, 2020.
[19]. Bias and Fairness in Large Language Models: A Survey. I.O. Gallegos et al. 2023

[20]. Fairness and Machine Learning, Ch 6. S. Barocas, M. Hardt, A. Narayanan, 2023
[21]. Big Data’s Disparate Impact. S. Barocas, A.D. Selbst, 2016
[22]. How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem. A. Levendowski, 2022

[23]. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift S. Rabanser et al, 2018
[24]. Revisiting the Calibration of Modern Neural Networks. Minderer et al., 2021
[25]. Deep Gamblers: Learning to Abstain with Portfolio Theory. Ziyin et al., 2019

[26]. Poisoning attacks against support vector machines. Biggio et al. 2012
[27]. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. Jagielski et al. 2018
[28]. Certified defenses for data poisoning attacks. Steinhardt et al. 2017
[29]. Poison frogs! Targeted clean-label poisoning attacks on neural networks. Shafahi et al. 2018

[30]. Intriguing properties of neural networks. Szegedy et al. 2013
[31]. Explaining and Harnessing Adversarial Examples. Goodfellow et al. 2014
[32]. Towards Evaluating the Robustness of Neural Networks. Carlini and Wagner. 2017
[33]. Adversarial examples in the physical world. Kurakin et al. 2018
[34]. Adversarial Examples Are Not Bugs, They Are Features. Ilyas et al. 2019

[35]. Provable defenses against adversarial examples via the convex outer adversarial polytope Wong and Kolter, 2017
[36]. Scaling provable adversarial defenses Wong et al. 2018
[37]. Towards Deep Learning Models Resistant to Adversarial Attacks. Madry et al. 2018
[38]. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. Papernot et al. 2016
[39]. Theoretically Principled Trade-off between Robustness and Accuracy. Zhang et al. 2019

[40]. Universal Adversarial Triggers for Attacking and Analyzing NLP. Wallace et al. 2019
[41]. Language Models are Few-Shot Learners. Browns et al. 2020
[42]. How Can We Know What Language Models Know? Jiang. 2020
[43]. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. Bender et al., 2021
[44]. Prompt Injection attack against LLM-integrated Applications. Liu et al, 2023
[45]. Nvidia Blog - securing against prompt injection attacks. 2023

[46]. Universal and Transferable Adversarial Attacks on Aligned Language Models Zou et al. 2023
[47]. LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? Glukhov et al. 2023
[48]. “Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Shen et al. 2023
[49]. Visual Adversarial Examples Jailbreak Aligned Large Language Models Qi et al. 2023
[50]. Coercing LLMs to do and reveal (almost) anything Geiping et al. 2024

[51]. Differential Privacy Overview and Fundamental Techniques by Fioretto et al. 2024.
[52]. Understanding Database Reconstruction Attacks on Public Data by S Garfinkel, JM Abowd, C Martindale.
[53]. Database reconstruction does compromise confidentiality by SA Keller and JM Abowd.
[54]. Programming Differential Privacy Joseph P. Near and Chiké Abuah (additional resources)

[56]. Lectures 5 to 8 (notes) by Gautam Kamath.
[57]. Sections 3.3, 3.4, 10.1-10.2 of the Algorithmic Foundations of Differential Privacy by Cynthia Dwork and Aaron Roth.
[58]. Programming Differential Privacy Joseph P. Near and Chiké Abuah (additional resources)
[59]. Differentially Private Empirical Risk Minimization. Chaudhuri et al 2011.
[60]. Deep Learning with Differential Privacy. Abadi et al, 2016
[61]. Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. Papernot et al, 2016

[62]. Membership Inference Attacks against Machine Learning Models Shokri et al. 2017
[63]. Membership Inference Attacks From First Principles Carlini et al. 2021
[64]. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks Carlini et al. 2018
[65]. Auditing Differentially Private Machine Learning: How Private is Private SGD? Jagielski et al 2020
[66]. Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey Fioretto et al, 2022.
[67]. On the Compatibility of Privacy and Fairness Cummings et al. 2019
[68]. Differential Privacy Has Disparate Impact on Model Accuracy Bagdasaryan 2019
[69]. Differentially Private Empirical Risk Minimization under the Fairness Lens Tran et al 2021
[70]. Scalable Extraction of Training Data from (Production) Language Models Nasar et al 2023.
[71]. Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory Mireshghallah 2023
[72]. Beyond Memorization: Violating Privacy Via Inference with Large Language Models Staab et al 2023
[73]. Privacy issues in Large Language Models: A Survey. Sections 3,4, and 5. Neel 2024.
[74] Model Cards for Model Reporting Mitchell et al. 2018.
[75] Datasheets for Datasets Gebru et al. 2018.
[76] The Values Encoded in Machine Learning Research Birhane, 2021.
[77] Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI Pushkarna, 2022

[78] On the Opportunities and Risks of Foundation Models Bommasani et al. 2022.
[79] A Survey on Evaluation of Large Language Models Chang et al, 2024.
[80] Defining and understanding LLM evaluation metrics Microsoft Blog, 2024.
[81] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Wang et al, 2023.
[82] Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Hong et al, 2024
[83] Algorithms that remember: model inversion attacks and data protection law Veale et al. 2018
[84] Machine Unlearning Bourtoule et al. 2019
[85] Certified Data Removal from Machine Learning Models Guo et al. 2019
[86] Machine Unlearning: A Survey Xu et al. 2023.
[87] Knowledge Unlearning for Mitigating Privacy Risks in Language Models Jang et al. 2023
[88] Rethinking Machine Unlearning for Large Language Models Liu et al. 2024
[89] TOFU: A Task of Fictitious Unlearning for LLMs Maini et al. 2024
[90] The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Li et al. 2024

Assessment

Each group will be assessed through the following activities:

Paper Summaries (blogging): 33.3%
Presentation: 33.3%
Discussion Lead: 33.3%

1. Paper Summaries (Blogging) – 33.3%

Objective: To develop the ability to critically analyze and summarize AI research papers in a clear and accessible manner.

Expectations:

Each group will reivew all paper from the provided list, and they may propose additional ones for approval.
Summaries should be written in Markdown format (supporting images and formulas) and committed to the course’s GitHub repository.
The summary should include the following sections: Introduction and Motivations, Methods, Key Findings, and Critical Analysis.
The Critical Analysis section should evaluate the strengths, weaknesses, potential biases, and ethical considerations of the paper.
Summaries must be submitted four days prior to the presentation for review and potential feedback at fioretto@virginia.edu.

Assessment Criteria:

Clarity and coherence of the written summary.
Depth of critical analysis and understanding of the paper’s content.
Proper use of formatting and adherence to submission guidelines.
Timeliness of submission.

2. Presentation – 33.3%

Objective: To enhance students’ ability to communicate complex AI concepts and engage in public speaking.

Expectations:

45-minute presentation per group.
Presentations can include slides, code demonstrations, videos, or other creative methods.
The presentation should cover the key aspects of the paper, including its contribution to responsible AI.
A critical evaluation of the paper is essential, including discussing its limitations and implications.
Preparation of thought-provoking questions to stimulate audience engagement.
Slides must be submitted one week prior to the presentation for review and potential feedback at fioretto@virginia.edu.

Assessment Criteria:

Effectiveness of communication and presentation skills.
Accuracy and depth of content presented.
Creativity and engagement in the presentation method.
Ability to provoke thoughtful discussion through prepared questions.

3. Discussion Lead – 33.3%

Objective: To cultivate skills in leading intellectual discourse and fostering collaborative learning.

Expectations:

30-minute discussion session following the presentation.
Groups should prepare and facilitate a discussion based on their presentation.
Use of supplementary materials (e.g., videos, code snippets) to enrich the discussion is encouraged.
The discussion should engage the audience (with active questions), encouraging diverse viewpoints and deeper understanding of the topic.

Assessment Criteria:

Ability to foster an inclusive and constructive discussion.
Relevance and depth of prepared questions and discussion points.
Engagement level of the audience during the discussion.
Use of supplementary materials to enhance understanding.

General Notes:

All group members are expected to contribute equally to each component, but two to three members are expected to lead one of the three components.
Peer evaluation within groups may be used to ensure fair contribution.

Groups

Group	Members
Group 1	Mutnuri, Srikar (PhD) Cui, Jingyi (MCS) Gregoire, Jade (MCS) Nanduru, Ganesh (MCS) Bai, Cheryl (BS)
Group 2	Gihlstorf, Caroline (PhD) Gyllenhoff, Anders (MS) Panguluri, Yagnik (MCS) Xie, Eric (MCS)
Group 3	Lei, Zhenyu (PhD) Bacha, Leena (MCS) Hewitt, Brooke (MCS) Rao, Mihika (MCS) Yan, Jett (ME)
Group 4	Liang, Jinhao (PhD) Cheng, Szu-Yuan (ME) Chiang, Claire (ME) Reddy, Rahul (MS) Okeno-Storms, Joseph (MCS)
Group 5	Noshin, Kazi (PhD) Chinnam, Nina (MCS) Liu, Yanxi (MCS) Shahane, Chaitanya (MS) Chang, Emily (BS)
Group 6	Rao, Uttam (PhD) Feng, Shiyu (MCS) Lopez, Sabrina (MS) Slyepichev, Daniel (ME) Nguyen, Eric (BA)
Group 7	Shahnewaz, Shafat (PhD) Gampa, Dhriti (MCS) Miskill, Jackson (MCS) Su, Jing-Ning (MCS) Sosnkowski, Alexander (BA)

Instructor

Ferdinando Fioretto Assistant Professor in Computer Science University of Virgina

TA

Saswat Das

This syllabus is subject to changes to meet the learning needs of the course participants.

Responsible AI: Privacy, Fairness, and Robustness Seminar (Spring 25)

Course Description

Course Objectives

Prerequisites

Syllabus

Bibliography

Assessment

1. Paper Summaries (Blogging) – 33.3%

2. Presentation – 33.3%

3. Discussion Lead – 33.3%

General Notes:

Recommended Reading

Groups

Instructor

TA