Choreographing Human-AI Teamwork

For this project, you will work in groups of six to design a functioning, interactive AI application that effectively collaborates with its users. (A few groups may have 7 members if the class size is not divisible by 6.)

On this page:

Tasks for Human-AI Teamwork
Learning Goals
Weekly Tasks
Cook-Off
Final submission
Grading Rubric

Tasks for Human-AI Teamwork

Each group has been assigned one of the tasks below. Your job is to design and implement an AI system (including its model, data flow, interface, and interactions) to help the user-AI team excel at this task. The system can be a standalone web app, a Google Colab notebook, or equivalent.

AI & Programming: Design and implement an LLM-powered chatbot that helps a novice programmer create p5.js sketches that creatively communicate a unique life experience of theirs (e.g., feeling lonely in a crowd, experiencing the elation of feeling on top of the world).

🔖 Example task: Create a p5.js sketch that communicates what it felt like to be an outsider, using the AI chatbot to help you code and refine your sketch.
💭 Why this task matters: It mirrors real-world challenges in designing AI that enhances programmers’ creativity rather than automating their work.

AI & Dating: Design a genAI-powered web app that gathers information about the type of social relationship the user is seeking and generates a fictional profile of the ideal match. The app can refine this profile iteratively as the user provides more inputs.

🔖 Example task: Use the AI web app to help you articulate what you’re looking for in a lifelong friendship.
💭 Why this task matters: It mirrors real-world challenges in designing AI that recommends various types of social connections, such as those found on platforms like meetup.com, Reddit, or LinkedIn.

AI & 3D Assembly: Design and implement an AI-powered chatbot that utilizes the user’s webcam and keyboards for data input and generates suggestions to help them quickly and accurately create a specific design with a provided set of LEGO or IKEA pieces.

🔖 Example task: Unfold four IKEA KOMPLEMENT Boxes as quickly as possible, using the AI system for assistance.
💭 Why this task matters: It mirrors real-world challenges in designing AI that enhances creativity for professionals working in 3D (e.g., mechanical engineers, architects, landscape designers) rather than automating their expertise away.

Learning Goals

Understand key concepts that underpin “good” one-human-one-AI interaction, such as user-AI collaboration (as opposed to coordination), AI intelligibility (as opposed to explainability), trustworthiness of and trust in AI.
Apply these concepts when making design decisions for a functioning AI application.
Gain experience collaborating in interdisciplinary AI application design teams.

Weekly Tasks

Task 1

Assign responsibilities among the existing AI model(s), the user, and any non-AI features of the application (e.g., heuristic-based decision trees, look-up tables).
Based on this division of responsibilities, plan the data flow for your web application.
Investigate framework options (e.g., LangChain, Streamlit, React, Gradio) based on your team’s skills and the planned data flow. Choose the simplest system architecture to achieve your envisioned teamwork in order to save trouble down the line.
⏰ Preparation for the next class: Obtain necessary vibe coding tools and API keys (e.g., OpenAI, Anthropic, Hugging Face) based on your chosen framework and models. Ensure at least 3-4 team members have working access. Gemini Pro is free for students for one year.

💭 Two common pitfalls: Assigning AI or the user a task too complex for it to handle reliably, or assigning AI such a simple task that a look-up table, a simple search on YouTube, or a brief ChatGPT conversation would suffice.

💭 Use only off-the-shelf AI model(s) and simple system architecture. The goal is to learn how to design human-AI teams that achieve things neither humans nor AI can do alone, not to build new models or push AI boundaries. You may use multiple models via LLM chaining or agentic workflows only if absolutely necessary, and your team has the technical expertise to do so.

🛎️ Team representative: Submit your team’s team contract, division of responsibility design, and research notes here.

Task 2

Document your envisioned divison of human-AI roles in this FigJam worksheet.
Create a barebone prototype implementing this design and test it with at least two users. Use a real task and real teamwork evaluation survey during this and all future rounds of testing.
Analyze based on what you observe: How can team performance be improved? Refine your division of responsibilities and data flow accordingly.
⏰ Preparation for the next classes: Include a simple way to override the AI’s actual response with pre-defined responses you choose (e.g., a button that makes the AI give a wrong answer, or a dropdown to select from specific test responses) in your system. This testing mode will help you test various interaction scenarios in Tasks 3-4.

🎯 Your users’ answers to survey questions 1-2 (This task would have been easier if I had done it by myself. I felt subservient to the system) and the final question (Did the human-AI team produce a successful output for the task?) are strong indicators of how effectively you allocated roles between the human and AI. Iterate until your users rate questions 1–2 low and rate the final question highly.

💭 Common error: User tests had participants complete a toy task or the real task for only a few minutes, rather than continuing until task completion.

💭 Common error: Spend too much time on engineering not on human-AI collaboration design. A barebone prototype is typically a command line interface, a Colab notebook, or a web app using prompt chaining tool (e.g., flowiseai), which allows flexioble control over division of roles but minimal engineering work.

🛎️ Team representative: Submit the following through this form: (1) the worksheet URL and (2) a link to a Google Doc or Drive folder containing a screen recording of one user testing your latest system version and a brief description of how the human-AI division of roles has evolved across iterations.

🛎️ Every student: Please complete peer evaluation here.

Task 3

Update the human-AI teamwork protocol (same FigJam worksheet) to include a communication protocol for the key moments in your task. The protocol should address the three steps covered in lecture.

Implement and test the protocol with at least 2 users. During each session, users should experience at least one AI error (can be arbitrarily introduced using the testing mode from Task 2). Analyze your observations, refine your communication design, and implement the best and final version.

🎯 Your users’ answers to survey questions 3-6 (I could anticipate… I can understand…I can predict…When an error occurred, I can…) are good proxies for your communication design’s effectiveness. Iterate until your users rate these questions highly.

💭 Continue using the real tasks until completion and evaluation metrics we’ll use for the cook-offs for user testing.

💭 The simplest implementation of your system is likely what’s known as Wizard of Oz: rather than coding the communication protocol into your system, a team member plays the role, watching the AI’s outputs and manually saying or showing users whatever your protocol specifies at each key moment. Analyze how well this design works, iterate until users rate the relevant survey questions highly, and then implement the final design.

Task 4

Update the human-AI teamwork protocol (same FigJam worksheet) to include trust-building and trust recovery processes after AI errors.

Same as in the previous task: Implement and test the protocol with at least 2 users. During each session, users should experience at least one AI error (can be arbitrarily introduced using the testing mode from Task 2). Analyze what you observe, refine your communication design, and implement the final design.

💭 Your users’ answer to survey question 8 (Would you use this system in real life despite its limitations?) is a strong indicator of how good the design is.

🛎️ Each group member: Time for peer evaluation.

Cook-Off

During week 10 class, all teams will set up a booth with 2-3 computers where students from other teams will test your system. Testers will perform a task designed by the teaching team for 15 minutes (or less if they finish early). Neither teams nor testers will know the specific task ahead of time (e.g., which prompt we’ll give users for your p5.js assistant, or which IKEA piece users will assemble with your AI).

Testers will then complete a survey with the following questions, which will count toward the project grade:

Division of responsibility design: Both AI and the user contributed meaningfully
- Agree/disagree: This task would have been easier if I had done it by myself.
- Agree/disagree: I felt subservient to the system.
Communication and AI intelligibility:
- Agree/disagree: I could anticipate what the system would do next.
- Agree/disagree: I could understand what the system is expecting of me at all times.
Design for trust building and trust recovery after AI errors:
- Agree/disagree: I could predict when and what errors this AI might make.
- Agree/disagree: When an error occurred, I understood how to recover from it and how to adapt my use of this system to its strengths and weaknesses.
Subjective teamwork experience:
- How successful was your collaboration with the system?
- Would you use this system in real life despite its limitations?
Objective team performance: Did you and the AI complete the task successfully?
- [3D Assembly] How fast was the assembly?
- [Programming & Social Connection] How well did the sketch capture your feelings/your ideal social relation?

Final submission

🛎️ Within one week after the cook-off (the Wed after spring break), one group representative should submit the following through this form.

The group’s final AI system with documentation of your human-AI teamwork design (division of responsibilities and communication protocol)
Responses to teaching team questions from the cook-off
Reflections on teamwork in AI application design

Grading Rubric

Grading rubrics for Project 2 tasks are available here. The rubric for the final submission is here.