Friday, January 26, 2024

My 3 takeaways from Agastya’s “Student, teacher, and AI” conference at Kuppam

Thanks to my friend Ajith Basu, I got an opportunity to participate in the “Student, teacher, and AI”, a national conference held at Agasty’s beautiful Kuppam campus. I was part of the facilitation team with Shriram Bharathan and Suhasini Seelin. The participants came from education departments in the central and several state government offices, schools, colleges, corporates, startups, and NGOs. The conference had 3 thrust areas: (1) demystifying AI, (2) the role of AI in future curriculum, pedagogy, and assessment, and (3) AI's influence on social and emotional learning. Here are my 3 takeaways from my sketchy and selective notes.

Self-learning may be a myth: AI is going to enable self-learning by generating personalized insights. For example, AI can tell the teacher that specific four students are weak in – say “division by 7”. This perspective was championed by Anand Rangarajan of Google among others. Having experienced online self-learning and being a beneficiary of YouTube’s recommendation engine myself, I was drawn to this view. However, Prof Bindu Thirumalai of TISS was vocal in suggesting that self-learning is a myth. Learning is fundamentally a social phenomenon and peer group and mentoring play a crucial role. Having grown up in an educated family and having access to helping friends, could my understanding of self-learning be flawed? I am curious.

Empathy, not yet, but beware of biases today: We looked at a short fictional case where Preetha, a personal artificial assistant, acts as an empathic friend to an 8th-standard girl, Swati, who is struggling with math in the class. Experts felt that most of the technological elements needed for the dialogue are already present. However, the degree of empathy and warmth demonstrated in the story is still missing in the human-AI interaction.

We also explored biases exhibited by Swati and Preetha. While we were doing that Dr. Pradeep from Google fed the story to Bard and showed us how Bard can identify biases participants had not spotted yet. We also reflected upon our own biases we are carrying. During this exercise, most of us were using the term biases to mean prejudices and inclinations. Prof Arun Tangirala of IIT Madras championed a view that for something to be called a bias, we need to have a ground truth and evaluate whether there is a systematic error of judgment. While there were differences in the meaning of bias, there was a consensus that biases will be amplified in the AI world, and it demands greater awareness.

Will AI enable creative adaptive intelligence? Not clear. Ramji Raghavan of Agastya proposed that to live in a world where technology such as AI widens the complexity gap, we need creative adaptive intelligence. Will AI enable it? It is not obvious. Some participants felt that they were already turning to ChatGPT for every problem, and that meant they were becoming lazy. Prof C. K. Manjunath from SMVITM, Mangalore presented how an AI-enabled advertisement such as Titan Eye+ becomes interactive and fun and asked, “How can an average teacher match this creativity?” Ms Changra, the Education Minister from Dharamsala, felt that unless we are alert, technology overdependence may affect our mental well-being.

To me personally, the two high points of the conference had very limited AI in it. One was a play by kids from Ganganagar Government School in Bangalore directed by Suhasini, and the second one was a veena recital by Vidushi. Sujatha Thiagarajan. Both evoked strong emotions. Would an AI-enacted play or an AI-recital in the future have a similar effect? I don’t know.

image credit: Agastya International Foundation

Monday, January 15, 2024

3Cs of idea communication illustrated through Andrej Karpathy’s LLM talk

Communicating your idea effectively is important be it to customers, team members, or investors. We presented 3 attributes of idea communication – curiosity, concreteness, and credibility in our book “8 steps to innovation”. In this blog, I would like to illustrate the 3Cs using Andrej Karpathy’s talk “Intro to large language models” which he published on his YouTube channel.

Andrej Karpathy is one of my favorite teachers in the deep learning area. The OpenAI founding member and ex-director of AI at Tesla has a hands-on approach to teaching involving Python, Pytorch, and technical papers. Hence, I was surprised when Andrej uploaded a PowerPoint presentation on his YouTube channel. I was familiar with half the information in the talk. And yet, there was a lot I could learn from the way Andrej presented. It is an excellent example to illustrate how 3Cs – curiosity, concreteness, and credibility improve the effectiveness of a presentation. Let’s look at each C one by one.

Curiosity: A good presentation not only makes you curious early on, it keeps you engaged by maintaining a curiosity flow. What does curiosity flow in Andrej’s talk look like? He begins with the question, “What is an LLM?” (21 min), then he moves on to the second part, “The promise and future directions of LLM” (17 min), and in the third and last part, Andrej talks about “the challenges in LLM paradigm” (13 min).

Within each part, Andrej is maintaining a curiosity flow. For example, while explaining what an LLM is, Andrej asks questions like “How do we get the (neural network) parameters?” “What does a neural network do?” “How do we obtain an assistant?” etc. While presenting the future directions in LLM research, Andrej explains problems like – what is equivalent of system-2 thinking? Or how do we get tree search in chess to language? How do we create a self-improvement sandbox environment for LLM like how it happened for AlphaGo? And, in the final part, he shows how different jailbreaks like “prompt injection” “data exfiltration” or “data poisoning” pose a security challenge for an LLM. In short, it helps to build a curiosity flow while designing an idea presentation.

Concreteness: Large Language Models are high-dimensional and abstract and as Andrej alludes to in the talk, how they work is not fully clear. Hence, it makes sense to use lots of concrete examples to make the concept understandable. And that’s what Andrej does. In many places, he shows how LLMs respond in certain situations by showing how ChatGPT behaves when you prompt it in a particular way. For example, he illustrates the “reversal curse” by showing how ChatGPT answers the question “Who is Tom Cruise’s mother?” correctly while saying “I don’t know” when asked, “Who is Mary Lee Pfeiffer’s son?” He gives a demo of how LLMs use tools like browser search, calculator, and Python libraries to solve a given problem and present the information as a plot. 

Andrej also uses several metaphors or analogies to explain concepts. For example, he says an LLM is like a zip file of the Internet, except that it is a lossy compression. Or, LLM is not like a car where you can understand and explain how different parts work together to give its function. Or, current LLMs are like speed chess which uses an automatic and fast system-1 mode of thinking, and while it is yet to learn how to solve problems like competitive chess where players use deliberate, slow, system-2 mode of thinking involving tree search.

My biggest takeaway from the talk comes in the form of a metaphor when Andrej explains that it is better to think of an LLM as the kernel of an emerging operating system (like Windows or Linux) rather than as a chatbot or a bot generator. To explain this, he maps various components of current OSes to LLM components. For example, he says the Internet is like the hard disk in the traditional OS, and context window is like the working memory or RAM, etc. I thought it was a powerful metaphor to convey the paradigm shift.

Credibility: Most idea presenters like you and me need to worry about making our ideas credible. Given his position and brand and given the popularity of LLMs, Andrej probably doesn’t have to pay special attention to this aspect. However, he is making forward-looking statements in this talk, and he needs to ensure he doesn’t divulge any information confidential to OpenAI. He achieves this by citing academic papers while mentioning future directions and security challenges. His demo also adds to the credibility. He is not making any “AGI is around the corner” kind of hyperbolic statements and devotes time to talking about the limitations and challenges of the current LLMs.

I hope this illustration helps one to see how the 3Cs - curiosity flow, concreteness, and credibility help in designing better presentations.