Monday, January 15, 2024

3Cs of idea communication illustrated through Andrej Karpathy’s LLM talk

Communicating your idea effectively is important be it to customers, team members, or investors. We presented 3 attributes of idea communication – curiosity, concreteness, and credibility in our book “8 steps to innovation”. In this blog, I would like to illustrate the 3Cs using Andrej Karpathy’s talk “Intro to large language models” which he published on his YouTube channel.

Andrej Karpathy is one of my favorite teachers in the deep learning area. The OpenAI founding member and ex-director of AI at Tesla has a hands-on approach to teaching involving Python, Pytorch, and technical papers. Hence, I was surprised when Andrej uploaded a PowerPoint presentation on his YouTube channel. I was familiar with half the information in the talk. And yet, there was a lot I could learn from the way Andrej presented. It is an excellent example to illustrate how 3Cs – curiosity, concreteness, and credibility improve the effectiveness of a presentation. Let’s look at each C one by one.

Curiosity: A good presentation not only makes you curious early on, it keeps you engaged by maintaining a curiosity flow. What does curiosity flow in Andrej’s talk look like? He begins with the question, “What is an LLM?” (21 min), then he moves on to the second part, “The promise and future directions of LLM” (17 min), and in the third and last part, Andrej talks about “the challenges in LLM paradigm” (13 min).

Within each part, Andrej is maintaining a curiosity flow. For example, while explaining what an LLM is, Andrej asks questions like “How do we get the (neural network) parameters?” “What does a neural network do?” “How do we obtain an assistant?” etc. While presenting the future directions in LLM research, Andrej explains problems like – what is equivalent of system-2 thinking? Or how do we get tree search in chess to language? How do we create a self-improvement sandbox environment for LLM like how it happened for AlphaGo? And, in the final part, he shows how different jailbreaks like “prompt injection” “data exfiltration” or “data poisoning” pose a security challenge for an LLM. In short, it helps to build a curiosity flow while designing an idea presentation.

Concreteness: Large Language Models are high-dimensional and abstract and as Andrej alludes to in the talk, how they work is not fully clear. Hence, it makes sense to use lots of concrete examples to make the concept understandable. And that’s what Andrej does. In many places, he shows how LLMs respond in certain situations by showing how ChatGPT behaves when you prompt it in a particular way. For example, he illustrates the “reversal curse” by showing how ChatGPT answers the question “Who is Tom Cruise’s mother?” correctly while saying “I don’t know” when asked, “Who is Mary Lee Pfeiffer’s son?” He gives a demo of how LLMs use tools like browser search, calculator, and Python libraries to solve a given problem and present the information as a plot. 

Andrej also uses several metaphors or analogies to explain concepts. For example, he says an LLM is like a zip file of the Internet, except that it is a lossy compression. Or, LLM is not like a car where you can understand and explain how different parts work together to give its function. Or, current LLMs are like speed chess which uses an automatic and fast system-1 mode of thinking, and while it is yet to learn how to solve problems like competitive chess where players use deliberate, slow, system-2 mode of thinking involving tree search.

My biggest takeaway from the talk comes in the form of a metaphor when Andrej explains that it is better to think of an LLM as the kernel of an emerging operating system (like Windows or Linux) rather than as a chatbot or a bot generator. To explain this, he maps various components of current OSes to LLM components. For example, he says the Internet is like the hard disk in the traditional OS, and context window is like the working memory or RAM, etc. I thought it was a powerful metaphor to convey the paradigm shift.

Credibility: Most idea presenters like you and me need to worry about making our ideas credible. Given his position and brand and given the popularity of LLMs, Andrej probably doesn’t have to pay special attention to this aspect. However, he is making forward-looking statements in this talk, and he needs to ensure he doesn’t divulge any information confidential to OpenAI. He achieves this by citing academic papers while mentioning future directions and security challenges. His demo also adds to the credibility. He is not making any “AGI is around the corner” kind of hyperbolic statements and devotes time to talking about the limitations and challenges of the current LLMs.

I hope this illustration helps one to see how the 3Cs - curiosity flow, concreteness, and credibility help in designing better presentations.

No comments:

Post a Comment