Communicating your idea effectively is important be it to customers, team members, or investors. We presented 3 attributes of idea communication – curiosity, concreteness, and credibility in our book “8 steps to innovation”. In this blog, I would like to illustrate the 3Cs using Andrej Karpathy’s talk “Intro to large language models” which he published on his YouTube channel.
Andrej
Karpathy is one of my favorite teachers in the deep learning area. The OpenAI
founding member and ex-director of AI at Tesla has a hands-on approach to
teaching involving Python, Pytorch, and technical papers. Hence, I was
surprised when Andrej uploaded a PowerPoint presentation on his YouTube
channel. I was familiar with half the information in the talk. And yet, there
was a lot I could learn from the way Andrej presented. It is an excellent
example to illustrate how 3Cs – curiosity, concreteness, and credibility improve the effectiveness of a presentation. Let’s look at each C one by one.
Curiosity: A good presentation not only makes you
curious early on, it keeps you engaged by maintaining a
curiosity flow. What does curiosity flow in Andrej’s talk look like? He
begins with the question, “What is an LLM?” (21 min), then he moves on to the
second part, “The promise and future directions of LLM” (17 min), and in the
third and last part, Andrej talks about “the challenges in LLM paradigm” (13
min).
Within each part, Andrej is maintaining a curiosity flow.
For example, while explaining what an LLM is, Andrej asks questions like “How
do we get the (neural network) parameters?” “What does a neural network do?”
“How do we obtain an assistant?” etc. While presenting the future directions in
LLM research, Andrej explains problems like – what is equivalent of system-2
thinking? Or how do we get tree search in chess to language? How do we create a self-improvement sandbox environment for LLM like how it happened for AlphaGo?
And, in the final part, he shows how different jailbreaks like “prompt
injection” “data exfiltration” or “data poisoning” pose a security challenge
for an LLM. In short, it helps to build a curiosity flow while designing an
idea presentation.
Concreteness: Large Language Models are
high-dimensional and abstract and as Andrej alludes to in the talk, how they
work is not fully clear. Hence, it makes sense to use lots of concrete examples
to make the concept understandable. And that’s what Andrej does. In many
places, he shows how LLMs respond in certain situations by showing how ChatGPT
behaves when you prompt it in a particular way. For example, he illustrates the “reversal curse” by showing how ChatGPT answers the question “Who is Tom
Cruise’s mother?” correctly while saying “I don’t know” when asked, “Who is Mary
Lee Pfeiffer’s son?” He gives a demo of how LLMs use tools like browser search,
calculator, and Python libraries to solve a given problem and present the
information as a plot.
Andrej also uses several metaphors or analogies to explain
concepts. For example, he says an LLM is like a zip file of the Internet,
except that it is a lossy compression. Or, LLM is not like a car where you can
understand and explain how different parts work together to give its function.
Or, current LLMs are like speed chess which uses an automatic and fast system-1
mode of thinking, and while it is yet to learn how to solve problems like
competitive chess where players use deliberate, slow, system-2 mode of thinking
involving tree search.
My biggest takeaway from the talk comes in the form of a
metaphor when Andrej explains that it is better to think of an LLM as the
kernel of an emerging operating system (like Windows or Linux) rather than as a
chatbot or a bot generator. To explain this, he maps various components of
current OSes to LLM components. For example, he says the Internet is like the hard disk in the traditional OS, and context window is like the working memory
or RAM, etc. I thought it was a powerful metaphor to convey the paradigm shift.
Credibility: Most idea presenters like you and me need
to worry about making our ideas credible. Given his position and brand and
given the popularity of LLMs, Andrej probably doesn’t have to pay special
attention to this aspect. However, he is making forward-looking statements in
this talk, and he needs to ensure he doesn’t divulge any information
confidential to OpenAI. He achieves this by citing academic papers while
mentioning future directions and security challenges. His demo also adds to the
credibility. He is not making any “AGI is around the corner” kind of hyperbolic
statements and devotes time to talking about the limitations and challenges of the
current LLMs.
No comments:
Post a Comment