Nathan's angel investment notes, plus AI Security Week!
Read Nathan’s investment notes and why he made his first angel investment into Elicit ($9 million seed round), and listen to the future of AI security.
This week on The Cognitive Revolution: read Nathan’s investment notes and why he made his first angel investment into Elicit ($9 million seed round), and the future of AI security.
💸 Elicit Scout Memo
Excited to share my first ever angel investment in Elicit!
For context, here's my "Scouting Report" on Elicit that I wrote for a few investor friends in June, including Ben Tossell who also invested, after using the product and interviewing founders Jungwon and Andreas on The Cognitive Revolution earlier this year. (Watch here!)
Ask a research question and explore the scientific literature at
http://elicit.org
" Elicit Scouting Report I'm generally skeptical of early stage AI investments right now; since most incumbents should be able to implement generative AI before customers revolt, the best investment opportunities are going to come in the form of fundamentally new things. Elicit – an “AI Research Assistant” – is such a new thing, and as such has a chance of going zero-to-one in a new category. They have been building for years, and in the last 6 months have started to see the proverbial hockey stick growth (similar curve and timing to @Waymark right now) as model performance has hit critical thresholds and general awareness has grown.
They are taking a principled and somewhat contrarian approach by trying as much as possible to fight against the "bitter lesson" trends toward endless scaling, end-to-end training, high-dimensional model-to-model communication, and general inscrutability. Instead of trusting the model to think step by step, they are systematically breaking down high-end workflows and having models – both commercial eg OpenAI & Anthropic and also some of their own – implement them. They have shown they can do this at a high level based on my product reviews – as of early 2023 when I last compared I would say that Elicit had the best serious research assistant product, and the only new rival that comes to mind since then is Perplexity's new copilot product. Intensity of usage isn't hours / day yet – they show ~12% of monthly users use the site 6 or more times / month, which is honestly probably a lot better than most while still short of hall-of-fame numbers - but given that this is such an advanced use, academic and other frontier research, you are of course betting on the team to successfully ride their current early lead through a couple technology improvement cycles, at which point maybe AI is the only way to conduct research and this becomes a daily or even continuous use product.
Interestingly you might think of an investment in Elicit as a sort of hedge against AI over-regulation. In a world where AI is suddenly required by law to uphold certain standards of transparency, Elicit is as well positioned as anyone to explain how their system works.
This is a mission-driven organization with some demonstrated determination / staying power – the founders built this as a non-profit and are only now making an OpenAI-like shift to a for-profit model. Bottom line for me: established team, one of a couple early leaders in a new category that could be huge, seeing the beginning of a hockey stick, and an approach that you'll always feel good about having supported makes this a good candidate for investment "
🔐 The Future of AI Security with Adam Wenchel of Arthur.ai
👉 Listen here: Spotify | Apple | Youtube
I'm excited to share my conversation with Adam Wenchel, CEO of Arthur.AI, a leading provider of AI security solutions that says, simply, "We make AI better for everyone."
If you listen to this show, you know that companies of all sizes are racing to implement LLMs for their revolutionary speed and efficiency, but are also worried about risks stemming their unpredictable behavior.
This is where Arthur comes in. Their tools, including Arthur Shield, which the company describes as the first firewall for LLMs, and Arthur Bench, described as "the most robust way to evaluate LLMs", help enterprises customers – in such high-stakes, compliance-centric sectors as finance, healthcare, and computer security – to monitor LLMs in production, detect problems, and prevent harmful outcomes.
In our talk, Adam, who started Arthur as an AI security company in 2018, before GPT-2, shares his unique perspective on the AI security landscape, drawing on years of experience building commercial AI systems.
He describes the sorts of attacks he originally set out to detect and defend against, explains how priorities have changed for boards and executives with the surge in LLM adoption, and outlines the techniques Arthur has developed specifically for LLMs, including using other LLMs to evaluate system behavior.
Along the way we touch on benchmarking, performance metrics, standards for responsible use, and the future of AI governance.
Adam believes that effective security systems will accelerate beneficial applications of LLMs, and his insights are directly relevant for any organization implementing AI today.
🔐 Universal Jailbreaks with Zico Kolter, Andy Zou, Asher Trockman
👉 Listen here: Spotify | Apple | Youtube
In the first part with Zico and Andy, we go deep on their recent "Universal Jailbreak" work, exploring both how they did it, and what we can learn from the result.
As you'll hear, this work is almost the opposite of mechanistic interpretability.
If mechanistic interpretability is about studying a model's behavior and trying to understand how it works, this research is about demonstrating that if you have access to a model, you can often corrupt its behavior with fairly simple brute force techniques, and not only do you not need to understand the model's internal logic to do so, but the resulting jailbreaks don't have to make any obvious sense either.
In the second part, we cover another of Andy's papers, with Asher Trockman, which asks the question: how far can we get by taking a close look at the high-level patterns that emerge in models during pre-training and then just … starting with something simple that looks more or less like that? Turns out this can take us pretty far!
We spend a lot of time in this conversation getting into the details of how their techniques work, so I think it's also worth flagging a few key themes to keep in mind as you listen.
First, note the relationship between an optimization target, often defined as a loss function, and the means of optimizing toward that goal. Because this work is designed to find simple strings of tokens that work across models, they go beyond standard back-propagation here, and I think you'll learn a lot from the details of the technique they used.
Then, consider just how weird and unpredictable model behavior can be. That a nonsense string can serve as a jailbreak suggests that the so-called "loss landscape" is super weird and full of surprises, which relates to another major theme of the entire show, which is the fact that, with current techniques, developers simply don't have great control over how their systems behave, and consistently face pareto frontiers where they don't know how to make one aspect of system behavior better without making others worse.
This is sometimes called the "alignment tax", and the fact that Zico and Andy informed all the major labs of this vulnerability and their plans to publish about it, but none patched it before the story came out, suggests that the alignment tax is non-trivial.
Interestingly, this approach also suggests a new phenomenon that we might call an "alignment externality" – I find it amazing that a technique which can only be developed with full access to model weights still works so well on a variety of black-box models. If the debates surrounding Open Source weren't complicated enough already, this work makes it clear that if you are releasing an RLHF'd model with typical pre-training foundation today, you are also effectively open-sourcing the values-neutral shoggoth, and also suggests that you may cause direct harm to other commercial providers, not just by competing but by exposing weaknesses in their systems.
Finally, keep in mind that this is all very early, and we should expect to continue to see changes outside of current margins. We recorded this interview a little over a month ago, and since then I've seen a number of instances where model developers used a much more carefully curated, often partially synthetic dataset to improve model quality without the developers needing to worry that their models are also being exposed to all sorts of toxic content.
In the end, I think we have no choice but to admit how little we know, and how little we can predict about where the AI technology wave is going. It's possible that some of the problems which currently seem most vexing could simply disappear.
but at the same time, some guarantees, or to be more precise, some levels of adversarial robustness, might very well continue to prove elusive. And even if you put no stock in concerns around AI getting entirely out of control, there is a very real chance that future models will have sufficient power to allow bad actors to cause serious harm.
In any case, the time to have this discussion is now. I hope you learn as much as I did from this illuminating conversation.
As always, thanks for supporting the show. Let us how we’re doing with this form, or email tcr@turpentine.co!