29 Sep 2024

Podcast: The Silent Siege

Last year I asked GPT-4 to craft a narrative through the eyes of Department of Defense leaders who were struggling with a cryptic assault orchestrated by a sophisticated adversarial AI. Today I ran the story, The Silent Siege, through NotebookLM to create an AI generated discussion about it. It’s pretty impressive.

Transcript

21 Sep 2024

I wanted to take Cursor for a spin to see if their AI-assisted development approach lives up to the hype. I develop full apps so infrequently these days that I’d have no chance of building anything in Python without investing dozens of hours in getting back up to speed. New approaches with AI, that promise to smash that barrier, have been tempting me to start developing again. The bottom line, after giving this a try, is that while you couldn’t really go from zero knowledge to web/mobile deployment without having familiarity with things like setting up an IDE, configuring Python, managing libraries, dealing with hosting providers, DNS, deploying web apps, etc., tools like Cursor can reduce the learning curve and complexity of the entire process significantly - and it can tell you how to do many of these things if you know what questions to ask.

My first simple app is a Korean Diary assistant that I built to help me write Korean sentences and break down the grammar and vocabulary in those sentences. Anthropic’s Claude 3.5 Sonnet handles the translation. It’s a fairly simple app and UI (there are some nice scrolling animation effects for the menu at the top, but that’s about it). The user can also export the translation and breakdowns via markdown that is formatted as shown on the screen or via plain text.

A few observations from the experience:

I didn’t read or manually edit any of the Python, HTML, or CSS during development. I simply (by design) engaged in conversation with the model in a constant cycle of build, test, debug, build, test, etc.
A functional prototype with a basic system prompt can be up and running in minutes. Most of the few hours spent on this were in trying to get the model to understand my intent for the UI.
I deployed this on Python Anywhere. Deploying a web app there requires a fair amount of its own configuration, but Claude 3.5 Sonnet (which is also the model I used in Cursor itself) completely understood that context and could provide assistance there as well.
The collaborative experience with the AI made the whole process feel like working with a developer who instantly translated your requirements into testable code. It didn’t always work on the first try; in fact, it often didn’t, but through precise feedback, it eventually got there.
Understanding software development and having a lot of legacy knowledge (like working via command line) is still pretty important - and the process can break down pretty quickly as the complexity of the codebase increases. However, these are early days, and it’s clear that the software development process faces fundamental changes.

Unfortunately I can’t make this public due to the cost of the Anthropic API.

18 May 2024

It’s hard to not agree with this:

So, is the AI device category dead? No. I just think the fad of it has gone to pasture and now it’s time for companies to take it seriously.

The form factors of both Humane and Rabbit have clearly not been the hits we wanted them to be, but for different reasons. First, the AI Pin seems to be an overheating, slow device that underdelivers on the distraction-free vision the team had. And second, the R1 is at least a lot cheaper, but with it comes another screen that is unnecessary when your smartphone is right there.

Honestly, I think Meta is on the right track with its Ray-Ban Smart Glasses. The image-based AI functionality helped me so much during my time in Costa Rica, and implementing assistance into your regular wearable tech seems to be the right direction.

We will probably see even more wacky devices dressed up to the nines for AI like these two, but remember one thing — if ever you see one, really question whether having that thing will be better than the phone you already have.

I still anticipate a surge in consumer wearables and robots, but the most groundbreaking devices will need to enter the market as disruptors and then continue rapidly disrupting - just to survive. Products won’t have the luxury of entering the market and gradually evolving over multiple product cycles. AI is evolving faster than any technology we have ever seen. It is enabling and then crushing companies in weeks, not years.

17 May 2024

In a series of tweets on May 17, 2024, Jan Leike, the former head of alignment, superalignment lead, and executive at OpenAI, announced his departure from the company. Leike’s tweets shed light on his growing concerns about OpenAI’s priorities and the urgent need to focus on the safety and control of AI systems.

Key Points:

Leike’s team at OpenAI made significant advancements in AI alignment research, including launching InstructGPT, publishing scalable oversight on LLMs, and pioneering automated interpretability and weak-to-strong generalization.
Despite the exceptional talent at OpenAI, Leike expressed disagreement with the company’s core priorities, leading to his difficult decision to step away from his role.
Leike believes that OpenAI should allocate more resources to preparing for the next generations of AI models, focusing on critical areas such as security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, and societal impact.
The alignment team faced challenges in securing compute resources, making it increasingly difficult to conduct crucial research.
Leike emphasizes that building smarter-than-human machines is inherently dangerous and that OpenAI has an enormous responsibility to ensure the safety and beneficial impact of AGI on humanity.
He expresses concern that safety culture and processes have taken a backseat to “shiny products” in recent years.
Leike urges OpenAI to become a safety-first AGI company and prioritize preparing for the implications of AGI to ensure it benefits all of humanity.

Importance of the Announcement:

Jan Leike’s departure from OpenAI and his public statements highlight the growing concerns within the AI research community about the priorities and safety measures in place as we move closer to the development of AGI. As a leading figure in AI alignment research, Leike’s words carry significant weight and underscore the need for a greater focus on the potential risks and challenges associated with advanced AI systems.

The announcement serves as a wake-up call for the AI industry, policymakers, and the general public to recognize the importance of prioritizing AI safety and control measures. It emphasizes the need for open dialogue, collaboration, and a shared commitment to ensuring that AGI development is guided by principles that prioritize the well-being and safety of humanity.

Leike’s tweets also highlight the challenges faced by researchers working on critical AI safety issues, such as the need for adequate resources and support to conduct their work effectively. This underscores the importance of providing robust funding and institutional support for AI safety research.

As we continue to make rapid advancements in AI technology, it is crucial that we heed the warnings of experts like Jan Leike and work together to develop a responsible and safety-first approach to AGI development. Only by prioritizing safety, control, and the societal impact of AI can we ensure that the technology benefits all of humanity.

This analysis of Leike’s Twitter thread was written by Claude Opus

28 Apr 2024

I recently received my Rabbit R1 AI companion and (as promised) wasted no time putting it through its paces. With its sleek design and promising features, I’ve been looking forward to exploring the limits of what it can do – and where it would inevitably fall short.

A common criticism of the Rabbit R1 AI companion is that it should have been developed as an app on smartphones, rather than standalone hardware. While this may seem like a more conventional approach, the Rabbit team has intentionally chosen to take a different path. They’re striving for a “post-app” future where human-machine interactions are rethought and reimagined, and they believe that custom-designed hardware is essential to achieving this vision. By separating their AI companion from the constraints of traditional mobile devices, the Rabbit team can focus on creating a more holistic computing experience that seamlessly integrates into daily life. While this approach is unlikely to succeed, I’m intrigued by their willingness to challenge conventional wisdom and explore new frontiers in human-computer interaction.

The Good

Let’s start with the positives. The design by Teenage Engineering is simply impeccable. The Rabbit R1 looks and feels like a premium product, with a build quality that’s top-notch. When you hold it in your hand, you can’t help but be impressed by the attention to detail. And none of the photos or videos do justice to the color – it’s much more vibrant in person.

The user interface is also well thought out and executed, even if it’s not without its bugs. The model response time is quite good, and I have no doubt that it will only improve as the team continues to refine their technology.

One of the standout features for me was the “Rabbit Hole” – a web site that stores your interactions with the model, transcribed audio, and voice notes. It’s an incredibly useful feature that offers a level of transparency and control that I appreciated.

The price, $199 without a subscription, is also a standout. I’d pay more for a much more robust and polished version of a device like this but this is a fair price for the R1 at this stage of its life. I knew before I ordered it that it would arrive as a work in progress with significant potential to be junk drawer filler but the relatively low price and interesting tech made the risk worth taking.

The Not-So-Good

Of course, no category-launching product is perfect, and the Rabbit R1 is no exception. One of the biggest drawbacks is the lack of features – navigation, integration with popular apps and tools, alerts, alarms, reminders, email, and more are all on the roadmap but not yet available.

Screenshot of Rabbit.Tech's R1 Feature Roadmap

I also experienced some issues with hallucinations – instances where the model responded in a way that wasn’t entirely accurate or relevant. While this isn’t unique to the Rabbit R1, I did find it to be a bit too frequent for my taste.

Battery life is another area where the device falls short. As with many new tech devices, the battery life is not great out of the box – but I’m confident that the team will address this issue in future updates (and they did a day after posting this).

Security and Privacy Concerns

As someone who values security and privacy, I have to express some concerns about the Rabbit R1. While the risk is currently low - I’m not too concerned about it having my Spottily credentials and I won’t record sensitive information, I don’t feel comfortable giving it access to my critical accounts and information – at least not until the team can demonstrate more even transparency, auditing, and robustness in their end-to-end architecture. This isn’t just a challenge for the Rabbit team, it is something that will require a lot of thought from the entire industry as we migrate from apps to agents.

Customization and Voice Control

Finally, I want to touch on two additional areas where I think the device could be improved: LLM memory and customization, as well as voice control. While you can give the model voice instructions to customize its responses to you, it seems that these changes are limited to the current session – which is not a huge deal for me at the moment, but is something that we will come to expect from our “AI companions.”

And speaking of voice control, I appreciate the inherent security of the push-to-talk button but would love to see an option for a wake word or hands-free use. It would be fantastic to be able to plug this device into a docking station at my desk and use it as a dictation tool or virtual assistant without having to constantly reach for it. I generally want to do as much as I can with this device via hands-free voice.

Conclusion

In conclusion, the Rabbit R1 AI companion is far from fully polished – but that’s to be expected from an emerging technology. The team is actively responding to user feedback on social media and rolling out frequent fixes and updates, which bodes well for its future development.

While there are certainly areas where the device falls short, I believe that it has some near term potential. If you’re an early adopter who understands what to expect from this technology, then the Rabbit R1 might be worth considering. Just remember to temper your expectations and be patient – this is a product that’s still finding its footing. The good news is that the team is actively engaging with users on social media and regularly rolling out over the air updates.

TLDR: The Rabbit R1 AI companion is a promising but imperfect device that shows potential. While it has some drawbacks, I believe that it will continue to evolve and improve with time. If you’re willing to take the leap and join the early adopter crowd, then this might be worth considering – just don’t forget to keep your expectations in check. It is immature, but it knows what it wants to be when it grows up - time will tell if it gets there.

This review was mostly written by Meta’s Llama 3 from my robust notes on the R1.

14 Mar 2024

Devin just took our jobs:

Screenshot of AI just officially took our jobs… I hate you Devin

13 Mar 2024

Figure just dropped a jaw-dropping progress update:

The thread linked above has a lot of interesting tech details but make sure to watch the demo video. A year ago I was telling people that these models would allow us to make 50+ years of progress in humanoid robotics in the next decade but it is increasingly looking like we’ll achieve that in two or three years. And if you think progress is head-spinning now (and it is) just wait until there are millions of these moving about the real world, learning every second, and benefiting from their collective experience.

29 Feb 2024

Robotics and embodied AI just got a huge shot of energy (and capital) with Figure’s latest funding round:

Investors include Microsoft, the OpenAI Startup Fund, Nvidia, the Amazon Industrial Innovation Fund and Jeff Bezos (through Bezos Expeditions).

Others include Parkway Venture Capital, Intel Capital, Align Ventures and ARK Invest.

The $675 million Series B funding round “will accelerate Figure’s timeline for humanoid commercial deployment,” the company said in a release.

The intrigue: Figure and OpenAI will also collaborate to develop next-generation AI models for humanoid robots.

This will combine “OpenAI’s research with Figure’s deep understanding of robotics hardware and software,” the companies said.

The partnership “aims to help accelerate Figure’s commercial timeline by enhancing the capabilities of humanoid robots to process and reason from language.”

Read the full story at Axios.

14 Feb 2024

OpenAI is starting to add some basic memory capabilities to ChatGPT. Of course, AI won’t be able to be your personal assistant, primary computing interface, or serve us in any kind of ongoing relationship with out some level of persistence - and more likely near total persistence by the time we approach maturity with some of these concepts. The privacy and security challenges will dwarf the many still unresolved privacy and security challenges that we struggle with today. It will all get even more challenging as traditional interfaces fall away in favor of an AI-first approach. This will happen faster than people think.

Screenshot of ChatGPT's interface for controlling memory settings

13 Feb 2024

Another AI device worth tracking - Brilliant Labs' Frame glasses appear to do exactly what you’d expect this generation of AI glasses to do. This will definitely be a successful form factor for AI delivery but I find the novel approaches seen in the Rabbit R1 and Humane devices more interesting at the moment. I don’t expect to see a single winner in the AI device space. We will, quite soon, just be serving up AI access and functions through virtually everything we make.

Frame Multimodal AI Glasses by Brilliant Labs

21 Jan 2024

I’ve been tracking humanoid robotic startup Figure for a while but they are just now, thanks to a string of significant progress updates, starting to really grab people’s attention in a significant way. This interview with CEO Brett Adcock is the most revealing update that I’ve seen to date.

Screenshot of Figure CEO Brett Adcock on the Brighter with Herbert podcast

AI and robotics are enablers and drivers for each other’s development in a very profound way. I think we’ll see more progress in the next 3-5 years than we have seen in the previous couple of decades and the capability/deployment curves will just rise exponentially from there.

15 Jan 2024

I’ve been working with LLMs for language learning over the past year and have found them extraordinarily useful. The moment OpenAI introduced custom GPTs I immediately rushed to create My Korean Tutor to address some specific challenges for Korean language learners.

Screenshot for My Korean Tutor custom GPT for Korean language learning

It does a lot, but my primary goal in creating it was to make the process of daily journaling in Korean a little bit easier for learners who are still struggling to acquire vocabulary and basic grammar. My Korean Tutor helps reduce the friction by letting the user propose a topic and then giving them some basic vocabulary, grammar, and examples that they can use to get started.

As for the other features, some of them are visible via action buttons but if you ask it “What can you do?” it will reveal even more features:

14 Jan 2024

If you’ve seen the Rabbit r1 but found it confusing this short Rough Guide to the Rabbit r1 might help:

Basically, it’s like a very smart 10 year child with perfect recall who does exactly what you tell it to. It can watch and mimic what you do on a computer, sometimes with uncanny accuracy. Whether you’re editing videos and could use an “auto mask” assistant, or you’re always booking trips online and get tired of the details, you let the Rabbit watch what you do a few times, then, it just…does it for you.

Technically, a Rabbit is an agent, a “software entity capable of performing tasks on its own.” Those of us with a love of sci-fi will remember the hotel agent in the book Altered Carbon; we ain’t there yet, but we’re getting closer.

Of course we’ll have to see how it actually lives up to that promise once units land in users hands in April/May. I should be receiving one then. I’ll share my thoughts soon after. FWIW I don’t expect it to be perfect and it doesn’t have to be perfect to be a success. The real question is: does this human-machine interface approach and its software layer show promise? Is there enough of a foundation there to build on or is it destined for the Museum of Failure? Their pricing strategy, keeping it cheap and subscription free, shows that they understand that there’s both an attraction to, and presumption of failure for, a device like this. I think we’ll know if they can clear that hurdle soon after it drops.

13 Jan 2024

This is the Rabbit R1, an AI first device with a novel OS and form factor that is aiming to be something more than phone but not quite a complete computer replacement. It’s shooting for that holy grail slot of AI companion.

We’re destined to soon start filling junk drawers filled with several generations of these AI devices before someone gets it right and this one will almost certainly end up there as well. Still, there are some interesting ideas and design choices at work here, and the price ($199 - no subscription), is right so of course I ordered one immediately.

You can see a demo here.

7 Dec 2023

If you didn’t watch the videos Google dropped with its Gemini announcement yesterday I highly recommend carving out a half hour or so of your time to do so. They’re excellent demonstrations of the power of multimodal AI models and just a hint of what is about to explode into every aspect of your personal and professional lives - no matter what you do and who you are. Get ready for it.

The announcement post and associated technical report are super interesting - if you’re into that sort of thing. 2024 is going to be wild.

4 Dec 2023

This is an interesting and rare peek at the progress of humanoid AI robotics company Figure.

3 Dec 2023

Chair, Lord Lisvane, and committee member, Lord Clement-Jones discuss the recent findings of The House of Lords Artificial Intelligence in Weapon Systems Committee in this video:

Highlights and their full report, ‘Aspiration vs reality: the use of AI in autonomous weapon systems AI and the future of warfare’, are available here.

3 Dec 2023

It’s hard to believe that a film from 1970 could get so much about AI so right, but ‘Colossus: The Forbin Project’ does exactly that. I won’t reveal any spoilers, but its exploration of issues related to alignment and emergent behaviors was prescient. It’s also a really entertaining film, albeit a bit dark.

DALL·E 3 illustration of Colossus: The Forbin Project

I fed the plot to ChatGPT and asked it to create the image above. It seemed like the appropriate thing to do. You can stream Colossus: The Forbin Project for free at the Internet Archive.

2 Dec 2023

A team of researchers from Rutgers and the University of Michigan have developed WarAgent:

“…the first LLM-based Multi-Agent System (MAS) that simulates historical events. This simulation seeks to capture the complex web of factors influencing diplomatic interactions throughout history”

They used it to simulate events related to both world wars and published their findings in War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars. This is really exciting stuff.

Here’s the abstract:

Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose WarAgent, an LLM-powered multi-agent AI system, to simulate the participating countries, their decisions, and the consequences, in historical international conflicts, including the World War I (WWI), the World War II (WWII), and the Warring States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, we examine the advancements and limitations of cutting-edge AI systems’ abilities in studying complex collective human behaviors such as international conflicts under diverse settings. In these simulations, the emergent interactions among agents also offer a novel perspective for examining the triggers and conditions that lead to war. Our findings offer data-driven and AI-augmented insights that can redefine how we approach conflict resolution and peacekeeping strategies. The implications stretch beyond historical analysis, offering a blueprint for using AI to understand human history and possibly prevent future international conflicts. Code and data are available at github.com/agiresear…

H/T Ethan Mollick

2 Dec 2023

As mentioned earlier I have been constantly creating custom GPTs (both public and private) as technology demonstrators or to offload or minimize repetitive tasks. I just finished creating one to help me stay on top of some of the topics I post about here. This screenshot illustrates how simple this process can be.

Screenshot that shows interaction with Custom GPT

It’s important to note that I’m not using this to automate blogging or social media posts (although that’s certainly possible). It’s just a handy tool that helps me stay on top of a number of rapidly evolving topics - a research assistant.

2 Dec 2023

This is an excellent deep dive on AI’s potential and the resulting jobs disruption that will follow - and why you should be adapting in response now.

1 Dec 2023

Here’s a video roundup of the humanoid robot projects from Apptronik, Sanctuary AI, Agility Robotics, Tesla, Unitree, and Figure that I’m actively tracking. AI was already driving rapid progress in the field but recent advancements have pushed the financial incentives through the roof. We should start seeing significant deployments (beyond current pilots at places like Amazon) of this form factor within the next 2-3 years (China is leaning into this in a big way) which will should only serve to accelerate progress even more.

1 Dec 2023

Check out Anduril’s Roadrunner twin-turbojet VTOL Autonomous Air Vehicle (AAV)

1 Dec 2023

I finished Mustafa Suleyman’s The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma over the Thanksgiving holiday. I rarely read these kinds of books. And given that I work with AI and have done significant forecasting work on its potential impact, I don’t really need a primer on the risks and opportunities. However, Suleyman presents an exceptionally well-architected and supported case that’s focused as much on how society is likely to respond to “the wave” as the technology itself. The book is dense with insights but is neither overly technical nor a dry slog. If you don’t feel up to speed on the topic, and even if you are, this book is definitely worth your time.

30 Nov 2023

Here’s a perfect example of what I mean when I say that we’ve unlocked science fiction:

Today, in a paper published in Nature, we share the discovery of 2.2 million new crystals – equivalent to nearly 800 years’ worth of knowledge.

What a mind-blowing announcement from the Deepmind team. I knew AI would transform materials science - but not this abruptly. And to think that we will soon see these kinds of announcements in virtually every discipline imaginable.

John W. Little

Podcast: The Silent Siege