One of the newest AI hardware releases that caught the attention of the media at CES 2024 was the rabbit r1 by rabbit.tech, a handheld AI device priced at $199 that is designed to be an AI assistant to help users with a variety of everyday tasks.
The device allows users to perform many tasks in a very portable way without the need for a mobile phone: hailing a cab, playing a song, ordering food, booking a hotel, and even PS-retouching or generating images directly from Midjourney.
01 An almost all-purpose AI assistant
The Rabbit r1 is an orange square device about half the size of an iPhone, designed by rabbit in collaboration with design firm Teenage Engineering. It features a 2.88-inch touchscreen, a rotating camera for taking photos and videos, and a wheel/button for navigating or talking to the device’s built-in assistant.
Specifications: 2.3GHz MediaTek processor, 4GB of RAM and 128GB of storage, with official claims of a full day of battery life. Priced at $199, it will start shipping in March.
There is an analogue scroll wheel on the right side. Above the wheel is a camera that rotates 360 degrees. It’s called the Rabbit Eye – the camera can be turned up or down when not in use, and it’s actually a privacy shutter – you can use it as a selfie or a rear-facing camera. Whilst you can use the Rabbit Eye for video calls, it doesn’t work like a traditional smartphone camera.
On the right is a Push-to-talk button that you can press and hold to issue voice commands, and there’s a 4G LTE SIM card slot (surprisingly not 5G) for network connectivity, meaning it doesn’t need to be paired with any other device.
The official description states that the device is not meant to replace a mobile phone and cannot be used to watch films or play games.
Instead, it is designed to help users free themselves from menial tasks. Lupin compares it to handing over a mobile phone to a personal assistant to get things done. For example, it can call an Uber for a user by holding down the Push-to-talk button and saying, “Call me an Uber to the Empire State Building. r1 will take a few seconds to parse the request, and then it will display the fare and other details on the screen and start hailing the car. The process is the same across categories, whether you want to make a reservation at a restaurant, book a flight, add a song to a Spotify playlist, etc. The R1 doesn’t have any built-in apps.
The R1 doesn’t have any inbuilt apps. it also doesn’t connect to any app APIs, there are no plug-ins, and no agent accounts. Likewise, it doesn’t need to be paired with a smartphone.
Rabbit’s operating system, called Rabbit OS, is more of an intermediary layer in practice, allowing users to tether apps through the Rabbit Hole webpage on rabbit’s website, where you can sign in to accounts on services like OpenTable, Uber, Spotify, Doordash, and Amazon. Rabbit Hole gives the Rabbit OS the ability to perform actions on behalf of the user on the connected account.
Rabbit claims that it does not store any user credentials from third-party services. In addition, all authentication occurs at the third-party service’s login system, and the user is free to cancel access to the linked Rabbit OS and delete any stored data at any time.
Similarly, since the r1 uses Push-to-talk buttons (such as an intercom) to trigger voice command prompts, there are no wake-up words, so the r1 does not have to constantly listen to you as most popular voice assistants do. The microphone on the device activates and records audio only when you tap the button.
02 New Attempt to Combine Large Action Model and Hardware
Instead of a large language model like ChatGPT, Rabbit said Rabbit OS is based on a “Large Action Model,” which can be simply understood as a kind of universal controller for apps. “We wanted to find a generic solution, like the Large Language Model,” he said. “How can we find a generic solution to actually trigger our services, whether you’re a website or an app, or any platform or desktop?”
In a sense, it’s an idea similar to Alexa or Google Assistant. rabbit OS can control music playback, order merchandise, buy groceries, send messages, and more, all from one interface. There’s no need to open or log into an app; just ask for what you want and let the device deliver.
However, instead of building a bunch of APIs and trying to convince developers to support r1, Rabbit trained a model of how to use existing apps. The backend uses a combination of a large language model (powered by OpenAI’s ChatGPT) and a large action model developed by Rabbit to understand user intent. Large Action Models (LAMs) are trained by humans interacting with apps like Spotify and Uber, who show the models how these apps work. These LAMs learn by demonstration – they observe how a human performs a task through a mobile, desktop, or cloud interface, and then replicate that task on their own. The company has trained in advance for the most popular apps, and all of these processes can be applied to any app, anywhere, Lu Pin says.
The R1 also has a dedicated training mode that you can use to teach the device how to do something, and it’s able to repeat the action on its own. Lupin gives an example: “You’d say, ‘Hey, first of all, go to a programme called Photoshop.’ Open it up. Get your photo here. Make a lasso over the watermark and click, click, click, then click, then click. That’s how you remove the watermark. Rabbit OS takes 30 seconds to process, and then it automatically removes all the watermarks, says Lu.
Rabbit’s approach is very smart. It’s hard to get developers to support new operating systems, even for tech giants, and LAM’s approach is to subvert that simply by teaching the model how to use the app. Right now, we’re seeing a ton of new AI hardware coming to market, but a lot of times, all these tools do is connect to ChatGPT. in contrast, Rabbit is more like a super app – a single interface that you can do anything through.
The R1 is designed to be more like the Ai Agent we’ve seen over the past year, i.e. machine learning models trained on common user interfaces like websites and apps. So instead of being able to order a pizza through some dedicated API, they can do it the same way humans do: by clicking a normal button on a normal web or mobile app.
For Rabbit OS, it’s perhaps as relevant to the app shop as ChatGPT is to search.
“We’re not trying to kill your mobile phone,” CEO and founder Lu Zhan said in a pre-CES chat with reporters. “The mobile phone is an entertainment device, but it’s not the most efficient machine if you want to get things done. To arrange a dinner with a colleague, we need four to five different apps to work together. Large-scale language models are a universal solution for natural language, and we want to provide a universal solution for these services – they should be able to understand you.
03 Replacing Ai Pin or will it be replaced by a big company
According to Rabbit CEO Lv Reng, the vision for the r1 is a combination of voice assistant, screen and camera. Lv says it’s more likely to compete with devices like Humane’s AI Pin than the iPhone. smart assistants like Alexa are already obsolete compared to the r1.
At the end of the day, the big question isn’t whether the rabbit r1 succeeded in its sales goals, but whether this approach is viable in the face of extremely strong competition.
Google, Apple, Microsoft, OpenAI, Anthropic, Amazon, Meta-every one of them is working every day to create more powerful AI products. rabbit’s biggest danger isn’t that no one will buy it; it’s that within six months, a $100 billion company made its own AI Agent that does 80 per cent of r1’s work and is free to use on mobile phones.
With just 17 employees, Rabbit is not a big company.
“Of course we’re worried,” he replies, “we’re a start-up. But just because these big companies can do it doesn’t mean we need to stop.
He noted that while these companies have a wealth of resources, they lack the agility of startups, which are building their own content today, as well as data. He noted that the Big Language model itself is based on an open recipe – five papers and that’s it. There’s little chance of building a moat there. But Rabbit’s LAM is built on proprietary data and can deliver very specific user experiences on very specific devices.