Apple's GenAI Focus
As many of you have no doubt heard, Apple has called off their Apple Car project. Bloomberg also reports that some of the Apple employees who were working on this will be moved over to the AI division. This made me curious as to what Apple is up to in the GenAI space.
Shoring Up Vision Pro
It’s worth noting that Apple appeared poised at the beginning of the month to acquire German AI startup Brighter AI. Apparently, the acquisition is aimed at enhancing privacy features for the Apple Vision Pro.
Brighter AI’s tech is described as a solution for anonymizing images and videos to help companies meet data protection regulations. From their website:
“With its proprietary Deep Natural Anonymization solution, brighter AI protects the identities of the people recorded. At the same time, companies from the automotive, healthcare, and the public sector can use anonymized data for analysis and machine learning without violating privacy. In this way, AI learning models and privacy go hand in hand."
Several articles I read pointed to handling license plate data and facial anonymization as primary use cases. So, helping Apple solutions remain compliant with privacy regulations by incorporating data anonymization into image gathering (if you’re interested in this application, check out this Brighter AI blog post).
And check out this article for a breakdown of Tim Cook’s thoughts around Vision Pro.
Keys to the Kingdom
At the top of Apple’s GenAI project list is a prototype AI tool called Keyframer. It’s described as a GenAI animation tool that enables users to add motion to 2D images by describing how they should be animated. The idea is that users add in a Scalable Vector Graphic (SVG) file, input some text prompts, and Keyframer builds CSS animation code to animate the user’s original image. Keyframer is built on OpenAI’s GPT-4 as its base model.
Apple has released some images in their research paper (link below). Here, we see frame-by-frame examples of animations generated by Keyframer. .
This is the input field for SVG code and the field for entering a GPT prompt.
And the input/output side by side.
My Takeaway from “Keyframer: Empowering Animation Design using Large Language Models”
(Tiffany Tseng, Ruijia Cheng, and Jeffrey Nichols)
Apple is aiming at the “less explored” space of animation design. In the Introduction, the authors point out that in a variety of applications animation work often entails collaboration among multiple stakeholders (e.g., motion designers, technical artists, software engineers). Thus, one-shot prompting interfaces (authors cite Dall-E and Midjourney here) are too simplistic. Rather, an alternative approach (i.e., Keyframe's interface) is needed that allows users to iteratively create and refine designs for animations.
The authors observe that natural language-based GenAI tools often lack effective prompting to guide the generated output. For example, trial and error as the common prompting strategy for one-shot approaches to text-to-image generation. But this is what I found particularly interesting, and I’ll quote directly from the paper:
“Several prompt taxonomies have been proposed with generative art communities using modifiers specifying artistic style (e.g., ‘Cubism’) and quality (e.g., ‘award-winning’), along with keywords to spur surprising output (what Oppenlaender refers to as ‘magic terms’). Similarly, Chiou et al distinguish between ‘operational’ keywords that specify concrete reference terms and ‘conceptual’ keywords using abstract modifiers that are more likely to lead to unexpected results.”
The article referred to in the above quote is “Designing with AI: An Exploration of Co-Ideation with Image Generators” by Li-Yuan Chiou, Peng-Kai Hung, Rung-Huei Liang, and Chun-Teng Wang. Briefly, the authors conclude that AI can make significant positive contributions to the design process by augmenting self-expression. Or, to use the authors’ own words, “distinct perspective that opens up new avenues for artistic expression”.
All of this resonates with me because I spend a good deal of time working with prompts in the GPT models. I have found that just the exercise of experimenting with different ways to phrase a prompt to get a desired result is quite empowering creatively. But more intriguing is the prompt’s role as the gatekeeper for how a user experiences GenAI.
I’m picturing a garden hose where the LLM is the water source. A weak prompt is like pinching the hose and limiting what can come out. The more you open it up, the more of the source you have access to.
I don’t think this is idle speculation. I’m reminded of the bomb dropped recently by Jensen Huang regarding the “death of coding.” Since I’m not an engineer in any way, shape, or form, I’ll give the specifics of coding’s future a wide berth. Rather, it’s the idea that advancements in natural language processing will enable anyone to use GenAI to create in a way that was formerly the exclusive realm of programmers. I would submit, however, that this will largely depend on two things: on the solution-side how we develop new ways for users to work and iterate with prompts (e.g., Keyframer); and, whether the user can effectively engage NLP in a way that fully realizes the immense potential of GenAI.
(Check out this page for the latest papers and discussion on Apple’s machine learning research)