Apple makes major AI advance with image generation technology rivaling DALL-E and Midjourney

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more
Apple‘s machine learning research team has developed a breakthrough AI system for generating high-resolution images that could challenge the dominance of diffusion models, the technology powering popular image generators like DALL-E and Midjourney.
The advancement, detailed in a research paper published last week, introduces “STARFlow,” a system developed by Apple researchers in collaboration with academic partners that combines normalizing flows with autoregressive transformers to achieve what the team calls “competitive performance” with state-of-the-art diffusion models.
The breakthrough comes at a critical moment for Apple, which has faced mounting criticism over its struggles with artificial intelligence. At Monday’s Worldwide Developers Conference, the company unveiled only modest AI updates to its Apple Intelligence platform, highlighting the competitive pressure facing a company that many view as falling behind in the AI arms race.
“To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution,” wrote the research team, which includes Apple machine learning researchers Jiatao Gu, Joshua M. Susskind, and Shuangfei Zhai, along with academic collaborators from institutions including UC Berkeley and Georgia Tech.
How Apple is fighting back against OpenAI and Google in the AI wars
The STARFlow research represents Apple’s broader effort to develop distinctive AI capabilities that could differentiate its products from competitors. While companies like Google and OpenAI have dominated headlines with their generative AI advances, Apple has been working on alternative approaches that could offer unique advantages.
The research team tackled a fundamental challenge in AI image generation: scaling normalizing flows to work effectively with high-resolution images. Normalizing flows, a type of generative model that learns to transform simple distributions into complex ones, have traditionally been overshadowed by diffusion models and generative adversarial networks in image synthesis applications.
“STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality,” the researchers wrote, demonstrating the system’s versatility across different types of image synthesis challenges.
Inside the mathematical breakthrough that powers Apple’s new AI system
Apple’s research team introduced several key innovations to overcome the limitations of existing normalizing flow approaches. The system employs what researchers call a “deep-shallow design,” using “a deep Transformer block [that] captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial.”
The breakthrough also involves operating in the “latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling,” according to the paper. This approach allows the model to work with compressed representations of images rather than raw pixel data, significantly improving efficiency.
Unlike diffusion models, which rely on iterative denoising processes, STARFlow maintains the mathematical properties of normalizing flows, enabling “exact maximum likelihood training in continuous spaces without discretization.”
What STARFlow means for Apple’s future iPhone and Mac products
The research arrives as Apple faces increasing pressure to demonstrate meaningful progress in artificial intelligence. A recent Bloomberg analysis highlighted how Apple Intelligence and Siri have struggled to compete with rivals, while Apple’s modest announcements at WWDC this week underscored the company’s challenges in the AI space.
For Apple, STARFlow’s exact likelihood training could offer advantages in applications requiring precise control over generated content or in scenarios where understanding model uncertainty is critical for decision-making — potentially valuable for enterprise applications and on-device AI capabilities that Apple has emphasized.
The research demonstrates that alternative approaches to diffusion models can achieve comparable results, potentially opening new avenues for innovation that could play to Apple’s strengths in hardware-software integration and on-device processing.
Why Apple is betting on university partnerships to solve its AI problem
The research exemplifies Apple’s strategy of collaborating with leading academic institutions to advance its AI capabilities. Co-author Tianrong Chen, a PhD student at Georgia Tech who interned with Apple’s machine learning research team, brings expertise in stochastic optimal control and generative modeling.
The collaboration also includes Ruixiang Zhang from UC Berkeley’s mathematics department and Laurent Dinh, a machine learning researcher known for pioneering work on flow-based models during his time at Google Brain and DeepMind.
“Crucially, our model remains an end-to-end normalizing flow,” the researchers emphasized, distinguishing their approach from hybrid methods that sacrifice mathematical tractability for improved performance.
The full research paper is available on arXiv, providing technical details for researchers and engineers looking to build upon this work in the competitive field of generative AI. While STARFlow represents a significant technical achievement, the real test will be whether Apple can translate such research breakthroughs into the kind of consumer-facing AI features that have made competitors like ChatGPT household names. For a company that once revolutionized entire industries with products like the iPhone, the question isn’t whether Apple can innovate in AI — it’s whether they can do it fast enough.