Midjourney v7 launches with voice prompting and faster draft mode — why is it getting mixed reviews?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Midjourney, the boot-strapped startup viewed by many AI power users as the “gold standard” of AI image generation since its launch in 2022, has now introduced the much-anticipated, most advanced version of its generator model, Midjourney v7.

The headline feature is a new way to prompt the model to create images.

Previously, users were limited to typing in text prompts and attaching other images to help guide generations (the model could incorporate a variety of user-uploaded and attached images, including other Midjourney generations, to influence the style and subjects of new generations).

Now, the user can simply speak aloud to Midjourney’s alpha website (alpha.midjourney.com) — provided they have a microphone in/on/attached to their computer (or using a networked device with audio input, such as headphones or a smartphone) — and the model will listen and conjure up its own text prompts based on the user’s spoken audio descriptions, generating images from this.

It’s unclear whether or not Midjourney created a new voice input model (speech-to-text) from scratch or is using a fine-tuned or out-of-the-box version of one from another provider such as ElevenLabs or OpenAI. I asked Midjourney founder David Holz on X, but he has yet to answer.

Using Draft Mode and conversational Voice Input to prompt in a flow state

Going hand-in-hand with this input method is a new “Draft Mode” that generates images more rapidly than Midjourney v6.1, the most immediate preceding version, often in less than a minute or even 30 seconds in some cases.

While the images are initially of lower quality than v6.1, the user can click on the “enhance” or “vary” buttons located to the right of each generation to re-render the draft at full quality.

The idea is that the human user will be happy to use both together — in fact, you need to have “Draft Mode” turned on to activate audio input — to enter a more seamless flow state of creative drafting with the model, spending less time on refining the specific language of prompts and more on seeing new generations, reacting to them in realtime, and adjusting them or tweaking them as needed more naturally and rapidly by simply speaking the thoughts out to the model.

“Make this look more detailed, darker, lighter, more realistic, more kinetic, more vibrant,” etc. are some of the instructions the user could provide through the new audio interface in response to generations to produce new, adjusted ones that better match their creative vision.

Getting started with Midjourney v7

To enter these modes, starting with the new “Draft” feature, the user must first jump through one new hurdle: Midjourney’s personalization feature.

While this feature had been introduced previously on Midjourney v6 back in June 2024, it was optional, allowing the user to create a personal “style” that could be applied to all generations going forward by rating 200 pairs of images (selecting which on the user liked best) through the Midjourney website. The user could then toggle on a style that matched the images they liked best during the pairwise rating process.

Now, Midjourney v7 requires users to generate a new v7-specific personalized style before even using it at all in the first place.

Once the user does that, they’ll land on the familiar Midjourney Alpha website dashboard where they can click “Create” from the left side rail to open a the creation tab.

Then, in the prompt entry bar at the top, the user can click on the new “P” button to the right of the bar to turn on their personalization mode.

Midjourney founder and leader David Holz confirmed to VentureBeat on X that older personalization styles from v6 could also be selected, but not the separate “moodboards” — styles made up of user-uploaded image collections — though Midjourney’s X account separately stated that feature will be returning soon as well. However, I didn’t see the opportunity to select my older v6 style.

Nonetheless, the user can then click on the new “Draft Mode” button to the right of the Personalization button (also further to the right of the text prompt entry box) to activate this faster image generation mode.

Once that’s been selected with the cursor, it will turn orange indicating it is turned on, and then a new button with a microphone icon should appear to the right of this one. This is the voice prompting mode, which the user can once again click on to activate.

Once the user has pressed this microphone button to enter the voice prompting mode, they should see the microphone icon change from white to orange to indicate it is engaged, and a waveform line will appear to the right of it that should begin undulating in time with the user’s speech.

The model will then be able to hear you and should also hear when you finish speaking. In practice, I sometimes got an error message saying “Realtime API disconnected,” but stopping and restarting the voice entry mode and refreshing the webpage usually cleared it quickly.

After a few seconds of speaking, Midjourney will begin flashing some keyword windows below the prompt entry textbox at the top and also generate a full text prompt to the right as it generates a new set of 4 images based on what the user said.

The user can then further modify these new generations by speaking to the model again, toggling voice mode on and off as needed.

Here’s a quick demo video of me using it today to generate some sample imagery. You’ll see the process is far from perfect, but it is really fast and does allow for more of an interrupted state of prompting, refining, and receiving images from the model.

More new features…but also many missing features and limitations from v6/6.1

Midjourney v7 is launching with two operational modes: Turbo and Relax. Turbo Mode provides high performance at twice the cost of a standard v6 job, while Draft Mode costs half as much (in terms of jobs). A standard-speed mode is currently in development and will be released once optimized.

At launch, features such as upscaling, inpainting, and retexturing will temporarily rely on the v6 model. Midjourney plans to transition these functions to v7 in future updates.

The company is committing to regular development over the next two months, with updates scheduled every one to two weeks. A major upcoming addition will be a new character and object reference system designed specifically for v7, features found on older versions of Midjourney by applying arcane text prompt suffixes such as –cref and –sref (for style) to tend of a user’s text prompt.

Midjourney plans to engage its community through public sharing spaces and feedback channels, and it will host a roadmap ranking session to help prioritize future development efforts.

Midjourney emphasizes that v7 is a completely new model with its own strengths and challenges. Users are encouraged to experiment with different prompt styles and report their experiences to help refine the platform.

Initial reaction is mixed…far from the near-unanimous praise of prior Midjourney releases

While most of the older Midjourney releases were met with overwhelming excitement and adulation, the initial reception to v7 is decidedly more mixed.

Although Midjourney was careful to call this an “alpha” release in its blog and on social media, many users still expecting a larger jump in image quality and prompt adherence (how well the image generations matched the user’s specific instructions in text or audio), as well as were hoping for improved human anatomical understanding (particularly of hands, a common AI image generation issue) and text generation (also something image models have struggled with, though Ideogram and OpenAI’s native GPT-4o image generator appear to get it much more consistently accurate than Midjourney v7 based on initial user reports).

As @freiboitar wrote on X:

“Gotta say it: kinda disappointed.OpenAI set the bar sky-high. talk to your image gen like it’s your bro? Mind = blown.

MJ7 looks “more realistic”. but did we really need that?MJ + Magnific already nailed it.

Might pause my sub tbh.”

“The problem is v7 doesn’t really feel like v7. It feels more like v6.2,” posted Magnific AI founder Javi Lopez on X, citing the incremental seeming nature of the updates.

Indeed, Ethan Mollick, the Pennsylvania Wharton School of Business professor and AI influencer, also chimed in to say: “I like their new releases but the problem with the new v7 (right) released today is that v6 (left) was already really good.”

“Identical prompts from v6 are worse in v7,” wrote self-described “AI maximalist” David Shapiro on X.

“All the old Favorites that are getting way too old,” said artist and musician @CaptainHaHaa: “Hands, Text still an issue, no cref, srefs have gone wack. But its ok because you can talk to it while it disappoints you.”

Others were more forgiving and delighted with their initial test generations on v7, with AI power user Dreaming Tulpa saying on X it had “better image quality” and was “super artistic.”

Similarly, AI artist and designer Tatiana Tsiguleva voiced that Midjourney v7 was a “Huge jump in quality!”

It’s still early days for Midjourney v7, however, and the initial reaction could swing back in either direction — either adulation or frustration with the new model and design features. For now, it’s available to anyone with a Midjourney account to begin using.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link