The first product image I generated with Midjourney was a disaster. I wrote something like 'elegant product on white background with good lighting'. What came back looked like a 2008 render with an undefined object floating in the void. That day I understood that AI does not read minds. It reads words. And if your words are vague, the result is vague.
After months of iterating, breaking prompts and discovering patterns, I found a method that works. I call it the layer method because every product prompt I write has between three and five layers of information. Each layer controls a different aspect of the final image. And the order matters more than I originally thought.
Layer 1: the product with surgical precision
The first layer is the product description. This is where most people fail because they describe what they want to see instead of describing what exists. Writing 'elegant bottle' is not the same as writing 'amber glass bottle 500ml with natural cork cap and kraft paper label'. The second version gives Midjourney concrete material to build with. The first gives it freedom to invent anything.
My rule is to describe the product as if writing a technical spec sheet for a manufacturer. Material, relative size, exact color, surface texture, details that make it unique. If the product has intentional imperfections like an irregular matte finish or visible stitching, that goes here. These details make the image feel real rather than generated.
Layer 2: surface and context
The second layer defines where the product sits. A perfume on white marble tells a different story than a perfume on dark wood. Midjourney is extremely sensitive to surfaces and background materials. I have found that textured surfaces produce more believable images than flat backgrounds. Polished concrete, an oak table with visible grain, a wrinkled linen surface. Each texture adds visual information that the brain reads as real.
This is also where I define secondary elements. An olive branch beside artisan soap. Water droplets on the surface next to a skincare bottle. Scattered coffee beans around packaging. These elements are not decoration. They are context that communicates the product category without needing text.
Layer 3: light is everything
If you asked me which layer matters most I would say this one. Lighting separates an amateur image from a professional one. And Midjourney responds incredibly well to specific lighting instructions.
The phrases I use most in this layer are: soft directional light from the left, golden hour backlight, studio rim lighting, diffused natural window light. Each produces a completely different result. Soft directional light from the left is my default for luxury products. Golden hour backlight works perfectly for organic or artisan products. Studio rim lighting is ideal when you need the product to separate from the background with an edge of light.
A discovery that changed my results was adding shadow direction. Writing 'soft shadows falling to the right' not only controls shadows but tells Midjourney where the light comes from, reinforcing the entire scene.
Layer 4: the photographic style
This layer is where my experience as a visual designer makes the difference. Here I define whether the image looks like it was shot on an iPhone or a medium format camera. Keywords I use include lens type, depth of field and processing style.
For premium products I use: shot with a Hasselblad, 80mm lens, shallow depth of field, color grading with warm tones. For lifestyle products: Canon 5D Mark IV, 35mm lens, natural color palette, editorial style. For food and beverages: macro lens, extreme close-up, moisture detail, Kinfolk magazine aesthetic.
What I discovered is that mentioning a specific camera changes the entire image texture. Midjourney was trained on millions of photos tagged with EXIF data and associates each camera with a particular look. A Hasselblad produces richer colors and smoother focus transitions. A Leica produces cooler tones and sharper edges. It is like choosing your photography gear but with words.
Layer 5: technical parameters
The final layer is the Midjourney parameters that control the technical output. For product photography I always use --ar 4:5 or --ar 3:4 which are the most common ratios for e-commerce and social media. I add --style raw when I want a more photographic and less artistic result. And I adjust --stylize between 50 and 150 depending on how much creative freedom I want to give the tool.
A complete real example looks like this:
amber glass bottle 500ml with natural cork cap and kraft paper label, on a raw concrete surface with dried lavender sprigs, soft directional light from the left, warm shadows falling to the right, shot with Hasselblad 80mm lens, shallow depth of field, warm muted tones, editorial product photography --ar 4:5 --style raw --stylize 100
That prompt has all five layers working together. Specific product, textured surface with context, directed lighting, defined photographic style and technical parameters. The result is an image you could place in a product catalog without anyone suspecting it was generated.
The mistakes I made most
The first mistake was using too many emotional adjectives. Writing 'beautiful' or 'amazing' or 'perfect' tells Midjourney nothing. They are empty words for a machine. What works are technical and specific descriptions. Not 'beautiful light' but 'diffused window light at 45 degrees'.
The second mistake was not iterating enough. My process now is to generate four versions, identify the one with the best foundation and then vary it with specific adjustments. I change the surface. I change the light direction. I change the lens. Each change brings me closer to what I need. I never expect the first generation to be the final one.
The third mistake was ignoring negative prompts. Adding --no text, watermark, hands, people at the end removes elements that Midjourney sometimes introduces without being asked. Especially text. Midjourney tends to invent text on product labels and it is almost always illegible.
This does not replace a photographer
I need to be clear because this topic sparks debate. This technique does not replace a professional photo shoot for a high-level final campaign. What it does is eliminate sixty percent of the preliminary work. I can present visual concepts to clients before hiring a photographer. I can explore twenty creative directions in an hour. I can create content for social media and e-commerce that previously required a budget many small brands simply do not have.
As I wrote in my post about what a prompt is, every word is a design decision. In product photography that becomes literal. The difference between a generic image and an image that sells lies in the words you choose. And choosing the right words is exactly what we designers do.
Only now our photo studio fits in a single line of text.