Elevating Products in E-commerce with Generative AI Packshots

Photo of Krystian Bergmann

Krystian Bergmann

Updated Jun 5, 2024 • 9 min read
DALL·E 2023-12-08 18.07.35 - A wide, minimalistic digital image depicting AI generating eCommerce product images. Show a sleek, advanced AI system or a robotic arm in the process

It’s no secret that presenting a product in a unique, attractive, and original way is both an art and a powerful sales enabler. Generative AI can help make it easier.

The way you display a product online should be so attractive that it carries a promise of satisfaction after purchase. However, it can be expensive to do that – or at least it used to be.

With generative AI, marketers now have plenty of new, innovative ways to enhance the product presentation process, particularly for generic products that often get overshadowed by more prominent items.

The challenge of manually creating packshots

A common issue in e-commerce is the large manual effort required to create original product sessions for generic items.

Customization is the heartbeat of contemporary consumer experience, with many products, particularly packages, being custom-made to resonate with individual customer preferences. This manual process of creating custom product visuals, i.e. packshots, is time-consuming and resource-intensive.

Accelerating packshot creation with generative AI

For a use case like this, generative AI already provides a wide suite of tools to make it easier. From proprietary systems like Adobe’s Firefly to open-source image generation models and frameworks, e-commerce companies now have the opportunity to lower costs while improving the quality of packshots for their stores.

GenAI can generate captivating backgrounds for product images, ensuring each product radiates a unique aesthetic appeal. A static layer, such as a perfume bottle, a package, or a company logo, remains constant, while the GenAI seamlessly crafts a dynamic background, creating a harmonious and eye-catching product visual.

Extending this innovative solution to custom packaging, businesses can unlock unprecedented realms of creativity. Companies can offer a gallery of AI-generated product images, serving as a muse for customers, inspiring them to unleash their creativity and conceptualize extraordinary product packages.

Different AI tools that can empower e-commerce businesses

To implement this solution, a blend of several technologies can come in handy:

  • AI image enhancement tools: such tools can optimize product photography for online marketplaces. They can enhance product photos, making them look more professional by improving light and color, removing backgrounds, resizing images, cropping objects, and improving file weight.
  • AI product photo generators: these can generate professional product photos using AI. They can add backgrounds to product images, generate photos right from the ecommerce platform (like Shopify), and blend the product image into the background.
  • AI object removal tools: these tools can seamlessly remove unwanted objects from images, enhancing their visual appeal.
  • AI design tools: AI-powered visual editors for product photography, they can generate high-quality ecommerce photoshoots in seconds.
  • 3D modeling tools: these tools can build product photos and 3D models without having to learn any 3D modeling, particularly useful for creating realistic product images without the need for physical prototypes.
  • AI content generation tools: generative AI can be used to create personalized product descriptions, generate unique product recommendations, and even create personalized videos for customers.

It's important to consider your existing technology stack, including your ecommerce platform, CRM, and other systems, when looking for AI tools to integrate. Seamless integration ensures that the AI tool can work in harmony with your current stack, simplifying data synchronization and streamlining your processes.

Given the sensitive nature of customer data, it's crucial to choose AI tools that prioritize data privacy and security. Look for tools that comply with industry-standard security measures, such as encryption and secure data storage.

In the background, these technologies enable generative AI to help in processes such as generating packshots:

  • Generative Adversarial Networks (GANs): GANs are a class of machine learning models that can create plausible examples of real-world data, for example generating images of real-world products without having to take pictures of them.
  • Layering techniques: advanced layering techniques allow the seamless integration of the static image layer, such as a logo or product, with the dynamically generated background, ensuring consistency and visual appeal.
  • Machine learning (ML): ML algorithms can be used to analyze consumer preferences and trends, enabling the system to generate backgrounds and images that resonate with prevailing market tastes.
  • Cloud computing: cloud technologies ensure scalability and accessibility, allowing businesses to easily manage and deploy the generated images across various platforms and channels.

Using OpenAI’s DALL-E to generate packshots

For our use case of generating packshots, we chose OpenAI’s image generation model, DALL-E.

What DALL-E does:

DALL-E generates images from textual descriptions. You describe what you want, for example "a two-story pink dollhouse," and DALL-E will generate an image that corresponds to that description. It bridges the gap between natural language processing and computer vision.

How DALL-E Works:

  • Training

DALL-E is trained with a multitude of images paired with textual descriptions. This corpus allows the model to learn associations between textual attributes and visual characteristics.

  • Textual input

A text prompt is given to DALL-E as input. The prompt describes the image that needs to be generated. For example, "a cat in a hat."

  • Image generation

DALL-E processes the text prompt and generates an image that visually represents the description. It creates a variety of images that align with the text prompt's description, offering different visual interpretations of the input.

  • Conditional generation

The model can generate images conditionally based on the input, meaning it adjusts the output based on the specificity and condition set by the textual description.

This is how it works in the background:

Beige Company Organizational Chart Graph (2)

1. TEXT Prompt
The process begins by inputting a textual description or prompt. This can be anything that describes the desired image, such as "a two-headed flamingo" or "a futuristic city skyline".

2. CLIP's Text Encoder
The text prompt is fed into the text encoder, which is a part of the CLIP model. The text encoder's job is to convert the text prompt into a fixed-size vector representation called an embedding. This embedding captures the semantic meaning of the input text.

3. Text Embedding

The output of the text encoder is a text embedding. This embedding is a high-dimensional vector that represents the input text in a way that the model can work with.

4. TEXT + Random Image Embedding

To initiate the image generation process, the text embedding is combined with a random image embedding. The random image embedding provides variability, which means that multiple runs with the same text prompt can produce slightly different images.

5. PRIOR, Decoder only Transformer

This step is crucial for generating the image. The combined embedding from the previous step is fed into a decoder-only transformer. The transformer, which is similar to those used in models like GPT, generates a sequence of tokens. However, instead of generating text tokens, it generates image tokens. It progressively refines the image representation.

6. Text + Image Embedding

The output from the transformer is a combination of the text embedding and the image tokens. This combined embedding is now ready to be transformed into the actual image.


This is the final stage where the combined embedding is fed into a decoder named GLIDE. The decoder's job is to convert the high-dimensional embedding into a pixel representation, effectively creating the final image.


The output of the GLIDE decoder is the generated image that visually represents the initial text prompt.

9. The last step is connecting two layers of image: first layer as an artificially generated background, second layer as static image element.


In e-commerce and retail, where competition is fierce, and consumer expectations keep evolving, innovation is the key to success.

By using the potential of genAI in product presentation and custom packaging, businesses can transcend conventional barriers, automate creativity, and offer captivating visuals that resonate with the unique preferences of each customer.

This particular challenge with generating packshots came up during one of our AI Primer workshops. We facilitate them with companies to:
  • Guide beginners through AI & GenAI use cases and help them with a strategy and best-in-class solutions
  • Inspire GenAI-advanced teams with our helicopter view on challenges in their industries, and support them in prioritization

Happy to connect on LinkedIn or discuss on a call if there’s anything we can help you with.

Photo of Krystian Bergmann

More posts by this author

Krystian Bergmann

AI Consulting Lead at Netguru
Thinking about implementing AI?  Discover the best way to introduce AI in your company with AI Primer Workshop  Sign up for AI Primer

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business