Using AI to Generate Image Descriptions for Conversion Rate Optimization

Photo of Patryk Szczygło

Patryk Szczygło

Updated Nov 16, 2023 • 9 min read
Converting images to text with the use of computer vision AI

In the digital marketplace, images can make a huge difference. How can you use AI to pick images that are most beneficial for your conversion rates?

First, you need a way to quantify the influence of an image on conversions. This was the driving question behind our R&D project aimed at unraveling how different types of product images impact the sales process.

By analyzing the correlation between specific image presentations and customer purchase decisions, we wanted to unlock new insights into consumer behavior.

Unlocking the visual formula for sales success

We wanted to use AI to generate descriptions of images. This data could then be used to adjust product listings (and eventually also landing pages, ads, etc), and perform A/B tests to see what works best. It opens up a lot of opportunities for fans of conversion optimization:

  • Understand image impact: determine how different images of the same product influence customer decisions during the sales process.
  • Track image-purchase link: track which specific type of photo was shown to the customer and whether it led to a product purchase.
  • Extend research to ads and social media: see if these insights could also be applied to advertisements and promotional content on social media platforms.
  • Use analytics for insight: use an analytics tool that captures and analyzes the data regarding what image was presented, and whether it led to a purchase decision.

Benefits of AI photo conversion & testing solution

Understanding the impact of visuals on consumer behavior can be a powerful advantage:

  • Deep dive into customer behavior: by analyzing how customers respond to different product images, you gain critical insights into consumer preferences. This lets you fine-tune marketing campaigns, using the most effective images to drive higher conversion rates.
  • Tailored image strategy for e-commerce: different images serve different purposes. Whether it’s lifestyle imagery that illustrates product use or detailed shots highlighting product features, it’s useful to understand which types resonate best with your audience.
  • Data-driven image effectiveness analysis: for retailers, understanding which images boost sales for specific products is key. This solution facilitates it through targeted tests, analyzing conversion rates across various image types.
  • Enhanced search functionality and SEO: for e-commerce platforms with search features, this solution can improve the accuracy of search results through image-to-text description tagging. It can also enhance the likelihood of customers finding (and purchasing) what they need. Including this metadata can positively influence organic search rankings.
  • Optimized website and landing page imagery: understanding which images best communicate a page's content and persuade visitors to take desired actions is quite useful. This solution helps select visuals that not only align with your messaging but also increase visitor-to-lead conversion rates.

Overall, using AI to convert images into text is a big step towards a more effective, data-driven approach to visual marketing.

photo description (5)

Exploring tools to convert images into text

The tools we’ve explored include the Google Vision API, Azure AI Custom Vision, and a custom solution.

Google Vision API advantages:

  • Google Vision API is a part of Google Cloud and provides pre-trained models to developers for detecting emotions, recognizing printed and handwritten text, identifying landmarks, faces, logos, and types of entities like structures and landmarks in images.
  • It provides automatic features like image labeling, face and landmark detection, optical character recognition (OCR), and more.
  • It’s known for its strong OCR capability that can extract text from images of various languages accurately.
  • It allows developers to build metadata on their image catalog, moderate offensive content, and enable new marketing scenarios through image sentiment analysis.

Azure AI Custom Vision advantages:

  • Azure AI Custom Vision is a part of Microsoft Azure and allows developers to build, deploy, and improve their own image classifiers.
  • It provides a user-friendly interface to train your own classifier with your own images.
  • It offers high-quality image classification, object detection, and semantic segmentation models and allows you to export these models to be run offline.
  • It's known for its continuous learning capabilities and its ability to improve over time as you provide it with more data.

Custom solution advantages:

  • A custom solution is designed and developed from scratch according to the specific requirements of a project, based on either an open-source model, or a model that you build and train yourself.
  • It offers the most opportunity for customization, but requires substantial time, resources, and expertise in machine learning and image processing.
  • A custom solution offers a high degree of flexibility and can be tailored to unique use cases that may not be covered by pre-trained models.
  • It requires ongoing maintenance and updating, which can be resource-intensive, but the upside is you have a solution that’s precisely tuned to your needs.

How to boost conversions with AI photo conversion

The necessary steps to achieve this are:

  1. Collect data: collect data on how users interact with the images on your website. This can include click-through rates, conversion rates, time spent viewing, and other relevant metrics.
  2. Tag images: use image recognition technology to tag images on your website, using an API like Google Vision API or Azure AI Custom Vision to identify objects, colors, and other elements in the images.
  3. Analyze performance: analyze the performance of different types of images. You might find that images with a certain color scheme or a particular perspective of a product perform better than others.
  4. A/B testing: test different images to see which ones perform better. This could involve changing the color scheme, the product displayed, or other elements of the images.
  5. Machine learning: use machine learning algorithms to predict which images will perform best. This can involve training a model on your performance data and image tags, and then using this model to predict the performance of new images.
  6. Iterate and improve: use the insights gained from your analysis and testing to continuously improve your images, either making small tweaks to your images, or a complete overhaul of your image strategy.

Tagging images with AI - implementation details

The fastest way is to use the Google Vision API, which provides good results right out of the box. You can test it right here.

If you’re going for a custom solution, you have a lot of different open-source models to choose from. For example, this Vision Transformer (ViT) image classification model will give you more detailed results than Google’s API - but it will require more coding to get it to run and optimize it.

photo description

While the Google Vision API offers a wide array of features and capabilities, a custom solution can provide better results in certain scenarios because of these advantages:

  1. Specificity: custom solutions can be designed to focus on very specific tasks that general APIs like Google Vision API might not cater to.
  2. Flexibility: with a custom solution, you have the flexibility to tweak, modify, and optimize your algorithms over time, making it more adaptable to changing needs and conditions.
  3. Data privacy: by using a custom solution, you have more control over your data. With APIs, there can be a concern about sensitive data being sent to a third party.
  4. Integration: a custom solution can be designed to seamlessly integrate with other systems in your infrastructure, with which off-the-shelf solutions may not be compatible.
  5. Performance: a well-designed custom solution might outperform Google Vision API.
  6. Cost: it’s a higher upfront investment, but it can be more cost-effective in the long run. Especially if it's used heavily, because the costs of using a proprietary API can add up quickly.

Next steps

Generating photo descriptions with AI opens up new avenues of optimization for ecommerce product pages, as well as any other area where images play a crucial role. Both proprietary and open-source, there are many tools that enable this capability at a relatively low cost.

Once you have this capability, you can use this data to optimize your images, perform A/B tests, and ultimately boost your conversion rates across different channels, from your site all the way to social media ads.

Follow us on LinkedIn to stay updated on new R&D projects like this one.

Photo of Patryk Szczygło

More posts by this author

Patryk Szczygło

Patryk is an engineer leading R&D department to develop more knowledge in cutting edge...
How to build products fast?  We've just answered the question in our Digital Acceleration Editorial  Sign up to get access

We're Netguru!

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency
Let's talk business!

Trusted by: