Speeding Up Merck’s Process from 6 Months to 6 Hours with an AI R&D Assistant

Two researchers sitting in a laboratory

About this project

Merck cut chemical identification time from 6 months to 6 hours using an AI R&D Assistant, showcasing a groundbreaking efficiency boost in their research process.


Rapid prototyping
Research and development



Merck, a leading company in the life science industry, slashed their chemical identification time from 6 months to 6 hours by switching to an AI R&D Assistant.


Merck, known as the Merck Group, is a multinational science and technology company based in Darmstadt, Germany. Operating in 66 countries with approximately 57,000 employees, Merck's main business focus lies in healthcare, life science, and electronics.

As the world's oldest operating chemical and pharmaceutical company, Merck has a significant presence in Europe, Africa, Asia, Oceania, and Latin America.

Two medical experts looking at a tablet


Merck wanted to reduce the manual work of domain experts whose responsibility was identifying key chemical compounds from scientific literature for future sales.

We decided to implement an AI R&D Assistant to solve the problem, and we only had 5 weeks to provide them with a POC.

With a team composed of data engineers and engineer leads, we got straight into work.

Our scope was to:

  • Analyze and identify all the chemicals mentioned in articles and their role within the system
  • Assign official names and unique codes to all identified molecules for clarity
  • Utilize large chemical databases to retrieve additional chemical properties
  • Identify which chemicals are already sold. by Merck by checking the whole Catalog DB

Project Journey

To get started, we built an interface on which the domain expert can upload PDF files of scientific literature, the file then would be processed by AI to extract chemical compounds from it and retrieve information about each chemical from a chemical database..

We used LangChain (a tool that gives developers a framework to construct LLM‑powered apps easily) as the main extractor making requests to AzureOpenAI endpoints.

From there, we were able to retrieve information such as InchiKey, Smiles code, molecular formula, and synonyms as well as CAS numbers extracted from synonyms.

The final step was to check in the Merck Catalog DB if they already sell the chemical, and lastly to display the results on the interface, on which we display all the information as well as a 2D image representation of the chemical.

Two medical researchers in a laboratory


  • There was massive input data exceeding the limitations of popular LLMs like ChatGPT, which was solved by splitting the input into chunks
  • The internal Catalog DB did not contain all chemical details, so we had to process the whole catalog and enrich it to have InchiKey, CAS numbers, etc… for each chemical.


  • The POC was ready within the 5 weeks deadline and under budget
  • We developed a unique solution for the client based on their specific needs
  • We conducted regular tracking and documentation of project progress
  • We hosted the POC on Merck’s secure AWS infrastructure and used their own GPT service
  • Manually the whole process takes 6 months, the AI R&D Assistant was able to complete the task in ~6 hours
  • If I had to choose two words to summarize our collaboration with Netguru it would be speed and efficacy. They were able to provide us with a POC within the 5 weeks deadline and now, with the AI R&D Assistant, we can complete our task in a fraction of the time.
    Mark Greiner-2

    Mark Greiner

    Digital Innovation Manager at Merck KGaA Darmstadt

We're Netguru

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency.

Let's talk business