Speeding Up Merck’s Process from 6 Months to 6 Hours with an AI R&D Assistant
Merck cut chemical identification time from 6 months to 6 hours using an AI R&D Assistant, showcasing a groundbreaking efficiency boost in their research process.
Client
Merck, known as the Merck Group, is a multinational science and technology company based in Darmstadt, Germany. Operating in 66 countries with approximately 57,000 employees, Merck's main business focus lies in healthcare, life science, and electronics.
As the world's oldest operating chemical and pharmaceutical company, Merck has a significant presence in Europe, Africa, Asia, Oceania, and Latin America.
Project
Merck wanted to reduce the manual work of domain experts whose responsibility was identifying key chemical compounds from scientific literature for future sales.
We decided to implement an AI R&D Assistant to solve the problem, and we only had 5 weeks to provide them with a POC.
With a team composed of data engineers and engineer leads, we got straight into work.
Our scope was to:
- Analyze and identify all the chemicals mentioned in articles and their role within the system
- Assign official names and unique codes to all identified molecules for clarity
- Utilize large chemical databases to retrieve additional chemical properties
- Identify which chemicals are already sold. by Merck by checking the whole Catalog DB
Project
To get started, we built an interface on which the domain expert can upload PDF files of scientific literature, the file then would be processed by AI to extract chemical compounds from it and retrieve information about each chemical from a chemical database..
We used LangChain (a tool that gives developers a framework to construct LLM‑powered apps easily) as the main extractor making requests to AzureOpenAI endpoints.
From there, we were able to retrieve information such as InchiKey, Smiles code, molecular formula, and synonyms as well as CAS numbers extracted from synonyms.
The final step was to check in the Merck Catalog DB if they already sell the chemical, and lastly to display the results on the interface, on which we display all the information as well as a 2D image representation of the chemical.
Challenges
- There was massive input data exceeding the limitations of popular LLMs like ChatGPT, which was solved by splitting the input into chunks
- The internal Catalog DB did not contain all chemical details, so we had to process the whole catalog and enrich it to have InchiKey, CAS numbers, etc… for each chemical.
Results
- The POC was ready within the 5 weeks deadline and under budget
- We developed a unique solution for the client based on their specific needs
- We conducted regular tracking and documentation of project progress
- We hosted the POC on Merck’s secure AWS infrastructure and used their own GPT service
- Manually the whole process takes 6 months, the AI R&D Assistant was able to complete the task in ~6 hours