IDEA Foundation
Retail · E-commerceUnited KingdomAzure

200,000 SKUs into an 8-level taxonomy — at 95%+ accuracy.

An AI-powered automated category mapping system for mcgrocer.com, combining LLM semantic understanding with vector similarity — taking catalog work from months to hours and enabling real-time vendor inventory ingestion.

About the organisation

mcgrocer.com is a fast-growing UK-based online grocery aggregator partnering with multiple supermarkets, wholesalers, and specialty retailers. Their platform hosts hundreds of thousands of products sourced from various vendors, each using different naming conventions and category structures. As the business scaled, maintaining a clean, consistent, searchable product catalog became critical to the customer experience.

Why

With thousands of new products being added from multiple UK retailers, mcgrocer.com needed a way to keep its catalog accurate, structured, and up to date. Inconsistent source categories and manual intervention were causing delays, classification errors, and heavy operational overhead. As the catalog scaled toward 200,000+ SKUs, product discovery suffered and internal teams struggled to maintain taxonomy quality.

What

An AI-powered Automated Category Mapping System was built to analyse product data and map each item into mcgrocer.com's redesigned 8-level taxonomy. The system ensures every SKU is classified accurately and consistently — without human intervention — enabling real-time catalog updates across all vendor partners.

How

The platform uses a hybrid of LLM-based semantic understanding and vector similarity scoring to interpret product titles, descriptions, and attributes. It automatically determines the most accurate placement within the multi-level taxonomy and validates results through confidence scoring and rule-based checks. Batch automation, continuous learning, and Azure-native deployment allow the system to process large volumes instantly as vendors push fresh inventory.

The starting state
  • Brute-force scraping collected product data from partner stores in inconsistent shapes.
  • Products frequently appeared in wrong or irrelevant categories.
  • New items required manual review and reassignment.
  • Taxonomy alignment was inconsistent across vendors.
  • Manual teams spent hours mapping SKUs one at a time.
  • Catalog updates were slow, error-prone, and impacted search and navigation.
Components

What was built and how it fits together.

AI-powered category mapping engine

Autonomously analyses titles, descriptions, ingredients, brand names, and attributes to identify the correct taxonomy path.

8-level taxonomy alignment

Maps each product across all levels — from broad department to the deepest sub-category — ensuring consistent classification across the platform.

LLM + vector similarity hybrid

Combines semantic understanding with similarity scoring for precise, context-aware results.

Automated batch processing

New products are processed instantly as they arrive from vendors — eliminating delays and manual sorting.

Validation logic & confidence scoring

Built-in rules ensure high accuracy, flagging only uncertain items for human review.

Continuous learning

The engine improves accuracy over time by learning from corrections and historical patterns.

Azure-deployed architecture

All processing, classification, and scaling workloads run on Azure for reliability and seamless scalability.

Outcomes in production

The operational result, measured against the starting state.

  • 95%+ classification accuracy across all category levels.
  • Hours instead of months to classify large batches of products.
  • Near real-time updates as new items arrive from partner stores.
  • Consistent taxonomy across 200,000+ SKUs.
  • Improved product discovery and navigation for shoppers.
  • Reduced operational workload for the manual catalog team.
  • Future-ready scalability as new vendors and categories are added.

Have a similar problem? Let’s talk.