200,000 SKUs into an 8-level taxonomy — at 95%+ accuracy.
An AI-powered automated category mapping system for mcgrocer.com, combining LLM semantic understanding with vector similarity — taking catalog work from months to hours and enabling real-time vendor inventory ingestion.
mcgrocer.com is a fast-growing UK-based online grocery aggregator partnering with multiple supermarkets, wholesalers, and specialty retailers. Their platform hosts hundreds of thousands of products sourced from various vendors, each using different naming conventions and category structures. As the business scaled, maintaining a clean, consistent, searchable product catalog became critical to the customer experience.
With thousands of new products being added from multiple UK retailers, mcgrocer.com needed a way to keep its catalog accurate, structured, and up to date. Inconsistent source categories and manual intervention were causing delays, classification errors, and heavy operational overhead. As the catalog scaled toward 200,000+ SKUs, product discovery suffered and internal teams struggled to maintain taxonomy quality.
An AI-powered Automated Category Mapping System was built to analyse product data and map each item into mcgrocer.com's redesigned 8-level taxonomy. The system ensures every SKU is classified accurately and consistently — without human intervention — enabling real-time catalog updates across all vendor partners.
The platform uses a hybrid of LLM-based semantic understanding and vector similarity scoring to interpret product titles, descriptions, and attributes. It automatically determines the most accurate placement within the multi-level taxonomy and validates results through confidence scoring and rule-based checks. Batch automation, continuous learning, and Azure-native deployment allow the system to process large volumes instantly as vendors push fresh inventory.
- ▸Brute-force scraping collected product data from partner stores in inconsistent shapes.
- ▸Products frequently appeared in wrong or irrelevant categories.
- ▸New items required manual review and reassignment.
- ▸Taxonomy alignment was inconsistent across vendors.
- ▸Manual teams spent hours mapping SKUs one at a time.
- ▸Catalog updates were slow, error-prone, and impacted search and navigation.
What was built and how it fits together.
AI-powered category mapping engine
Autonomously analyses titles, descriptions, ingredients, brand names, and attributes to identify the correct taxonomy path.
8-level taxonomy alignment
Maps each product across all levels — from broad department to the deepest sub-category — ensuring consistent classification across the platform.
LLM + vector similarity hybrid
Combines semantic understanding with similarity scoring for precise, context-aware results.
Automated batch processing
New products are processed instantly as they arrive from vendors — eliminating delays and manual sorting.
Validation logic & confidence scoring
Built-in rules ensure high accuracy, flagging only uncertain items for human review.
Continuous learning
The engine improves accuracy over time by learning from corrections and historical patterns.
Azure-deployed architecture
All processing, classification, and scaling workloads run on Azure for reliability and seamless scalability.
The operational result, measured against the starting state.
- ▸95%+ classification accuracy across all category levels.
- ▸Hours instead of months to classify large batches of products.
- ▸Near real-time updates as new items arrive from partner stores.
- ▸Consistent taxonomy across 200,000+ SKUs.
- ▸Improved product discovery and navigation for shoppers.
- ▸Reduced operational workload for the manual catalog team.
- ▸Future-ready scalability as new vendors and categories are added.
