AI Demand Forecasting for Retail and CPG at the SKU-Store Level

Why Retail and CPG Forecasting Is Harder Than It Looks

Retail and consumer packaged goods (CPG) supply chains operate at a level of granularity that most forecasting systems were never designed to handle. A national grocery chain might manage 50,000 SKUs across 2,000 store locations, each with its own local demand pattern, promotional calendar, and shelf-space allocation. Multiply those dimensions by 52 weeks, and the number of individual forecasts required per year runs into the billions.

Three structural forces compound this complexity. First, SKU proliferation has accelerated faster than most planning systems can absorb. Retailers now carry more variants, sizes, and flavors than ever, each with a shorter lifecycle and thinner sales history. Second, multi-channel fragmentation means the same SKU flows through physical stores, e-commerce fulfillment centers, and direct-to-consumer channels, each with different demand signals and lead times. Third, short product lifecycles — particularly in fashion, seasonal CPG, and private-label goods — leave planners with sparse historical data to train models on.

Split-comparison infographic showing traditional forecasting with 25-40% error on the left and AI-driven forecasting with 10-16% error on the right, connected by a central bridge label. — Traditional forecasting methods carry error rates of 25–40%, while AI-driven approaches reduce that to 10–16% according to the International Journal on Science and Technology (IJSAT 2025).

The practical consequence is that a forecast that works at the category or region level often collapses when pushed down to the SKU-store-week level. A model that predicts 12,000 units of laundry detergent for the Northeast region may be off by 40% or more when asked to allocate those units across individual stores in Boston, Portland, and Scranton — each with different brand preferences, price sensitivities, and competitive landscapes.

This is not a problem that can be solved by throwing more data at a time-series model. The structural challenges of retail and CPG demand — cross-product substitution, promotional lift interactions, and supplier constraint propagation — are fundamentally relational. They involve connections between SKUs, stores, promotions, and suppliers that isolated time-series models cannot see.

The Three Blind Spots of Time-Series Models in Retail

Time-series models — whether classical (ARIMA, exponential smoothing) or machine learning-based (Prophet, XGBoost) — share a fundamental limitation: they treat each SKU-store pair as an independent forecasting problem. The model looks at the historical sales of SKU A in Store 1 and projects that pattern forward. It has no mechanism to know that SKU B, sitting on the adjacent shelf, is a near-perfect substitute, or that a 20% discount on SKU C in urban stores will cannibalize SKU D in suburban stores.

This structural blind spot manifests in three specific ways that are particularly damaging in retail and CPG environments.

1. Cross-Product Substitution: The Invisible Demand Shift

When a shopper walks into a store and finds their preferred brand of yogurt out of stock, they do not simply leave. They pick up a competing brand, a different flavor, or a larger size. That demand shift — from the out-of-stock SKU to a substitute — is invisible to a time-series model trained only on the historical sales of each individual SKU.

According to benchmarks from Kumo.ai, cross-product substitution affects 5–8% of SKUs weekly in retail and CPG environments and is the largest source of forecast error after basic seasonality. A model that cannot see substitution will systematically under-forecast the substitute SKU during stockout events and over-forecast the primary SKU when it returns to stock.

2. Promotional Lift Interactions: The Same Deal, Different Results

Promotions drive 20–40% of volume in CPG categories, making them the single largest demand lever available to retailers and manufacturers. Yet most forecasting models treat promotional lift as a simple binary flag: if a promotion is active, apply a fixed multiplier to the baseline forecast.

This approach fails because promotional lift is not uniform. A 20% discount on a premium SKU in an urban store cluster might produce a 3.2x lift, while the same discount on the same SKU in a suburban store cluster produces only a 1.4x lift, per Kumo.ai benchmarks. The difference is driven by store-level demographics, competitive density, and the product's price elasticity — variables that a time-series model cannot incorporate as interaction effects.

3. Supplier Constraint Propagation: The Ripple Effect

When a supplier experiences a production delay or raw material shortage, the impact does not stop at that supplier's direct shipments. It propagates across the entire product portfolio. A delay in packaging materials might affect 50 SKUs simultaneously. A shortage of a key ingredient might force reformulations or substitutions across multiple product lines.

Time-series models have no mechanism to represent these dependencies. Each SKU's forecast is generated independently, so the model cannot learn that a constraint on Supplier X should reduce the forecast for all SKUs that depend on that supplier. The result is systematic over-forecasting during supply disruptions and missed revenue opportunities when constraints ease.

How Relational AI (Graph Neural Networks) Captures Cross-Table Signals

Relational AI — specifically graph neural networks (GNNs) — addresses these blind spots by changing the fundamental data structure used for forecasting. Instead of treating each SKU-store pair as an independent time series, a graph-based model represents SKUs, stores, promotions, and suppliers as interconnected nodes in a graph. Edges between nodes encode relationships: substitution affinity between SKUs, promotional eligibility between a promotion and a store cluster, supply dependency between a supplier and its SKUs.

Conceptual diagram showing three panels on the left (Product Substitution, Promotional Lift Interactions, Supplier Constraint Propagation) flowing into a central graph neural network of interconnected nodes and edges. — Graph neural networks represent SKUs, stores, promotions, and suppliers as interconnected nodes, enabling the model to learn substitution patterns, promotion × store interactions, and supply ripple effects natively.

When the model generates a forecast for a specific SKU-store pair, it does not only look at that pair's historical sales. It aggregates information from neighboring nodes in the graph: the sales of substitute SKUs in the same store, the promotional calendar for that store cluster, the current lead time from the supplier. This message-passing architecture allows the model to learn complex interaction effects that are invisible to isolated time-series models.

The impact on forecast accuracy is substantial. On the SAP SALT enterprise benchmark, Kumo.ai's relational forecasting model (KumoRFM) achieved 89% accuracy compared to 75% for PhD data scientists using XGBoost and 63% for LLM+AutoML approaches. More broadly, enterprise benchmarks suggest that relational AI achieves 85–95% SKU-store-week accuracy versus 60–75% for time-series models on the same data.

Companies that have made the switch from isolated time-series to relational demand models report operational improvements beyond accuracy. Kumo.ai benchmarks indicate a 25% reduction in overstock and $2–5 million in freed working capital per quarter. These outcomes reflect the fact that a model which understands substitution and promotional interactions can make more precise inventory allocation decisions — reducing safety stock without increasing stockout risk.

Real Retail Examples: Substitution, Lift, and Constraint Effects in Action

The theoretical advantages of relational AI become concrete when mapped to real retail scenarios. The following table summarizes three common situations where time-series models fail and relational models succeed.

Three retail scenarios illustrating how relational AI captures signals that time-series models miss.
Scenario	Time-Series Model Behavior	Relational AI Behavior	Business Impact
Stockout on Brand A yogurt shifts demand to Brand B	Forecasts Brand B at baseline; misses 30–50% of actual demand during stockout	Learns substitution affinity from graph; adjusts Brand B forecast upward when Brand A stockout is detected	Reduces lost sales by capturing demand shift; prevents over-ordering when Brand A returns
20% discount on premium SKU in urban vs. suburban stores	Applies same lift multiplier (e.g., 1.8x) to all stores	Learns store-cluster-specific lift: 3.2x in urban, 1.4x in suburban	Optimizes inventory allocation; prevents stockouts in high-lift stores and overstocks in low-lift stores
Supplier packaging delay affects 50 SKUs simultaneously	Each SKU forecasted independently; no constraint signal	Supplier node propagates delay signal to all dependent SKUs; forecasts adjusted downward	Prevents $2–5M in excess inventory per quarter (per Kumo.ai benchmarks)

The substitution example is particularly instructive. In a typical grocery store, 5–8% of SKUs experience a stockout in any given week. For each of those SKUs, demand does not disappear — it shifts to substitutes. A time-series model that cannot see this shift will generate forecasts that are systematically wrong for both the out-of-stock SKU (over-forecast) and the substitute SKU (under-forecast). Over a quarter, these errors compound into significant inventory misallocation.

Data-driven infographic showing Urban Stores with a 3.2x lift bar and Suburban Stores with a 1.4x lift bar, both under a 20% discount tag. — A 20% discount on premium SKUs produces 3.2x lift in urban stores versus 1.4x in suburban stores — a difference that time-series models cannot capture without relational structure.

Promotional Lift Modeling: The Single Largest Demand Driver

Promotions are not just a demand driver — they are the dominant demand driver in CPG, accounting for 20–40% of category volume. Getting promotional lift wrong means getting the entire forecast wrong. Yet most forecasting systems treat promotional lift as a static parameter: a fixed percentage uplift applied whenever a promotion is active.

The reality is far more complex. Promotional lift varies across at least three dimensions:

Product tier: Premium SKUs tend to have higher price elasticity and thus higher promotional lift than value-tier SKUs.
Store cluster: Urban stores with higher foot traffic and more competitive density see different lift patterns than suburban or rural stores.
Promotion type: A percentage discount, a buy-one-get-one offer, and a loyalty-point multiplier each produce different lift profiles, even on the same SKU.

A relational model captures these interactions natively. The graph structure allows the model to learn that a 20% discount on a premium SKU in an urban store cluster produces 3.2x lift, while the same discount on the same SKU in a suburban cluster produces only 1.4x lift. This is not a parameter that a planner sets manually — it is a pattern that the model learns from the data, provided the data includes store-cluster attributes and promotion-type identifiers.

The practical implication is significant. A retailer running a national promotion on a premium SKU might allocate inventory evenly across all stores based on a single lift multiplier. The result: stockouts in high-lift urban stores and excess inventory in low-lift suburban stores. A relational model that captures store-cluster-specific lift enables precise allocation, reducing both lost sales and markdown risk.

Vendor Approaches That Address Retail Complexity

Several vendors have built platforms that address the structural challenges of retail and CPG forecasting, though their approaches differ in methodology, granularity, and integration requirements. The following table summarizes how four representative vendors handle the three blind spots.

Comparison of vendor approaches to the three structural challenges of retail/CPG forecasting. Vendor capability details are drawn from public benchmarks and competitive positioning materials.
Vendor	Core Methodology	Substitution Handling	Promotional Lift Modeling	Supplier Constraint Handling	Granularity
Blue Yonder	ML ensemble with proprietary promotion engine	Rule-based substitution detection with manual override	Store-cluster-specific lift curves; promotion interaction effects	Supply constraint propagation via network model	SKU-store-week
RELEX	ML with shelf-aware demand sensing	Substitution modeled via product affinity scoring	Promotion × store × product interaction model	Constraint propagation via supply chain network graph	SKU-store-day
Kumo.ai	Graph neural network (relational AI)	Native graph-based substitution learning	Graph-based product × store × promotion interaction learning	Native graph-based constraint propagation	SKU-store-week
o9	ML with demand sensing and scenario modeling	Substitution modeled via product attribute similarity	Promotion lift modeled with scenario simulation	Constraint propagation via digital twin	SKU-store-week

The key differentiator among these vendors is how they handle the relational structure of retail demand. Blue Yonder and RELEX use rule-based or scoring-based approaches to model substitution and promotional interactions, which require manual configuration and ongoing maintenance. Kumo.ai uses a graph neural network that learns these relationships automatically from the data. o9 uses a digital twin approach that allows scenario simulation but requires significant data integration effort.

For demand planning managers evaluating these platforms, the critical question is not which vendor has the highest accuracy on a benchmark, but which vendor's approach aligns with their organization's data maturity, SKU complexity, and integration landscape. A graph-native approach may be overkill for a retailer with 500 SKUs and simple promotion calendars, but essential for a national CPG manufacturer managing 50,000 SKUs across 10,000 stores with complex promotion calendars.

Implementation Considerations for Retail and CPG

Deploying AI demand forecasting at the SKU-store level in retail and CPG environments presents practical challenges that go beyond model selection. These challenges are specific to the structural complexity of retail and CPG — not generic implementation pitfalls.

SKU Volume Scalability

The scale of retail and CPG forecasting is enormous. Amazon uses AI-driven demand forecasting for more than 400 million products, according to Intellias. While most retailers operate at a smaller scale, the principle holds: the forecasting system must handle millions of SKU-store-week combinations without degrading performance. Graph-based models scale differently than time-series models — the computational cost grows with the number of edges (relationships) rather than the number of nodes (SKU-store pairs). Organizations should benchmark their specific SKU-store graph size against vendor scalability claims before committing to a platform.

Data Integration with POS and ERP Systems

Relational AI models require data that many retailers do not currently capture in a structured, accessible format. Substitution patterns require point-of-sale (POS) data at the transaction level, not just aggregated sales. Promotional lift modeling requires promotion calendars with store-cluster assignment, not just national promotion flags. Supplier constraint propagation requires real-time or near-real-time supplier lead time data.

For organizations that lack this data, the first implementation step is not model deployment — it is data infrastructure investment. POS data must be centralized, promotion calendars must be digitized with store-level granularity, and supplier data must be integrated through a supply chain control tower or similar platform.

Model Retraining Cadence

Retail and CPG demand patterns shift rapidly. New product introductions, competitor actions, and changing consumer preferences mean that a model trained on last year's data may be significantly less accurate this year. Graph-based models require retraining when the graph structure changes — when new SKUs are added, stores are opened or closed, or supplier relationships change. Organizations should plan for weekly or bi-weekly retraining cycles during peak seasons and monthly retraining during stable periods.

Staged Adoption Model

The GroupBWT staged adoption model provides a useful framework for retail and CPG organizations:

Pilot (0–3 months, $100K–$500K): Select a single category or store cluster. Focus on proving that relational AI captures substitution and promotional lift effects that the current system misses. Measure forecast accuracy improvement and inventory reduction.
Expansion (6–12 months): Scale to multiple categories and store clusters. Integrate supplier constraint data. Begin using relational forecasts for inventory allocation decisions.
Enterprise (18–24 months): Full deployment across all SKUs and stores. Integrate with S&OP and IBP processes. Use relational forecasts for promotional planning and supplier collaboration.
Adaptive (36+ months, budgets exceeding $10M): Continuous model retraining with real-time data feeds. Autonomous inventory replenishment decisions for select categories. Full digital twin integration.

For demand planning managers and retail/CPG supply chain practitioners, the path forward is clear: the structural challenges of SKU-store-level forecasting — substitution, promotional lift interactions, and supplier constraint propagation — cannot be solved by incremental improvements to time-series models. They require a fundamentally different approach to representing and learning from the relational structure of retail demand. Graph-based AI offers that approach, with documented accuracy improvements and operational outcomes that justify the investment in data infrastructure and organizational change.

AI Demand Forecasting in Retail and CPG: Solving Substitution, Promotional Lift, and Supplier Constraints at the SKU-Store Level