MAISON CODE .
/ XML · Feeds · Google Shopping · Automation · Data Engineering

The Feed: Engineering Data Pipelines for Google Shopping

Why default Shopify feeds fail at scale. A technical guide to building high-performance XML pipelines, using the Content API, and optimizing Custom Labels for ROAS.

AB
Alex B.
The Feed: Engineering Data Pipelines for Google Shopping

If you are a fashion brand, your website is your flagship store. But your Product Feed is your billboard, your catalog, and your salesperson, distributed across the entire internet. For most merchants, the Product Feed is an afterthought. They install a “Google Shopping” plugin, click “Sync”, and forget it.

This is why they lose.

At Maison Code Paris, we treat the Product Feed as a Data Product. It is an engineering artifact that directly correlates with Return on Ad Spend (ROAS). If your feed is slow, inaccurate, or generic, you are paying a “Lazy Tax” to Google.

This guide explores how to re-engineer the feed from a passive XML file into a dynamic, revenue-generating pipeline.

Why Maison Code Discusses This

At Maison Code Paris, we act as the architectural conscience for our clients. We often inherit “modern” stacks that were built without a foundational understanding of scale. We see simple APIs that take 4 seconds to respond because of N+1 query problems, and “Microservices” that cost $5,000/month in idle cloud fees.

We discuss this topic because it represents a critical pivot point in engineering maturity. Implementing this correctly differentiates a fragile MVP from a resilient, enterprise-grade platform that can handle Black Friday traffic without breaking a sweat.

The Problem with “Default” Syncs

Standard platforms (Shopify, Magento, Salesforce) offer native integrations. These fail at scale (GMV > $10M) for three reasons:

  1. Latency: They typically sync once per 24 hours. If you sell out of a SKU at 10:00 AM, you continue paying for clicks until the next sync at 2:00 AM. This is wasted spend.
  2. Generic Titles: They map your internal CMS title ("Crop Top") directly to Google. Google wants "Woman's Cotton Crop Top - Black - Size M".
  3. Zero Strategy: They populate required fields, but ignore custom_labels. You cannot bid differently on “High Margin” vs “Clearance” items because the data isn’t there.

Architecture: The Hybrid Pipeline

We do not rely on apps. We build a custom pipeline on AWS/Vercel. We use a Hybrid Approach:

  1. Bulk Sync (XML): A daily regeneration of the full catalog for structural data.
  2. Incremental Sync (API): Real-time updates for Price and Availability.
graph TD
    CMS[Headless CMS / Shopify] -->|Nightly Cron| Generator[Node.js XML Generator]
    Generator -->|Stream| S3[S3 Bucket: feed.xml]
    S3 -->|Fetch| GMC[Google Merchant Center]
    
    CMS -->|Webhook: PRICE_UPDATE| API[Serverless Function]
    API -->|Push| ContentAPI[Google Content API]
    
    ContentAPI -->|Instant Update| GMC

Phase 1: High-Performance XML Generation

Generating an XML file for 50,000 SKUs is heavy. If you load all products into memory, your Node.js process will crash (Heap Out of Memory). We use Streams.

The Streaming Generator

We fetch products using Cursor-based pagination (GraphQL), transform them, and pipe the result directly to the S3 upload stream.

import { Transform } from 'stream';
import { createGzip } from 'zlib';
import { S3 } from '@aws-sdk/client-s3';

// 1. Transform Stream: JSON Product -> XML String
const xmlTransform = new Transform({
  writableObjectMode: true,
  transform(product, encoding, callback) {
    const xmlNode = `
    <item>
      <g:id>${product.sku}</g:id>
      <g:title><![CDATA[${optimizeTitle(product)}]]></g:title>
      <g:price>${product.price.amount} ${product.price.currency}</g:price>
      <g:link>${product.onlineStoreUrl}</g:link>
      <g:cogs>${product.cost}</g:cogs> <!-- Custom Margin Data -->
    </item>
    `;
    callback(null, xmlNode);
  }
});

// 2. The Pipeline
async function generateFeed() {
  const s3Stream = new PassThrough();
  const upload = new Upload({
    client: new S3({}),
    params: { Bucket: 'feeds', Key: 'google.xml.gz', Body: s3Stream }
  });

  const productStream = getShopifyProductStream(); // Custom Generator

  productStream
    .pipe(xmlTransform)
    .pipe(createGzip()) // Always compress
    .pipe(s3Stream);

  await upload.done();
}

This pipeline allows us to generate distinct feeds for distinct regions (US, EU, UK) in parallel with minimal memory footprint.

Phase 2: The Logic Layer (Data Enrichment)

This is where Engineering meets Marketing. We don’t just pass data; we enhance it.

Title Optimization (SEO for Ads)

The algorithm matches queries to your Title.

  • Bad: “Air Max 90” (Internal CMS Name).
  • Good: “Nike Air Max 90 Men’s Running Shoe - White/Red - Size 10”.

We utilize a template engine: Title = [Brand] + [Gender] + [Collection] + [Product Type] + [Color] + [Material]

Custom Labels for Bidding

Google allows 5 custom labels (custom_label_0 to 4). This is your secret weapon. We programmatically populate these based on business logic:

  • Label 0 (Margin): If (Price - Cost) > $50, set "High_Margin". Bid high.
  • Label 1 (Season): If tags contains “Summer25”, set "New_Arrival".
  • Label 2 (Performance): Sync with Google Analytics. If ConversionRate > 3%, set "Best_Seller".
  • Label 3 (Stock): If inventory < 5, set "Low_Stock". Stop generic ads, push urgency.

Phase 3: The Content API (Real-Time)

For Price and Stock, XML is too slow. We use the Google Content API for Shopping.

When a purchase happens on the store, a webhook fires. OrderCreated -> InventoryLevel: 0.

Our serverless function immediately hits Google:

import { content_v2_1 } from '@googleapis/content';

async function updateGoogleStock(sku: string, quantity: number) {
  const auth = await getGoogleAuth();
  const content = new content_v2_1.Content({ auth });

  await content.inventory.set({
    merchantId: '12345678',
    storeCode: 'online', // or local store code
    productId: `online:en:US:${sku}`,
    requestBody: {
      availability: quantity > 0 ? 'in stock' : 'out of stock',
      // We can also update sale price here instantly
      salePrice: { value: '99.00', currency: 'USD' }
    }
  });
}

Latency: < 2 minutes. Result: You never pay for a click on an out-of-stock item.

Expanding Channels: Meta, Pinterest, TikTok

Once you have this raw data pipeline, you are not limited to Google.

  • Meta (Facebook/Instagram): Accepts a similar CSV format. We fork the stream, map g:id to fb:id, and upload to the Catalog Manager.
  • TikTok: Requires video assets. We can map custom_label_4 to a URL of a generated video asset (see AI Agents).
  • Local Inventory Ads (LIA): If you have physical stores, we generate a secondary feed linking store_code (Paris Champs-Elysées) to quantity. When a user is near Paris, the ad says “Pick up today”.

Common Pitfalls (The “Disapproved” Nightmare)

  1. GTIN Mismatch: Google rigorously checks UPC/EAN barcodes. If you send a fake GTIN, the product is banned. If you don’t have one, send identifier_exists: no.
  2. Image Overlays: Google demands white backgrounds. If your main image has a “Sale” watermark, it will be rejected. Our pipeline checks image metadata or uses Cloudinary transformed URLs to strip overlays.
  3. Price Mismatch: If the XML says $100 and the Landing Page says $101 (due to currency conversion or updates), Google suspends the account. This is why the Content API is mandatory for real-time consistency.

10. Feed Rules vs Source Editing

Google Merchant Center allows “Feed Rules”. “If Title contains ‘Nike’, append ‘Sneakers’.” Do not use this. Logic hidden in GMC is invisible to your developers. If you change the title in the code, and GMC changes it back, you will spend weeks debugging. Rule: Logic belongs in the Code Pipeline (Source), not in the Destination Interface.

11. Inventory Buffers (The Safety Net)

Your warehouse sync isn’t instant. It takes 10 minutes. In those 10 minutes, you might sell your last unit on Amazon. The Google User clicks… “Out of Stock”. You paid for that click. The Fix: if quantity < 3, set availability: out_of_stock. We intentionally “hide” the last few units from Ad Networks to prevent “Bounce Rate” spikes and bad Customer Experience.

12. Feeding the Beast: Performance Max (PMax)

Google’s new “Black Box” campaign type (PMax) loves assets. It doesn’t just want a title. It wants:

  • lifestyle_images: Array of URLs showing the product in use.
  • short_description: 150 chars.
  • product_highlight: Bullet points. Most connectors drop these. We map our Sanity CMS fields to these extended attributes. The more context PMax has, the cheaper your CPC becomes.

13. A/B Testing Feed Titles

Does “Nike Air Max” click better than “Mens Running Shoe”? You don’t know. We split the product ID.

  • ID-123-A -> Title A.
  • ID-123-B -> Title B. We send both variants to Google (as separate products, but sharing stock via Item Group ID). We analyze the CTR. Winner takes all. This is Feed experimentation.

14. Conclusion

The Product Feed is the cardiovascular system of e-commerce. It pumps products into the ecosystem of the internet. If the data is rich, clean, and fast, the ads perform. If the data is poor, the algorithm starves.

We don’t just “sync” products. We engineer visibility.


Is your feed bleeding money?

If you are seeing “Price Mismatch” errors or low ROAS.

Hire our Architects.