The Composable CDP: Why Your Warehouse is the Source of Truth
Stop paying Segment $100k/year. A technical guide to the Composable CDP stack: Snowflake, dbt, and Hightouch (Reverse ETL).
The “Customer Data Platform” (CDP) industry is one of the biggest rackets in SaaS. Tools like Segment, mParticle, or Salesforce CDP charge you based on “Monthly Tracked Users” (MTU). If a user visits your site once, you pay. If you have 10 million dusty emails in your database from 2015, you pay. Enterprise bills often exceed $200,000/year just to store data you already own.
In 2025, the best engineering teams are killing the Monolithic CDP. They are moving to the Composable CDP. The logic is simple: Your Data Warehouse (Snowflake/BigQuery) is the CDP. It is cheap, scalable, and you own it. You just need a pipe to move the data out of the warehouse to your marketing tools (Klaviyo/Meta). That pipe is Reverse ETL (Hightouch).
Why Maison Code Discusses This
At Maison Code Paris, we act as the architectural conscience for our clients. We often inherit “modern” stacks that were built without a foundational understanding of scale. We see simple APIs that take 4 seconds to respond because of N+1 query problems, and “Microservices” that cost $5,000/month in idle cloud fees.
We discuss this topic because it represents a critical pivot point in engineering maturity. Implementing this correctly differentiates a fragile MVP from a resilient, enterprise-grade platform that can handle Black Friday traffic without breaking a sweat.
1. The Architecture: Unbundling Segment
The Monolithic CDP does three things:
- Event Collection:
analytics.track() - Identity Resolution: Merging
user_123withcookie_abc. - Activation: Sending audiences to Facebook Ads.
The Composable CDP splits this:
- Collection: Rudderstack (Open Source) or Snowplow.
- Storage: Snowflake (Cheap storage).
- Transformation: dbt (SQL logic).
- Activation: Hightouch (The “Reverse ETL”).
graph LR
subgraph Sources
Store[Shopify] -->|Fivetran| Warehouse
Web[Web Events] -->|Rudderstack| Warehouse
end
subgraph Warehouse[Snowflake]
Raw[Raw Tables] -->|dbt| Gold[Gold Customer Table]
end
subgraph Activation
Gold -->|Hightouch| FB[Facebook Ads]
Gold -->|Hightouch| Email[Klaviyo]
end
2. The Power of SQL: Identity Resolution
In Segment, you are stuck with their Identity Graph logic. In Snowflake, you write the logic in SQL (dbt). You have infinite flexibility.
Scenario: You want to link “Offline Store Purchases” to “Online Web Browsing”. Segment struggles with this if the email doesn’t match perfectly. In dbt, you can write fuzzy matching logic.
-- models/gold/dim_users.sql
WITH web_users AS (
SELECT DISTINCT email, cookie_id FROM raw.web_events
),
pos_users AS (
SELECT email, phone, loyalty_card FROM raw.pos_transactions
)
SELECT
COALESCE(w.email, p.email) as master_email,
w.cookie_id,
p.loyalty_card,
-- Custom logic: If they bought in-store, they are VIP
CASE WHEN p.loyalty_card IS NOT NULL THEN 'VIP' ELSE 'Standard' END as segment
FROM web_users w
FULL OUTER JOIN pos_users p ON w.email = p.email
You now have a gold.dim_users table which is the Single Source of Truth for the entire company.
3. Activation: Syncing to the Edge
Marketing tools (Klaviyo) are dumb databases. They need us to tell them who to email.
Instead of building a custom python script snowflake_to_klaviyo.py (which breaks every week), we use Hightouch.
Hightouch simply queries your Gold Table and maps the fields.
Query:
SELECT email, first_name, favorite_color
FROM dim_users
WHERE segment = 'VIP' AND last_purchase_date < NOW() - INTERVAL '90 DAYS'
Mapping:
email-> Klaviyoemailfavorite_color-> Klaviyocustom_properties.color
Hightouch runs this every 15 minutes. It handles rate limits, retries, and API changes.
4. Operational Analytics: Slack Alerts
CDPs are usually “Marketing only”. But the Composable CDP serves Engineering and Sales too. We can use Hightouch to send data to Slack.
Use Case: High Value Failures
If a user with LTV > $5000 gets a Payment Failed error.
Standard Flow: User sees error. Leaves. We lose a VIP.
Composable Flow:
- dbt models
failures_last_hour. - Hightouch syncs this to Slack channel
#vip-support. - Support Agent sees: “VIP Alex Failed Payment. Phone: 555-0199”.
- Agent calls Alex immediately. “Can I help you complete the order?”
This is Data Activation. It turns a massive database into actionable revenue.
5. Privacy and Governance (GDPR)
In a Monolithic CDP, deleting a user is a nightmare. You have to ask Segment to delete it, then hope they propagate it.
In Composable, you delete the row in Snowflake.
Hightouch detects the deletion (diff) and sends a DELETE request to Facebook, Google, and Klaviyo automatically.
One query enforces GDPR across your entire stack.
6. The Cookie Apocalypse (ITP 2.5)
Apple (Safari) kills client-side cookies after 7 days (ITP).
If a user visits on Monday and returns next Wednesday, Segment thinks they are a New User.
Your Attribution is broken.
Server-Side Tracking fixes this.
Because we control the domain (data.maisoncode.paris), we can set HttpOnly cookies that last 2 years.
Rudderstack handles this out of the box.
This recovers 20% of lost attribution for clients with high Apple traffic (Fashion/Luxury).
7. Identity Resolution Algorithms
How do you know user_123 is alex@gmail.com?
There are two strategies:
- Deterministic: Exact match. (Email = Email). accuracy 100%. match rate 40%.
- Probabilistic: “Same IP + Same Device Model + Same Location”. accuracy 80%. match rate 90%. For CDPs, we prefer Deterministic. We do not want to email the wrong person. However, for Ad Targeting, we check Probabilistic. It’s okay if 10% of people see the wrong ad, if it means doubling your reach. Snowflake allows you to run both graphs simultaneously.
8. The Cost Equation
Let’s compare a client with 500k MTUs.
Segment (Business Plan):
- Protocol: Included
- Personas: Add-on
- Total: ~$60,000 / year.
Composable Stack:
- Rudderstack (Open Source): $0 (Hosted on AWS).
- Snowflake: $500 / month (Storage + Compute).
- Hightouch: $800 / month.
- Total: ~$15,000 / year.
Savings: 75%. Plus, you own the data. If you cancel Hightouch, you still have your Snowflake tables. If you cancel Segment, you lose your graph.
7. The “Real-Time” Myth
Marketers love to scream: “We need Real-Time Personalization!” Engineers must ask: “Do you really?” Scenario A: User abandons cart.
- Need: Send email in 1 hour.
- Tool: Warehouse (Batch). Sufficient. Scenario B: User clicks “Red Shoes”. Homepage Hero should change to “Red Shoes” immediately.
- Need: < 200ms latency.
- Tool: Edge Middleware (Vercel/Cloudflare). The Warehouse is for Strategic Data (Email, Ads, Analysis). The Edge is for Tactical Data (UI Personalization). Don’t try to force Snowflake to do sub-second queries. That is not its job.
8. The Cost Trap of “Free” Analytics
Google Analytics 4 (GA4) is free. But it is sampled.
And the BigQuery export can get expensive ($0.05 per GB queried).
But compared to Adobe Analytics ($100k+), it is a steal.
The Trap: Storing everything.
Engineers tend to log mouse_move, scroll_depth_10%, scroll_depth_20%.
This creates “Data Swamps”. Billions of rows of noise.
Rule: Only track an event if you have a Business Question attached to it.
“If we track scroll depth, what decision will we change?”
If the answer is “None”, delete the tracking code. Save the bytes.
9. Conclusion
Data is gravity. The more data you put into a proprietary SaaS (Segment/Salesforce), the harder it is to leave. The Database is the only technology that has survived 40 years. Bet on SQL. Bet on the Warehouse. Build pipes, not silos.
Reducing Data Spend?
Are you paying for “MTUs” that don’t convert?
Build a Composable Stack. Read about Attribution SQL and Server-Side Tagging.
“But Segment is real-time. Snowflake is batch.” True. Data warehouses have latency (loading data + dbt build). Usually 15-30 minutes. If you need sub-second personalization (e.g., showing a popup based on the click they just did 1 second ago), the Composable CDP is too slow. Solution: Use Client-Side edge personalization (Edge Middleware) for the “Hot” path. Use Composable CDP for the “Cold” path (Email, Ads, Retention).
8. Conclusion
Data is gravity. The more data you put into a proprietary SaaS (Segment/Salesforce), the harder it is to leave. The Database is the only technology that has survived 40 years. Bet on SQL. Bet on the Warehouse. Build pipes, not silos.
Reducing Data Spend?
Are you paying for “MTUs” that don’t convert?