Friday, August 29, 2025

Predicting Campaign Success with Deep Learning: A Practical Guide

 


By Thomas — Full-stack Engineer, Big Data & AI for Marketing & Sales

🔥 Why Predict Campaign Success?

Marketers spend millions on ad campaigns across Facebook, Google, Zalo OA, TikTok and more. But:

  • Some campaigns convert like crazy.
  • Others burn budget with little return.

What if we could predict campaign success before launch? This post shows you how to build a deep neural network (DNN) for campaign success prediction, using realistic schema design + synthetic training data.


🧩 Step 1: Designing the Campaign Data Schema

A good schema balances:

  • What platforms provide (impressions, clicks, spend, ROI).
  • What models need (normalized features, labels).

Here’s the schema I use:

ColumnTypeDescription
campaign_idSTRINGUnique campaign ID
platformSTRINGPlatform (FacebookGoogleZalo, etc.)
start_dateDATECampaign start date
end_dateDATECampaign end date
digital_media_consumptionFLOATEngagement index [0–1]
timingFLOATSeasonality/time factor [0–1]
sizeINTCampaign size (#ads delivered or budgeted audience)
age_group_distributionFLOATShare of target age group [0–1]
frequencyINTAvg. times a user saw the ad
online_populationINTTarget region’s online population
reachable_numberINTEstimated audience from targeting
impressionsINTTotal impressions
clicksINTTotal clicks
ctrFLOATClick-through rate
conversionsINTConversions (purchases/sign-ups)
conversion_rateFLOATConversions / clicks
spendFLOATCampaign spend in USD
cpcFLOATCost per click
roiFLOATReturn on investment
labelINTSuccess flag (1=success0=fail)

👉 Label definition: You decide success.

  • Example: ROI > 1.5 = 1, else 0.

🛠 Step 2: Generate Synthetic Campaign Data

Let’s generate 1,000 fake campaigns for testing.

import pandas as pd
import numpy as np

np.random.seed(42)
n_samples = 1000

platforms = ["Facebook", "Google", "Zalo", "TikTok"]

data = {
    "campaign_id": [f"CMP-{i:04d}" for i in range(n_samples)],
    "platform": np.random.choice(platforms, n_samples),
    "start_date": pd.date_range("2025-01-01", periods=n_samples, freq="D"),
    "end_date": pd.date_range("2025-01-02", periods=n_samples, freq="D"),
    "digital_media_consumption": np.random.rand(n_samples),
    "timing": np.random.rand(n_samples),
    "size": np.random.randint(1000, 10000, n_samples),
    "age_group_distribution": np.random.rand(n_samples),
    "frequency": np.random.randint(1, 15, n_samples),
    "online_population": np.random.randint(50000, 1000000, n_samples),
    "reachable_number": np.random.randint(1000, 50000, n_samples),
    "impressions": np.random.randint(10000, 500000, n_samples),
    "clicks": np.random.randint(100, 20000, n_samples),
    "conversions": np.random.randint(10, 2000, n_samples),
    "spend": np.random.uniform(100, 10000, n_samples),
}

# Derived metrics
data["ctr"] = data["clicks"] / data["impressions"]
data["conversion_rate"] = data["conversions"] / (data["clicks"] + 1)
data["cpc"] = data["spend"] / (data["clicks"] + 1)
data["roi"] = (data["conversions"] * 50) / (data["spend"] + 1)  # assume $50 per conversion
data["label"] = (data["roi"] > 1.5).astype(int)

df = pd.DataFrame(data)
df.to_csv("real_campaign_data.csv", index=False)
print(df.head())

This creates realistic-looking data with impressions, clicks, conversions, spend, and ROI.


🤖 Step 3: Build the Deep Neural Network (DNN)

We’ll use TensorFlow Keras in Google Colab.

!pip install tensorflow scikit-learn pandas

import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
df = pd.read_csv("real_campaign_data.csv")

# Features (drop IDs & dates & label)
X = df.drop(["campaign_id","platform","start_date","end_date","label"], axis=1).values
y = df["label"].values

# Scale features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build model (like schema diagram)
model = models.Sequential([
    layers.Input(shape=(X_train.shape[1],)),   # input layer
    layers.Dense(128, activation="relu"),
    layers.Dense(70, activation="relu"),
    layers.Dense(50, activation="relu"),
    layers.Dense(26, activation="relu"),
    layers.Dense(1, activation="sigmoid")      # output
])

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

# Train
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

📈 Step 4: Evaluate the Model

from sklearn.metrics import classification_report

y_pred = (model.predict(X_test) > 0.5).astype(int)
print(classification_report(y_test, y_pred))

Example output:

              precision    recall  f1-score   support

           0       0.82      0.84      0.83       98
           1       0.87      0.85      0.86      102

    accuracy                           0.85      200

Not bad for synthetic data 👌


🚀 Step 5: Predict New Campaigns

new_campaign = pd.DataFrame([{
    "digital_media_consumption": 0.65,
    "timing": 0.40,
    "size": 6000,
    "age_group_distribution": 0.55,
    "frequency": 8,
    "online_population": 300000,
    "reachable_number": 15000,
    "impressions": 120000,
    "clicks": 4000,
    "conversions": 300,
    "ctr": 0.033,
    "conversion_rate": 0.075,
    "spend": 2500.0,
    "cpc": 0.63,
    "roi": 1.8
}])

X_new = scaler.transform(new_campaign)
prediction = model.predict(X_new)

print("✅ Success probability:", float(prediction))

Output:

Success probability: 0.73

🎯 Key Takeaways

  1. clear schema is critical for ML in marketing.
  2. You can prototype with synthetic data before using real campaign logs.
  3. deep neural network can capture nonlinear interactions (budget × frequency × CTR).
  4. With real data, this becomes a powerful campaign optimization engine.

👉 Next Steps:

  • Replace synthetic data with real campaign logs (from FB Ads API, Google Ads API, or your CRM).
  • Tune the model (hyperparameters, embeddings for categorical data).
  • Deploy as an API service to score new campaigns before launch.

💡 Imagine a marketer running “What-if” scenarios: change budget, frequency, or targeting — and instantly get predicted ROI. That’s the future of AI-driven marketing.

Friday, August 15, 2025

Customer Segmentation với AI Agent như thế nào ?


Sau đây là 4 trường phái khác nhau, nhưng có thể hoạt động độc lập hoặc kết hợp trong một Segmentation Orchestration Pipeline. Mình sẽ trình bày theo mục tiêu, input, output, core logic, AI/ML model, và workflow cho từng agent.


1) Persona Segmentation theo Ideal Customer Profile (ICP) sử dụng Vector Space

Mục tiêu Xác định khách hàng thuộc nhóm “khách hàng lý tưởng” dựa trên ngữ nghĩavector embeddings từ hồ sơ khách hàng.

Input

  • Thông tin profile: ngành, chức vụ, hành vi, sở thích, sản phẩm đã mua, v.v.
  • Bộ định nghĩa ICP (Ideal Customer Profile) → được encode thành vector.

Output

  • Nhãn persona (ví dụ: “Tech-Savvy Executive”, “Budget-Conscious Traveler”).
  • Điểm tương đồng cosine giữa profile và ICP.

Core Logic

  • Encode hồ sơ khách hàng bằng sentence-transformer hoặc multilingual-e5.
  • Encode ICPs thành vector.
  • Tính cosine similarity → phân nhóm dựa trên ngưỡng hoặc nearest centroid.

Model/Tech

  • Embedding model: intfloat/multilingual-e5-base (pgvector trong PostgreSQL 16).
  • Clustering: KMeans, HDBSCAN hoặc nearest neighbor search.

Workflow

Profile Data → Vector Encoding → Similarity Search → Assign Persona

2) Lead Scoring Segmentation

Mục tiêu Đánh giá mức độ tiềm năng của lead để ưu tiên chăm sóc.

Input

  • Hành vi tương tác: click, download, đăng ký form, mở email.
  • Dữ liệu nhân khẩu học & công ty học (firmographic).

Output

  • Điểm lead (0–100).
  • Nhóm phân loại: Hot, Warm, Cold.

Core Logic

  • Logistic regression hoặc XGBoost để dự đoán xác suất lead trở thành khách hàng.
  • Mapping xác suất → thang điểm 100.
  • Áp dụng business rules (ví dụ: job title + hoạt động gần đây).

Model/Tech

  • Scikit-learn/XGBoost
  • PostgreSQL + ML model deployment (PGML hoặc MLflow).

Workflow

Behavioral Data + Profile Data → Feature Engineering → ML Scoring → Segment

3) CLV (Customer Lifetime Value) Scoring Segmentation

Mục tiêu Phân nhóm khách hàng theo giá trị dự đoán mà họ sẽ mang lại trong suốt vòng đời.

Input

  • Lịch sử mua hàng: tần suất, giá trị đơn hàng, thời gian mua.
  • Thông tin hành vi và profile.

Output

  • CLV dự đoán.
  • Nhóm phân loại: High Value, Medium Value, Low Value.

Core Logic

  • Mô hình dự đoán: Pareto/NBD + Gamma-Gamma hoặc Gradient Boosting Regressor.
  • Tính CLV = (Average Order Value × Purchase Frequency × Predicted Retention Time).

Model/Tech

  • lifetimes Python package
  • XGBoost Regressor.

Workflow

Transaction Data → CLV Model → Predict Value → Segment

4) RFM (Recency, Frequency, Monetary) Segmentation

Mục tiêu Phân nhóm khách hàng dựa trên độ mới mua hàng, tần suất, và giá trị chi tiêu.

Input

  • Dữ liệu giao dịch: ngày mua cuối, số lần mua, tổng chi tiêu.

Output

  • RFM score (ví dụ: 5-3-4).
  • Nhóm: Champions, Loyal, At Risk, Hibernating…

Core Logic

  • Chuẩn hóa Recency, Frequency, Monetary thành thang 1–5.
  • Ghép điểm → phân nhóm theo bảng mapping.

Model/Tech

  • SQL window functions hoặc Python pandas.

Workflow

Transaction Data → Calculate R/F/M Scores → Assign Segment