Understanding Customer Behavior Beyond Individual Transactions

Why I Chose This Project

This project started with a very personal frustration.

During a family trip to London, I used my credit card to tap in and out of the Tube. The first few rides worked perfectly—fast, cashless, effortless. Then, suddenly, my card started getting declined for very small transit charges. There was no lack of credit and nothing that felt suspicious from my point of view, just normal travel behavior.

From the system’s perspective, though, my behavior had changed: a new country, repeated small transactions, and a shift away from my usual spending pattern. That gap between how people live and how systems interpret behavior became the motivation for this project.

The core question I wanted to explore was simple:

Can customer behavior be understood more meaningfully through patterns, rather than individual transactions or rigid rules?

The Approaches I Chose (and Why)

Before looking at results, it’s important to explain the approaches I intentionally selected, because these choices shaped what insights were possible.

Unsupervised learning
I framed this as an exploratory problem. Instead of predicting a predefined label, I used unsupervised learning to let behavioral patterns emerge naturally from the data.

Dimensionality reduction
Raw transaction data is extremely high-dimensional and sparse. To preserve behavioral structure while keeping the problem tractable, I compressed transaction signals into a lower-dimensional representation that still captured recurring patterns across merchants, channels, geography, and time.

Customer-level clustering
Rather than clustering individual transactions, I clustered customer-level behavioral profiles, which produced segments that were more stable and easier to interpret from a business perspective.

RFM as an interpretability anchor
Classic Recency, Frequency, and Monetary (RFM) features were included to ground the learned patterns in intuitive business terms and avoid purely abstract clusters.

Temporal analysis
Finally, I explicitly examined how customers move between segments over time. This made it possible to study behavior as something that evolves, rather than treating segments as fixed labels.

Code snapshot: feature design choices

    
# Numerical features (scaled later)
num_cols = [
        "amt_signed",
        "hour_sin", "hour_cos",
        "mon_sin", "mon_cos",
        "dow_sin", "dow_cos",
    ]

# Categorical features (encoded later)
cat_cols = [
        "MCC",            # high-cardinality: compressed with SVD
        "channel",        # online vs card_present
        "entry_mode",     # tap / swipe / chip / key-in
        "merchant_state",
        "country",
    ]

    # Note: intentionally excluded very-high-cardinality fields
    # (e.g., merchant_id, city) to avoid a massive sparse matrix
    # and "clustering on store names" instead of behavior.

This snippet illustrates how I separated numerical, categorical, and time-based signals, and why some high-cardinality fields were intentionally excluded. The goal was to reflect behavior, not identifiers.

Turning Transactions into Customer “Fingerprints”

A single transaction says very little about a person. But a long history of transactions can reveal patterns.

To capture those patterns, I summarized transaction histories into customer-level behavioral fingerprints. These fingerprints reflected how customers spend—across categories, channels, locations, and time—rather than focusing only on totals.

One important design choice was to normalize category usage so that the representation captured mix and diversity, not just volume.

Code snapshot: building behavioral profiles

  # Convert MCC counts/spend into per-customer proportions
  # so the representation reflects "mix of categories" rather than volume alone.
	mcc_profile = (
                mcc_piv
                .div(mcc_piv.sum(axis=1).replace(0, 1), axis=0)
                .add_prefix("mcc_")
		)

This short snippet shows how raw category counts were converted into proportional profiles, allowing clustering to focus on behavioral composition rather than total spend volume.

Discovering Distinct Spending Patterns

Once customer fingerprints were built, clustering revealed a small number of distinct and interpretable patterns. While the exact number is less important than the insight, a five-group structure consistently balanced stability and interpretability.

At a high level, the groups reflected differences such as:

intensity of usage
breadth vs concentration of spending
recency of activity
consistency over time

What stood out most was that customers with similar total spend could behave very differently.

To get an initial sense of how these behaviors organize themselves, the figure below shows a two-dimensional PCA projection of customer-level representations, with points colored by their assigned segment. Although this projection compresses much richer high-dimensional structure, it provides an intuitive overview of how distinct behavioral groups begin to emerge from the data.

This figure provides a high-level view of how customer behaviors separate into distinct groups.

Not All High-Value Customers Are the Same

Digging deeper into the segments revealed an important nuance: high value does not imply homogeneous behavior.

Some customers concentrate spending in a narrow set of categories, while others distribute spending broadly across many types of merchants. Two customers may look similar by total amount, yet represent very different lifestyles and risk or engagement profiles.

This comparison highlights how segments differ beyond total spend.

Choosing the Number of Segments Thoughtfully

Rather than fixing the number of clusters arbitrarily, I evaluated several options and looked for a balance between compactness, separation, and interpretability.

The goal wasn’t to find a mathematically “perfect” number, but a structure that was: stable, explainable and useful.

Code snapshot: evaluating clustering choices

range_n_clusters = range(3, 8)
sil_scores, dbi_scores, inertias = [], [], []

for k in range_n_clusters:
       km = KMeans(n_clusters=k, random_state=42, n_init=10)
       km.fit(X)  # X is the reduced feature space (e.g., PCA/SVD components)

       labels = km.predict(X)
       inertias.append(km.inertia_)
       sil_scores.append(silhouette_score(X, labels))
       dbi_scores.append(davies_bouldin_score(X, labels))

This snippet shows how I compared multiple clustering metrics across different values of k, focusing on robustness and interpretability rather than optimizing a single score.

Lifestyles Change Over Time — and That’s the Point

One of the most interesting parts of the project was examining how customers move between segments over time.

Rather than abrupt jumps, transitions were often gradual. Customers tended to pass through intermediate states—declining frequency, shrinking diversity—before becoming inactive.

This temporal view reinforced a key insight:

Customer segments are not static labels; they are evolving states.

This figure shows how customers gradually move between behavioral states over time.

A Note on Modeling Choices

This project was intentionally designed to favor interpretability, stability, and behavioral insight over model complexity. Rather than pursuing more elaborate algorithms, I focused on approaches that made trade-offs explicit and results explainable—especially important in contexts where decisions affect real customers. The goal was not to build a production-ready system, but to explore how far thoughtful representation, careful aggregation, and unsupervised structure can go in revealing meaningful patterns in real-world data.

Working on this project made me realize how much I enjoy behavior-driven modeling, where math, data engineering, and real human patterns intersect. Coming from a background focused on analytics and BI, this was one of the first projects where I felt I was modeling how people live, not just summarizing numbers.

Final Reflection

This project reinforced several lessons for me:

structure matters more than raw volume
interpretability is as important as accuracy
temporal context reveals what static snapshots hide

Most importantly, it showed me the value of letting data describe behavior on its own terms. When we do that, we get insights that are not only more accurate, but also more empathetic.

Search This Blog

Sophie's Data Orchard