Posts

Understanding Customer Behavior Beyond Individual Transactions

Image
Why I Chose This Project This project started with a very personal frustration. During a family trip to London, I used my credit card to tap in and out of the Tube. The first few rides worked perfectly—fast, cashless, effortless. Then, suddenly, my card started getting declined for very small transit charges. There was no lack of credit and nothing that felt suspicious from my point of view, just normal travel behavior. From the system’s perspective, though, my behavior had changed: a new country, repeated small transactions, and a shift away from my usual spending pattern. That gap between how people live and how systems interpret behavior became the motivation for this project. The core question I wanted to explore was simple: Can customer behavior be understood more meaningfully through patterns, rather than individual transactions or rigid rules? The Approaches I Chose (and Why) Before looking at results, it’s important to explain the approaches I intentionally selected , becau...

Recreating a Classic Result: ISOMAP, Eigenfaces, and a Simple Face Recognition Experiment

Image
An attempt to reproduce, explore, and truly understand a few classic ideas in dimensionality reduction and face analysis—by doing them myself. Some ideas in data science are so influential that you keep encountering them—in papers, lectures, and references—but they can still feel abstract until you actually work through them yourself. This project was originally presented in a high-impact Science paper, and then extend that exploration to related ideas like eigenfaces and a very simple form of face recognition . Rather than focusing on implementation details, this post is about what I observed , what started to make sense visually , and why these ideas suddenly felt intuitive once I saw the results . Seeing High-Dimensional Data Differently Face images are deceptively complex. A single grayscale image can be represented as thousands of numbers—one per pixel. When you stack hundreds of such images together, the resulting space is extremely high-dimensional, and human intuition stru...

Image Compression with Clustering: What I learned

Image
  TL;DR: I used K-means to shrink images by grouping similar colors, and it turns out you don’t need many colors to keep an image looking good (and some choices run a lot faster than others). This post focuses on intuition and observations rather than implementation details, out of respect for academic integrity and future students. Why This Was Interesting In one of my Computational Data Analysis (CDA) homework assignments, I explored how a classic idea from data science:  clustering  can be used for image compression . A digital color image is made up of millions of pixels , and each pixel has three numbers describing its Red, Green, and Blue ( RGB ) values. The key idea behind this homework is simple: Instead of storing every exact color , we can group similar colors together and represent them with a smaller set of “representative” colors. This is exactly what K-means clustering does. Each pixel is treated as a point in color space Similar colors are group...

My Georgia Tech OMSA Year 2 Fall term Takeaway (Fall25)

By Fall 2025, I felt more grounded in OMSA . The self-doubt from my first year hadn’t completely disappeared, but I was more comfortable with the program’s expectations and pace. This term focused heavily on theoretical depth and statistical rigor , and it pushed me to think more carefully about assumptions, modeling choices, and interpretation. ISYE 6740 – Computational Data Analysis ISYE 6740 is a mathematically intensive course that dives deep into the foundations of machine learning and data analysis. It draws heavily on calculus, linear algebra, probability, statistics , and optimization , and emphasizes understanding why algorithms work, not just how to apply them. Topics such as dimensionality reduction , clustering, classification, and optimization are treated from a theoretical perspective, often involving careful derivations and proofs. Prof. Xie is an exceptional instructor—wise, incredibly sharp, and remarkably clear. She has a rare ability to explain mathematically in...

My Georgia Tech OMSA Year 1 Takeaway (Fall24 - Summer25)

  My first year in Georgia Tech’s OMSA program spanned three very different terms. As a full-time student , each term served a different purpose: survival, recovery, and rebuilding momentum. Looking back, Year 1 wasn’t just about learning analytics, it was about learning how to pace myself in a demanding program, manage intensity, self-doubt, while still moving forward. Also, if you’re wondering why this blog hasn’t been updated in a while…  yes, you can absolutely blame OMSA 😉 Fall 2024 — Learning to Survive the First Term ISYE 6501 – Introduction to Analytics Modeling One of my favorite courses in OMSA so far. Prof. Sokol is incredibly dedicated, and the TAs are consistently responsive and helpful with both homework and conceptual questions. The course provides a strong, well-structured foundation in analytics modeling and introduces multiple ways of thinking about data-driven decision making . I learned a tremendous amount, and it significantly shaped how I app...

The statistics of competition results in Excel

Image
Recently, my son participated in a local academic competition alongside other kids of same grade from various schools. During the group contest, held in the same classroom, his team which has 4 kids from same class felt confident as they competed against other school teams. However, to their surprise, their school didn’t secure a spot in the top eight. The team they considered less capable outperformed them. As a data enthusiast mom, I’m eager to delve into the reasons behind this unexpected outcome. I'm going to divide the observants into 4 groups:  Group1: participants from school A, Group2: participants from school B,  Group3: participants from school C, Group4: All the participants. First, I'm going to study the Descriptive Statistics of each group's scores.  Descriptive statistics refer to the analysis, summary, and presentation of findings related to a data set derived from a sample or the entire population. These statistics help us understand and organize char...