Using Pandas GroupBy to Compare Tipping Behaviour

When you first encounter GroupBy in Pandas, it can feel abstract. The mental model is simple: split your data into groups, run a calculation on each group, then combine the results.

The question

Using the classic Tips dataset: do smokers tip differently from non-smokers?

The code

import seaborn as sns

tips = sns.load_dataset("tips")
tips.groupby("smoker")["tip"].mean()

This returned:

No     2.99
Yes    3.00
Name: tip, dtype: float64

On average, smokers tipped slightly more — but the difference is tiny (about one cent).

Why GroupBy is useful

Without GroupBy you’d write something like:

tips[tips["smoker"] == "No"]["tip"].mean()
tips[tips["smoker"] == "Yes"]["tip"].mean()

That works for two groups. It falls apart when you have many categories or want multiple aggregations at once.

GroupBy scales:

tips.groupby("smoker").agg(
    avg_tip=("tip", "mean"),
    avg_bill=("total_bill", "mean"),
    count=("tip", "count"),
)

One line, multiple metrics, any number of groups.

Takeaway

GroupBy is the workhorse of exploratory data analysis. Whenever you catch yourself filtering the same column repeatedly to compare groups, reach for groupby instead.