Back to blog
3 min read

Using Pandas GroupBy to Compare Tipping Behaviour

A quick walkthrough of groupby aggregations on the Tips dataset — comparing average tips between smokers and non-smokers.

  • pandas
  • python
  • data-analysis

When you first encounter GroupBy in Pandas, it can feel abstract. The mental model is simple: split your data into groups, run a calculation on each group, then combine the results.

The question

Using the classic Tips dataset: do smokers tip differently from non-smokers?

The code

import seaborn as sns

tips = sns.load_dataset("tips")
tips.groupby("smoker")["tip"].mean()

This returned:

No     2.99
Yes    3.00
Name: tip, dtype: float64

On average, smokers tipped slightly more — but the difference is tiny (about one cent).

Why GroupBy is useful

Without GroupBy you’d write something like:

tips[tips["smoker"] == "No"]["tip"].mean()
tips[tips["smoker"] == "Yes"]["tip"].mean()

That works for two groups. It falls apart when you have many categories or want multiple aggregations at once.

GroupBy scales:

tips.groupby("smoker").agg(
    avg_tip=("tip", "mean"),
    avg_bill=("total_bill", "mean"),
    count=("tip", "count"),
)

One line, multiple metrics, any number of groups.

Takeaway

GroupBy is the workhorse of exploratory data analysis. Whenever you catch yourself filtering the same column repeatedly to compare groups, reach for groupby instead.