Setting Up a Python Environment for Data Science

Before analysing data, you need a working environment. This is the setup I followed at the start — install the tools, verify they run, then move on to learning. Steps below cover Windows, macOS, and Linux.

What you are installing

Tool	Purpose
Python	The language you write code in
pip	Installs third-party packages
venv	Keeps project packages isolated
Jupyter	Notebook interface — run code cell by cell
pandas	Loads and manipulates tables of data
matplotlib / seaborn	Creates charts and plots

Step 1 — Install Python

Windows

Download the installer from python.org/downloads
Run it and check “Add python.exe to PATH” at the bottom of the first screen — easy to miss, but important
Click Install Now

Open Command Prompt or PowerShell and check:

python --version

You should see something like Python 3.12.x.

On Windows the command is usually python, not python3. If python does not work, try py --version instead.

macOS

Download from python.org or use Homebrew:

brew install python
python3 --version

Linux

Python is often pre-installed. If not:

sudo apt update && sudo apt install python3 python3-venv python3-pip   # Ubuntu / Debian
python3 --version

Step 2 — Create a virtual environment

A virtual environment is a separate folder for packages so they do not clash with other projects.

Windows (Command Prompt or PowerShell):

mkdir my-data-science
cd my-data-science
python -m venv venv
venv\Scripts\activate

macOS / Linux:

mkdir my-data-science
cd my-data-science
python3 -m venv venv
source venv/bin/activate

When active, your terminal shows (venv) at the start.

To leave the environment later, run deactivate.

Step 3 — Upgrade pip and install packages

pip is Python’s package manager. Upgrade it first to avoid common warnings:

python -m pip install --upgrade pip

Use python -m pip instead of typing pip alone — works reliably on every OS.

Then install the core data science stack:

python -m pip install pandas numpy matplotlib seaborn jupyter

Package	What it does
`pandas`	Reads CSV files into tables (DataFrames)
`numpy`	Fast maths on numbers and arrays
`matplotlib`	Base plotting library
`seaborn`	Easier, nicer statistical charts built on matplotlib
`jupyter`	Runs `.ipynb` notebook files in the browser

Step 4 — Launch Jupyter

jupyter notebook

This opens a browser tab. Click New → Python 3 to create a notebook. Each grey box is a cell — type code, press Shift + Enter to run it.

If jupyter is not recognised, try:

python -m jupyter notebook

Step 5 — Quick verification

Run this in your first cell:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

print("Setup complete")
print("Pandas version:", pd.__version__)

If no errors appear, your environment is ready.

Common issues

python or pip not found (Windows) — reinstall Python and make sure “Add python.exe to PATH” was checked. Close and reopen your terminal after installing.

PowerShell blocks activation (Windows) — if venv\Scripts\activate fails, run this once in PowerShell as admin:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Or use Command Prompt instead of PowerShell.

pip not found (macOS / Linux) — use python3 -m pip instead of pip.

Permission errors — make sure your virtual environment is activated before installing. You should see (venv) in your prompt.

Yellow pip warnings — usually safe to ignore after upgrading pip. If a package fails, read the last few lines of the error; it often names the missing dependency.

Jupyter won’t open — try python -m jupyter lab as an alternative, or reinstall with python -m pip install --upgrade jupyter.

What comes next

With Python running and libraries installed, you are ready to load a real dataset and explore it. That is where pandas, charts, and groupby come in — the fun part.