Table of Contents
- Installation
- What is a NumPy Array?
- Creating NumPy Arrays
- Key Array Attributes
- Array Operations
- Indexing and Slicing
- Shape Manipulation
- Statistical Functions
- Conclusion
- References
Installation
Before diving in, you’ll need to install NumPy. The easiest way is using Python’s package manager, pip, or conda (if you use Anaconda).
Using pip:
Open your terminal or command prompt and run:
pip install numpy
Using conda (Anaconda/Miniconda):
conda install numpy
To verify installation, import NumPy in a Python shell and check the version:
import numpy as np
print(np.__version__) # Output: e.g., 1.26.0 (version may vary)
(Note: By convention, NumPy is imported as np to save keystrokes.)
What is a NumPy Array?
A NumPy array (ndarray) is a homogeneous, multidimensional grid of values (all elements must be of the same data type) with a fixed size. This is in contrast to Python lists, which are heterogeneous (can hold mixed data types) and dynamically resizable.
Key Advantages of NumPy Arrays:
- Performance: NumPy arrays are stored in contiguous memory blocks, enabling faster access and vectorized operations (no need for Python loops).
- Multidimensional Support: Easily handle 1D (vectors), 2D (matrices), and higher-dimensional data.
- Rich Functionality: Built-in math, statistical, and linear algebra operations.
Creating NumPy Arrays
NumPy provides several ways to create arrays. Let’s explore the most common methods:
1. From Python Lists/Tuples: np.array()
Convert a Python list or tuple to a NumPy array using np.array().
import numpy as np
# 1D array from a list
list_1d = [1, 2, 3, 4, 5]
arr_1d = np.array(list_1d)
print(arr_1d) # Output: [1 2 3 4 5]
print(type(arr_1d)) # Output: <class 'numpy.ndarray'>
# 2D array from a list of lists
list_2d = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr_2d = np.array(list_2d)
print(arr_2d)
# Output:
# [[1 2 3]
# [4 5 6]
# [7 8 9]]
# Specify data type explicitly
arr_float = np.array([1, 2, 3], dtype=float)
print(arr_float) # Output: [1. 2. 3.]
2. Built-in Functions for Common Arrays
np.zeros(shape): Array of zeros
Creates an array filled with zeros of the specified shape (tuple of dimensions).
zeros_1d = np.zeros(5) # 1D array with 5 zeros
zeros_2d = np.zeros((3, 4)) # 2D array (3 rows, 4 columns)
print(zeros_2d)
# Output:
# [[0. 0. 0. 0.]
# [0. 0. 0. 0.]
# [0. 0. 0. 0.]]
np.ones(shape): Array of ones
Similar to zeros(), but fills with ones.
ones_2d = np.ones((2, 3), dtype=int) # Integer ones
print(ones_2d)
# Output:
# [[1 1 1]
# [1 1 1]]
np.arange(start, stop, step): Sequence of numbers
Similar to Python’s range(), but returns a NumPy array.
arr_range = np.arange(0, 10, 2) # Start=0, stop=10 (exclusive), step=2
print(arr_range) # Output: [0 2 4 6 8]
np.linspace(start, stop, num): Evenly spaced numbers
Generates num evenly spaced values between start and stop (inclusive).
arr_lin = np.linspace(0, 1, 5) # 5 values between 0 and 1
print(arr_lin) # Output: [0. 0.25 0.5 0.75 1. ]
np.random Module: Random Arrays
Create arrays with random values (useful for simulations).
# Random floats between 0 and 1 (3x3 array)
rand_arr = np.random.rand(3, 3)
print(rand_arr)
# Random integers between low and high (exclusive)
rand_int = np.random.randint(low=1, high=10, size=(2, 2))
print(rand_int) # Output: e.g., [[3 7], [5 2]] (values vary)
Key Array Attributes
Every NumPy array has attributes that describe its structure. Let’s use a 2D array to explore them:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
| Attribute | Description | Example Output |
|---|---|---|
arr.shape | Tuple of array dimensions (rows, columns) | (3, 3) |
arr.dtype | Data type of elements | int64 (or int32) |
arr.size | Total number of elements | 9 |
arr.ndim | Number of dimensions (axes) | 2 (2D array) |
arr.itemsize | Size (in bytes) of one element | 8 (for int64) |
Array Operations
NumPy simplifies numerical operations with vectorization—operations on entire arrays without explicit loops.
1. Element-Wise Operations
Arithmetic operations (+, -, *, /, **) are applied element-wise:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # Output: [5 7 9]
print(a * 2) # Output: [2 4 6] (scalar multiplication)
print(a ** 2) # Output: [1 4 9] (element-wise squaring)
print(a / b) # Output: [0.25 0.4 0.5 ]
(Compare with Python lists: [1,2,3] + [4,5,6] returns [1,2,3,4,5,6] (concatenation), not element-wise addition.)
2. Matrix Operations
For linear algebra operations (e.g., dot product, matrix multiplication), use np.dot() or the @ operator:
# Dot product (1D arrays: sum of products)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
dot_product = np.dot(a, b) # or a @ b
print(dot_product) # Output: 1*4 + 2*5 + 3*6 = 32
# Matrix multiplication (2D arrays)
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
mat_product = mat1 @ mat2 # or np.dot(mat1, mat2)
print(mat_product)
# Output:
# [[1*5 + 2*7, 1*6 + 2*8],
# [3*5 + 4*7, 3*6 + 4*8]]
# [[19 22]
# [43 50]]
3. Universal Functions (ufuncs)
NumPy provides optimized math functions (ufuncs) that work element-wise on arrays:
arr = np.array([-2, -1, 0, 1, 2])
print(np.abs(arr)) # Absolute values: [2 1 0 1 2]
print(np.sin(arr)) # Sine of elements (radians)
print(np.log(arr[arr > 0])) # Logarithm (only positive elements)
print(np.exp(arr)) # Exponential (e^x)
Indexing and Slicing
NumPy arrays support indexing (access single elements) and slicing (access subarrays), similar to Python lists but extended for multidimensional data.
1. 1D Array Indexing/Slicing
Works exactly like Python lists:
arr = np.arange(10) # [0, 1, 2, ..., 9]
print(arr[3]) # Index 3: Output: 3
print(arr[2:5]) # Slice 2-4: Output: [2 3 4]
print(arr[:-2]) # All elements except last 2: Output: [0 1 2 3 4 5 6 7]
print(arr[::2]) # Step 2: Output: [0 2 4 6 8]
2. 2D Array Indexing/Slicing
Use arr[row, col] (comma-separated indices for rows and columns):
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access single element (row 1, column 2)
print(arr[1, 2]) # Output: 6
# Access entire row (row 0)
print(arr[0]) # Output: [1 2 3]
# Access entire column (column 1)
print(arr[:, 1]) # Output: [2 5 8]
# Slice subarray (rows 0-1, columns 1-2)
print(arr[0:2, 1:3])
# Output:
# [[2 3]
# [5 6]]
# Fancy indexing (select specific rows/columns)
print(arr[[0, 2], [1, 2]]) # (0,1) and (2,2): Output: [2 9]
Shape Manipulation
NumPy lets you reshape arrays or change their dimensions without modifying data (when possible).
1. reshape(shape)
Returns a new array with the specified shape (must have the same number of elements as the original):
arr = np.arange(12) # [0, 1, ..., 11] (shape: (12,))
reshaped = arr.reshape(3, 4) # 3 rows, 4 columns
print(reshaped.shape) # Output: (3, 4)
2. ravel() vs flatten()
Convert a multidimensional array to 1D:
ravel(): Returns a view of the original array (modifications affect the original).flatten(): Returns a copy of the array (modifications do not affect the original).
arr = np.array([[1, 2], [3, 4]])
raveled = arr.ravel()
flattened = arr.flatten()
raveled[0] = 99
print(arr) # Original array modified: [[99 2], [3 4]]
flattened[0] = 55
print(arr) # Original array unchanged: [[99 2], [3 4]]
3. transpose()
Swaps axes of a multidimensional array (e.g., rows ↔ columns for 2D arrays):
arr = np.array([[1, 2], [3, 4]])
transposed = arr.transpose() # or arr.T
print(transposed)
# Output:
# [[1 3]
# [2 4]]
Statistical Functions
NumPy has built-in functions for common statistical operations, with support for axis-specific computations.
Example Array:
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Key Statistical Functions:
| Function | Description | Example (axis=0: columns, axis=1: rows) |
|---|---|---|
np.sum() | Sum of elements | data.sum(axis=0) → [12 15 18] |
np.mean() | Mean (average) | data.mean(axis=1) → [2. 5. 8.] |
np.median() | Median | np.median(data) → 5.0 |
np.std() | Standard deviation | np.std(data) → 2.581988897471611 |
np.min(), np.max() | Minimum/maximum | data.max(axis=0) → [7 8 9] |
np.corrcoef() | Correlation matrix | np.corrcoef(data) (for 2D data) |
Conclusion
NumPy is the foundation of numerical computing in Python, offering efficient arrays, vectorized operations, and a rich set of tools for data manipulation. By mastering NumPy, you’ll unlock the full potential of libraries like Pandas (data analysis) and Scikit-learn (machine learning).
Start practicing with small datasets, experiment with array operations, and explore the NumPy documentation for advanced features like broadcasting, masking, and linear algebra.
References
- NumPy Official Documentation
- “Python for Data Analysis” by Wes McKinney (creator of Pandas)
- NumPy Tutorial (Real Python)
- Kaggle NumPy Tutorial