Table of Contents
- Why Use Python’s Standard Library for Automation?
- File & Directory Management
- Data Processing
- Scheduling Tasks
- Email Automation
- Web Scraping
- System Tasks
- Best Practices
- Conclusion
- References
Why Use Python’s Standard Library for Automation?
Python’s standard library is a “batteries-included” set of modules maintained by the Python core team. For automation, it offers key advantages:
- No extra dependencies: Scripts work out-of-the-box on any Python installation (no
pip installneeded). - Portability: Run scripts on Windows, macOS, or Linux without environment issues.
- Reliability: Modules are tested rigorously and updated with Python versions.
Let’s dive into specific modules and use cases!
File & Directory Management
Organizing files, renaming batches, or moving data between folders are common automation tasks. The standard library has robust tools for this.
os Module: Basic File Operations
The os module interacts with the operating system, handling paths, directories, and file metadata.
Example: Rename All .txt Files in a Folder
Suppose you want to add a prefix (e.g., backup_) to all text files in your “Documents” folder:
import os
# Define the target directory
target_dir = os.path.expanduser("~/Documents") # Works on Windows/macOS/Linux
# List all files in the directory
for filename in os.listdir(target_dir):
# Check if the file is a .txt file
if filename.endswith(".txt"):
# Create old and new paths
old_path = os.path.join(target_dir, filename)
new_filename = f"backup_{filename}"
new_path = os.path.join(target_dir, new_filename)
# Rename the file
os.rename(old_path, new_path)
print(f"Renamed: {filename} → {new_filename}")
Key Functions:
os.listdir(path): List files in a directory.os.path.join(path, *paths): Safely combine paths (avoids issues with/vs\).os.rename(old, new): Rename/move files.
pathlib: Modern Path Handling
Python 3.4+ introduced pathlib, an object-oriented alternative to os.path for cleaner path manipulation.
Example: Organize Downloads by File Type
Use pathlib to sort files in your “Downloads” folder into subfolders (e.g., Images/, PDFs/):
from pathlib import Path
# Define the downloads directory
downloads_dir = Path.home() / "Downloads" # Equivalent to ~/Downloads
# Create subfolders if they don't exist
file_types = {
".jpg": "Images",
".png": "Images",
".pdf": "PDFs",
".docx": "Documents",
".txt": "Documents"
}
for ext, folder in file_types.items():
(downloads_dir / folder).mkdir(exist_ok=True) # Create folder if missing
# Move files to their respective folders
for file in downloads_dir.iterdir():
if file.is_file(): # Skip directories
ext = file.suffix.lower() # Get file extension (e.g., .txt)
if ext in file_types:
target_folder = downloads_dir / file_types[ext]
file.rename(target_folder / file.name) # Move the file
print(f"Moved: {file.name} → {target_folder.name}")
Why pathlib? It uses intuitive syntax (e.g., / for path joining) and avoids string manipulation errors.
shutil: Advanced File Operations
For copying, archiving, or deleting directories, use shutil (short for “shell utilities”).
Example: Backup a Directory to a ZIP File
import shutil
from pathlib import Path
import datetime
# Define source and backup paths
source_dir = Path.home() / "Projects"
backup_dir = Path.home() / "Backups"
backup_dir.mkdir(exist_ok=True)
# Create a timestamped ZIP filename (e.g., backup_20240520.zip)
timestamp = datetime.datetime.now().strftime("%Y%m%d")
zip_filename = f"backup_{timestamp}.zip"
zip_path = backup_dir / zip_filename
# Create the ZIP archive
shutil.make_archive(
base_name=str(zip_path).replace(".zip", ""), # Exclude .zip extension
format="zip",
root_dir=source_dir
)
print(f"Backup created: {zip_path}")
Key Functions:
shutil.copy(src, dst): Copy files.shutil.rmtree(path): Delete a directory (use with caution!).shutil.make_archive(): Create ZIP/TAR archives.
Data Processing
Automating data tasks (e.g., parsing CSVs, generating JSON) is a breeze with the standard library.
csv: Handling Tabular Data
The csv module simplifies reading/writing CSV files (common for spreadsheets).
Example: Convert CSV to JSON
Suppose you have a sales.csv file and want to convert it to JSON for an API:
import csv
import json
# Read CSV and convert to list of dictionaries
with open("sales.csv", "r") as csv_file:
csv_reader = csv.DictReader(csv_file) # Uses first row as keys
data = list(csv_reader) # Convert to list of dicts
# Write data to JSON file
with open("sales.json", "w") as json_file:
json.dump(data, json_file, indent=4) # indent=4 for readability
print("Converted sales.csv to sales.json")
Input (sales.csv):
date,product,revenue
2024-05-01,Laptop,999.99
2024-05-02,Phone,699.99
Output (sales.json):
[
{
"date": "2024-05-01",
"product": "Laptop",
"revenue": "999.99"
},
{
"date": "2024-05-02",
"product": "Phone",
"revenue": "699.99"
}
]
json: Working with JSON Data
The json module parses JSON strings/files and serializes Python objects to JSON.
Example: Filter JSON Data
Suppose you have a users.json file and want to extract users from “New York”:
import json
with open("users.json", "r") as json_file:
users = json.load(json_file) # Load JSON into a Python list
# Filter users from New York
ny_users = [user for user in users if user.get("city") == "New York"]
# Save filtered data to a new file
with open("ny_users.json", "w") as outfile:
json.dump(ny_users, outfile, indent=4)
print(f"Saved {len(ny_users)} New York users to ny_users.json")
Scheduling Tasks
Automate tasks to run at specific times (e.g., daily backups, hourly reports) with time-based modules.
time: Delays and Timers
The time module provides basic timing functions, like time.sleep() for delays.
Example: Run a Task Every 5 Minutes
import time
def daily_report():
print("Generating daily report...")
# Add report logic here (e.g., data processing, email)
# Run the task every 5 minutes (300 seconds)
while True:
daily_report()
print("Waiting 5 minutes...\n")
time.sleep(300) # Pause for 300 seconds
Note: For long-running scripts, use a process manager like systemd (Linux) or Task Scheduler (Windows) to keep it running.
sched: Event Scheduling
For more precise event scheduling (e.g., run a task at 3 PM daily), use the sched module (a general-purpose event scheduler).
Example: Schedule a Task for a Specific Time
import sched
import time
from datetime import datetime
def run_at_3pm(sc):
print("Task executed at 3 PM!")
# Reschedule for tomorrow
sc.enterabs(
time=mktime_tomorrow_3pm(), # Timestamp for tomorrow 3 PM
priority=1,
action=run_at_3pm,
argument=(sc,)
)
def mktime_tomorrow_3pm():
# Get tomorrow's date at 3:00 PM
tomorrow = datetime.now().replace(hour=15, minute=0, second=0, microsecond=0) + timedelta(days=1)
return time.mktime(tomorrow.timetuple())
# Initialize the scheduler
scheduler = sched.scheduler(time.time, time.sleep)
# Schedule the first run
scheduler.enterabs(mktime_tomorrow_3pm(), 1, run_at_3pm, (scheduler,))
# Start the scheduler
scheduler.run()
Tip: For complex schedules (e.g., cron-like), use schedule (third-party) or system tools like cron (Linux) instead.
Email Automation
Send alerts, reports, or reminders via email using smtplib and email modules.
smtplib: Sending Emails
smtplib connects to an SMTP server (e.g., Gmail, Outlook) to send emails.
Example: Send a Simple Email
import smtplib
from email.mime.text import MIMEText
# Email configuration
sender_email = "[email protected]"
sender_password = "your_app_password" # Use app-specific password for Gmail
receiver_email = "[email protected]"
subject = "Automated Email from Python"
body = "Hello,\n\nThis is an automated email sent via Python's smtplib!"
# Create the email message
msg = MIMEText(body)
msg["Subject"] = subject
msg["From"] = sender_email
msg["To"] = receiver_email
# Send the email via Gmail's SMTP server
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login(sender_email, sender_password)
server.send_message(msg)
print("Email sent successfully!")
Security Note: For Gmail, enable “Less secure apps” (not recommended) or use an App Password (2-Step Verification required).
email.mime: Creating Rich Emails
Add attachments, HTML content, or images using email.mime submodules.
Example: Send an Email with a CSV Attachment
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders
# Create a multipart message (text + attachment)
msg = MIMEMultipart()
msg["From"] = "[email protected]"
msg["To"] = "[email protected]"
msg["Subject"] = "Sales Report with Attachment"
# Add body text
body = "Please find the sales report attached."
msg.attach(MIMEText(body, "plain"))
# Attach the CSV file
filename = "sales.csv"
with open(filename, "rb") as attachment:
part = MIMEBase("application", "octet-stream")
part.set_payload(attachment.read())
encoders.encode_base64(part) # Encode attachment in Base64
part.add_header(
"Content-Disposition",
f"attachment; filename= {filename}",
)
msg.attach(part)
# Send the email
with smtplib.SMTP_SSL("smtp.gmail.com", 465) as server:
server.login("[email protected]", "your_app_password")
server.send_message(msg)
print("Email with attachment sent!")
Web Scraping
Extract data from websites (e.g., stock prices, weather) using urllib and html.parser.
urllib: Fetching Web Pages
urllib.request downloads web pages (no need for requests!).
Example: Fetch a Web Page
from urllib.request import urlopen
url = "https://example.com"
with urlopen(url) as response:
html = response.read().decode("utf-8") # Read and decode HTML
print(f"Fetched {len(html)} characters from {url}")
html.parser: Parsing HTML
html.parser parses HTML to extract specific data (e.g., links, text).
Example: Scrape Headlines from a News Site
from urllib.request import urlopen
from html.parser import HTMLParser
class HeadlineParser(HTMLParser):
def __init__(self):
super().__init__()
self.in_headline = False
self.headlines = []
def handle_starttag(self, tag, attrs):
# Look for <h2> tags with class "headline" (adjust for target site)
if tag == "h2":
for attr in attrs:
if attr == ("class", "headline"):
self.in_headline = True
def handle_endtag(self, tag):
if tag == "h2" and self.in_headline:
self.in_headline = False
def handle_data(self, data):
if self.in_headline:
self.headlines.append(data.strip())
# Fetch and parse the page
url = "https://example-news-site.com" # Replace with target site
with urlopen(url) as response:
html = response.read().decode("utf-8")
parser = HeadlineParser()
parser.feed(html)
# Print scraped headlines
print("Scraped Headlines:")
for idx, headline in enumerate(parser.headlines, 1):
print(f"{idx}. {headline}")
Ethics Note: Always check a website’s robots.txt (e.g., https://example.com/robots.txt) and respect crawl limits.
System Tasks
Automate system-level tasks (e.g., run shell commands, monitor processes) with subprocess.
subprocess: Running Shell Commands
The subprocess module lets you spawn new processes, connect to their input/output, and obtain return codes.
Example: Backup a Directory with rsync
Use subprocess to run the rsync command (a Linux/macOS tool for file synchronization):
import subprocess
def backup_with_rsync():
source = "/home/user/Documents"
dest = "/media/backup_drive/Documents"
# Run rsync command: -avz = archive, verbose, compress
result = subprocess.run(
["rsync", "-avz", source, dest],
capture_output=True,
text=True
)
# Check if the command succeeded
if result.returncode == 0:
print("Backup successful!")
print("Output:", result.stdout)
else:
print("Backup failed!")
print("Error:", result.stderr)
backup_with_rsync()
Example: Check Disk Usage
import subprocess
# Run `df -h` to check disk usage (Linux/macOS)
result = subprocess.run(["df", "-h"], capture_output=True, text=True)
print("Disk Usage:")
print(result.stdout)
Best Practices
To write robust automation scripts:
-
Error Handling: Use
try-exceptblocks to catch issues (e.g., missing files, network errors).try: # Risky operation (e.g., file access) with open("data.txt", "r") as f: data = f.read() except FileNotFoundError: print("Error: data.txt not found!") -
Logging: Use the
loggingmodule to track script activity (better thanprint).import logging logging.basicConfig(filename="automation.log", level=logging.INFO) logging.info("Script started at: " + str(datetime.now())) -
Testing: Test scripts with sample data before deploying to production.
-
Security: Avoid hardcoding passwords (use environment variables or
keyring).
Conclusion
Python’s standard library is a treasure trove for automation. From file management (os, pathlib) to email (smtplib) and web scraping (urllib), you can automate almost any task without installing extra packages.
Start small: automate one repetitive task (e.g., organizing downloads), then expand. The more you explore modules like csv, subprocess, and sched, the more time you’ll save!