Table of Contents
- 1. Basic String Manipulation
- 2. Intermediate Techniques
- 3. Advanced Techniques
- 4. Conclusion
- 5. References
1. Basic String Manipulation
1.1 Creating Strings
In Python, strings are sequences of Unicode characters enclosed in quotes. You can define them using:
- Single quotes (
'):name = 'Alice' - Double quotes (
"):message = "Hello, World!" - Triple quotes (
'''or"""): For multi-line strings or strings containing quotes:multi_line = """This is a multi-line string.""" quote = 'He said, "Python is fun!"' # Single quotes with double quotes inside
1.2 Accessing Characters: Indexing
Each character in a string has a position (index), starting from 0 for the first character. Use square brackets [] to access individual characters:
text = "Python"
print(text[0]) # Output: 'P' (1st character)
print(text[3]) # Output: 'h' (4th character)
Negative Indexing: Access characters from the end using negative indices (e.g., -1 for the last character):
print(text[-1]) # Output: 'n' (last character)
print(text[-3]) # Output: 'h' (3rd from the end)
1.3 Slicing Strings
Slicing extracts a substring by specifying a start index, end index, and optional step. The syntax is string[start:end:step], where:
start: Inclusive starting position (default:0).end: Exclusive ending position (default: length of string).step: Interval between characters (default:1).
Examples:
text = "Python Programming"
print(text[0:6]) # Output: 'Python' (0 to 5, since end is exclusive)
print(text[7:]) # Output: 'Programming' (from index 7 to end)
print(text[:6]) # Output: 'Python' (from start to index 5)
print(text[::2]) # Output: 'Pto rgamn' (every 2nd character)
print(text[::-1]) # Output: 'gnimmargorP nohtyP' (reverse the string)
1.4 Concatenation and Repetition
Combine strings with the + operator (concatenation) or repeat them with *:
greeting = "Hello"
name = "Alice"
combined = greeting + ", " + name + "!" # Concatenation
print(combined) # Output: 'Hello, Alice!'
stars = "*" * 10 # Repetition
print(stars) # Output: '**********'
1.5 Membership Testing
Check if a substring exists in a string using in or not in:
text = "Python is powerful"
print("powerful" in text) # Output: True
print("Java" not in text) # Output: True
1.6 Length of a String
Use len() to get the number of characters in a string:
text = "Hello"
print(len(text)) # Output: 5
1.7 Basic String Methods
Python strings have built-in methods for common operations. Here are some essentials:
| Method | Description | Example |
|---|---|---|
upper() | Convert to uppercase | "hello".upper() → 'HELLO' |
lower() | Convert to lowercase | "HELLO".lower() → 'hello' |
strip() | Remove leading/trailing whitespace | " hello ".strip() → 'hello' |
isalpha() | Check if all characters are alphabetic | "Python".isalpha() → True |
isdigit() | Check if all characters are digits | "123".isdigit() → True |
2. Intermediate Techniques
2.1 String Formatting
Formatting strings to insert variables or values is critical for readability. Python offers several approaches:
2.1.1 Old-Style % Operator (Legacy)
Uses % placeholders (e.g., %s for strings, %d for integers):
name = "Alice"
age = 30
print("Name: %s, Age: %d" % (name, age)) # Output: 'Name: Alice, Age: 30'
2.1.2 str.format() Method (Flexible)
Uses curly braces {} with optional indices or names for clarity:
print("Name: {0}, Age: {1}".format(name, age)) # Index-based
print("Name: {n}, Age: {a}".format(n=name, a=age)) # Name-based
2.1.3 F-Strings (Python 3.6+, Preferred)
Prefix strings with f and embed expressions directly in {}:
print(f"Name: {name}, Age: {age}") # Output: 'Name: Alice, Age: 30'
# Supports expressions:
print(f"Next year, {name} will be {age + 1}") # Output: 'Next year, Alice will be 31'
2.1.4 string.Template (Safe for User Input)
Avoids code injection risks with $ placeholders (use for untrusted input):
from string import Template
t = Template("Name: $name, Age: $age")
print(t.substitute(name=name, age=age)) # Output: 'Name: Alice, Age: 30'
2.2 Splitting and Joining Strings
-
split(sep, maxsplit): Split a string into a list usingsepas the delimiter.text = "apple,banana,orange" fruits = text.split(",") # Split on commas print(fruits) # Output: ['apple', 'banana', 'orange'] -
splitlines(): Split multi-line strings into a list of lines:multi_line = "Line 1\nLine 2\nLine 3" lines = multi_line.splitlines() print(lines) # Output: ['Line 1', 'Line 2', 'Line 3'] -
join(iterable): Combine a list of strings into one string using a separator:words = ["Hello", "World"] sentence = " ".join(words) # Join with spaces print(sentence) # Output: 'Hello World'
2.3 Replacing Substrings
Use replace(old, new, count) to replace occurrences of old with new (optional count limits replacements):
text = "Python is great. Python is fun."
new_text = text.replace("Python", "Java", 1) # Replace first occurrence
print(new_text) # Output: 'Java is great. Python is fun.'
2.4 Checking Starts and Ends
startswith(prefix): Check if a string starts withprefix.endswith(suffix): Check if it ends withsuffix.
url = "https://example.com"
print(url.startswith("https")) # Output: True
print(url.endswith(".com")) # Output: True
2.5 Advanced Case Conversion
Beyond upper()/lower(), use:
title(): Capitalize the first letter of each word.capitalize(): Capitalize the first letter of the string (lowercase others).swapcase(): Swap uppercase and lowercase.
text = "hello world"
print(text.title()) # Output: 'Hello World'
print(text.capitalize()) # Output: 'Hello world'
print("PyThOn".swapcase()) # Output: 'pYtHoN'
2.6 Whitespace Handling
lstrip()/rstrip(): Remove leading/trailing whitespace only.expandtabs(tabsize): Replace tabs (\t) with spaces (default: 8).
text = " left right "
print(text.lstrip()) # Output: 'left right ' (remove leading)
print(text.rstrip()) # Output: ' left right' (remove trailing)
print("a\tb\tc".expandtabs(4)) # Output: 'a b c' (tabs → 4 spaces)
3. Advanced Techniques
3.1 Regular Expressions with re
The re module enables pattern-based string matching (e.g., validating emails, extracting data).
Key Functions:
re.match(pattern, string): Match pattern at the start of the string.re.search(pattern, string): Search for pattern anywhere in the string.re.findall(pattern, string): Return all non-overlapping matches as a list.re.sub(pattern, repl, string): Replace matches withrepl.
Example 1: Extract All Numbers
import re
text = "Order 123: 45 items, total $67.89"
numbers = re.findall(r"\d+\.?\d*", text) # Pattern for integers/floats
print(numbers) # Output: ['123', '45', '67.89']
Example 2: Validate Email
email = "[email protected]"
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if re.match(pattern, email):
print("Valid email") # Output: 'Valid email'
3.2 Encoding and Decoding Strings
Python 3 strings are Unicode (str type), but data is often stored/transmitted as bytes (bytes type). Use:
str.encode(encoding): Convertstrtobytes(e.g.,utf-8).bytes.decode(encoding): Convertbytesback tostr.
text = "café"
bytes_data = text.encode("utf-8") # str → bytes: b'caf\xc3\xa9'
decoded_text = bytes_data.decode("utf-8") # bytes → str: 'café'
Common Encodings: utf-8 (universal), latin-1 (ASCII-compatible), utf-16.
3.3 Advanced F-Strings
F-strings support format specifiers, nested expressions, and even lambda functions:
Format Specifiers
Control padding, alignment, and data formatting (e.g., numbers, dates):
pi = 3.14159265
print(f"Pi: {pi:.2f}") # Output: 'Pi: 3.14' (2 decimal places)
print(f"Number: {42:05d}") # Output: 'Number: 00042' (5-digit padding)
Nested F-Strings
Embed f-strings inside other f-strings for dynamic formatting:
user = {"name": "Alice", "age": 30}
print(f"User: {f'Name: {user["name"]}, Age: {user["age"]}'}")
# Output: 'User: Name: Alice, Age: 30'
3.4 Raw Strings and Byte Strings
-
Raw Strings (
r""): Ignore escape characters (e.g.,\nis treated as literal\n). Useful for regex patterns or file paths:path = r"C:\Users\Alice\file.txt" # No need to escape backslashes -
Byte Strings (
b""): Store raw bytes (not Unicode). Used for low-level I/O or network protocols:b_str = b"hello" # Type: bytes print(b_str.decode("utf-8")) # Convert to str: 'hello'
3.5 Performance Optimization
Strings in Python are immutable (they cannot be modified in-place). Repeated concatenation with + creates new strings, which is slow for large data. Instead:
- Use
str.join()for Large Strings: Build a list of substrings first, then join once:# Slow: result = "" for i in range(1000): result += str(i) # Creates 1000 new strings # Fast: parts = [] for i in range(1000): parts.append(str(i)) result = "".join(parts) # Single join operation
4. Conclusion
String manipulation is a cornerstone of Python programming. From basic indexing to advanced regex and Unicode handling, mastering these techniques will elevate your ability to process text, parse data, and build robust applications. Practice with real-world scenarios (e.g., cleaning CSV data, validating user input) to solidify your skills.