【Introduction to Python Standard Library Part 6】Master File Operations! Smart Searching with glob, Easy Copying/Moving/Deleting with shutil #16
Welcome to Part 6 of our "Introduction to Python Standard Library" series! In Part 4, we learned the fundamentals of interacting with the file system using the os module. Today, we're going to level up our file management skills with two more incredibly useful modules: glob and shutil.
If the os module provides the basic building blocks, think of glob and shutil as your specialized power tools:
glob: Acts like a detective, allowing you to find files and directories whose names follow a specific pattern (e.g., "find all JPEG images" or "list all text files starting with 'report'").shutil: Short for "shell utilities," this module provides high-level operations for copying, moving, and deleting files and entire directory trees, tasks that are surprisingly complex to do with theosmodule alone.
By combining these two modules, you can create powerful automation scripts to organize your files, create backups, and manage projects with ease. Let's get started!
Part 1: Finding Files with `glob` (Your File System Search Engine)
The glob module is designed to find all the pathnames matching a specified pattern according to the rules used by the Unix shell. It uses special characters called "wildcards" to create these patterns.
Let's import it: import glob
Key Wildcards in `glob`:
*(Asterisk): Matches zero or more characters. For example,*.txtmatches all files ending with.txt.?(Question Mark): Matches exactly one character. For example,photo_?.jpgwould matchphoto_1.jpgandphoto_A.jpg, but notphoto_10.jpg.[](Brackets): Matches any single character within the brackets. For example,data_[0-9].csvwould matchdata_1.csv,data_2.csv, etc., up todata_9.csv.
Example Usage:
import glob
import os
# --- Setup for the example ---
os.makedirs("glob_test/sub_dir", exist_ok=True)
for filename in ["report.txt", "summary.txt", "image1.jpg", "image2.png", "data_2024.csv", "data_2025.csv", "glob_test/sub_dir/notes.txt"]:
with open(filename, "w") as f:
f.write("dummy")
# --- End Setup ---
# Find all .txt files in the current directory
txt_files = glob.glob("*.txt")
print(f"Text files found: {txt_files}")
# Find all files starting with 'image'
image_files = glob.glob("image*")
print(f"Image files found: {image_files}")
# Find all data files for a specific year
data_2024_files = glob.glob("data_202?.csv")
print(f"202x data files found: {data_2024_files}")
Recursive Search with `**`
This is one of `glob`'s most powerful features. The ** wildcard, when used with the `recursive=True` argument, allows you to search for files in the current directory and all of its subdirectories.
# Find ALL .txt files in 'glob_test' and all its subdirectories
all_txt_files_recursively = glob.glob("glob_test/**/*.txt", recursive=True)
print(f"\nAll .txt files found recursively: {all_txt_files_recursively}")
glob.glob() vs. glob.iglob()
glob.glob(): Returns a list of all matching paths. This is easy to work with but can consume a lot of memory if you find millions of files.glob.iglob(): Returns an iterator. An iterator yields results one by one as you loop over it, making it much more memory-efficient for very large numbers of files.
# Using an iterator is more memory-efficient for large numbers of files
image_iterator = glob.iglob("image*")
print("\nIterating through image files with iglob:")
for filepath in image_iterator:
print(filepath)
Part 2: High-Level File Operations with `shutil` (The Shell Utilities)
The shutil module provides functions for high-level operations on files and collections of files. Think of it as the tool you reach for when `os` module functions are too low-level.
Let's import it: import shutil
1. Copying Files and Directories
shutil.copy(src, dst): Copies a single file from the source `src` to the destination `dst`. If `dst` is a directory, the file is copied into it with the same basename.shutil.copy2(src, dst): This is like `copy()`, but it also attempts to preserve the file's metadata (like permissions, last modification time, etc.). This is often the one you want to use.shutil.copytree(src, dst): Recursively copies an entire directory tree from `src` to `dst`. The destination directory `dst` must not already exist (unless you use the `dirs_exist_ok=True` argument, available in Python 3.8+).
import shutil
# Copying a single file
os.makedirs("backup_folder", exist_ok=True)
shutil.copy2("report.txt", "backup_folder/report_backup.txt")
print("\n'report.txt' copied with metadata to 'backup_folder'.")
# Copying an entire directory
shutil.copytree("glob_test", "glob_test_backup")
print("'glob_test' directory has been completely copied to 'glob_test_backup'.")
2. Moving and Renaming Files and Directories
shutil.move(src, dst): Recursively moves a file or directory (`src`) to a new location (`dst`). This is more powerful than `os.rename()` as it can move files across different file systems/drives on some platforms.
# Move the summary.txt file into the backup folder
shutil.move("summary.txt", "backup_folder/summary.txt")
print("'summary.txt' has been moved to 'backup_folder'.")
3. Deleting Directory Trees
shutil.rmtree(path): Recursively deletes an entire directory tree. This is the tool you use when `os.rmdir()` fails because the directory isn't empty.
⚠️ WARNING: Use `shutil.rmtree()` with extreme caution!
This function is powerful and irreversible. It's like running `rm -rf` on the command line. It will delete the specified directory and everything inside it without asking for confirmation. Always double-check your path before using it!
# Let's clean up the backup directory we just made
shutil.rmtree("glob_test_backup")
print("\n'glob_test_backup' and all its contents have been deleted.")
Bonus: Archiving Files (`make_archive`)
As a taste of its high-level power, `shutil` can even create archive files (like .zip or .tar).
# Create a zip archive of the 'backup_folder' directory
shutil.make_archive("backup_archive", 'zip', "backup_folder")
print("\n'backup_folder' has been archived into 'backup_archive.zip'.")
The Power Couple: Combining `glob` and `shutil`
The real magic happens when you combine these two modules. You can use glob to find a specific set of files and then use `shutil` to perform an operation on all of them. This is the core of many automation scripts.
Practical Example: Creating a Backup Script for Specific File Types
Let's write a script that finds all .txt and .csv files in the current directory and its subdirectories and copies them to a central backup folder.
import glob
import shutil
import os
from datetime import datetime
# --- The Script ---
# 1. Define source and backup directories
source_directory = "." # Current directory
backup_directory_name = f"backup_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"
backup_path = os.path.join(source_directory, backup_directory_name)
# 2. Create the backup directory
os.makedirs(backup_path, exist_ok=True)
print(f"\n--- Starting Backup Script ---")
print(f"Created backup directory: {backup_path}")
# 3. Define file types to back up
file_types_to_backup = ["*.txt", "*.csv"]
files_backed_up_count = 0
# 4. Find and copy files
for file_type in file_types_to_backup:
# Use glob to find all files of the current type, recursively
search_pattern = os.path.join(source_directory, "**", file_type)
files_to_copy = glob.glob(search_pattern, recursive=True)
print(f"\nFound {len(files_to_copy)} files matching '{file_type}':")
for src_path in files_to_copy:
# To avoid copying files from previous backup folders
if backup_directory_name in src_path:
continue
print(f" - Copying '{src_path}'")
shutil.copy2(src_path, backup_path) # copy2 preserves metadata
files_backed_up_count += 1
# 5. Print summary
print(f"\n--- Backup Complete ---")
print(f"Total files backed up: {files_backed_up_count} to '{backup_path}'")
This script is a practical and powerful tool. You can easily adapt it to organize photos, documents, or project files based on your own needs!
Conclusion: Your Automation Power Tools
By learning to use glob and shutil, you've moved beyond basic file system interaction and into the realm of powerful automation. You now have the tools to script complex file management tasks that would be tedious and time-consuming to do manually.
Key Takeaways:
globis your go-to for finding files and directories using wildcard patterns (*,?,**).shutilprovides high-level functions for copying (copy2,copytree), moving (move), and deleting (rmtree) files and directories.- Combining
globto find files andshutilto act on them is a common and powerful pattern for automation. - Always be extra careful with destructive operations like
shutil.rmtree()!
Think about the repetitive file tasks you do every day. Could you write a Python script to automate them? The answer is now, most likely, yes!
In our next installment, we might dive into the world of regular expressions with re or learn how to interact more deeply with the Python interpreter using the sys module. See you then!
コメント
コメントを投稿