【Introduction to Python Standard Library Part 6】Master File Operations! Smart Searching with glob, Easy Copying/Moving/Deleting with shutil #16

Welcome to Part 6 of our "Introduction to Python Standard Library" series! In Part 4, we learned the fundamentals of interacting with the file system using the os module. Today, we're going to level up our file management skills with two more incredibly useful modules: glob and shutil.

If the os module provides the basic building blocks, think of glob and shutil as your specialized power tools:

  • glob: Acts like a detective, allowing you to find files and directories whose names follow a specific pattern (e.g., "find all JPEG images" or "list all text files starting with 'report'").
  • shutil: Short for "shell utilities," this module provides high-level operations for copying, moving, and deleting files and entire directory trees, tasks that are surprisingly complex to do with the os module alone.

By combining these two modules, you can create powerful automation scripts to organize your files, create backups, and manage projects with ease. Let's get started!


Part 1: Finding Files with `glob` (Your File System Search Engine)

The glob module is designed to find all the pathnames matching a specified pattern according to the rules used by the Unix shell. It uses special characters called "wildcards" to create these patterns.

Let's import it: import glob

Key Wildcards in `glob`:

  • * (Asterisk): Matches zero or more characters. For example, *.txt matches all files ending with .txt.
  • ? (Question Mark): Matches exactly one character. For example, photo_?.jpg would match photo_1.jpg and photo_A.jpg, but not photo_10.jpg.
  • [] (Brackets): Matches any single character within the brackets. For example, data_[0-9].csv would match data_1.csv, data_2.csv, etc., up to data_9.csv.

Example Usage:

import glob
import os

# --- Setup for the example ---
os.makedirs("glob_test/sub_dir", exist_ok=True)
for filename in ["report.txt", "summary.txt", "image1.jpg", "image2.png", "data_2024.csv", "data_2025.csv", "glob_test/sub_dir/notes.txt"]:
    with open(filename, "w") as f:
        f.write("dummy")
# --- End Setup ---

# Find all .txt files in the current directory
txt_files = glob.glob("*.txt")
print(f"Text files found: {txt_files}")

# Find all files starting with 'image'
image_files = glob.glob("image*")
print(f"Image files found: {image_files}")

# Find all data files for a specific year
data_2024_files = glob.glob("data_202?.csv")
print(f"202x data files found: {data_2024_files}")

Recursive Search with `**`

This is one of `glob`'s most powerful features. The ** wildcard, when used with the `recursive=True` argument, allows you to search for files in the current directory and all of its subdirectories.

# Find ALL .txt files in 'glob_test' and all its subdirectories
all_txt_files_recursively = glob.glob("glob_test/**/*.txt", recursive=True)
print(f"\nAll .txt files found recursively: {all_txt_files_recursively}")

glob.glob() vs. glob.iglob()

  • glob.glob(): Returns a list of all matching paths. This is easy to work with but can consume a lot of memory if you find millions of files.
  • glob.iglob(): Returns an iterator. An iterator yields results one by one as you loop over it, making it much more memory-efficient for very large numbers of files.
# Using an iterator is more memory-efficient for large numbers of files
image_iterator = glob.iglob("image*")
print("\nIterating through image files with iglob:")
for filepath in image_iterator:
    print(filepath)

Part 2: High-Level File Operations with `shutil` (The Shell Utilities)

The shutil module provides functions for high-level operations on files and collections of files. Think of it as the tool you reach for when `os` module functions are too low-level.

Let's import it: import shutil

1. Copying Files and Directories

  • shutil.copy(src, dst): Copies a single file from the source `src` to the destination `dst`. If `dst` is a directory, the file is copied into it with the same basename.
  • shutil.copy2(src, dst): This is like `copy()`, but it also attempts to preserve the file's metadata (like permissions, last modification time, etc.). This is often the one you want to use.
  • shutil.copytree(src, dst): Recursively copies an entire directory tree from `src` to `dst`. The destination directory `dst` must not already exist (unless you use the `dirs_exist_ok=True` argument, available in Python 3.8+).
import shutil

# Copying a single file
os.makedirs("backup_folder", exist_ok=True)
shutil.copy2("report.txt", "backup_folder/report_backup.txt")
print("\n'report.txt' copied with metadata to 'backup_folder'.")

# Copying an entire directory
shutil.copytree("glob_test", "glob_test_backup")
print("'glob_test' directory has been completely copied to 'glob_test_backup'.")

2. Moving and Renaming Files and Directories

  • shutil.move(src, dst): Recursively moves a file or directory (`src`) to a new location (`dst`). This is more powerful than `os.rename()` as it can move files across different file systems/drives on some platforms.
# Move the summary.txt file into the backup folder
shutil.move("summary.txt", "backup_folder/summary.txt")
print("'summary.txt' has been moved to 'backup_folder'.")

3. Deleting Directory Trees

  • shutil.rmtree(path): Recursively deletes an entire directory tree. This is the tool you use when `os.rmdir()` fails because the directory isn't empty.

⚠️ WARNING: Use `shutil.rmtree()` with extreme caution!

This function is powerful and irreversible. It's like running `rm -rf` on the command line. It will delete the specified directory and everything inside it without asking for confirmation. Always double-check your path before using it!

# Let's clean up the backup directory we just made
shutil.rmtree("glob_test_backup")
print("\n'glob_test_backup' and all its contents have been deleted.")

Bonus: Archiving Files (`make_archive`)

As a taste of its high-level power, `shutil` can even create archive files (like .zip or .tar).

# Create a zip archive of the 'backup_folder' directory
shutil.make_archive("backup_archive", 'zip', "backup_folder")
print("\n'backup_folder' has been archived into 'backup_archive.zip'.")

The Power Couple: Combining `glob` and `shutil`

The real magic happens when you combine these two modules. You can use glob to find a specific set of files and then use `shutil` to perform an operation on all of them. This is the core of many automation scripts.

Practical Example: Creating a Backup Script for Specific File Types

Let's write a script that finds all .txt and .csv files in the current directory and its subdirectories and copies them to a central backup folder.

import glob
import shutil
import os
from datetime import datetime

# --- The Script ---

# 1. Define source and backup directories
source_directory = "." # Current directory
backup_directory_name = f"backup_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"
backup_path = os.path.join(source_directory, backup_directory_name)

# 2. Create the backup directory
os.makedirs(backup_path, exist_ok=True)
print(f"\n--- Starting Backup Script ---")
print(f"Created backup directory: {backup_path}")

# 3. Define file types to back up
file_types_to_backup = ["*.txt", "*.csv"]
files_backed_up_count = 0

# 4. Find and copy files
for file_type in file_types_to_backup:
    # Use glob to find all files of the current type, recursively
    search_pattern = os.path.join(source_directory, "**", file_type)
    files_to_copy = glob.glob(search_pattern, recursive=True)
    
    print(f"\nFound {len(files_to_copy)} files matching '{file_type}':")
    
    for src_path in files_to_copy:
        # To avoid copying files from previous backup folders
        if backup_directory_name in src_path:
            continue
            
        print(f"  - Copying '{src_path}'")
        shutil.copy2(src_path, backup_path) # copy2 preserves metadata
        files_backed_up_count += 1

# 5. Print summary
print(f"\n--- Backup Complete ---")
print(f"Total files backed up: {files_backed_up_count} to '{backup_path}'")

This script is a practical and powerful tool. You can easily adapt it to organize photos, documents, or project files based on your own needs!


Conclusion: Your Automation Power Tools

By learning to use glob and shutil, you've moved beyond basic file system interaction and into the realm of powerful automation. You now have the tools to script complex file management tasks that would be tedious and time-consuming to do manually.

Key Takeaways:

  • glob is your go-to for finding files and directories using wildcard patterns (*, ?, **).
  • shutil provides high-level functions for copying (copy2, copytree), moving (move), and deleting (rmtree) files and directories.
  • Combining glob to find files and shutil to act on them is a common and powerful pattern for automation.
  • Always be extra careful with destructive operations like shutil.rmtree()!

Think about the repetitive file tasks you do every day. Could you write a Python script to automate them? The answer is now, most likely, yes!

In our next installment, we might dive into the world of regular expressions with re or learn how to interact more deeply with the Python interpreter using the sys module. See you then!

Post Index

コメント

このブログの人気の投稿

Post Index

【Introduction to Python Standard Library Part 3】The Standard for Data Exchange! Handle JSON Freely with the json Module #13

Your First Step into Python: A Beginner-Friendly Installation Guide for Windows #0