facebook-pixel
  • Data Science
  • Jeff Hale
  • FEB 27, 2019

Ten Python File System Methods You Should Know

The file system is a bit like a house. Say you’re spring cleaning and you need to move boxes of notebooks from one room to another.

Directories are like boxes. They hold things.

The boxes are like directories. They hold things. In this case, notebooks.

The notebooks are like files. You can read and write to them. You can put them in your directory boxes.

Capiche?


In this guide we’ll look at methods from the os and shutil modules. The osmodule is the primary Python module for interacting with the operating system. The shutil module also contains high-level file operations. For some reason you make directories with os but move and copy them with shutil. Go figure. 

Update: pathlib discussion added Feb. 16, 2019

In Python 3.4 the pathlib module was added to the standard library to improve working with file paths, and as of 3.6 is plays nicely with the rest of the standard library. The pathlib methods provide some benefits for parsing file paths over the methods we’ll discuss below — namely pathlib treats paths as objects rather than strings. Although pathlib is handy, it doesn’t have all the lower level functionality we’ll be exploring. Also, you’ll undoubtedly see the osand shutil methods below in code for years to come. So it’s definitely a good idea to be familiar with them.

I plan to discuss pathlib in a future article, so follow me to make sure you don’t miss it. To learn more about the pathlib module now, see this article and this article.

A few other things to know before we dig in:

  • This guide is designed for Python 3. Python 2 won’t be supported beyond Jan. 1, 2020.
  • You need to import os and shutil into your file to use these commands.
  • My example code is available on GitHub.
  • Substitute your own arguments for the arguments in quotes below.

Now that we’ve got the context out of the way, let’s get to it! Here’s the list of 10 commands you should know.

10 File System Methods

The list below follows this pattern:

Method — Description — Equivalent macOS Shell Command

Get Info

  • os.getcwd() — get the current working directory path as a string — pwd
  • os.listdir() — get the contents of the current working directory as a list of strings — ls
  • os.walk("starting_directory_path")— returns a generator with name and path info for directories and files in the the current directory and all subdirectories— no exact short CLI equivalent, but ls -R provides subdirectory names and the names of files within subdirectories

Change Things

  • os.chdir("/absolute/or/relative/path") — change the current working directory — cd
  • os.path.join()—create a path for later use — no short CLI equivalent
  • os.makedirs("dir1/dir2") — make directory —mkdir -p
  • shutil.copy2("source_file_path", "destination_directory_path") — copy a file or directory — cp
  • shutil.move("source_file_path", "destination_directory_path") — move a file or directory — mv
  • os.remove("my_file_path") — remove a file — rm
  • shutil.rmtree("my_directory_path")— remove a directory and all files and directories in it —rm -rf

Let’s discuss.

Get Info

os.getcwd()

os.getcwd() returns the current working directory as a string. That one is straightforward. 

os.listdir()

os.listdir() returns the contents of the current working directory as a list of strings. That one is also straightforward. 

os.walk("my_start_directory")

os.walk() creates a generator that can return information about the current directory and subdirectories. It works through the directories in the specified starting directory.

os.walk() returns the following items for each directory it traverses:

  1. current directory path as a string
  2. subdirectory names in the current directory as lists of strings
  3. filenames in current directory as a list of strings

It does this for each directory!

It’s often useful to use os.walk() with a for loop to iterate over the contents of a directory and its subdirectories. For example, the following code will print all files in the directories and subdirectories of the current working directory.

import os
cwd = os.getcwd()
for dir_path, dir_names, file_names in os.walk(cwd):
    for f in file_names:
        print(f)

That’s how we get info, now let’s look at commands that change the working directory or move, copy, or delete parts of the file system.

Change Things

os.chdir("/absolute/or/relative/path")

This method changes the current working directory to either the absolute or relative path provided.

If your code then makes other changes to the file system, it’s a good idea to handle any exceptions raised when using this method with try-except. Otherwise you might be deleting directories or files you don’t want deleted. 

os.path.join()

The os.path module has a number of useful methods for common pathname manipulations. You can use it to find information about directory names and parts of directory names. The module also has methods to check whether a file or directory exists.

os.path.join() is designed to create a path that will work on most any operating system by joining multiple strings into one beautiful file path.

Here’s the description from the docs:

Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last…

Basically, if you are on a Unix or macOS system, os.path.join() sticks a forward slash (“/”) between each string you provide to create a path. If the operating system needs a “\” instead, then join knows to use a back slash.

os.path.join() also provides clear information to other developers that you are creating a path. Definitely use it instead of manual string concatenation to avoid looking like a rookie. 

os.makedirs("dir1/dir2")

os.makedirs() makes directories. The mkdir() method also makes directories, but it does not make intermediate directories. So I suggest you use os.makedirs().

shutil.copy2("source_file", "destination")

There are many ways to copy files and directories in Python. shutil.copy2()is a good choice because it tries to preserve as much of the source file’s metadata as possible. For more discussion, see this article.

Move things

shutil.move("source_file", "destination")

Use shutil.move() to change a file’s location. It uses copy2 as the default under the hood.

os.remove("my_file_path")

Sometimes you need to remove a file. os.remove() is your tool.

shutil.rmtree("my_directory_path")

shutil.rmtree() removes a directory and all files and directories in it

 

Remove things

Careful with functions that delete things! You may want to print what will be deleted as a dry run with print(). Then run substitute in your remove function for print() when you’re sure it won’t delete the wrong files. Hat tip to Al Sweigart for that idea in Automate the Boring Stuff with Python.

Here’s the full list one more time.

10 File System Methods Recap

The list below follows this pattern: Method— Description — Equivalent macOS Shell Command

Get Info

  • os.getcwd() — get the current working directory path as a string — pwd
  • os.listdir() — get the contents of the current working directory as a list of strings — ls
  • os.walk("starting_directory_path")— returns a generator with name and path info for directories and files in the the current directory and all subdirectories — no exact short CLI equivalent, but ls -R provides subdirectory names and the names of files within subdirectories

Change Things

  • os.chdir("/absolute/or/relative/path") — change the current working directory — cd
  • os.path.join()—create a path for later use — no short CLI equivalent
  • os.makedirs("dir1/dir2") — make directory —mkdir-p
  • shutil.copy2("source_file_path", "destination_directory_path") — copy a file or directory — cp
  • shutil.move("source_file_path", "destination_directory_path") — move a file or directory — mv
  • os.remove("my_file_path") — remove a file — rm
  • shutil.rmtree("my_directory_path")— remove a directory and all files and directories in it —rm -rf

Wrap

Now you’ve seen the basics of interacting with the file system in Python. Try these commands in your IPython interpreter for quick feedback. Then explain them to someone else to solidify your knowledge. You’ll be less sore than if you had moved boxes of notebooks around your house. But the exercise would have been good, so now you can hit the gym instead. 

If you want to go deeper, check out the free ebook Automate the Boring Stuff with Python.

If you want to learn about reading and writing from files with Python check out the open function. Remember to use a context manager like so: with open(‘myfile’) as file: 

I hope you found this intro to Python file system manipulation useful. If you did, please share it on your favorite social media channels so others can find it too.

I write about Python, Docker, data science, and more. If any of that’s of interest to you, read more here and follow me on Medium.

The Harvard Innovation Lab

Made in Boston @

The Harvard Innovation Lab

350

Matching Providers

Matching providers 2
comments powered by Disqus.