pyutils.files package

This subpackage contains utilities for dealing with files on disk.

Submodules

pyutils.files.directory_filter module

This module contains two classes meant to help reduce unnecessary disk I/O operations:

The first, DirectoryFileFilter, determines when the contents of a file held in memory are identical to the file copy already on disk.

The second, DirectoryAllFilesFilter, is basically the same except for the caller need not indicate the name of the disk file because it will check the memory file’s signature against all file signatures in a particular directory on disk.

See examples below.

class pyutils.files.directory_filter.DirectoryAllFilesFilter(directory: str)[source]

Bases: DirectoryFileFilter

A predicate that will return False if a file to-be-written to a particular directory is identical to any other file in that same directory (regardless of its name).

i.e. this is the same as DirectoryFileFilter except that our apply() method will return true not only if the contents to be written are identical to the contents of filename on the disk but also it returns true if there exists some other file sitting in the same directory which already contains those identical contents.

>>> testfile = '/tmp/directory_filter_text_f39e5b58-c260-40da-9448-ad1c3b2a69c3.txt'
>>> contents = b'This is a test'
>>> with open(testfile, 'wb') as wf:
...     wf.write(contents)
14
>>> d = DirectoryAllFilesFilter('/tmp')
>>> d.apply(contents)    # False is _any_ file in /tmp contains contents
False
>>> d.apply(b'That was a test')    # True otherwise
True
>>> os.remove(testfile)
Parameters:

directory (str) – the directory we’re watching

apply(proposed_contents: Any, ignored_filename: str | None = None) bool[source]

Call this before writing a new file to directory with the proposed_contents to be written and it will return a value that indicates whether the identical contents is already sitting in any file in that directory. Useful, e.g., for caching.

Parameters:
  • proposed_contents (Any) – the contents about to be persisted to directory

  • ignored_filename (str | None) – unused for now, must be None

Returns:

True if proposed contents does not yet exist in any file in directory or False if it does exist in some file already.

Return type:

bool

class pyutils.files.directory_filter.DirectoryFileFilter(directory: str)[source]

Bases: object

A predicate that will return False if / when a proposed file’s content to-be-written is identical to the contents of the file on disk allowing calling code to safely skip the write.

Raises:

ValueError – directory doesn’t exist

Parameters:

directory (str) –

>>> testfile = '/tmp/directory_filter_text_f39e5b58-c260-40da-9448-ad1c3b2a69c2.txt'
>>> contents = b'This is a test'
>>> with open(testfile, 'wb') as wf:
...     wf.write(contents)
14
>>> d = DirectoryFileFilter('/tmp')
>>> d.apply(contents, testfile)     # False if testfile already contains contents
False
>>> d.apply(b'That was a test', testfile)    # True otherwise
True
>>> os.remove(testfile)
Parameters:

directory (str) – the directory we’re filtering accesses to

apply(proposed_contents: Any, filename: str) bool[source]

Call this with the proposed new contents of filename in memory and we’ll compute the checksum of those contents and return a value that indicates whether they are identical to the disk contents already (so you can skip the write safely).

Parameters:
  • proposed_contents (Any) – the contents about to be written to filename

  • filename (str) – the file about to be populated with proposed_contents

Returns:

True if the disk contents of the file are identical to proposed_contents already and False otherwise.

Return type:

bool

pyutils.files.file_utils module

This is a grab bag of file-related utilities. It has code to, for example, read files transforming the text as its read, normalize pathnames, strip extensions, read and manipulate atimes/mtimes/ctimes, compute a signature based on a file’s contents, traverse the file system recursively, etc…

Note

Many of these functions accept either a string or a pathlib.Path object and will return the same type they were given. I’ve defined a local TypeVar called StrOrPath to use on these routines.

class pyutils.files.file_utils.CreateFileWithMode(filename: StrOrPath, filesystem_mode: int | None = 384, open_mode: str | None = 'w', *, encoding: str | None = None)[source]

Bases: AbstractContextManager

This helper context manager can be used instead of the typical pattern for creating a file if you want to ensure that the file created is a particular filesystem permission mode upon creation.

Python’s open doesn’t support this; you need to set the os.umask and then create a descriptor to open via os.open, see below.

>>> import os
>>> filename = f'/tmp/CreateFileWithModeTest.{os.getpid()}'
>>> with CreateFileWithMode(filename, filesystem_mode=0o600) as wf:
...     print('This is a test', file=wf)
>>> result = os.stat(filename)

Note: there is a high order bit set in this that is S_IFREG indicating that the file is a “normal file”. Clear it with the mask.

>>> print(f'{result.st_mode & 0o7777:o}')
600
>>> with open(filename, 'r') as rf:
...     contents = rf.read()
>>> contents
'This is a test\n'
>>> remove(filename)
Parameters:
  • filename (StrOrPath) – path of the file to create.

  • filesystem_mode (int | None) – the UNIX-style octal mode with which to create the filename. Defaults to 0o600.

  • open_mode (str | None) – the mode to use when opening the file (e.g. ‘w’, ‘wb’, etc…)

  • encoding (str | None) – optional encoding you’re using to write the opened file. Use None for binary files (e.g. ‘wb’ mode).

Warning

If the file already exists it will be overwritten!

class pyutils.files.file_utils.FileWriter(filename: StrOrPath)[source]

Bases: AbstractContextManager

A helper that writes a file to a temporary location and then moves it atomically to its ultimate destination on close.

Example usage. Creates a temporary file that is populated by the print statements within the context. Until the context is exited, the true destination file does not exist so no reader of it can see partial writes due to buffering or code timing. Once the context is exited, the file is moved from its temporary location to its permanent location by a call to /bin/mv which should be atomic:

with FileWriter('/home/bob/foobar.txt') as w:
    print("This is a test!", file=w)
    time.sleep(2)
    print("This is only a test...", file=w)
Parameters:

filename (StrOrPath) – the ultimate destination file we want to populate. On exit, the file will be atomically created.

pyutils.files.file_utils.create_path_if_not_exist(path: StrOrPath, on_error: Callable[[StrOrPath, OSError], None] | None = None) None[source]

Attempts to create path if it does not exist already.

Parameters:
  • path (StrOrPath) – the path to attempt to create

  • on_error (Callable[[StrOrPath, OSError], None] | None) – if provided, this is invoked on error conditions and passed the path and OSError that it caused

Raises:

OSError – an exception occurred and on_error was not set.

Return type:

None

See also does_file_exist().

>>> import uuid
>>> import os
>>> path = os.path.join("/tmp", str(uuid.uuid4()), str(uuid.uuid4()))
>>> os.path.exists(path)
False
>>> create_path_if_not_exist(path)
>>> os.path.exists(path)
True
pyutils.files.file_utils.delete(path: StrOrPath) None[source]

This is a convenience for my dumb ass who can’t remember os.remove sometimes.

Parameters:

path (StrOrPath) –

Return type:

None

pyutils.files.file_utils.describe_file_atime(filename: StrOrPath, *, brief: bool = False) str | None[source]

Describe how long ago a file was accessed.

Parameters:
  • filename (StrOrPath) – the file whose atime should be described.

  • brief (bool) – if True, describe atime briefly.

Returns:

A string that represents how long ago filename was last accessed. The description will be verbose or brief depending on the brief argument.

Return type:

str | None

See also get_file_raw_atime(), get_file_atime_age_seconds(), get_file_atime_as_datetime(), get_file_atime_timedelta(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps() set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.describe_file_ctime(filename: StrOrPath, *, brief: bool = False) str | None[source]

Describes a file’s creation time.

Parameters:
  • filename (StrOrPath) – the file whose ctime should be described.

  • brief (bool) – if True, describe ctime briefly.

Returns:

A string that represents how long ago filename was created. The description will be verbose or brief depending on the brief argument.

Return type:

str | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_ctime_age_seconds(), get_file_ctime_as_datetime(), get_file_ctime_timedelta(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.describe_file_mtime(filename: StrOrPath, *, brief: bool = False) str | None[source]

Describes how long ago a file was modified.

Parameters:
  • filename (StrOrPath) – the file whose mtime should be described.

  • brief (bool) – if True, describe mtime briefly.

Returns:

A string that represents how long ago filename was last modified. The description will be verbose or brief depending on the brief argument.

Return type:

str | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_mtime_age_seconds(), get_file_mtime_as_datetime(), get_file_mtime_timedelta(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.describe_file_timestamp(filename: StrOrPath, extractor, *, brief=False) str | None[source]

~Internal helper

Parameters:

filename (StrOrPath) –

Return type:

str | None

pyutils.files.file_utils.does_directory_exist(dirname: StrOrPath) bool[source]

Does the given directory exist?

Parameters:

dirname (StrOrPath) – the name of the directory to check

Returns:

True if a path exists and is a directory, not a regular file.

Return type:

bool

See also does_file_exist().

>>> does_directory_exist('/tmp')
True
>>> does_directory_exist('/xyzq/21341')
False
pyutils.files.file_utils.does_file_exist(filename: StrOrPath) bool[source]

Returns True if a file exists and is a normal file.

Parameters:

filename (StrOrPath) – filename to check

Returns:

True if filename exists and is a normal file.

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also create_path_if_not_exist(), is_readable().

>>> does_file_exist(__file__)
True
>>> does_file_exist('/tmp/2492043r9203r9230r9230r49230r42390r4230')
False
pyutils.files.file_utils.does_path_exist(pathname: StrOrPath) bool[source]

Just a more verbose wrapper around os.path.exists.

Parameters:

pathname (StrOrPath) –

Return type:

bool

pyutils.files.file_utils.expand_globs(in_filename: StrOrPath) Generator[StrOrPath, None, None][source]

Expands shell globs (* and ? wildcards) to the matching files.

Parameters:

in_filename (StrOrPath) – the filepath to be expanded. May contain ‘*’ and ‘?’ globbing characters.

Returns:

A Generator that yields filenames that match the input pattern.

Return type:

Generator[StrOrPath, None, None]

See also get_files(), get_files_recursive().

pyutils.files.file_utils.fix_multiple_slashes(path: StrOrPath) StrOrPath[source]

Fixes multi-slashes in paths or path-like strings

Parameters:

path (StrOrPath) – the path in which to remove multiple slashes

Return type:

StrOrPath

>>> p = '/usr/local//etc/rc.d///file.txt'
>>> fix_multiple_slashes(p)
'/usr/local/etc/rc.d/file.txt'
>>> import pathlib
>>> p = pathlib.Path(p)
>>> str(fix_multiple_slashes(p))
'/usr/local/etc/rc.d/file.txt'
>>> p = 'this is a test'
>>> fix_multiple_slashes(p) == p
True
pyutils.files.file_utils.get_all_extensions(path: StrOrPath) List[str][source]

Return the extensions of a file or path in order.

Parameters:

path (StrOrPath) – the path from which to extract all extensions.

Returns:

a list containing each extension which may be empty.

Return type:

List[str]

See also without_extension(), without_all_extensions(), get_extension().

>>> get_all_extensions('/home/scott/foo.tar.gz.1')
['.tar', '.gz', '.1']
>>> get_all_extensions('/home/scott/foobar')
[]
pyutils.files.file_utils.get_canonical_path(filespec: StrOrPath) StrOrPath[source]

Returns a canonicalized absolute path.

Parameters:

filespec (StrOrPath) – the path to canonicalize

Returns:

the canonicalized path

Return type:

StrOrPath

See also get_path(), without_path().

>>> get_canonical_path('/tmp/../tmp/../tmp')
'/tmp'
pyutils.files.file_utils.get_directories(directory: StrOrPath)[source]

Returns the subdirectories in a directory as a generator.

Parameters:

directory (StrOrPath) – the directory to list subdirectories within.

Returns:

A generator that yields all subdirectories within the given input directory.

See also get_files(), get_files_recursive().

pyutils.files.file_utils.get_extension(path: StrOrPath) str[source]

Extract and return one (the last) extension from a file or path.

Parameters:

path (StrOrPath) – the path from which to extract an extension

Returns:

The last extension from the file path.

Return type:

str

See also without_extension(), without_all_extensions(), get_all_extensions().

>>> get_extension('this_is_a_test.txt')
'.txt'
>>> get_extension('/home/scott/test.py')
'.py'
>>> get_extension('foobar')
''
>>> import pathlib
>>> get_extension(pathlib.Path('/tmp/foobar.txt'))
'.txt'
pyutils.files.file_utils.get_file_atime_age_seconds(filename: StrOrPath) int | None[source]

Gets a file’s access time as an age in seconds (ago).

Parameters:

filename (StrOrPath) – file whose atime should be checked.

Returns:

The number of seconds ago that filename was last accessed.

Return type:

int | None

See also get_file_raw_atime(), get_file_atime_as_datetime(), get_file_atime_timedelta(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_atime_as_datetime(filename: StrOrPath) datetime | None[source]

Fetch a file’s access time as a Python datetime.

Parameters:

filename (StrOrPath) – the file whose atime should be fetched.

Returns:

The file’s atime as a Python datetime.datetime.

Return type:

datetime | None

See also get_file_raw_atime(), get_file_atime_age_seconds(), get_file_atime_timedelta(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_atime_timedelta(filename: StrOrPath) timedelta | None[source]

How long ago was a file accessed as a timedelta?

Parameters:

filename (StrOrPath) – the file whose atime should be checked.

Returns:

A Python datetime.timedelta representing how long ago filename was last accessed.

Return type:

timedelta | None

See also get_file_raw_atime(), get_file_atime_age_seconds(), get_file_atime_as_datetime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_ctime_age_seconds(filename: StrOrPath) int | None[source]

Gets a file’s creation time as an age in seconds (ago).

Parameters:

filename (StrOrPath) – file whose ctime should be checked.

Returns:

The number of seconds ago that filename was created.

Return type:

int | None

See also get_file_raw_ctime(), get_file_ctime_age_seconds(), get_file_ctime_as_datetime(), get_file_ctime_timedelta(), get_file_raw_mtime(), get_file_raw_atime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_ctime_as_datetime(filename: StrOrPath) datetime | None[source]

Fetches a file’s creation time as a Python datetime.

Parameters:

filename (StrOrPath) – the file whose ctime should be fetched.

Returns:

The file’s ctime as a Python datetime.datetime.

Return type:

datetime | None

See also get_file_raw_ctime(), get_file_ctime_age_seconds(), get_file_ctime_timedelta(), get_file_raw_atime(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_ctime_timedelta(filename: StrOrPath) timedelta | None[source]

How long ago was a file created as a timedelta?

Parameters:

filename (StrOrPath) – the file whose ctime should be checked.

Returns:

A Python datetime.timedelta representing how long ago filename was created.

Return type:

timedelta | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_ctime_age_seconds(), get_file_ctime_as_datetime(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_md5(filename: StrOrPath) str[source]

Hashes filename’s disk contents and returns the MD5 digest.

Parameters:

filename (StrOrPath) – the file whose contents to hash

Returns:

the MD5 digest of the file’s contents. Raises on error.

Return type:

str

pyutils.files.file_utils.get_file_mtime_age_seconds(filename: StrOrPath) int | None[source]

Gets a file’s modification time as seconds (ago).

Parameters:

filename (StrOrPath) – file whose mtime should be checked.

Returns:

The number of seconds ago that filename was last modified.

Return type:

int | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_mtime_as_datetime(), get_file_mtime_timedelta(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_mtime_as_datetime(filename: StrOrPath) datetime | None[source]

Fetch a file’s modification time as a Python datetime.

Parameters:

filename (StrOrPath) – the file whose mtime should be fetched.

Returns:

The file’s mtime as a Python datetime.datetime.

Return type:

datetime | None

See also get_file_raw_mtime(), get_file_mtime_age_seconds(), get_file_mtime_timedelta(), get_file_raw_ctime(), get_file_raw_atime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_mtime_timedelta(filename: StrOrPath) timedelta | None[source]

Gets a file’s modification time as a Python timedelta.

Parameters:

filename (StrOrPath) – the file whose mtime should be checked.

Returns:

A Python datetime.timedelta representing how long ago filename was last modified.

Return type:

timedelta | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_mtime_age_seconds(), get_file_mtime_as_datetime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime()

pyutils.files.file_utils.get_file_raw_atime(filename: StrOrPath) float | None[source]

Get a file’s raw access time.

Parameters:

filename (StrOrPath) – the path to the file to stat

Returns:

The file’s raw atime (seconds since the Epoch) or None on error.

Return type:

float | None

See also get_file_atime_age_seconds(), get_file_atime_as_datetime(), get_file_atime_timedelta(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_raw_ctime(filename: StrOrPath) float | None[source]

Get a file’s raw creation time.

Parameters:

filename (StrOrPath) – the path to the file to stat

Returns:

The file’s raw ctime (seconds since the Epoch) or None on error.

Return type:

float | None

See also get_file_raw_atime(), get_file_ctime_age_seconds(), get_file_ctime_as_datetime(), get_file_ctime_timedelta(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_raw_mtime(filename: StrOrPath) float | None[source]

Get a file’s raw modification time.

Parameters:

filename (StrOrPath) – the path to the file to stat

Returns:

The file’s raw mtime (seconds since the Epoch) or None on error.

Return type:

float | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_mtime_age_seconds(), get_file_mtime_as_datetime(), get_file_mtime_timedelta(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_raw_timestamp(filename: StrOrPath, extractor: Callable[[stat_result], float | None]) float | None[source]

Stat a file and, if successful, use extractor to fetch some subset of the information in the os.stat_result.

Parameters:
  • filename (StrOrPath) – the filename to stat

  • extractor (Callable[[stat_result], float | None]) – Callable that takes a os.stat_result and produces something useful(?) with it.

Returns:

whatever the extractor produced or None on error.

Return type:

float | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamps()

pyutils.files.file_utils.get_file_raw_timestamps(filename: StrOrPath) stat_result | None[source]

Stats the file and returns an os.stat_result or None on error.

Parameters:

filename (StrOrPath) – the file whose timestamps to fetch

Returns:

the os.stat_result or None to indicate an error occurred

Return type:

stat_result | None

See also get_file_raw_atime(), get_file_raw_ctime(), get_file_raw_mtime(), get_file_raw_timestamp()

pyutils.files.file_utils.get_file_size(filename: StrOrPath) int[source]

Returns the size of a file in bytes.

Parameters:

filename (StrOrPath) – the filename to size

Returns:

size of filename in bytes

Return type:

int

pyutils.files.file_utils.get_files(directory: StrOrPath) Generator[StrOrPath, None, None][source]

Returns the files in a directory as a generator.

Parameters:

directory (StrOrPath) – the directory to list files under.

Returns:

A generator that yields all files in the input directory.

Return type:

Generator[StrOrPath, None, None]

See also expand_globs(), get_files_recursive(), get_matching_files().

pyutils.files.file_utils.get_files_recursive(directory: StrOrPath)[source]

Find the files and directories under a root recursively.

Parameters:

directory (StrOrPath) – the root directory under which to list subdirectories and file contents.

Returns:

A generator that yields all directories and files beneath the input root directory.

See also get_files(), get_matching_files(), get_matching_files_recursive()

pyutils.files.file_utils.get_matching_files(directory: StrOrPath, glob_string: str)[source]

Returns the subset of files whose name matches a glob.

Parameters:
  • directory (StrOrPath) – the directory to match files within.

  • glob_string (str) – the globbing pattern (may include ‘*’ and ‘?’) to use when matching files.

Returns:

A generator that yields filenames in directory that match the given glob pattern.

See also get_files(), expand_globs().

pyutils.files.file_utils.get_matching_files_recursive(directory: StrOrPath, glob_string: str)[source]

Returns the subset of files whose name matches a glob under a root recursively.

Parameters:
  • directory (StrOrPath) – the root under which to search

  • glob_string (str) – a globbing pattern that describes the subset of files and directories to return. May contain ‘?’ and ‘*’.

Returns:

A generator that yields all files and directories under the given root directory that match the given globbing pattern.

See also get_files_recursive().

pyutils.files.file_utils.get_path(filespec: StrOrPath) StrOrPath[source]

Returns just the path of the filespec by removing the filename and extension.

Parameters:

filespec (StrOrPath) – path to remove filename / extension(s) from

Returns:

filespec with just the leading directory components and no

filename or extension(s)

Return type:

StrOrPath

See also without_path(), get_canonical_path().

>>> get_path('/home/scott/foobar.py')
'/home/scott'
>>> get_path('/home/scott/test.1.2.3.gz')
'/home/scott'
>>> get_path('~scott/frapp.txt')
'~scott'
pyutils.files.file_utils.is_directory(filename: StrOrPath) bool[source]

Is that path a directory (not a normal file?)

Parameters:

filename (StrOrPath) – the path of the file to check

Returns:

True if filename is a directory

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also does_directory_exist(), is_normal_file(), is_symlink().

>>> is_directory('/tmp')
True
pyutils.files.file_utils.is_executable(filename: StrOrPath) bool[source]

Is the file executable?

Parameters:

filename (StrOrPath) – the file to check for execute access.

Returns:

True if file exists, is a normal file and is executable by the current process. False otherwise.

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also does_file_exist(), is_readable(), is_writable().

pyutils.files.file_utils.is_normal_file(filename: StrOrPath) bool[source]

Is that file normal (not a directory or some special file?)

Parameters:

filename (StrOrPath) – the path of the file to check

Returns:

True if filename is a normal file.

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also is_directory(), does_file_exist(), is_symlink().

>>> is_normal_file(__file__)
True
pyutils.files.file_utils.is_readable(filename: StrOrPath) bool[source]

Is the file readable?

Parameters:

filename (StrOrPath) – the filename to check for read access

Returns:

True if the file exists, is a normal file, and is readable by the current process. False otherwise.

Return type:

bool

See also does_file_exist(), is_writable(), is_executable().

pyutils.files.file_utils.is_same_file(file1: StrOrPath, file2: StrOrPath) bool[source]

Determine if two paths reference the same inode.

Parameters:
  • file1 (StrOrPath) – the first file

  • file2 (StrOrPath) – the second file

Returns:

True if the two files are the same file.

Return type:

bool

See also is_symlink(), is_normal_file().

>>> is_same_file('/tmp', '/tmp/../tmp')
True
>>> is_same_file('/tmp', '/home')
False

Is that path a symlink?

Parameters:

filename (StrOrPath) – the path of the file to check

Returns:

True if filename is a symlink, False otherwise.

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also is_directory(), is_normal_file().

>>> is_symlink('/tmp')
False
>>> import os
>>> os.symlink('/tmp', '/tmp/foo')
>>> is_symlink('/tmp/foo')
True
>>> os.unlink('/tmp/foo')
pyutils.files.file_utils.is_writable(filename: StrOrPath) bool[source]

Is the file writable?

Parameters:

filename (StrOrPath) – the file to check for write access.

Returns:

True if file exists, is a normal file and is writable by the current process. False otherwise.

Return type:

bool

Note

A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.

See also is_readable(), does_file_exist().

pyutils.files.file_utils.remove(path: StrOrPath) None[source]

Deletes a file. Raises if path refers to a directory or a file that doesn’t exist.

Parameters:

path (StrOrPath) – the path of the file to delete

Raises:

FileNotFoundError – the path to remove does not exist

Return type:

None

>>> import os
>>> filename = '/tmp/file_utils_test_file'
>>> os.system(f'touch {filename}')
0
>>> does_file_exist(filename)
True
>>> remove(filename)
>>> does_file_exist(filename)
False
>>> filename = '/tmp/file_utils_test_file'
>>> os.system(f'touch {filename}')
0
>>> import pathlib
>>> p = pathlib.Path(filename)
>>> p.exists()
True
>>> remove(p)
>>> p.exists()
False
>>> remove("/tmp/23r23r23rwdfwfwefgdfgwerhwrgewrgergerg22r")
Traceback (most recent call last):
...
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/23r23r23rwdfwfwefgdfgwerhwrgewrgergerg22r'
pyutils.files.file_utils.remove_hash_comments(x: str) str[source]

Trivial function to be used as a line_transformer in slurp_file() for no # comments in file contents

Parameters:

x (str) –

Return type:

str

pyutils.files.file_utils.remove_newlines(x: str) str[source]

Trivial function to be used as a line_transformer in slurp_file() for no newlines in file contents

Parameters:

x (str) –

Return type:

str

pyutils.files.file_utils.set_file_raw_atime(filename: StrOrPath, atime: float) None[source]

Sets a file’s raw access time.

Parameters:
  • filename (StrOrPath) – the file whose atime should be set

  • atime (float) – raw atime as number of seconds since the Epoch to set

Return type:

None

See also get_file_raw_atime(), get_file_atime_age_seconds(), get_file_atime_as_datetime(), get_file_atime_timedelta(), get_file_raw_timestamps(), set_file_raw_mtime(), set_file_raw_atime_and_mtime(), touch_file()

pyutils.files.file_utils.set_file_raw_atime_and_mtime(filename: StrOrPath, ts: float | None = None) None[source]

Sets both a file’s raw modification and access times.

Parameters:
  • filename (StrOrPath) – the file whose times to set

  • ts (float | None) – the raw time to set or None to indicate time should be set to the current time.

Return type:

None

See also get_file_raw_atime(), get_file_raw_mtime(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_mtime()

pyutils.files.file_utils.set_file_raw_mtime(filename: StrOrPath, mtime: float)[source]

Sets a file’s raw modification time.

Parameters:
  • filename (StrOrPath) – the file whose mtime should be set

  • mtime (float) – the raw mtime as number of seconds since the Epoch to set

See also get_file_raw_mtime(), get_file_mtime_age_seconds(), get_file_mtime_as_datetime(), get_file_mtime_timedelta(), get_file_raw_timestamps(), set_file_raw_atime(), set_file_raw_atime_and_mtime(), touch_file()

pyutils.files.file_utils.slurp_file(filename: StrOrPath, *, skip_blank_lines: bool = False, line_transformers: List[Callable[[str], str]] | None = None)[source]

Reads in a file’s contents line-by-line to a memory buffer applying each line transformation in turn.

Parameters:
  • filename (StrOrPath) – file to be read

  • skip_blank_lines (bool) – should reading skip blank lines?

  • line_transformers (List[Callable[[str], str]] | None) – little string->string transformations

Returns:

A list of lines from the read and transformed file contents.

Raises:

Exception – filename not found or can’t be read.

pyutils.files.file_utils.strip_whitespace(x: str) str[source]

Trivial function to be used as a line_transformer in slurp_file() for no leading / trailing whitespace in file contents

Parameters:

x (str) –

Return type:

str

pyutils.files.file_utils.touch_file(filename: StrOrPath, *, mode: int | None = 438)[source]

Like unix “touch” command’s semantics: update the timestamp of a file to the current time if the file exists. Create the file if it doesn’t exist.

Parameters:
  • filename (StrOrPath) – the filename

  • mode (int | None) – the mode to create the file with

Warning

The default creation mode is 0o666 which is world readable and writable. Override this by passing in your own mode parameter if desired.

See also set_file_raw_atime(), set_file_raw_atime_and_mtime(), set_file_raw_mtime(), create_path_if_not_exist()

pyutils.files.file_utils.without_all_extensions(path: StrOrPath) StrOrPath[source]

Removes all extensions from a path; handles multiple extensions like foobar.tar.gz -> foobar.

Parameters:

path (StrOrPath) – the path from which to remove all extensions

Returns:

the path with all extensions removed.

Return type:

StrOrPath

See also without_extension()

>>> without_all_extensions('/home/scott/foobar.1.tar.gz')
'/home/scott/foobar'
pyutils.files.file_utils.without_extension(path: StrOrPath) StrOrPath[source]

Remove one (the last) extension from a file or path.

Parameters:

path (StrOrPath) – the path from which to remove an extension

Returns:

the path with one extension removed.

Return type:

StrOrPath

See also without_all_extensions().

>>> without_extension('foobar.txt')
'foobar'
>>> without_extension('/home/scott/frapp.py')
'/home/scott/frapp'
>>> f = 'a.b.c.tar.gz'
>>> while('.' in f):
...     f = without_extension(f)
...     print(f)
a.b.c.tar
a.b.c
a.b
a
>>> without_extension('foobar')
'foobar'
pyutils.files.file_utils.without_path(filespec: StrOrPath) StrOrPath[source]

Returns the base filename without any leading path.

Parameters:

filespec (StrOrPath) – path to remove leading directories from

Returns:

filespec without leading dir components.

Return type:

StrOrPath

See also get_path(), get_canonical_path().

>>> without_path('/home/scott/foo.py')
'foo.py'
>>> without_path('foo.py')
'foo.py'
>>> import pathlib
>>> str(without_path(pathlib.Path('/tmp/testing.123')))
'testing.123'

pyutils.files.lockfile module

This is a lockfile implementation I created for use with cronjobs on my machine to prevent multiple copies of a job from running in parallel.

For local operations, when one job is running this code keeps a file on disk to indicate a lock is held. Other copies will fail to start if they detect this lock until the lock is released. There are provisions in the code for timing out locks, cleaning up a lock when a signal is received, gracefully retrying lock acquisition on failure, etc…

Also allows for Zookeeper-based locks when lockfile path is prefixed with ‘zk:’ in order to synchronize processes across different machines.

class pyutils.files.lockfile.LocalLockFileContents(pid: int, commandline: str, expiration_timestamp: float | None)[source]

Bases: object

The contents we’ll write to each lock file.

Parameters:
  • pid (int) –

  • commandline (str) –

  • expiration_timestamp (float | None) –

commandline: str

The commandline of the process that holds the lock

expiration_timestamp: float | None

When this lock will expire as seconds since Epoch

pid: int

The pid of the process that holds the lock

class pyutils.files.lockfile.LockFile(lockfile_path: str, *, do_signal_cleanup: bool = True, expiration_timestamp: float | None = None, override_command: str | None = None)[source]

Bases: AbstractContextManager

A file locking mechanism that has context-manager support so you can use it in a with statement. e.g.:

with LockFile('./foo.lock'):
    # do a bunch of stuff... if the process dies we have a signal
    # handler to do cleanup.  Other code (in this process or another)
    # that tries to take the same lockfile will block.  There is also
    # some logic for detecting stale locks.

C’tor.

Parameters:
  • lockfile_path (str) – path of the lockfile to acquire; may begin with zk: to indicate a path in zookeeper rather than on the local filesystem. Note that zookeeper-based locks require an expiration_timestamp as the stale detection semantics are skipped for non-local locks.

  • do_signal_cleanup (bool) – handle SIGINT and SIGTERM events by releasing the lock before exiting

  • expiration_timestamp (Optional[float]) – when our lease on the lock should expire (as seconds since the Epoch). None means the lock will not expire until we explicltly release it. Note that this is required for zookeeper based locks.

  • override_command (Optional[str]) – don’t use argv to determine our commandline rather use this instead if provided.

Raises:

Exception – Zookeeper lock path without an expiration timestamp

acquire_with_retries(*, initial_delay: float = 1.0, backoff_factor: float = 2.0, max_attempts: int = 5) bool[source]

Attempt to acquire the lock repeatedly with retries and backoffs.

Parameters:
  • initial_delay (float) – how long to wait before retrying the first time

  • backoff_factor (float) – a float >= 1.0 the multiples the current retry delay each subsequent time we attempt to acquire and fail to do so.

  • max_attempts (int) – maximum number of times to try before giving up and failing.

Returns:

True if the lock was acquired and False otherwise.

Return type:

bool

locked() bool[source]

Is it locked currently?

Return type:

bool

release() None[source]

Release the lock

Return type:

None

try_acquire_lock_once() bool[source]

Attempt to acquire the lock with no blocking.

Returns:

True if the lock was acquired and False otherwise.

Return type:

bool

Module contents