pyutils.files package
This subpackage contains utilities for dealing with files on disk.
Submodules
pyutils.files.directory_filter module
This module contains two classes meant to help reduce unnecessary disk I/O operations:
The first, DirectoryFileFilter
, determines when the contents
of a file held in memory are identical to the file copy already on
disk.
The second, DirectoryAllFilesFilter
, is basically the same
except for the caller need not indicate the name of the disk file
because it will check the memory file’s signature against all file
signatures in a particular directory on disk.
See examples below.
- class pyutils.files.directory_filter.DirectoryAllFilesFilter(directory: str)[source]
Bases:
DirectoryFileFilter
A predicate that will return False if a file to-be-written to a particular directory is identical to any other file in that same directory (regardless of its name).
i.e. this is the same as
DirectoryFileFilter
except that ourapply()
method will return true not only if the contents to be written are identical to the contents of filename on the disk but also it returns true if there exists some other file sitting in the same directory which already contains those identical contents.>>> testfile = '/tmp/directory_filter_text_f39e5b58-c260-40da-9448-ad1c3b2a69c3.txt'
>>> contents = b'This is a test' >>> with open(testfile, 'wb') as wf: ... wf.write(contents) 14
>>> d = DirectoryAllFilesFilter('/tmp')
>>> d.apply(contents) # False is _any_ file in /tmp contains contents False
>>> d.apply(b'That was a test') # True otherwise True
>>> os.remove(testfile)
- Parameters:
directory (str) – the directory we’re watching
- apply(proposed_contents: Any, ignored_filename: str | None = None) bool [source]
Call this before writing a new file to directory with the proposed_contents to be written and it will return a value that indicates whether the identical contents is already sitting in any file in that directory. Useful, e.g., for caching.
- Parameters:
proposed_contents (Any) – the contents about to be persisted to directory
ignored_filename (str | None) – unused for now, must be None
- Returns:
True if proposed contents does not yet exist in any file in directory or False if it does exist in some file already.
- Return type:
bool
- class pyutils.files.directory_filter.DirectoryFileFilter(directory: str)[source]
Bases:
object
A predicate that will return False if / when a proposed file’s content to-be-written is identical to the contents of the file on disk allowing calling code to safely skip the write.
- Raises:
ValueError – directory doesn’t exist
- Parameters:
directory (str) –
>>> testfile = '/tmp/directory_filter_text_f39e5b58-c260-40da-9448-ad1c3b2a69c2.txt' >>> contents = b'This is a test' >>> with open(testfile, 'wb') as wf: ... wf.write(contents) 14
>>> d = DirectoryFileFilter('/tmp')
>>> d.apply(contents, testfile) # False if testfile already contains contents False
>>> d.apply(b'That was a test', testfile) # True otherwise True
>>> os.remove(testfile)
- Parameters:
directory (str) – the directory we’re filtering accesses to
- apply(proposed_contents: Any, filename: str) bool [source]
Call this with the proposed new contents of filename in memory and we’ll compute the checksum of those contents and return a value that indicates whether they are identical to the disk contents already (so you can skip the write safely).
- Parameters:
proposed_contents (Any) – the contents about to be written to filename
filename (str) – the file about to be populated with proposed_contents
- Returns:
True if the disk contents of the file are identical to proposed_contents already and False otherwise.
- Return type:
bool
pyutils.files.file_utils module
This is a grab bag of file-related utilities. It has code to, for example, read files transforming the text as its read, normalize pathnames, strip extensions, read and manipulate atimes/mtimes/ctimes, compute a signature based on a file’s contents, traverse the file system recursively, etc…
Note
Many of these functions accept either a string or a pathlib.Path object and will return the same type they were given. I’ve defined a local TypeVar called StrOrPath to use on these routines.
- class pyutils.files.file_utils.CreateFileWithMode(filename: StrOrPath, filesystem_mode: int | None = 384, open_mode: str | None = 'w', *, encoding: str | None = None)[source]
Bases:
AbstractContextManager
This helper context manager can be used instead of the typical pattern for creating a file if you want to ensure that the file created is a particular filesystem permission mode upon creation.
Python’s open doesn’t support this; you need to set the os.umask and then create a descriptor to open via os.open, see below.
>>> import os >>> filename = f'/tmp/CreateFileWithModeTest.{os.getpid()}' >>> with CreateFileWithMode(filename, filesystem_mode=0o600) as wf: ... print('This is a test', file=wf) >>> result = os.stat(filename)
Note: there is a high order bit set in this that is S_IFREG indicating that the file is a “normal file”. Clear it with the mask.
>>> print(f'{result.st_mode & 0o7777:o}') 600 >>> with open(filename, 'r') as rf: ... contents = rf.read() >>> contents 'This is a test\n' >>> remove(filename)
- Parameters:
filename (StrOrPath) – path of the file to create.
filesystem_mode (int | None) – the UNIX-style octal mode with which to create the filename. Defaults to 0o600.
open_mode (str | None) – the mode to use when opening the file (e.g. ‘w’, ‘wb’, etc…)
encoding (str | None) – optional encoding you’re using to write the opened file. Use None for binary files (e.g. ‘wb’ mode).
Warning
If the file already exists it will be overwritten!
- class pyutils.files.file_utils.FileWriter(filename: StrOrPath)[source]
Bases:
AbstractContextManager
A helper that writes a file to a temporary location and then moves it atomically to its ultimate destination on close.
Example usage. Creates a temporary file that is populated by the print statements within the context. Until the context is exited, the true destination file does not exist so no reader of it can see partial writes due to buffering or code timing. Once the context is exited, the file is moved from its temporary location to its permanent location by a call to /bin/mv which should be atomic:
with FileWriter('/home/bob/foobar.txt') as w: print("This is a test!", file=w) time.sleep(2) print("This is only a test...", file=w)
- Parameters:
filename (StrOrPath) – the ultimate destination file we want to populate. On exit, the file will be atomically created.
- pyutils.files.file_utils.create_path_if_not_exist(path: StrOrPath, on_error: Callable[[StrOrPath, OSError], None] | None = None) None [source]
Attempts to create path if it does not exist already.
- Parameters:
path (StrOrPath) – the path to attempt to create
on_error (Callable[[StrOrPath, OSError], None] | None) – if provided, this is invoked on error conditions and passed the path and OSError that it caused
- Raises:
OSError – an exception occurred and on_error was not set.
- Return type:
None
See also
does_file_exist()
.>>> import uuid >>> import os >>> path = os.path.join("/tmp", str(uuid.uuid4()), str(uuid.uuid4())) >>> os.path.exists(path) False >>> create_path_if_not_exist(path) >>> os.path.exists(path) True
- pyutils.files.file_utils.delete(path: StrOrPath) None [source]
This is a convenience for my dumb ass who can’t remember os.remove sometimes.
- Parameters:
path (StrOrPath) –
- Return type:
None
- pyutils.files.file_utils.describe_file_atime(filename: StrOrPath, *, brief: bool = False) str | None [source]
Describe how long ago a file was accessed.
- Parameters:
filename (StrOrPath) – the file whose atime should be described.
brief (bool) – if True, describe atime briefly.
- Returns:
A string that represents how long ago filename was last accessed. The description will be verbose or brief depending on the brief argument.
- Return type:
str | None
See also
get_file_raw_atime()
,get_file_atime_age_seconds()
,get_file_atime_as_datetime()
,get_file_atime_timedelta()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.describe_file_ctime(filename: StrOrPath, *, brief: bool = False) str | None [source]
Describes a file’s creation time.
- Parameters:
filename (StrOrPath) – the file whose ctime should be described.
brief (bool) – if True, describe ctime briefly.
- Returns:
A string that represents how long ago filename was created. The description will be verbose or brief depending on the brief argument.
- Return type:
str | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_ctime_age_seconds()
,get_file_ctime_as_datetime()
,get_file_ctime_timedelta()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.describe_file_mtime(filename: StrOrPath, *, brief: bool = False) str | None [source]
Describes how long ago a file was modified.
- Parameters:
filename (StrOrPath) – the file whose mtime should be described.
brief (bool) – if True, describe mtime briefly.
- Returns:
A string that represents how long ago filename was last modified. The description will be verbose or brief depending on the brief argument.
- Return type:
str | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_mtime_age_seconds()
,get_file_mtime_as_datetime()
,get_file_mtime_timedelta()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.describe_file_timestamp(filename: StrOrPath, extractor, *, brief=False) str | None [source]
~Internal helper
- Parameters:
filename (StrOrPath) –
- Return type:
str | None
- pyutils.files.file_utils.does_directory_exist(dirname: StrOrPath) bool [source]
Does the given directory exist?
- Parameters:
dirname (StrOrPath) – the name of the directory to check
- Returns:
True if a path exists and is a directory, not a regular file.
- Return type:
bool
See also
does_file_exist()
.>>> does_directory_exist('/tmp') True >>> does_directory_exist('/xyzq/21341') False
- pyutils.files.file_utils.does_file_exist(filename: StrOrPath) bool [source]
Returns True if a file exists and is a normal file.
- Parameters:
filename (StrOrPath) – filename to check
- Returns:
True if filename exists and is a normal file.
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
create_path_if_not_exist()
,is_readable()
.>>> does_file_exist(__file__) True >>> does_file_exist('/tmp/2492043r9203r9230r9230r49230r42390r4230') False
- pyutils.files.file_utils.does_path_exist(pathname: StrOrPath) bool [source]
Just a more verbose wrapper around os.path.exists.
- Parameters:
pathname (StrOrPath) –
- Return type:
bool
- pyutils.files.file_utils.expand_globs(in_filename: StrOrPath) Generator[StrOrPath, None, None] [source]
Expands shell globs (* and ? wildcards) to the matching files.
- Parameters:
in_filename (StrOrPath) – the filepath to be expanded. May contain ‘*’ and ‘?’ globbing characters.
- Returns:
A Generator that yields filenames that match the input pattern.
- Return type:
Generator[StrOrPath, None, None]
See also
get_files()
,get_files_recursive()
.
- pyutils.files.file_utils.fix_multiple_slashes(path: StrOrPath) StrOrPath [source]
Fixes multi-slashes in paths or path-like strings
- Parameters:
path (StrOrPath) – the path in which to remove multiple slashes
- Return type:
StrOrPath
>>> p = '/usr/local//etc/rc.d///file.txt' >>> fix_multiple_slashes(p) '/usr/local/etc/rc.d/file.txt'
>>> import pathlib >>> p = pathlib.Path(p) >>> str(fix_multiple_slashes(p)) '/usr/local/etc/rc.d/file.txt'
>>> p = 'this is a test' >>> fix_multiple_slashes(p) == p True
- pyutils.files.file_utils.get_all_extensions(path: StrOrPath) List[str] [source]
Return the extensions of a file or path in order.
- Parameters:
path (StrOrPath) – the path from which to extract all extensions.
- Returns:
a list containing each extension which may be empty.
- Return type:
List[str]
See also
without_extension()
,without_all_extensions()
,get_extension()
.>>> get_all_extensions('/home/scott/foo.tar.gz.1') ['.tar', '.gz', '.1']
>>> get_all_extensions('/home/scott/foobar') []
- pyutils.files.file_utils.get_canonical_path(filespec: StrOrPath) StrOrPath [source]
Returns a canonicalized absolute path.
- Parameters:
filespec (StrOrPath) – the path to canonicalize
- Returns:
the canonicalized path
- Return type:
StrOrPath
See also
get_path()
,without_path()
.>>> get_canonical_path('/tmp/../tmp/../tmp') '/tmp'
- pyutils.files.file_utils.get_directories(directory: StrOrPath)[source]
Returns the subdirectories in a directory as a generator.
- Parameters:
directory (StrOrPath) – the directory to list subdirectories within.
- Returns:
A generator that yields all subdirectories within the given input directory.
See also
get_files()
,get_files_recursive()
.
- pyutils.files.file_utils.get_extension(path: StrOrPath) str [source]
Extract and return one (the last) extension from a file or path.
- Parameters:
path (StrOrPath) – the path from which to extract an extension
- Returns:
The last extension from the file path.
- Return type:
str
See also
without_extension()
,without_all_extensions()
,get_all_extensions()
.>>> get_extension('this_is_a_test.txt') '.txt'
>>> get_extension('/home/scott/test.py') '.py'
>>> get_extension('foobar') ''
>>> import pathlib >>> get_extension(pathlib.Path('/tmp/foobar.txt')) '.txt'
- pyutils.files.file_utils.get_file_atime_age_seconds(filename: StrOrPath) int | None [source]
Gets a file’s access time as an age in seconds (ago).
- Parameters:
filename (StrOrPath) – file whose atime should be checked.
- Returns:
The number of seconds ago that filename was last accessed.
- Return type:
int | None
See also
get_file_raw_atime()
,get_file_atime_as_datetime()
,get_file_atime_timedelta()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_atime_as_datetime(filename: StrOrPath) datetime | None [source]
Fetch a file’s access time as a Python datetime.
- Parameters:
filename (StrOrPath) – the file whose atime should be fetched.
- Returns:
The file’s atime as a Python
datetime.datetime
.- Return type:
datetime | None
See also
get_file_raw_atime()
,get_file_atime_age_seconds()
,get_file_atime_timedelta()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_atime_timedelta(filename: StrOrPath) timedelta | None [source]
How long ago was a file accessed as a timedelta?
- Parameters:
filename (StrOrPath) – the file whose atime should be checked.
- Returns:
A Python
datetime.timedelta
representing how long ago filename was last accessed.- Return type:
timedelta | None
See also
get_file_raw_atime()
,get_file_atime_age_seconds()
,get_file_atime_as_datetime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_ctime_age_seconds(filename: StrOrPath) int | None [source]
Gets a file’s creation time as an age in seconds (ago).
- Parameters:
filename (StrOrPath) – file whose ctime should be checked.
- Returns:
The number of seconds ago that filename was created.
- Return type:
int | None
See also
get_file_raw_ctime()
,get_file_ctime_age_seconds()
,get_file_ctime_as_datetime()
,get_file_ctime_timedelta()
,get_file_raw_mtime()
,get_file_raw_atime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_ctime_as_datetime(filename: StrOrPath) datetime | None [source]
Fetches a file’s creation time as a Python datetime.
- Parameters:
filename (StrOrPath) – the file whose ctime should be fetched.
- Returns:
The file’s ctime as a Python
datetime.datetime
.- Return type:
datetime | None
See also
get_file_raw_ctime()
,get_file_ctime_age_seconds()
,get_file_ctime_timedelta()
,get_file_raw_atime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_ctime_timedelta(filename: StrOrPath) timedelta | None [source]
How long ago was a file created as a timedelta?
- Parameters:
filename (StrOrPath) – the file whose ctime should be checked.
- Returns:
A Python
datetime.timedelta
representing how long ago filename was created.- Return type:
timedelta | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_ctime_age_seconds()
,get_file_ctime_as_datetime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_md5(filename: StrOrPath) str [source]
Hashes filename’s disk contents and returns the MD5 digest.
- Parameters:
filename (StrOrPath) – the file whose contents to hash
- Returns:
the MD5 digest of the file’s contents. Raises on error.
- Return type:
str
- pyutils.files.file_utils.get_file_mtime_age_seconds(filename: StrOrPath) int | None [source]
Gets a file’s modification time as seconds (ago).
- Parameters:
filename (StrOrPath) – file whose mtime should be checked.
- Returns:
The number of seconds ago that filename was last modified.
- Return type:
int | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_mtime_as_datetime()
,get_file_mtime_timedelta()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_mtime_as_datetime(filename: StrOrPath) datetime | None [source]
Fetch a file’s modification time as a Python datetime.
- Parameters:
filename (StrOrPath) – the file whose mtime should be fetched.
- Returns:
The file’s mtime as a Python
datetime.datetime
.- Return type:
datetime | None
See also
get_file_raw_mtime()
,get_file_mtime_age_seconds()
,get_file_mtime_timedelta()
,get_file_raw_ctime()
,get_file_raw_atime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_mtime_timedelta(filename: StrOrPath) timedelta | None [source]
Gets a file’s modification time as a Python timedelta.
- Parameters:
filename (StrOrPath) – the file whose mtime should be checked.
- Returns:
A Python
datetime.timedelta
representing how long ago filename was last modified.- Return type:
timedelta | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_mtime_age_seconds()
,get_file_mtime_as_datetime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
- pyutils.files.file_utils.get_file_raw_atime(filename: StrOrPath) float | None [source]
Get a file’s raw access time.
- Parameters:
filename (StrOrPath) – the path to the file to stat
- Returns:
The file’s raw atime (seconds since the Epoch) or None on error.
- Return type:
float | None
See also
get_file_atime_age_seconds()
,get_file_atime_as_datetime()
,get_file_atime_timedelta()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_raw_ctime(filename: StrOrPath) float | None [source]
Get a file’s raw creation time.
- Parameters:
filename (StrOrPath) – the path to the file to stat
- Returns:
The file’s raw ctime (seconds since the Epoch) or None on error.
- Return type:
float | None
See also
get_file_raw_atime()
,get_file_ctime_age_seconds()
,get_file_ctime_as_datetime()
,get_file_ctime_timedelta()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_raw_mtime(filename: StrOrPath) float | None [source]
Get a file’s raw modification time.
- Parameters:
filename (StrOrPath) – the path to the file to stat
- Returns:
The file’s raw mtime (seconds since the Epoch) or None on error.
- Return type:
float | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_mtime_age_seconds()
,get_file_mtime_as_datetime()
,get_file_mtime_timedelta()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_raw_timestamp(filename: StrOrPath, extractor: Callable[[stat_result], float | None]) float | None [source]
Stat a file and, if successful, use extractor to fetch some subset of the information in the os.stat_result.
- Parameters:
filename (StrOrPath) – the filename to stat
extractor (Callable[[stat_result], float | None]) – Callable that takes a os.stat_result and produces something useful(?) with it.
- Returns:
whatever the extractor produced or None on error.
- Return type:
float | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
- pyutils.files.file_utils.get_file_raw_timestamps(filename: StrOrPath) stat_result | None [source]
Stats the file and returns an os.stat_result or None on error.
- Parameters:
filename (StrOrPath) – the file whose timestamps to fetch
- Returns:
the os.stat_result or None to indicate an error occurred
- Return type:
stat_result | None
See also
get_file_raw_atime()
,get_file_raw_ctime()
,get_file_raw_mtime()
,get_file_raw_timestamp()
- pyutils.files.file_utils.get_file_size(filename: StrOrPath) int [source]
Returns the size of a file in bytes.
- Parameters:
filename (StrOrPath) – the filename to size
- Returns:
size of filename in bytes
- Return type:
int
- pyutils.files.file_utils.get_files(directory: StrOrPath) Generator[StrOrPath, None, None] [source]
Returns the files in a directory as a generator.
- Parameters:
directory (StrOrPath) – the directory to list files under.
- Returns:
A generator that yields all files in the input directory.
- Return type:
Generator[StrOrPath, None, None]
See also
expand_globs()
,get_files_recursive()
,get_matching_files()
.
- pyutils.files.file_utils.get_files_recursive(directory: StrOrPath)[source]
Find the files and directories under a root recursively.
- Parameters:
directory (StrOrPath) – the root directory under which to list subdirectories and file contents.
- Returns:
A generator that yields all directories and files beneath the input root directory.
See also
get_files()
,get_matching_files()
,get_matching_files_recursive()
- pyutils.files.file_utils.get_matching_files(directory: StrOrPath, glob_string: str)[source]
Returns the subset of files whose name matches a glob.
- Parameters:
directory (StrOrPath) – the directory to match files within.
glob_string (str) – the globbing pattern (may include ‘*’ and ‘?’) to use when matching files.
- Returns:
A generator that yields filenames in directory that match the given glob pattern.
See also
get_files()
,expand_globs()
.
- pyutils.files.file_utils.get_matching_files_recursive(directory: StrOrPath, glob_string: str)[source]
Returns the subset of files whose name matches a glob under a root recursively.
- Parameters:
directory (StrOrPath) – the root under which to search
glob_string (str) – a globbing pattern that describes the subset of files and directories to return. May contain ‘?’ and ‘*’.
- Returns:
A generator that yields all files and directories under the given root directory that match the given globbing pattern.
See also
get_files_recursive()
.
- pyutils.files.file_utils.get_path(filespec: StrOrPath) StrOrPath [source]
Returns just the path of the filespec by removing the filename and extension.
- Parameters:
filespec (StrOrPath) – path to remove filename / extension(s) from
- Returns:
- filespec with just the leading directory components and no
filename or extension(s)
- Return type:
StrOrPath
See also
without_path()
,get_canonical_path()
.>>> get_path('/home/scott/foobar.py') '/home/scott'
>>> get_path('/home/scott/test.1.2.3.gz') '/home/scott'
>>> get_path('~scott/frapp.txt') '~scott'
- pyutils.files.file_utils.is_directory(filename: StrOrPath) bool [source]
Is that path a directory (not a normal file?)
- Parameters:
filename (StrOrPath) – the path of the file to check
- Returns:
True if filename is a directory
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
does_directory_exist()
,is_normal_file()
,is_symlink()
.>>> is_directory('/tmp') True
- pyutils.files.file_utils.is_executable(filename: StrOrPath) bool [source]
Is the file executable?
- Parameters:
filename (StrOrPath) – the file to check for execute access.
- Returns:
True if file exists, is a normal file and is executable by the current process. False otherwise.
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
does_file_exist()
,is_readable()
,is_writable()
.
- pyutils.files.file_utils.is_normal_file(filename: StrOrPath) bool [source]
Is that file normal (not a directory or some special file?)
- Parameters:
filename (StrOrPath) – the path of the file to check
- Returns:
True if filename is a normal file.
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
is_directory()
,does_file_exist()
,is_symlink()
.>>> is_normal_file(__file__) True
- pyutils.files.file_utils.is_readable(filename: StrOrPath) bool [source]
Is the file readable?
- Parameters:
filename (StrOrPath) – the filename to check for read access
- Returns:
True if the file exists, is a normal file, and is readable by the current process. False otherwise.
- Return type:
bool
See also
does_file_exist()
,is_writable()
,is_executable()
.
- pyutils.files.file_utils.is_same_file(file1: StrOrPath, file2: StrOrPath) bool [source]
Determine if two paths reference the same inode.
- Parameters:
file1 (StrOrPath) – the first file
file2 (StrOrPath) – the second file
- Returns:
True if the two files are the same file.
- Return type:
bool
See also
is_symlink()
,is_normal_file()
.>>> is_same_file('/tmp', '/tmp/../tmp') True
>>> is_same_file('/tmp', '/home') False
- pyutils.files.file_utils.is_symlink(filename: StrOrPath) bool [source]
Is that path a symlink?
- Parameters:
filename (StrOrPath) – the path of the file to check
- Returns:
True if filename is a symlink, False otherwise.
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
is_directory()
,is_normal_file()
.>>> is_symlink('/tmp') False
>>> import os >>> os.symlink('/tmp', '/tmp/foo') >>> is_symlink('/tmp/foo') True >>> os.unlink('/tmp/foo')
- pyutils.files.file_utils.is_writable(filename: StrOrPath) bool [source]
Is the file writable?
- Parameters:
filename (StrOrPath) – the file to check for write access.
- Returns:
True if file exists, is a normal file and is writable by the current process. False otherwise.
- Return type:
bool
Note
A Python core philosophy is: it’s easier to ask forgiveness than permission (https://docs.python.org/3/glossary.html#term-EAFP). That is, code that just tries an operation and handles the set of Exceptions that may arise is the preferred style. That said, this function can still be useful in some situations.
See also
is_readable()
,does_file_exist()
.
- pyutils.files.file_utils.remove(path: StrOrPath) None [source]
Deletes a file. Raises if path refers to a directory or a file that doesn’t exist.
- Parameters:
path (StrOrPath) – the path of the file to delete
- Raises:
FileNotFoundError – the path to remove does not exist
- Return type:
None
>>> import os >>> filename = '/tmp/file_utils_test_file' >>> os.system(f'touch {filename}') 0 >>> does_file_exist(filename) True >>> remove(filename) >>> does_file_exist(filename) False
>>> filename = '/tmp/file_utils_test_file' >>> os.system(f'touch {filename}') 0 >>> import pathlib >>> p = pathlib.Path(filename) >>> p.exists() True >>> remove(p) >>> p.exists() False
>>> remove("/tmp/23r23r23rwdfwfwefgdfgwerhwrgewrgergerg22r") Traceback (most recent call last): ... FileNotFoundError: [Errno 2] No such file or directory: '/tmp/23r23r23rwdfwfwefgdfgwerhwrgewrgergerg22r'
- pyutils.files.file_utils.remove_hash_comments(x: str) str [source]
Trivial function to be used as a line_transformer in
slurp_file()
for no # comments in file contents- Parameters:
x (str) –
- Return type:
str
- pyutils.files.file_utils.remove_newlines(x: str) str [source]
Trivial function to be used as a line_transformer in
slurp_file()
for no newlines in file contents- Parameters:
x (str) –
- Return type:
str
- pyutils.files.file_utils.set_file_raw_atime(filename: StrOrPath, atime: float) None [source]
Sets a file’s raw access time.
- Parameters:
filename (StrOrPath) – the file whose atime should be set
atime (float) – raw atime as number of seconds since the Epoch to set
- Return type:
None
See also
get_file_raw_atime()
,get_file_atime_age_seconds()
,get_file_atime_as_datetime()
,get_file_atime_timedelta()
,get_file_raw_timestamps()
,set_file_raw_mtime()
,set_file_raw_atime_and_mtime()
,touch_file()
- pyutils.files.file_utils.set_file_raw_atime_and_mtime(filename: StrOrPath, ts: float | None = None) None [source]
Sets both a file’s raw modification and access times.
- Parameters:
filename (StrOrPath) – the file whose times to set
ts (float | None) – the raw time to set or None to indicate time should be set to the current time.
- Return type:
None
See also
get_file_raw_atime()
,get_file_raw_mtime()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_mtime()
- pyutils.files.file_utils.set_file_raw_mtime(filename: StrOrPath, mtime: float)[source]
Sets a file’s raw modification time.
- Parameters:
filename (StrOrPath) – the file whose mtime should be set
mtime (float) – the raw mtime as number of seconds since the Epoch to set
See also
get_file_raw_mtime()
,get_file_mtime_age_seconds()
,get_file_mtime_as_datetime()
,get_file_mtime_timedelta()
,get_file_raw_timestamps()
,set_file_raw_atime()
,set_file_raw_atime_and_mtime()
,touch_file()
- pyutils.files.file_utils.slurp_file(filename: StrOrPath, *, skip_blank_lines: bool = False, line_transformers: List[Callable[[str], str]] | None = None)[source]
Reads in a file’s contents line-by-line to a memory buffer applying each line transformation in turn.
- Parameters:
filename (StrOrPath) – file to be read
skip_blank_lines (bool) – should reading skip blank lines?
line_transformers (List[Callable[[str], str]] | None) – little string->string transformations
- Returns:
A list of lines from the read and transformed file contents.
- Raises:
Exception – filename not found or can’t be read.
- pyutils.files.file_utils.strip_whitespace(x: str) str [source]
Trivial function to be used as a line_transformer in
slurp_file()
for no leading / trailing whitespace in file contents- Parameters:
x (str) –
- Return type:
str
- pyutils.files.file_utils.touch_file(filename: StrOrPath, *, mode: int | None = 438)[source]
Like unix “touch” command’s semantics: update the timestamp of a file to the current time if the file exists. Create the file if it doesn’t exist.
- Parameters:
filename (StrOrPath) – the filename
mode (int | None) – the mode to create the file with
Warning
The default creation mode is 0o666 which is world readable and writable. Override this by passing in your own mode parameter if desired.
See also
set_file_raw_atime()
,set_file_raw_atime_and_mtime()
,set_file_raw_mtime()
,create_path_if_not_exist()
- pyutils.files.file_utils.without_all_extensions(path: StrOrPath) StrOrPath [source]
Removes all extensions from a path; handles multiple extensions like foobar.tar.gz -> foobar.
- Parameters:
path (StrOrPath) – the path from which to remove all extensions
- Returns:
the path with all extensions removed.
- Return type:
StrOrPath
See also
without_extension()
>>> without_all_extensions('/home/scott/foobar.1.tar.gz') '/home/scott/foobar'
- pyutils.files.file_utils.without_extension(path: StrOrPath) StrOrPath [source]
Remove one (the last) extension from a file or path.
- Parameters:
path (StrOrPath) – the path from which to remove an extension
- Returns:
the path with one extension removed.
- Return type:
StrOrPath
See also
without_all_extensions()
.>>> without_extension('foobar.txt') 'foobar'
>>> without_extension('/home/scott/frapp.py') '/home/scott/frapp'
>>> f = 'a.b.c.tar.gz' >>> while('.' in f): ... f = without_extension(f) ... print(f) a.b.c.tar a.b.c a.b a
>>> without_extension('foobar') 'foobar'
- pyutils.files.file_utils.without_path(filespec: StrOrPath) StrOrPath [source]
Returns the base filename without any leading path.
- Parameters:
filespec (StrOrPath) – path to remove leading directories from
- Returns:
filespec without leading dir components.
- Return type:
StrOrPath
See also
get_path()
,get_canonical_path()
.>>> without_path('/home/scott/foo.py') 'foo.py'
>>> without_path('foo.py') 'foo.py'
>>> import pathlib >>> str(without_path(pathlib.Path('/tmp/testing.123'))) 'testing.123'
pyutils.files.lockfile module
This is a lockfile implementation I created for use with cronjobs on my machine to prevent multiple copies of a job from running in parallel.
For local operations, when one job is running this code keeps a file on disk to indicate a lock is held. Other copies will fail to start if they detect this lock until the lock is released. There are provisions in the code for timing out locks, cleaning up a lock when a signal is received, gracefully retrying lock acquisition on failure, etc…
Also allows for Zookeeper-based locks when lockfile path is prefixed with ‘zk:’ in order to synchronize processes across different machines.
- class pyutils.files.lockfile.LocalLockFileContents(pid: int, commandline: str, expiration_timestamp: float | None)[source]
Bases:
object
The contents we’ll write to each lock file.
- Parameters:
pid (int) –
commandline (str) –
expiration_timestamp (float | None) –
- commandline: str
The commandline of the process that holds the lock
- expiration_timestamp: float | None
When this lock will expire as seconds since Epoch
- pid: int
The pid of the process that holds the lock
- class pyutils.files.lockfile.LockFile(lockfile_path: str, *, do_signal_cleanup: bool = True, expiration_timestamp: float | None = None, override_command: str | None = None)[source]
Bases:
AbstractContextManager
A file locking mechanism that has context-manager support so you can use it in a with statement. e.g.:
with LockFile('./foo.lock'): # do a bunch of stuff... if the process dies we have a signal # handler to do cleanup. Other code (in this process or another) # that tries to take the same lockfile will block. There is also # some logic for detecting stale locks.
C’tor.
- Parameters:
lockfile_path (str) – path of the lockfile to acquire; may begin with zk: to indicate a path in zookeeper rather than on the local filesystem. Note that zookeeper-based locks require an expiration_timestamp as the stale detection semantics are skipped for non-local locks.
do_signal_cleanup (bool) – handle SIGINT and SIGTERM events by releasing the lock before exiting
expiration_timestamp (Optional[float]) – when our lease on the lock should expire (as seconds since the Epoch). None means the lock will not expire until we explicltly release it. Note that this is required for zookeeper based locks.
override_command (Optional[str]) – don’t use argv to determine our commandline rather use this instead if provided.
- Raises:
Exception – Zookeeper lock path without an expiration timestamp
- acquire_with_retries(*, initial_delay: float = 1.0, backoff_factor: float = 2.0, max_attempts: int = 5) bool [source]
Attempt to acquire the lock repeatedly with retries and backoffs.
- Parameters:
initial_delay (float) – how long to wait before retrying the first time
backoff_factor (float) – a float >= 1.0 the multiples the current retry delay each subsequent time we attempt to acquire and fail to do so.
max_attempts (int) – maximum number of times to try before giving up and failing.
- Returns:
True if the lock was acquired and False otherwise.
- Return type:
bool