User’s Guide¶
Overview¶
arlib is designed using the bridge pattern . The abstract archive manipulation functionality, e.g. open the archive file, query member names, open a member, are defined in Archive, an abstract base class called “engine”. Core functionalities are defined by the corresponding abstract methods and properties:
Archive.member_names
: Return a list of names of member files in the archiveArchive.open_member()
: Open a member as a file object
The core functionalities are implemented in derived classes which we call them concrete engines. Other functionalities may be overridden by concrete engines but that’s not required. Currently, three concrete engines are implemented in the library:
TarArchive
: Manipulates tar filesZipArchive
: Manipulates zip filesDirArchive
: Treat a directory as an archive and files inside as members
Since Archive
is a abc, which can not be instantiate, a
function open()
can be used as a factory to create concrete
engines. The type of concrete engines are automatically determined by
the archive file property and the mode argument to open the archive.
Automatically engine selection¶
Automatically engine selection is achived by auto_engine()
,
which will be called in the constructor of Archive
. Users
rarely need to call auto_engine()
directly. Call the constructor
of Archive
will implicitely call auto_engine()
.
auto_engine()
will call an ordered list of engine determination
function (EDF) to decide the appropriate engine type. The signature of
a EDF must be edf_func(path: path-like, mode: str)
, where
path
is the path to the archive, and mode
is the mode
string to open the archive. The function should return a concrete
engine type if it can be determined, or return None
otherwise.
The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:
arlib.register_auto_engine(func)
A priority value can also be specified:
arlib.register_auto_engine(func, priority)
The value of priority
define the ordering of the registered
EDFs. The smaller the priority
value, the higher the priority
values. EDFs with higher priority will be called before EDFs with
lower priority values. The default priority value is 50.
A third bool type argument prepend
can also be specified for
register_auto_engine()
. When prepend
is true, the EDF will
be put before (i.e. higher priority) other registered EDFs with the
same priority value. Otherwise, it will be put after them.
Since register_auto_engine()
returns the input function object
func
, it can also be used as a non-parameterized decorator:
@arlib.register_auto_engine
def func(path, mode):
# function definition
The function register_auto_engine()
also support another version of calling signature arlib.register_auto_engine(priority, prepend)
, which will return a wrapped decorator with arguments. The typical usage is:
@arlib.register_auto_engine(priority=50, prepend=False)
def func(path, mode):
# function definition
Obtain list of member names¶
The abstract property Archive.member_names
will return a list
of str
, which represents the names of the members in the
archive:
ar = arlib.Archive('a.zip', 'r')
members = ar.member_names
Concrete engines such as TarArchive
and ZipArchive
implement the property using the underlying zipfile
and
tarfile
modules. Archive.member_names
provides a
uniform interface to corresponding underlying functions.
Check member properties¶
The methods Archive.member_is_dir()
and
Archive.member_is_file()
whether the specified member is a
directory or a regular file.
Open member as a file object¶
The abstract method Archive.open_member()
provide a uniform
interface for opening member file as a file object. The signature of
the method is open_member(name, mode, **kwargs)
, where
name
is the name of member file, and mode
is the mode
argument the same as in the built-in open()
function. kwargs
are keyword arguments that will be passed to
underlying methods in zipfile
, tarfile
etc.
Extract members to a location¶
The method Archive.extract()
provide a uniform interface for
extracting members to a location. Two optional arguments can be
specified: path
for the location of the destination,
members
for a list of members to extract.
with arlib.open('abc.tar') as ar:
ar.extract('c:/', ['a.txt','dir2/'])
Context manager¶
The Archive
class also defines the context manager
functionality. Specifically, Archive.__enter__()
returns the
archive itself, and Archive.__exit__()
calls
self.close()
then return True
.
Extend the library¶
The architecture of the library is flexible enough to add more archive types. Adding a new archive type includes the following steps:
Derive a new class and implement the core functionalities
class AnotherArchive(Archive): def __init__(self, path, mode, **kwargs): # definition @property def member_names(self): # definition def open_member(self, name, mode='r', **kwargs): # definition
(optional) override methods
Archive.close()
,Archive.__enter__()
,Archive.__exit__()
etc(optional) defined and register a new EDF which could automatically determine the new archive type
@register_auto_engine def another_auto_engine(path, mode): # definition
(optional) override methods
Archive.extract()
. The default implementation inArchive
use shutil.copyfileobj copy corresponding members to the destination. Use the corresponding archive implementation may be more efficient.