User’s Guide

Overview

arlib is designed using the bridge pattern . The abstract archive manipulation functionality, e.g. open the archive file, query member names, open a member, are defined in Archive, an abstract base class called “engine”. Core functionalities are defined by the corresponding abstract methods and properties:

The core functionalities are implemented in derived classes which we call them concrete engines. Other functionalities may be overridden by concrete engines but that’s not required. Currently, three concrete engines are implemented in the library:

Since Archive is a abc, which can not be instantiate, a function open() can be used as a factory to create concrete engines. The type of concrete engines are automatically determined by the archive file property and the mode argument to open the archive.

Automatically engine selection

Automatically engine selection is achived by auto_engine(), which will be called in the constructor of Archive. Users rarely need to call auto_engine() directly. Call the constructor of Archive will implicitely call auto_engine().

auto_engine() will call an ordered list of engine determination function (EDF) to decide the appropriate engine type. The signature of a EDF must be edf_func(path: path-like, mode: str), where path is the path to the archive, and mode is the mode string to open the archive. The function should return a concrete engine type if it can be determined, or return None otherwise.

The EDF list already contains several EDFs. Users can extend the list by registering new EDFs:

arlib.register_auto_engine(func)

A priority value can also be specified:

arlib.register_auto_engine(func, priority)

The value of priority define the ordering of the registered EDFs. The smaller the priority value, the higher the priority values. EDFs with higher priority will be called before EDFs with lower priority values. The default priority value is 50.

A third bool type argument prepend can also be specified for register_auto_engine(). When prepend is true, the EDF will be put before (i.e. higher priority) other registered EDFs with the same priority value. Otherwise, it will be put after them.

Since register_auto_engine() returns the input function object func, it can also be used as a non-parameterized decorator:

@arlib.register_auto_engine
def func(path, mode):
    # function definition

The function register_auto_engine() also support another version of calling signature arlib.register_auto_engine(priority, prepend), which will return a wrapped decorator with arguments. The typical usage is:

@arlib.register_auto_engine(priority=50, prepend=False)
def func(path, mode):
    # function definition

Obtain list of member names

The abstract property Archive.member_names will return a list of str, which represents the names of the members in the archive:

ar = arlib.Archive('a.zip', 'r')
members = ar.member_names

Concrete engines such as TarArchive and ZipArchive implement the property using the underlying zipfile and tarfile modules. Archive.member_names provides a uniform interface to corresponding underlying functions.

Check member properties

The methods Archive.member_is_dir() and Archive.member_is_file() whether the specified member is a directory or a regular file.

Open member as a file object

The abstract method Archive.open_member() provide a uniform interface for opening member file as a file object. The signature of the method is open_member(name, mode, **kwargs), where name is the name of member file, and mode is the mode argument the same as in the built-in open() function. kwargs are keyword arguments that will be passed to underlying methods in zipfile, tarfile etc.

Extract members to a location

The method Archive.extract() provide a uniform interface for extracting members to a location. Two optional arguments can be specified: path for the location of the destination, members for a list of members to extract.

with arlib.open('abc.tar') as ar:
    ar.extract('c:/', ['a.txt','dir2/'])

Context manager

The Archive class also defines the context manager functionality. Specifically, Archive.__enter__() returns the archive itself, and Archive.__exit__() calls self.close() then return True.

Extend the library

The architecture of the library is flexible enough to add more archive types. Adding a new archive type includes the following steps:

  1. Derive a new class and implement the core functionalities

    class AnotherArchive(Archive):
        def __init__(self, path, mode, **kwargs):
            # definition
    
        @property
        def member_names(self):
            # definition
    
        def open_member(self, name, mode='r', **kwargs):
            # definition
    
  2. (optional) override methods Archive.close(), Archive.__enter__(), Archive.__exit__() etc

  3. (optional) defined and register a new EDF which could automatically determine the new archive type

    @register_auto_engine
    def another_auto_engine(path, mode):
        # definition
    
  4. (optional) override methods Archive.extract(). The default implementation in Archive use shutil.copyfileobj copy corresponding members to the destination. Use the corresponding archive implementation may be more efficient.