Personal Blogs

Algorithms 14: Subtype polymorphism and encapsulation

Tuesday, 22 Sept 2015, 16:51

Visible to anyone in the world

Edited by Martin Thomas Humby, Wednesday, 13 Dec 2023, 20:21

In Algorithms 11 binding methods to an instance of a class getting dynamic dispatch was demonstrated: calling a method against an instance at a reference calls the method belonging to that object. Algorithms 12 showed class inheritance: inheritance of method names together with the functionality of those methods.

Together these two attributes provide subtype polymorphism: the ability to view an instance of a subclass as an instance of its super-class. Subtype polymorphism is enforced in statically typed languages where calling a method unknown to the nominal class of a reference (or by-value variable) gets a compile-time error. Python provides more or less total flexibility, a variable can be viewed as any type, but compliance with the framework supplied by static typing can provide a basis that is known to work.

The utility of class inheritance for building new classes from existing classes with added or specialised functionality is obvious: it saves work and reduces code size together with associated complexity. Copy and paste could be used to get a new type but using inheritance the total amount of code is reduced. In Algorithms 12 a Stack was turned into an extendable structure, a List, simply by replacing __init__() and push() with new methods and introducing a helper method _check_capacity().

Overriding and overloading methods

Replacing an inherited method is known as overriding the method. To maintain polymorphism the parameter types and return type of the new method must be the same as those of the overridden method or steps taken to ensure the new method still works when presented with the same type of arguments as the method being replaced. Stack.__init__() for example, was designed to accept an integer, the maximum size of a stack, so List's __init__() should also work when passed an integer as well as an Iterable. (Iterable is the base class of List and Python's list, so a new List can be constructed from either of these types).

A function's name, parameters and return type comprise its signature. Many statically typed languages allow use of functions with the same name but different signatures, known as overloaded functions or methods. From the compiler's viewpoint these are separate functions. This can cause problems in some languages when pre-compiled libraries are used because the overloaded function names exposed by the library have been changed by the compiler, a process known as name mangling.

Many languages allow functions to be called without their return being assigned to a variable. In this case the expected return type will be unknown to the compiler so the signature used for overloading excludes the return type. When overloading is available execution efficiency will suffer if the type of a parameter is chosen to be that of a mutual ancestor with runtime type checking used to determine the actual type rather than writing separate overloaded methods for the different types.

Pythons dispatch mechanism uses the method name to determine which method to call. Parameter types do not feature in this selection excluding overloading. Any method with the same name in a subclass effectively overrides the superclass method but complete flexibility as to the type of parameters passed to a function can replace overloading. Runtime type checking or named default parameters are used to determine how parameters of different types are processed. Named defaults are considered to be more 'Pythonic' but are less transparent in use.

Encapsulation

Coding abstract data types without OOP in languages such as UCSD Pascal, Ada83 and Delphi there can be complete separation of operations appearing in the interface from the workings hidden in the implementation. Anything declared in the implementation, perhaps located in a separate file is private and hidden in entirety from client code.

Interface / implementation separation has become known as information hiding, a term originating from the design strategy proposed by David Parnas (Parnas 1972). In OOP such separation is provided by encapsulation but the facilities available to achieve it vary from one language to another. In general encapsulation means grouping the methods that operate on data with the data rather than passing data objects with no inherent functionality to a hierarchy of possibly shared subroutines.

OOP languages tend to declare all fields and methods in an interface file or section hiding only method implementations or feature no physical separation of code the entire class appearing in a single file. A typical C++ program for example will define a class in a .hpp header file and put method implementations in a separate .cpp file.

In C++ any class that is not defined as abstract can be instantiated by value simply by declaring a variable of the class. In this case the compiler needs access to the total size of fields so sufficient space can be allocated to accommodate the instance. When all instances are by reference as in Java etc. this is not a requirement. All references have the same size in memory and the actual allocation is done at runtime by object creation. Even so, Java and many other languages still feature all-in-one-place class definitions.

To overcome implementation exposure two strategies have been adopted: inheritance from a pure abstract class or interface type without any fields featuring only method definitions and secondly, key words defining levels of exposure for the members of a class (its methods and fields), generally private, protected and public.

With inheritance from an abstract class descendant instances can be assigned to a reference of the abstract type and viewed as such. The descendant supports the interface defined by the superclass. Languages that do not provide multiple inheritance generally define an interface type and a class can support multiple interfaces. The actual implementation can be ignored by client code and can be made reasonably inaccessible through the reference providing interface / implementation separation. This comes at the cost of pre-definition of the abstract type and perhaps an additional execution overhead resulting from multiple inheritance.

Beyond the separation of the interface as a set of public methods, other levels of exposure only relate to re-use of functionality through inheritance. Fields and methods defined as private are hidden from client code and in subclasses. Those defined as protected are inaccessible to client code but are visible in a subclass implementation.

Making fields and methods private imposes restrictions on the options available when writing a subclass. For example, a base class Person might have a private field date_of_birth. If modification to this birthdate after instantiation is a requirement a public method set_birthdate() is needed. A subclass, Employee say, must either use the inherited setter method or override it perhaps to ensure that no person below a certain age can be employed. An overridden setter cannot access the private field but calls the inherited setter to modify the birth date. This minimum age requirement is then imposed on all subclasses of Employee, not so if date_of_birth had been give protected exposure.

Sounds logical but such a lack of flexibility can cause problems when the only access a descendant of employee can have to the age field is through the inherited setter. Suppose at a later time a new class Trainee is required with a lower age limit. Due to the imposed limit no Trainee can be an Employee and cannot inherit or take advantage of any functionality already written in relation to employees. Reducing the limit for Employee will likely break existing age verification.

Another consideration may be the additional overhead of methods that call other methods, not good when execution efficiency is in any way critical. To overcome these problems Java provides a fourth level of exposure known as 'package-private', the default when no other is specified. All Java classes exist in a package, a grouping of class files. Package-private members of are visible in the implementation of all other classes in the same package but inaccessible to classes located in other packages.

Package-private exposure finds widespread use in supplied Java class libraries and many other languages provide an equivalent. Using it as an alternative to protected exposure is all very well if it is possible to extend a package, not a possibility for these Java libraries. Consequently minor modifications to the functionality of a HashMap say, with a requirement for equivalent execution efficiency means starting almost from scratch. The only useful library definition is the Map interface.

Coding descendants there is really very little to be gained from these restrictions and making the implementation completely inaccessible is not possible in Python. Protected and private categorization is available by a naming convention rather than error generation and more reliance is placed on organization and coding by thinking. Similar measures are available for structuring the code to provide modularity.

Python information hiding

The smallest increment of Python functionality is a class. Classes can contain nested classes but putting a class in a nested location has no particular significance other than an indication of associated usage. For an inner instance to have knowledge of an enclosing object it must be given a reference to it:

class Outer(object):
    class Nested:
        def __init__(self, outer):
            self._outer = outer
            ...

    def __init__(self):
        self._inner = Outer.Nested(self)
        ...

Python provides no direct equivalent of private and protected exposure but a convention that variable and helper method names starting with an underscore are protected and should not be accessed directly by client code is useful. A leading double underscore supplies some equivalence to private members but this provision is not required in most cases.

Constants exposed by a module can be put in all upper case to indicate their status but to get a constant that is immutable as far as possible it must be made a member of a class accessible through a property. This option is shown in the download package1.module2 but going to these lengths is not what Python is about, I think.

Python, Delphi, C# and some dialects of C++, provide so called properties, a construct that replaces public getter and setter methods generally to control or exclude modification of the value of private or protected fields by client code. The property is seen as a virtual field so we can write x = aproperty and when write modification has been programmed aproperty = x, both without function-call brackets or arguments.

Because all members are accessed by name in Python its properties can supply validation of submitted values in a subclass even if the base class provides direct access. A property with the name of the base class field replaces it. A number of different syntax and variations are available but typical usage might be:

class Base:
    def __init__(self, value):
        self.value = value     # value is a fully accessible field
    def halve(self):
        self.value /= 2

class Subclass(Base):
    def _get_value(self):
        return self.value
    def _set_value(self, value):
        if value >= 0 and value < 100:
             self.value = value

    val = property(_get_value, _set_value)

instance = Subclass(99)
print(instance.val)
instance.halve()
print(instance.val)

In the Subclass the property value replaces the base class field and all inherited methods now address the field _value through the setter and getter functions. The base class field value does not exist in the subclass so there is no wasted allocation.

In Subclass the field name _value could have been given a double underscore indicating private status. Doing this invokes name-mangling and __value becomes an alias for the mangled name in the implementation. The mangled name, which always follows the pattern _Base__value or generically _Classname__membername, is visible in subclasses and client code as well as the current implementation. Such members are not truly private, simply less accessible to erroneous use.

Double underscore method names can be used to get the same effect as the static instance methods available in C++ etc. When a dynamically dispatched method is overridden this directs all calls to the new method, very likely an unwanted side effect when the method is a helper used by multiple other inherited methods. Replacing a static method in a subclass, same name, same signature, this does not occur and any calls in inherited methods still go to the superclass method.

The two alternatives are contrasted by mod_list.py and mod_list_static.py in the download. An overridden _check_capacity() method and a 'static' equivalent __check_capacity()are shown as respective alternatives. The replaced method is used in a subclass by insert_at() but not push(), which must continue to use the inherited version.

Python modules and packages

A Python file or 'module' with a .py extension can contain constants, variables, functions and class definitions. In larger programs modules supply the next increment of modularity above classes. Lest we forget which module is being used, within a module its canonical name is available through the global variable __name__. However, when a module is run as a program __name__ is set to '__main__' getting the option of excluding minor test code from module initialization by enclosing it in an if statement. Variability of __name__ is demonstrated in package1.module1 together with the options for importing modules contained in a package as shown below.

The same naming conventions using underscores to indicate modules that are only for use by modules in the same package can be employed but a prepended double underscore does not result in name-mangling.

Packages allow modules to be grouped according to the type of functionality they expose or the usage they have within an application. They can be nested to any depth: package1.sub_package.subsub_package.subsub_module. When importing a module no account is taken of the location of the importing module in the package hierarchy and the full canonical path must be used.

If package1 contains two modules module1 and module2, to import a function module2.where() to module1 there are three options:

import package1.module2                           # import option 1
from package1 import module2                     # import option 2 (recommended)
from package1.module2 import where as where2     # import option 3

def where():
    return __name__

def elsewhere():
    result = package1.module2.where()   # using import option 1
    result = module2.where()            # using import option 2
    result = where2()                   # using import option 3
    return result

if __name__ == '__main__':
    print(where())
    print(elsewhere())

To create a new package in PyCharm, in the project tree right-click on the project name or the package under which the new package is to be created and select new > Python Package. PyCharm creates a module __init__ within the new package. The purpose of this module is to code any package initialization not done by constituent modules. When __init__ does not declare an __all__ list of module names, any modules imported to it from the current package or elsewhere are made visible in any module using the statement:

from package import *

When the alternative, declaring a list of modules in __init__ to be imported by import * is used, __all__ = ['module1', 'module2'] for example, this list can contain only modules in the immediate package. Also its presence excludes access to the modules in __init__'s import statements as described above.

Note that circular imports are not allowed. If module1 imports module2 then module2 cannot import module1. It may be possible to circumvent this rule by importing individual items, as in import option 3 above and module2's equivalent of module1's elsewhere() in the download, but be aware of possible problems: type checking failing to recognize an imported class for example.

Summary

Interface / implementation separation isolates client code from changes to an implementation. It can also provide levels of abstraction, used to mitigate the complexity inherent to large software systems. A user interface for example, can hide its implementation and the various subsystems that support it. Similarly each subsystem can hide its own implementation from code in the user interface. This layering of functionality can continue down through as many levels as may be required following the design strategy of information hiding.

OOP languages provide a level of interface, implementation separation through encapsulation. Subtype polymorphism used with abstract or interface classes can strengthen encapsulation. If sufficient pre-planning is put in place, provision of read-only interfaces for example, object integrity can be ensured.

Python relies less on enforced encapsulation than on code written with the precepts of information hiding in mind. Python's call-by-name has two effects here: any class supplies a conceptual interface that can be supported by any other class supplying the same function names and exposed functionality; direct access to fields in a base class can be overridden using properties.

Import statements within the __init__ module can be used to create virtual groupings exposing modules and/or classes from other packages. This facility suggests the option of creating multiple groupings by type of functionality and by usage within an application, for example.

Reference

Parnas, D. L. (1972) 'On the Criteria To Be Used in Decomposing Systems into Modules', Communications of the ACM, vol 15 no. 12 [Online]. Available at https://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf (Accessed 22 September 2015)

[Algorithms 14: Subtype polymorphism and encapsulation (c) Martin Humby 2015]

Tags: algorithms