Saturday 11 October 2014

Unfolding Mysteries: Chicken or Egg?

Recently I faced an interesting problem while refactoring legacy code. To understand the problem better, lets take a look at the following class diagram.











The structure is quite simple. There is a class for each object and a corresponding container class. The container class has an attribute named "item" which is responsible for returning object of the class it contains. e.g. The "item" attribute of TContainerA returns reference to TObjectA and so on.

The task in hand is to simplify the implementation using generics. The solution is simple. Here's how it looks like.

  TBaseContainer<T: TBaseObject> = class
  private
    function GetItem(AIndex: Integer): T;
  public
    property Items[AIndex: Integer]: T read GetItem;
  end;

As soon as I made above changes, it broke the derived container classes. The compiler started complaining about the TBaseContainer class.

[dcc32 Error] ContainerA.pas(29): E2003 Undeclared identifier: 'TBaseContainer'
[dcc32 Error] ContainerA.pas(29): E2021 Class type required

Again, the fix is simple. We just need to pass correct parameter to the TBaseContainer class as shown in below snippet. Also, the container specific implementation of "item" attribute is removed as it will be taken care by the TBaseContainer class. This works fantastic as long as our classes are really as simple as shown here.

  TContainerA = class(TBaseContainer<TObjectA>)
  //...
  end;

Do you smell Chicken or the Egg?

Not really? Consider a situation where there are 20+ container classes. And those are bit complex ones. For instance, each container class has its own enumerator that also needs to be parameterized (using generics) as we did for the container class. Of course, there are big chunks of code changes and the whole task affects large number of files. Also, it is desirable to keep the changes atomic due to the complexity and amount of work it requires. But, with above approach, that doesn't seems to be an option. The real problem is - all changes must be done in a one go or not at all. We cannot leave one or the other class behind.

Can we leave the TBaseContainer class as-is and derive a new intermediate class from the TBaseContainer class which contains the generics stuff.

  TCustomContainer<T: TBaseObject> = class(TBaseContainer)
  private
    function GetItem(AIndex: Integer): T;
  public
    property Items[AIndex: Integer]: T read GetItem;
  end;

Compile the code and no complaints this time since no one is using the TCustomContainer class we just created. Also the ambition of keeping the changes atomic is possible. Each concrete container can start using the TCustomContainer class at its own pace.

  TContainerA = class(TCustomContainer<TObjectA>)
  //uses new TCustomContainer class
  end;

  TContainerB = class(TBaseContainer)
  ...  //legacy code
  end;

The above solution works in almost all Object Oriented Languages supporting generics or parameterized types. Where's the mystery by the way?

Let's rename the TCustomContainer to TBaseContainer and see what happens. Essentially, what we are doing is creating a new class with the same name as its base class. The expectation is compile time error since both classes are declared in the same unit.

[dcc32 Error] BaseContainer.pas(24): E2004 Identifier redeclared: 'TBaseContainer'

Delphi surprises this time. It reports no error even if the derived class name is same as its base class. Within a unit, Delphi does not allow declaration of an identifiers which is already declared with the same name. So what's going on?

Let's hold on the curiosity for a moment, and and try to use the generic version of TBaseContainer with our concrete containers.

  TContainerA = class(TBaseContainer<TObjectA>)
  //uses generic version of TBaseContainer
  end;

  TContainerB = class(TBaseContainer)
  ...  //legacy code
  end;

Again, no complaints! However, without explanation, it tells what it is doing. Both TContainerA and TContainerB classes are derived from the TBaseContainer class. The former uses the generic version whereas the later uses non-generic version.

What is happening inside is - when a class is parameterized with generics, Delphi gives it an internal name which differentiates it from the classes declared with different number of parameters or no parameters. It means the TBaseContainer and TBaseContainer<T> are having different internal names and Delphi complier treats them unique.

Given that, following is all legal within same unit.

  TTest = class  //internal name => 'TTest'
  end;

  TTest<T> = class  //internal name => 'TTest`1'
  end;

  TTest<T1, T2> = class  //internal name => 'TTest`2'
  end;

  TTest1 = class  //internal name => 'TTest1'
  end;

  TTest1<T> = class(TTest1)  //internal name => 'TTest1`1'
  end;

  TTest2<T> = class  //internal name => 'TTest2`1'
  end;

  TTest2 = class(TTest2<TObject>)  //internal name => 'TTest2'
  end;

Can't we can say TTest, TTest1 and TTest2 classes are overloaded?

No comments:

Post a Comment