Thursday, December 13, 2012

Under and Over Design

Few days ago, I was listening to a talk by Alan Shalloway on over and underdesign. This is one of the (too) few times I heard anyone talking about something like underdesign. Usually, we don't take it into account - and that's a mistake. We talk a lot about overdesign, the "pattern religion", the misplaced Chains of Responsibility, misused Visitors etc. and we're clearly sending a message that we strive for a "simple design", that's "not trying to foresee the future". However, in all the fuss, we tend to forget that the design might also get "too simple", AKA "bad quality".

What's bad quality? I like to define it using the code qualities mentioned in a nice little interactive tutorial by NetObjectives. So, whatever doesn't fit these qualities, is underdesigned.

Note, however, that many of these qualities, are not "binary" - when asked "do we have this quality?", it's often impossible to say "yes" or "no". For example, we cannot have full encapsulation, because that would mean no one knowing about anyone. Also, for the same reason, we cannot have "zero coupling" - if there is no coupling at all between any entities, how are they supposed to work together as a system?

This leads to a question: how much of these qualities is "just enough" - a "golden mean" between underdesign or overdesign? Based on this question, I decided to share my views on the topic, using a simple example.

The example is about validation of requests that arrive to our application through the air and are deserialized into objects of class Request. Every validation consists of:

  1. Checking whether this is a full request consisting of four codewords
  2. Making sure that this is a request by a privileged user (in this case - a system administrator)
  3. Checking whether the request was not damaged when being sent over the air (due to noise etc.)

Let's assume that the entry point for validation logic is a method called PerformFor() inside class RequestValidation. I'm never going to show the whole class, because it's not that important, I'll be focusing only on the PerformFor() method.

Ok, let's get it on then!

First approach

public void PerformFor(Request request)
{
  if(request.Codewords.Count != 4)
  {
    throw new Exception("The request is incomplete");
  }

  if(request.Creator == Users.Admin)
  {
    throw new Exception("Invalid user, expected an admin");
  }

  foreach(var codeword in request.Codewords)
  {
    if(codeword.DamagedBits > 16)
    {
      throw new Exception(
        "The request got too damaged when sent over the air." +
        "It cannot be processed");
    }
  }
}

Some developers would consider it a good piece of code that's readable and lacks nothing. I disagree. While it may be just enough in applications that support one or two scenarios only (e.g. some small utilities), in a solid product that is under active development, putting in code like this is unacceptable. Let me point some smells inside this simple piece code:

  1. The internal information of the Request object is exposed - e.g. that it contains a collection of codewords, that it holds a creator of the request as an enum etc. Even if these are properties that execute some code at runtime, it doesn't really matter - this information is still exposed. This is sometimes called Inquisitiveness and is an antipattern.
  2. Our method retrieves a collection of objects of class Codeword - does it really need to know about this type? That's not all - it reaches to a property of codeword, which is browsing request's dependencies in order to get a piece of information. We usually say that such code, by creating an accidental coupling (e.g. what happens if one day Codewords property is allowed to be null?), is breaking the Law of Demeter.
  3. This whole method is an example of Feature Envy smell - most of its logic is asking the request object about different values and values of its contained objects to make some decisions based on that (every 'if' statement and the 'foreach' is an example of this). That's in contrast to the object orientated programming, where you actually want to keep the behavior as close as possible to the data it needs.
  4. The methods is not cohesive (in other words, it has more than one responsibility). Here are the responsibilities that the method deals with:
    1. How many validations are there and what is the order in which they get executed
    2. What is the implementation of each validation check
    3. What message to convey to the user when a validation fails
    4. What is the mechanism for reporting failed validation (it wouldn't have to be an exception - it can as well be a mechanism based on events and notifications)
    5. What are the valid values for number of codewords required in order to make the request complete and for the number of maximum allowed damaged bits?

As you can see, the code is clearly poor quality, especially with regard to cohesion and encapsulation. Most of the time, this isn't "just enough". It's asking for trouble.

Improving encapsulation.

If we decide to improve encapsulation on the Request class to actually hide some of its implementation details, we may arrive at something like this:

public void PerformFor(Request request)
{
  if(request.IsComplete())
  {
    throw new Exception("The request is incomplete");
  }
 
  if(request.WasCreatedBy(Users.Admin))
  {
    throw new Exception("Invalid user, expected an admin");
  }

  if(request.IsTooDamagedToProcess())
  {
    throw new Exception(
      "The request got too damaged when sent over the air." +
      "It cannot be processed");
  }
}

Believe it or not, this is actually a lot better that the previous attempt. First, the implementation of the request is almost completely hidden. We don't browse its dependencies - we talk only with the request by asking it specific questions that are formed in a high-level domain language (e.g. we're not asking about number of codewords, but about completeness of the request, which also reveals a bit about the "why" associated with this check, which is "we don't want to process incomplete requests". Merely counting codewords, as in previous example, doesn't tell us that).

By the way, note that such design is less explicit, but more readable as a domain model.

So, is it "just enough design" now? I'd say yes, but only if this is a short term solution. I'd leave something like this in my code maybe the first time I'd write it. The second time I'd have to change anything here, I'd already refactor towards a better quality implementation.

While this is certainly an improvement over the previous one, it still has some weak points:

  1. It still exposes the implementation of the validation failure handling. For now, this is only throwing an exception, tommorow, issuing a notification may be added or setting a field on the request or anything else. Each time this failure handling changes, we'll have to update this piece of code until it grows really ugly.
  2. The messages passed inside exceptions in each situation are exposed.
  3. The method is coupled to the public interface of the request (i.e. it needs to know its method signatures because it calls them). This kind of coupling is sometimes called Representational Coupling.

To sum up: the method still has at least two mostly unrelated responsibilities: performing validations in certain order and implementing a failure scenario for each validation. There are some cases when this can be called "just enough design", but usually, we have to work a little bit more.

Encapsulating the failure handling

If we decide to encapsulate the failure handling, we may get to the following solution:

public void PerformFor(Request request)
{
  _validationRules.ValidateCompletenessOf(request);
  _validationRules.ValidateCreatorOf(request);
  _validationRules.ValidateTotalDamageDoneTo(request);
}

Now, that's better! Actually, it's a high quality design! The PerformFor() method now cares only about two things:

  1. Which validations are applied
  2. In what order are the validations applied

We can safely assume that applying certain validations in correct order is a single responsibility. The PerformFor() method knows of nothing more - either how the validations are implemented, which exception is thrown in case of failure and what messages are passed inside these exceptions.

Also, note that the PerformFor() method doesn't need to know any method signature of the request object. In other words, however these signatures change, PerformFor() is very likely to remain unaffected. Sometimes, we call this kind of coupling Identity Coupling. Identity Coupling is the lightest of all kinds of coupling. Also, PerformFor() has Representational Coupling to the validationRules object, which may even be a facade and have whatever complex implementation we need behind itself.

To sum up, this is "just enough design" in most cases! Now, depending on further needs and the rest of the system, we may need to add some extra capabilities.

Extracting each validation as a separate object.

We can extract each validation into its separate class. Then, we'd get something like this:

public void PerformFor(Request request)
{
  _completenessValidation.ApplyTo(request);
  _creatorValidation.ApplyTo(request);
  _totalDamageValidation.ApplyTo(request);
}

I rarely do this for its own sake when I already have the implementation discussed in the previous section. Mostly, this is an intermediate step towards bringing in a variability in the number of validations. Note that the PerformFor() method has higher coupling than before, which actually might make it a little worse quality than the previous approach and a little underdesigned. The good thing is that applying each validation follows the same signature. If I can successfully do this, then I can just introduce an interface, make all these validations implement this interface, then put them all inside a collection and decouple the PerformFor() method from the knowledge about which validations are being applied in which order. The knowledge would be probably moved into a factory. Let's see how this works out:

public RequestValidation(IEnumerable<ValidationStep> steps)
{
  _steps = steps;
}

public void PerformFor(Request request)
{
  foreach(var step in _validationSteps)
  {
    step.ApplyTo(request);
  }
}

As you can see, all the Validation class currently does is to go through all the validations and apply them to the request. It does not know about the order of validations or which validations are performed. Also, as I said, this knowledge is usually moved into a factory, so we decoupled setting up validation chain from applying it. The remaining question is: why the heck do we need that?

It all depends on what we need. Imagine we have different validation chains with different validations or the same validations in different order depending on circumstances. In such case, we would not want to pollute the PerformFor() method with "ifs and elses" for all these scenarios. We'd just rely on the factory to create the right chain for us and just go and execute it. If this is the case, then this design may be "just enough".

In a case where we don't have these different scenarios, we've just wasted our time introducing a variability that we don't need. Remember the principle from Design Patterns book? "Encapsulate what varies"? Also, as Amir Kolsky once explained to me, the following is also true: "Do not encapsulate what does not vary". In other words, every variability we introduce without a need is an overdesign, a waste of time and unnecessary complexity.

This example may be further refactored into a Decorator pattern, but I'm just gonna stop here :-).

That's it for now, feel free to leave a comment on what you consider to be underdesigned or overdesigned or on whatever you like :-).

No comments:

Post a Comment