Sunday, December 16, 2012

Oh noes, another post on naming conventions?

Hi, today I'd like to write about a naming convention that I picked up some time ago and which help me to write more understandable code and better focused code.

Just to make things clear - I'm not going to evangelize anyone that what I suggest is "the best" or "the right" thing to do. I just want to share what works for me.

Ok, now that I explained myself, time for the rule and the rule is simple:

Name your classes and methods after domain concepts

Did I just hear you laughing and shouting "Don't say!"? Just wait a little before leaving, I wanna explain how I understand this rule. All I want to say is: when you name your classes and methods, use names that would be understandable to a domain expert, not to only to a programmer. There are couple of things to keep in mind when following this rule:

Watch out for 'er' or 'or"

I don't know why, we, software engineers, suffer from a cognitive bias that suggests us that each class we create must be a kind of entity - a material, physical thing. If there's no such "thing" to represent as domain concept, we make it up. For example, we tend to call classes that perform some kind of processing "XYZProcessor". Also, when we want to validate anything, we create "validators". When we need a special way to run part of logic, we create "runners".

In my opinion, this kind of naming has some flaws that I'd like to list below:

  1. I mean, come on! What are these? One thing sure - they're not domain concepts. If you told your user that you're "passing the frame through a validator", he'd be like "What?". The language that's used for communication between programmers and other stakeholders is hurt and requires translations between the "design language" and the "domain language".
  2. This kind of approach tends to create a lot of repetition when using classes and methods. These repetitions make it harder to read code. The words thing that we can get is something like:
    Just read it aloud and answer yourself whether it forms a text that is easy to memorize. Even if we work a little on this example, the best we can come up with is:
    Which still has one repetition. While this is just one statement, when running into a code that's full of such constructs, it's very hard to rebuild a domain model in mind.
  3. Also, as I learned, such naming convention makes it easier to cross the boundaries of class responsibilities. For example, let's take a "MessageProcessor". the name points that it has something to do with processing messages, but the relationship is indirect. So, one day, someone thinks "this processor is an actor, so maybe it can do more than one thing" and puts in a method for persisting the messages. I've seen countless times, that if something is not directly related to a domain concept, it eventually gets used for some additional tasks.

Of course, there are well-established names like "Builder" which clearly point to a design pattern used. These are sometimes appropriate. Also, it's good to name your class "Processor" if it really models a processor (e.g. in some closer-to-hardware systems). Anyway, as a general rule, when I encounter a name ending with "er" or "or", I ask myself: is it named after a thing or a process? If it's named after a thing, it's okay. If it's named after a process (e.g. validation, encryption, sending are all processes), there is almost certainly a better name that fits the domain better.

Let me offer some examples of how usually things get named and what naming I usually use (they use variables, but I usually name variables more or less after their classes):

I don't use... Instead, I use...

I learned that such names improve communication, help sharpen focus of a class and build a better conceptual domain model from looking at the code.

avoid putting implementation details inside higher level abstractions

This most often applies to abstractions created around collections. For example, let's take a tree-like structure, where each tree node has its children. Sooner or later, we discover that we need a separate abstraction over the set of children nodes, because we need some customized search criterias etc. for the child nodes.

How would you name such abstraction? ChildrenSet? ChildrenList? Why not just call it Children - after all, that's the concept from the domain that this abstraction models, doesn't it? Why the need for "List" or "Set" or anything else? Why reveal the internal implementation detail which is how the elements are stored? What if one day, you decide to split the set of children into two sets internally - one for terminal and one for non-terminal children - to speed up searching (if you know upfront that a node is terminal, you don't have to check its children at all)?

In rare cases, name objects and classes after verbs

Do you know Assert class from XUnit-type frameworks? Ever wondered why it's called "Assert" and not "Asserter" or "Assertions"? That's because it is a part of a mini-language and helps forming statements that read more like sentences, which improves readability. The Assert class is focused so much on a single task that it makes perfect sense to call it after a verb, not a noun as it is usually suggested. In the long run, it made possible forming expressions like this (in nUnit):

Assert.That(X, Has.Property("Y"));

Personally, I tend to apply this convention to small factories and other small objects that have a very strong focus. Let's say that I want to create credentials from User object and this object has either user name or account name filled. I prefer to create the credentials from account name because it's globally unique,but if the account name is not supplied, the creation method returns null and I create a less-privileged credentials based on user name, which is only unique locally. Thus, I may create the credentials factory like this:

CredentialsFactory createCredentials;

And then use it like this:

var credentials 
  = createCredentials.BasedOn(user.Account) 
    ?? createCredentials.BasedOn(user.Name);

Again, I don't use this too often, but I sometimes do.

Ok, that's all for today, feel free to leave a comment and share your tips for naming abstractions, variables and methods.

Thursday, December 13, 2012

Under and Over Design

Few days ago, I was listening to a talk by Alan Shalloway on over and underdesign. This is one of the (too) few times I heard anyone talking about something like underdesign. Usually, we don't take it into account - and that's a mistake. We talk a lot about overdesign, the "pattern religion", the misplaced Chains of Responsibility, misused Visitors etc. and we're clearly sending a message that we strive for a "simple design", that's "not trying to foresee the future". However, in all the fuss, we tend to forget that the design might also get "too simple", AKA "bad quality".

What's bad quality? I like to define it using the code qualities mentioned in a nice little interactive tutorial by NetObjectives. So, whatever doesn't fit these qualities, is underdesigned.

Note, however, that many of these qualities, are not "binary" - when asked "do we have this quality?", it's often impossible to say "yes" or "no". For example, we cannot have full encapsulation, because that would mean no one knowing about anyone. Also, for the same reason, we cannot have "zero coupling" - if there is no coupling at all between any entities, how are they supposed to work together as a system?

This leads to a question: how much of these qualities is "just enough" - a "golden mean" between underdesign or overdesign? Based on this question, I decided to share my views on the topic, using a simple example.

The example is about validation of requests that arrive to our application through the air and are deserialized into objects of class Request. Every validation consists of:

  1. Checking whether this is a full request consisting of four codewords
  2. Making sure that this is a request by a privileged user (in this case - a system administrator)
  3. Checking whether the request was not damaged when being sent over the air (due to noise etc.)

Let's assume that the entry point for validation logic is a method called PerformFor() inside class RequestValidation. I'm never going to show the whole class, because it's not that important, I'll be focusing only on the PerformFor() method.

Ok, let's get it on then!

First approach

public void PerformFor(Request request)
  if(request.Codewords.Count != 4)
    throw new Exception("The request is incomplete");

  if(request.Creator == Users.Admin)
    throw new Exception("Invalid user, expected an admin");

  foreach(var codeword in request.Codewords)
    if(codeword.DamagedBits > 16)
      throw new Exception(
        "The request got too damaged when sent over the air." +
        "It cannot be processed");

Some developers would consider it a good piece of code that's readable and lacks nothing. I disagree. While it may be just enough in applications that support one or two scenarios only (e.g. some small utilities), in a solid product that is under active development, putting in code like this is unacceptable. Let me point some smells inside this simple piece code:

  1. The internal information of the Request object is exposed - e.g. that it contains a collection of codewords, that it holds a creator of the request as an enum etc. Even if these are properties that execute some code at runtime, it doesn't really matter - this information is still exposed. This is sometimes called Inquisitiveness and is an antipattern.
  2. Our method retrieves a collection of objects of class Codeword - does it really need to know about this type? That's not all - it reaches to a property of codeword, which is browsing request's dependencies in order to get a piece of information. We usually say that such code, by creating an accidental coupling (e.g. what happens if one day Codewords property is allowed to be null?), is breaking the Law of Demeter.
  3. This whole method is an example of Feature Envy smell - most of its logic is asking the request object about different values and values of its contained objects to make some decisions based on that (every 'if' statement and the 'foreach' is an example of this). That's in contrast to the object orientated programming, where you actually want to keep the behavior as close as possible to the data it needs.
  4. The methods is not cohesive (in other words, it has more than one responsibility). Here are the responsibilities that the method deals with:
    1. How many validations are there and what is the order in which they get executed
    2. What is the implementation of each validation check
    3. What message to convey to the user when a validation fails
    4. What is the mechanism for reporting failed validation (it wouldn't have to be an exception - it can as well be a mechanism based on events and notifications)
    5. What are the valid values for number of codewords required in order to make the request complete and for the number of maximum allowed damaged bits?

As you can see, the code is clearly poor quality, especially with regard to cohesion and encapsulation. Most of the time, this isn't "just enough". It's asking for trouble.

Improving encapsulation.

If we decide to improve encapsulation on the Request class to actually hide some of its implementation details, we may arrive at something like this:

public void PerformFor(Request request)
    throw new Exception("The request is incomplete");
    throw new Exception("Invalid user, expected an admin");

    throw new Exception(
      "The request got too damaged when sent over the air." +
      "It cannot be processed");

Believe it or not, this is actually a lot better that the previous attempt. First, the implementation of the request is almost completely hidden. We don't browse its dependencies - we talk only with the request by asking it specific questions that are formed in a high-level domain language (e.g. we're not asking about number of codewords, but about completeness of the request, which also reveals a bit about the "why" associated with this check, which is "we don't want to process incomplete requests". Merely counting codewords, as in previous example, doesn't tell us that).

By the way, note that such design is less explicit, but more readable as a domain model.

So, is it "just enough design" now? I'd say yes, but only if this is a short term solution. I'd leave something like this in my code maybe the first time I'd write it. The second time I'd have to change anything here, I'd already refactor towards a better quality implementation.

While this is certainly an improvement over the previous one, it still has some weak points:

  1. It still exposes the implementation of the validation failure handling. For now, this is only throwing an exception, tommorow, issuing a notification may be added or setting a field on the request or anything else. Each time this failure handling changes, we'll have to update this piece of code until it grows really ugly.
  2. The messages passed inside exceptions in each situation are exposed.
  3. The method is coupled to the public interface of the request (i.e. it needs to know its method signatures because it calls them). This kind of coupling is sometimes called Representational Coupling.

To sum up: the method still has at least two mostly unrelated responsibilities: performing validations in certain order and implementing a failure scenario for each validation. There are some cases when this can be called "just enough design", but usually, we have to work a little bit more.

Encapsulating the failure handling

If we decide to encapsulate the failure handling, we may get to the following solution:

public void PerformFor(Request request)

Now, that's better! Actually, it's a high quality design! The PerformFor() method now cares only about two things:

  1. Which validations are applied
  2. In what order are the validations applied

We can safely assume that applying certain validations in correct order is a single responsibility. The PerformFor() method knows of nothing more - either how the validations are implemented, which exception is thrown in case of failure and what messages are passed inside these exceptions.

Also, note that the PerformFor() method doesn't need to know any method signature of the request object. In other words, however these signatures change, PerformFor() is very likely to remain unaffected. Sometimes, we call this kind of coupling Identity Coupling. Identity Coupling is the lightest of all kinds of coupling. Also, PerformFor() has Representational Coupling to the validationRules object, which may even be a facade and have whatever complex implementation we need behind itself.

To sum up, this is "just enough design" in most cases! Now, depending on further needs and the rest of the system, we may need to add some extra capabilities.

Extracting each validation as a separate object.

We can extract each validation into its separate class. Then, we'd get something like this:

public void PerformFor(Request request)

I rarely do this for its own sake when I already have the implementation discussed in the previous section. Mostly, this is an intermediate step towards bringing in a variability in the number of validations. Note that the PerformFor() method has higher coupling than before, which actually might make it a little worse quality than the previous approach and a little underdesigned. The good thing is that applying each validation follows the same signature. If I can successfully do this, then I can just introduce an interface, make all these validations implement this interface, then put them all inside a collection and decouple the PerformFor() method from the knowledge about which validations are being applied in which order. The knowledge would be probably moved into a factory. Let's see how this works out:

public RequestValidation(IEnumerable<ValidationStep> steps)
  _steps = steps;

public void PerformFor(Request request)
  foreach(var step in _validationSteps)

As you can see, all the Validation class currently does is to go through all the validations and apply them to the request. It does not know about the order of validations or which validations are performed. Also, as I said, this knowledge is usually moved into a factory, so we decoupled setting up validation chain from applying it. The remaining question is: why the heck do we need that?

It all depends on what we need. Imagine we have different validation chains with different validations or the same validations in different order depending on circumstances. In such case, we would not want to pollute the PerformFor() method with "ifs and elses" for all these scenarios. We'd just rely on the factory to create the right chain for us and just go and execute it. If this is the case, then this design may be "just enough".

In a case where we don't have these different scenarios, we've just wasted our time introducing a variability that we don't need. Remember the principle from Design Patterns book? "Encapsulate what varies"? Also, as Amir Kolsky once explained to me, the following is also true: "Do not encapsulate what does not vary". In other words, every variability we introduce without a need is an overdesign, a waste of time and unnecessary complexity.

This example may be further refactored into a Decorator pattern, but I'm just gonna stop here :-).

That's it for now, feel free to leave a comment on what you consider to be underdesigned or overdesigned or on whatever you like :-).