Saturday, November 4, 2017

On Default Choices

Nowadays, I tend to say less of "I always do X" and more of "X is my default choice".

Default choice is a term (and a conceptual tool) I use a lot in design discussions and coding as of late. I find it very unfortunate that, although I see default choices implicitly mentioned in almost every design book, blog post, and discussion, a concept of default choice has never received much attention. Thus, I decided to write something about it myself.

My homegrown definitions


Over the years of thinking and talking about default choices, I have developed three homegrown definitions of the term, each of them approaching the idea from a different angle. To me, a default choice is:
  1. Something I currently consider the best trade-off in most cases.
  2. Something I need a reason not to apply. 
  3. The first solution I consider, when presented with a problem.
Definition 1 suggests several important things:
  1. The "what I consider" part suggests that making a choice default is subjective. My default choices are drawn from experience I acquired while solving problems myself or what I learned from other people. Of course, when I work with somebody in a team, we need to collectively agree on some team-wide default choices, and this agreement becomes a part of our culture as a team.
  2. The word "currently" suggests that default choices are not set in stone but rather refined over time. As mentioned before, their primary source is experience, so as this experience develops, so do the default choices.
  3. The word "trade-off" suggests that a default choice is not the One True Way - like all other choices, it has its share of real and potential issues. 
  4. The "in most cases" part suggests that "context is king". The reason I have default choices is that many times, a context I encounter is similar to ones I already encountered in the past.
Desinition 2 introduces an interesting twist. When something becomes my default choice for dealing with certain problem, I need justification to NOT apply it. The point for me is that having a default choice saves me some thinking and analysis where instead I can draw from my past experiences. For example, I don't need a reason to apply the Single Responsibility Principle. I don't need to justify it everytime I use it. I do need a reason to withhold myself from doing so.

Definition 3 suggests that even when there are several equally good solution, I start with evaluating my default one. For example, when trying to avoid using null, my default choice is to use a Null Object pattern, but I often end up using an Optional/Maybe instead when the situation at hand is a better fit. Still, I consider Null Object first.

Examples of default choices


In my approach to coding and design, I apply lots of default choices. Some of them are:
  1. Don't pass or return null.
  2. Design objects in tell don't ask manner.
  3. Do not have more than one constructor in any class.
  4. Separate objects usage from creation.
  5. Develop in outside-in style.
  6. Use test-driven development.
  7. Design classes to maximize cohesion and minimize coupling.
  8. Prefer object composition over class inheritance.
  9. Use hexagonal architecture for my components.
  10. ...
Does it mean I always do all of that? No, of course not. For example, I pass null when a third-party library demands it (although even in such case I have some default choices on how to do it). I use class inheritance when implementing exception classes (likewise, my exception classes often have more than one constructor). I don't use hexagonal architecture when writing build scripts. 

This is the whole notion of defaultness. Having default choices allows saving time thinking about the same things over and over again, but at the same time, it's not an excuse for turning off thinking.

Gradation of defaultness


If you look at some of my open source code on github, you will notice that not every class is pretty and not everything even has a test. This is because I vary what choices I consider default depending on the context I work in.

Some of the things that influence how much of my default choices I apply:

  1. Do I work alone or in a team?
    • this influences how much mess I can tolerate in the code. When I work alone, I can leave something in a messy state and I know no one else will have issues understanding the code or that the code will not change in my absence.
  2. How big is the team?
    • the bigger the team, the more disciplined I tend to be as unattended mess can grow much faster when more people work on the code.
  3. What is the level of expertise in my team?
    • with more expert team, I can leave a little underdesign in the code and trust that when other team members encounter it, they will know how to clean it up.
  4. What language do I write in?
    • Assuming the same size of app, I tend to be more disciplined when writing in dynamic languages as they don't have static cheking. I extract methods, classes and DSLs faster to maximize code reuse and avoid silly mistakes.
  5. What is the required level of quality?
    • If this is my small home pet project that only I use, I may not even bother with automated tests or write less of them. After all, when I encounter an error, I will know how to fix it (still, almost all of my home pet projects have automated tests). Maybe I will shock you by saying that some of my code doesn't even have to work correctly (this applies to some of my training examples). On the other hand, when writing code that human life depends on, I'd like to have a huge test coverage on several levels.
  6. Do I have a schedule?
    • this may occur as a paradox to some people, but I tend to apply more design principles, techniques and patterns when I have a schedule. This is because in such case, no one will probably agree to stop for several weeks and fix a piece of design. On the other hand, when working on home pet projects where no one is waiting for a specific new feature, I can spend two months refactoring and have lots of fun doing so!
  7. ...

Default choices as a discussion tool

Personally, I am not a fan of "we must do X because it's the right thing to do" as much as I once was. This is because my technical discussions based on "rightness" have never really brought any satisfying conclusions and led to conflicts and each side of the discussion losing respect for the other.

Now, if we agree that many of our disagreements arise from having different default choices, we can then achieve at least three things:
  1. Remain in disagreement without losing trust in each other as professional software developers.
  2. Explore the differences in our default choices and uncover how we can reconcile them or make better choices. In other words, we can learn from each other.
  3. When we (e.g. as a team) agree on default choices and what a default choice means, we know better what we can expect from each other, which enforces trust and can speed up code reviews.





Sunday, September 22, 2013

DevDay 2013 in Kraków

On Friday, September 21st, I had the pleasure of attending DevDay 2013 conference. The experience was great, because the conference was free, the food was tasty, and the presenters were top-notch (plus, you could catch them during the breaks and ask further questions)!

There were two tracks everyone could attend. I decided to stick to the "green track" and I don't regret it. I'd like to share my feelings about each of the presentations. I'm not gonna write exactly what the presentation was about though, because you can find this on the conference page.

Jon Skeet: Back to basics: the mess we've made of our fundamental data types

Jon is a great showman - that's granted. Using his puppet he called "Tony the Pony", he went on to discuss the issues people are having with the fundamental data types: numbers, text and date/time and he did it with style! There were a lot of jokes and funny anecdotes along the way. I especially enjoyed the part about date and time, when Jon gave a flood of examples why date and time is too complex for ordinary people, like when which country migrated from julian calendar to gregorian calendar (and what their migration strategies were), about timezones with the same abbreviations, places on earth where there's no midnight etc. Half through his talk I was already fully convinced I won't be able to understand this topic ever.

Thankfully, Jon is also the author of an open source library called Joda Time (Java) / Noda time (.NET) that comes to the rescue. Being a Stack Overflow Chuck Norris (by the way, I heard him saying during the conference: "It's hard living up to being Chuck Norris" :-)), it's no wonder that in Joda Time, they have tests for many issues people have and report on Stack Overflow with the built-in date-time API.

Patrick Kua: Implementing continuous delivery

I really liked how Patrick presented the topic of Continuous Delivery. A very positive guy, eager to discuss and not dogmatic at all. The presentation covered a lot of patterns for continuous delivery on many levels. Some of those can be found at Slideshare. After the talk I got to question Patrick whether inside-out or outside-in development works better with continuous delivery, and he replied that both can be used successfully - it all depends on what's easier at a given moment for the development team.

Andreas Håkansson: Guerilla Framework Design

A very interesting session where Andreas showed us, using simple examples, how C# can be pushed to the limits (he even called it "language abuse" :-)) to provide better and more expressive APIs, and how he put it to use during implementation of Nancy framework. I think the important distinction he made was that the code you're consuming yourself is very different than the code you write to be consumed by others (i.e. a framework). There were some really neat tricks shown, like using dynamics instead of dictionaries to attain better expressiveness, or patterns and gotchas for assembly scanning for extensions. He also showed how a framework can expose its IoC container to enable replacing its core components. What I liked was his stress on testability and the whole idea of using language features in a crazy way felt a lot like Ruby on Rails.

Hadi Hariri: moving the web to the client

The presentation was mainly about Angular.js, but there were also some nice comments about how server-side application should be used as an API or how web development came astray and how web developers tried to abstract the wrong things, like HTTP protocol (I particularly enjoyed the example Hadi gave about a question he got about whether to return status "200" with error message in case of error - clearly, the person asking the question forgot that HTTP has return codes for errors). Personally, I wasn't aware that client-side application development in JavaScript has moved this far, thanks to libraries such as Angular.js and HTML5. Another "taste" was a presentation of server side code written in Kotlin language and Wasabi framework.

Itamar Syn-Hershko: Full-text search with Lucene and neat things you can do with it

I knew almost nothing about full-text search, Lucene and Elastic Search, so everything in this presentation was new to me. Due to this, I had a bit of a hard time following the code examples, especially that the session was quite late during the day. Anyway, I have to say it was a valuable session, which I couldn't fully digest because of my lack of knowledge on the topic. Still, I enjoyed it!

Marco Cecconi: The Architecture of StackOverflow

The session on performance considerations and tweaks of Stack Overflow started slowly. I had the impression that Marco was suffering from a jet lag, because he felt quite sleepy at the beginning. The funny thing was that the more we were moving into crazy-geeky stuff, the more excited Marco was getting and the presentation ended on a high note with a long session of questions and answers. We could learn what technologies does Stack Overflow work with, what's their open source policy, how they make money, what tweaks do they use for performance (like real time monitoring, hand-written intermediate language code, statics all around etc.) and their approach to testing. A very interesting case study indeed.

Rob Ashton: The software journeyman's guide to being homeless and jobless

Rob is British, so I wasn't always able to understand him well (which is really funny, because we're always told we learn English at school, but it turns to be more of an American-English when it comes to accent). Still, the presentation about his year-long experiment was an exciting and energizing one. Gales of laughter poured from the audience as Rob was showing and commenting his experiences, and he won the audience completely by showing us a picture of him coding Clojure during Eurovision song contest :-). The presentation was kind of motivational one and I think it succeeded in encouraging us to make our jobs a happy experience.

Then came a closing speech, a photo of all the speakers and an after-party, which I actually didn't attend, since I was tired already.

Anyway, the experience was absolutely brilliant and I am looking forward to attending the conference in the future!

Sunday, January 20, 2013

Value objects are the lesson I keep relearning over and over again

Hi, today I'm gonna tell a bit about value objects, why they're so useful and what rules apply to creating and using them. This post is inspired by some work by Kent Beck, Steve Freeman and Nat Pryce.

Example problem

Imagine you're developing a web shop for your customer. There are different kinds of products sold and your customers have the ability to add new products.

Each product has at least two important attributes: name and price (actually there are others like quantity, but let's leave them alone for now).

Let's say that you've decided to use a decimal to hold the price, and a string to hold the name. Note that both are generic library types, not connected to any domain. Is it a good choice to use "library types" for domain abstractions? We shall soon find out...

Time passes...

Actually, it turns out that these values must be shared across few subdomains of the system. For example:

  1. The website needs to display them
  2. They are used in calculations
  3. They are taken into account when defining and checking promotion rules (e.g. "buy three, pay for two")
  4. They must be supplied when printing invoices

etc.

The code grows larger and larger and, as the concepts of product name and price are among the main concepts of the application, they tend to land everywhere.

Now, imagine that one of the following changes must make its way into the system:

  1. The product name must be compared as case insensitive, since the names of the products are always printed in uppercase on the invoice. Thus, creating two products that differ only in a letter case (eg. "laptop" and "LAPTOP") would confuse the customers as both these products look the same on the invoice. Also, the only way one would create two products that differ by letter case only is by mistake and we want to avoid that.
  2. The product name is not enough to differentiate a product. For example, a notebook manufacturers have the same models of notebooks in different configurations (e.g. different amount of RAM or different processor models inside). So each product will receive additional identifier that will have to be taken into account during comparisons.
  3. In order to support customers from different countries, new currencies must be supported everywhere money are already used.

These changes are a horror to make. Why? It's because we're coupled in multiple places to a particular implementation of product name (string) and a particular implementation of money (decimal). This wouldn't be so bad, if not for the fact that we're coupled to implementation we cannot change!

From now on, let's put the money concept aside and consider only the product name, as both of these cases are similar, so it's sufficient to consider just one of them.

What options do we have?

So, what choice do we have now? In order to support new requirements, we have to find all places where we use the product name and price and make the same change. Every time we need to do something like this, it means that we've introduced redundancy.

Option one - just modify the implementation in all places

So let's say we want to add comparison with letter case ignored. The worst idea to have would be to find all places where we do something like this:

if(productName == productName2))
{
..

And change it to:

if(String.Equals(productName, productName2, 
   StringComparison.OrdinalIgnoreCase))
{
..

This is bad for at least three reasons:

  1. According to Shalloway's Law, it will be very hard to find all these places and chances are you'll miss at least one.
  2. Everyone who adds new comparisons of product names to the code in the future, will have to remember to use IgnoreCase comparison. If you want to know my opinion, accidental violation of this convention is just a matter of time.
  3. Even if this time you'll be able to find and correct all the places, every time this aspect changes (e.g. we'll have to support InvariantIgnoreCase option instead of OrdinalIgnoreCase for some reasons, or the case I mentioned earlier with comparison including an identifier), you'll have to do it over.
  4. Also, there are other changes that will be tied to the concept of product name (like generating a hash code or something) and you'll need to introduce them too in all the places where the product name value is used.

Option two - use helper class

We can address the third issue of the above list by moving the comparison operation into a helper class. Thus, the comparison would look like this:

if(ProductName.Equals(productName, productName2))
{
..

Now, the details of the comparison are hidden inside the newly created class. Each time the comparison needs to change, we have to modify only this one class.

However, while it protects us from the change of comparison policy, it's still not enough. The concept of product name is not encapsulated - it's still a string and all its methods are publicly available. Hence, another developer who starts working on the code may not even notice that product names are compared differently than other strings and just use the comparison methods from string type. Other defficiencies of the previous approach apply as well (as I said, except from the issue number 3).

Option three - encapsulate the domain concept and create a "Value Object"

I think it's more than clear now that product name is a not "just a string", but a domain concept and as such, it deserves its own class. Given this, the comparison snippet is now:

//both are of class ProductName
if(productName.Equals(productName2))
{
..

How is it different from the previous approach with helper class? While previously the implementation of a product name (a string) was publicly visible and we only added external functionality that operated on this implementation (and anybody could add their own), this time the nature of the product name is completely hidden from the outside world. The only available way of working with product names is through the ProductName's public interface (which exposes only those methods we want and no more). In other words, whereas before we were dealing with a general-purpose type we couldn't change, now we have a domain-specific type that's completely under our control.

How value objects help dealing with change

Let's see how this move makes it easier to introduce the changes I already mentioned (ignoring case, comparing by ID as well as by string name and getting uppercase version for printing on invoice).

Initial implementation

The first implementation may look like this:

public class ProductName
{
  string _value;

  internal ProductName(string value)
  {
    _value = value;
  }

  public static ProductName For(string value)
  {
    if(string.IsNullOrWhiteSpace(value))
    {
      throw new ArgumentException(
        "Product names must be human readable!");
    }
    else
    {
      return new ProductName(value);
    }
  }
  
  //for on-screen printing
  public string HumanReadablePart
  {
    get
    {
      return _value;
    }
  }

  //for invoices
  public string ToNameForInvoice()
  {
    return _value.ToUpper();
  }

  public override bool Equals(Object obj)
  {
    if (obj == null)
    {
      return false;
    }

    var otherProductName = obj as ProductName;
    if ((Object)otherProductName == null)
    {
      return false;
    }

    return _value.Equals(otherProductName._value);
  }

  public override int GetHashCode()
  {
    return _value.GetHashCode();
  }

  public static bool operator ==(ProductName a, ProductName b)
  {
    if (System.Object.ReferenceEquals(a, b))
    {
      return true;
    }

    if (((object)a == null) || ((object)b == null))
    {
      return false;
    }

    return a.Equals(b);
  }

  public static bool operator !=(ProductName a, ProductName b)
  {
    return !(a == b);
  }
}

Note few things about this implementation:

  1. The class has internal constructor and static factory method for general use. The factory method holds all the rules that must be satisfied in order to create a valid object and the constructor just sets the fields. This is a preferred way for value objects to be created. One reason for this is that the rules for creating valid objects might grow and we don't want it to cause maintenance burden on our unit tests (we usually don't mock value objects, so they will be all over our unit testing suite). Thus, the clients will always use the factory methods and unit tests will have the freedom to use the constructor if they wish so. Also, it's good for readability (e.g. you can write ProductName.For("Soap"))
  2. It looks like the effort on creating such a wrapper around just one value is huge, however, most of the methods are straightforward and others can be auto-generated by an IDE (like Equals(), GetHashCode() and equality operators).
  3. Objects of ProductName class behave as if they were values, e.g. comparison is state-based instead of reference-based. Note that this is similar as in case of C# strings, which are a canonical example of value objects.
  4. Product names are immutable. There is no operation that can overwrite its state once the object is created. This is on purpose and is a design constraint that we want to maintain. For example, we may want to sell sets of products in the future ("2 in 1" etc.) and treat it as a separate product with a name being a merger of the component names. In such case, we could write:
    var productSetName = productName1.MergeWith(productName2);
    
    and this operation creates a new product name instead of modifying any of the component product names. Note that this is also the same as in case of strings, which, in C#, are immutable (every operation like Trim(), Replace() etc. creates a new object).
  5. While being freshly created, the ProductName abstraction already contains bits that are domain-specific, namely the ToNameForInvoice() method. Whether it's a good desicion to put such method in here is heavily dependent on the context (there are other interesting options, but I'll leave this topic for another time).

Ok, now for the first change:

First change - case-insensitivity

This is actually very easy to perform - we just have to modify the Equals() and GetHashCode() operations like so:

public override bool Equals(Object obj)
{
  if (obj == null)
  {
    return false;
  }

  var otherProductName = obj as ProductName;
  if ((Object)otherProductName == null)
  {
    return false;
  }

  return string.Equals(this._value, 
                       otherProductName._value,
                       StringComparison.OrdinalIgnoreCase);
}

public override int GetHashCode()
{
  return _value.ToUpper(
    CultureInfo.InvariantCulture).GetHashCode();
}

(a disclaimer - I'm not 100% sure that this implementation deals with all of the weird locale-specific issues, so don't treat it as a reference implementation, but rather as a simple example)

Thanks to this, no change outside the ProductName class is necessary. Two methods need to be modified, but in just one place, which means that the encapsulation we've introduced works out pretty well.

Second change - additional identifier

In order to do this, we'll have to modify the creation of ProductName classes:

internal ProductName(string value, string id)
{
  _value = value;
  _id = id;
}

public static ProductName For(string value, string id)
{
  if(string.IsNullOrWhiteSpace(value))
  {
    throw new ArgumentException(
          "Product names must be human readable!");
  }
  else if(string.IsNullOrWhiteSpace(id))
  {
    throw new ArgumentException(
          "Identifiers must be human readable!");
  }
  else
  {
    return new ProductName(value, id);
  }
}

Note that this modification requires changes all over the codebase (because additional argument is needed to create an object), however, this is not the kind of change that we're afraid of. That's because the compiler will create a nice little TODO list (i.e. compile errors) for us and won't let us go further without addressing it first, so there's no chance the Shalloway's Law might come to effect. Thus, we're pretty much safe. By the way, if the requirements change in the future so that we will have to support product names without an ID, this is the only place we'll need to change.

In addition, Equals() and GetHashCode() will have to be changed again:

public override bool Equals(Object obj)
{
  if (obj == null)
  {
    return false;
  }

  var otherProductName = obj as ProductName;
  if ((Object)otherProductName == null)
  {
    return false;
  }

  var valuesEqual = string.Equals(this._value, 
                       otherProductName._value,
                       StringComparison.OrdinalIgnoreCase);
  var identifiersEqual = this._id.Equals(otherProductName._id);

  return valuesEqual && identifiersEqual;
}

public override int GetHashCode()
{
  return 
    _value.ToUpper(CultureInfo.InvariantCulture).GetHashCode()
      ^ _id.GetHashCode();;
}

This modification won't require any changes to the code outside ProductName class.

The last change is adding a new member for the ID to allow printing on invoices or on screen:

public string Identifier
{
  get
  {
    return _id;
  }
}

This will probably require changes to the rest of the codebase but the change will be for different reason (i.e. that we want to display product identifiers on web page and print them on the invoice), which is OK, since they're connected to responsibilities beyond those of product name.

Summary

By examining the above example, we can see the following principle emerging:

When you give a value a name that belongs to the problem domain, it means that its type should be a separate domain-specific class which is under your control instead of a general-purpose library class that's out of your control.

And it's a nice one to remember, especially because we often tend to model such values as a library types and wake up when it's already too late to make the transition to value object effectively. I'm an example of this and that's why I relearned this principle by hard many times.

And that's it for today. I'll be happy to hear your thoughts. Until then, see ya!

Sunday, December 16, 2012

Oh noes, another post on naming conventions?

Hi, today I'd like to write about a naming convention that I picked up some time ago and which help me to write more understandable code and better focused code.

Just to make things clear - I'm not going to evangelize anyone that what I suggest is "the best" or "the right" thing to do. I just want to share what works for me.

Ok, now that I explained myself, time for the rule and the rule is simple:

Name your classes and methods after domain concepts

Did I just hear you laughing and shouting "Don't say!"? Just wait a little before leaving, I wanna explain how I understand this rule. All I want to say is: when you name your classes and methods, use names that would be understandable to a domain expert, not to only to a programmer. There are couple of things to keep in mind when following this rule:

Watch out for 'er' or 'or"

I don't know why, we, software engineers, suffer from a cognitive bias that suggests us that each class we create must be a kind of entity - a material, physical thing. If there's no such "thing" to represent as domain concept, we make it up. For example, we tend to call classes that perform some kind of processing "XYZProcessor". Also, when we want to validate anything, we create "validators". When we need a special way to run part of logic, we create "runners".

In my opinion, this kind of naming has some flaws that I'd like to list below:

  1. I mean, come on! What are these? One thing sure - they're not domain concepts. If you told your user that you're "passing the frame through a validator", he'd be like "What?". The language that's used for communication between programmers and other stakeholders is hurt and requires translations between the "design language" and the "domain language".
  2. This kind of approach tends to create a lot of repetition when using classes and methods. These repetitions make it harder to read code. The words thing that we can get is something like:
    frameValidator.ValidateFrame(frame);
    
    Just read it aloud and answer yourself whether it forms a text that is easy to memorize. Even if we work a little on this example, the best we can come up with is:
    validator.Validate(frame);
    
    Which still has one repetition. While this is just one statement, when running into a code that's full of such constructs, it's very hard to rebuild a domain model in mind.
  3. Also, as I learned, such naming convention makes it easier to cross the boundaries of class responsibilities. For example, let's take a "MessageProcessor". the name points that it has something to do with processing messages, but the relationship is indirect. So, one day, someone thinks "this processor is an actor, so maybe it can do more than one thing" and puts in a method for persisting the messages. I've seen countless times, that if something is not directly related to a domain concept, it eventually gets used for some additional tasks.

Of course, there are well-established names like "Builder" which clearly point to a design pattern used. These are sometimes appropriate. Also, it's good to name your class "Processor" if it really models a processor (e.g. in some closer-to-hardware systems). Anyway, as a general rule, when I encounter a name ending with "er" or "or", I ask myself: is it named after a thing or a process? If it's named after a thing, it's okay. If it's named after a process (e.g. validation, encryption, sending are all processes), there is almost certainly a better name that fits the domain better.

Let me offer some examples of how usually things get named and what naming I usually use (they use variables, but I usually name variables more or less after their classes):

I don't use... Instead, I use...
validator.Validate(frame)
validation.ApplyTo(frame)
httpSessionTimerRunner.Run()
httpSessionExpiry.Schedule()
parallelExecutor.Execute(task)
parallelExecution.PerformFor(task)
serviceManager.DoWork()
serviceWorkflow.Start()
requestProcessor.Process(request)
requestProcessing.ExecuteWith(request)
requestProcessor.CanProcess(request)
requestProcessing.IsApplicableTo(request)
fileNameFormatter.Format(fileName)
fileNameFormat.ApplyTo(fileName)

I learned that such names improve communication, help sharpen focus of a class and build a better conceptual domain model from looking at the code.

avoid putting implementation details inside higher level abstractions

This most often applies to abstractions created around collections. For example, let's take a tree-like structure, where each tree node has its children. Sooner or later, we discover that we need a separate abstraction over the set of children nodes, because we need some customized search criterias etc. for the child nodes.

How would you name such abstraction? ChildrenSet? ChildrenList? Why not just call it Children - after all, that's the concept from the domain that this abstraction models, doesn't it? Why the need for "List" or "Set" or anything else? Why reveal the internal implementation detail which is how the elements are stored? What if one day, you decide to split the set of children into two sets internally - one for terminal and one for non-terminal children - to speed up searching (if you know upfront that a node is terminal, you don't have to check its children at all)?

In rare cases, name objects and classes after verbs

Do you know Assert class from XUnit-type frameworks? Ever wondered why it's called "Assert" and not "Asserter" or "Assertions"? That's because it is a part of a mini-language and helps forming statements that read more like sentences, which improves readability. The Assert class is focused so much on a single task that it makes perfect sense to call it after a verb, not a noun as it is usually suggested. In the long run, it made possible forming expressions like this (in nUnit):

Assert.That(X, Has.Property("Y"));

Personally, I tend to apply this convention to small factories and other small objects that have a very strong focus. Let's say that I want to create credentials from User object and this object has either user name or account name filled. I prefer to create the credentials from account name because it's globally unique,but if the account name is not supplied, the creation method returns null and I create a less-privileged credentials based on user name, which is only unique locally. Thus, I may create the credentials factory like this:

CredentialsFactory createCredentials;

And then use it like this:

var credentials 
  = createCredentials.BasedOn(user.Account) 
    ?? createCredentials.BasedOn(user.Name);

Again, I don't use this too often, but I sometimes do.

Ok, that's all for today, feel free to leave a comment and share your tips for naming abstractions, variables and methods.

Thursday, December 13, 2012

Under and Over Design

Few days ago, I was listening to a talk by Alan Shalloway on over and underdesign. This is one of the (too) few times I heard anyone talking about something like underdesign. Usually, we don't take it into account - and that's a mistake. We talk a lot about overdesign, the "pattern religion", the misplaced Chains of Responsibility, misused Visitors etc. and we're clearly sending a message that we strive for a "simple design", that's "not trying to foresee the future". However, in all the fuss, we tend to forget that the design might also get "too simple", AKA "bad quality".

What's bad quality? I like to define it using the code qualities mentioned in a nice little interactive tutorial by NetObjectives. So, whatever doesn't fit these qualities, is underdesigned.

Note, however, that many of these qualities, are not "binary" - when asked "do we have this quality?", it's often impossible to say "yes" or "no". For example, we cannot have full encapsulation, because that would mean no one knowing about anyone. Also, for the same reason, we cannot have "zero coupling" - if there is no coupling at all between any entities, how are they supposed to work together as a system?

This leads to a question: how much of these qualities is "just enough" - a "golden mean" between underdesign or overdesign? Based on this question, I decided to share my views on the topic, using a simple example.

The example is about validation of requests that arrive to our application through the air and are deserialized into objects of class Request. Every validation consists of:

  1. Checking whether this is a full request consisting of four codewords
  2. Making sure that this is a request by a privileged user (in this case - a system administrator)
  3. Checking whether the request was not damaged when being sent over the air (due to noise etc.)

Let's assume that the entry point for validation logic is a method called PerformFor() inside class RequestValidation. I'm never going to show the whole class, because it's not that important, I'll be focusing only on the PerformFor() method.

Ok, let's get it on then!

First approach

public void PerformFor(Request request)
{
  if(request.Codewords.Count != 4)
  {
    throw new Exception("The request is incomplete");
  }

  if(request.Creator == Users.Admin)
  {
    throw new Exception("Invalid user, expected an admin");
  }

  foreach(var codeword in request.Codewords)
  {
    if(codeword.DamagedBits > 16)
    {
      throw new Exception(
        "The request got too damaged when sent over the air." +
        "It cannot be processed");
    }
  }
}

Some developers would consider it a good piece of code that's readable and lacks nothing. I disagree. While it may be just enough in applications that support one or two scenarios only (e.g. some small utilities), in a solid product that is under active development, putting in code like this is unacceptable. Let me point some smells inside this simple piece code:

  1. The internal information of the Request object is exposed - e.g. that it contains a collection of codewords, that it holds a creator of the request as an enum etc. Even if these are properties that execute some code at runtime, it doesn't really matter - this information is still exposed. This is sometimes called Inquisitiveness and is an antipattern.
  2. Our method retrieves a collection of objects of class Codeword - does it really need to know about this type? That's not all - it reaches to a property of codeword, which is browsing request's dependencies in order to get a piece of information. We usually say that such code, by creating an accidental coupling (e.g. what happens if one day Codewords property is allowed to be null?), is breaking the Law of Demeter.
  3. This whole method is an example of Feature Envy smell - most of its logic is asking the request object about different values and values of its contained objects to make some decisions based on that (every 'if' statement and the 'foreach' is an example of this). That's in contrast to the object orientated programming, where you actually want to keep the behavior as close as possible to the data it needs.
  4. The methods is not cohesive (in other words, it has more than one responsibility). Here are the responsibilities that the method deals with:
    1. How many validations are there and what is the order in which they get executed
    2. What is the implementation of each validation check
    3. What message to convey to the user when a validation fails
    4. What is the mechanism for reporting failed validation (it wouldn't have to be an exception - it can as well be a mechanism based on events and notifications)
    5. What are the valid values for number of codewords required in order to make the request complete and for the number of maximum allowed damaged bits?

As you can see, the code is clearly poor quality, especially with regard to cohesion and encapsulation. Most of the time, this isn't "just enough". It's asking for trouble.

Improving encapsulation.

If we decide to improve encapsulation on the Request class to actually hide some of its implementation details, we may arrive at something like this:

public void PerformFor(Request request)
{
  if(request.IsComplete())
  {
    throw new Exception("The request is incomplete");
  }
 
  if(request.WasCreatedBy(Users.Admin))
  {
    throw new Exception("Invalid user, expected an admin");
  }

  if(request.IsTooDamagedToProcess())
  {
    throw new Exception(
      "The request got too damaged when sent over the air." +
      "It cannot be processed");
  }
}

Believe it or not, this is actually a lot better that the previous attempt. First, the implementation of the request is almost completely hidden. We don't browse its dependencies - we talk only with the request by asking it specific questions that are formed in a high-level domain language (e.g. we're not asking about number of codewords, but about completeness of the request, which also reveals a bit about the "why" associated with this check, which is "we don't want to process incomplete requests". Merely counting codewords, as in previous example, doesn't tell us that).

By the way, note that such design is less explicit, but more readable as a domain model.

So, is it "just enough design" now? I'd say yes, but only if this is a short term solution. I'd leave something like this in my code maybe the first time I'd write it. The second time I'd have to change anything here, I'd already refactor towards a better quality implementation.

While this is certainly an improvement over the previous one, it still has some weak points:

  1. It still exposes the implementation of the validation failure handling. For now, this is only throwing an exception, tommorow, issuing a notification may be added or setting a field on the request or anything else. Each time this failure handling changes, we'll have to update this piece of code until it grows really ugly.
  2. The messages passed inside exceptions in each situation are exposed.
  3. The method is coupled to the public interface of the request (i.e. it needs to know its method signatures because it calls them). This kind of coupling is sometimes called Representational Coupling.

To sum up: the method still has at least two mostly unrelated responsibilities: performing validations in certain order and implementing a failure scenario for each validation. There are some cases when this can be called "just enough design", but usually, we have to work a little bit more.

Encapsulating the failure handling

If we decide to encapsulate the failure handling, we may get to the following solution:

public void PerformFor(Request request)
{
  _validationRules.ValidateCompletenessOf(request);
  _validationRules.ValidateCreatorOf(request);
  _validationRules.ValidateTotalDamageDoneTo(request);
}

Now, that's better! Actually, it's a high quality design! The PerformFor() method now cares only about two things:

  1. Which validations are applied
  2. In what order are the validations applied

We can safely assume that applying certain validations in correct order is a single responsibility. The PerformFor() method knows of nothing more - either how the validations are implemented, which exception is thrown in case of failure and what messages are passed inside these exceptions.

Also, note that the PerformFor() method doesn't need to know any method signature of the request object. In other words, however these signatures change, PerformFor() is very likely to remain unaffected. Sometimes, we call this kind of coupling Identity Coupling. Identity Coupling is the lightest of all kinds of coupling. Also, PerformFor() has Representational Coupling to the validationRules object, which may even be a facade and have whatever complex implementation we need behind itself.

To sum up, this is "just enough design" in most cases! Now, depending on further needs and the rest of the system, we may need to add some extra capabilities.

Extracting each validation as a separate object.

We can extract each validation into its separate class. Then, we'd get something like this:

public void PerformFor(Request request)
{
  _completenessValidation.ApplyTo(request);
  _creatorValidation.ApplyTo(request);
  _totalDamageValidation.ApplyTo(request);
}

I rarely do this for its own sake when I already have the implementation discussed in the previous section. Mostly, this is an intermediate step towards bringing in a variability in the number of validations. Note that the PerformFor() method has higher coupling than before, which actually might make it a little worse quality than the previous approach and a little underdesigned. The good thing is that applying each validation follows the same signature. If I can successfully do this, then I can just introduce an interface, make all these validations implement this interface, then put them all inside a collection and decouple the PerformFor() method from the knowledge about which validations are being applied in which order. The knowledge would be probably moved into a factory. Let's see how this works out:

public RequestValidation(IEnumerable<ValidationStep> steps)
{
  _steps = steps;
}

public void PerformFor(Request request)
{
  foreach(var step in _validationSteps)
  {
    step.ApplyTo(request);
  }
}

As you can see, all the Validation class currently does is to go through all the validations and apply them to the request. It does not know about the order of validations or which validations are performed. Also, as I said, this knowledge is usually moved into a factory, so we decoupled setting up validation chain from applying it. The remaining question is: why the heck do we need that?

It all depends on what we need. Imagine we have different validation chains with different validations or the same validations in different order depending on circumstances. In such case, we would not want to pollute the PerformFor() method with "ifs and elses" for all these scenarios. We'd just rely on the factory to create the right chain for us and just go and execute it. If this is the case, then this design may be "just enough".

In a case where we don't have these different scenarios, we've just wasted our time introducing a variability that we don't need. Remember the principle from Design Patterns book? "Encapsulate what varies"? Also, as Amir Kolsky once explained to me, the following is also true: "Do not encapsulate what does not vary". In other words, every variability we introduce without a need is an overdesign, a waste of time and unnecessary complexity.

This example may be further refactored into a Decorator pattern, but I'm just gonna stop here :-).

That's it for now, feel free to leave a comment on what you consider to be underdesigned or overdesigned or on whatever you like :-).

Sunday, November 25, 2012

Welcome!

Today, I'd like to initiate my second blog, named Feelings Not Erased. It's a spin off from my other blog named Feelings Erased.

Why another blog?

There are some topics related to object oriented analysis and design that don't fit the profile of Feelings Erased blog, where my main focus is Test Driven Development and its relationship with object oriented design. Also, I really think that having an "alter-ego blog" is a cool idea :-).

Why "Erased" and then "Not Erased"?

I never explained, where did I get the title "Feelings Erased" in the first place, so now I'd like to reveal it :-).

Actually, the title "Feelings Not Erased" was first, although it wasn't mine. It's a title of a song from Chrono Cross OST. Just listen. Some time after hearing this, I made a track by myself, which was actually a cover of another song from the same OST. For some reasons, I decided to name the track "Feelings Erased" and distributed it only among my friends. This is where the idea for both blogs came from. So, the same moment I thought about an "alter-ego blog", the title made perfect sense.

So, that's why "Feelings Not Erased".

What now?

Feelings Erased remains my dearest child and this one will get a little less attention. However, I already have some great topics I'd like to write about here, so it's not a "sandbox" or a "trash can" in any way! It's just more OOAD in contrast to the other blog which is more about TDD and its influence on OOAD.

Stay tuned!