Minimizing Consistency Boundary with Event Sourcing

Imagine you have a system for managing customer subscriptions of your SaaS product. Your customer can be a company and can have multiple licenses he can use with the subscription he payed for. Customer also wants to be able to assign and unassign those licenses to his employees. Also, customer wants to be able to see how many unused licenses are available for a given subscription and how many licenses in total are used (across different subscriptions that he has payed for).

Given these requirements, let’s say that a business rule we have in our system is that when a customer assigns a license for an employee:

  • new license is stored to the database
  • number of unused licenses for a given subscription is decreased
  • total number of used licenses for all subscriptions of that customer is increased

Complementary, when a customer unassigns a license from an employee:

  • given license is deleted from the database
  • number of unused licenses for a given subscription is increased
  • total number of used licenses for all subscriptions of that customer is decreased

Storing only current system state (i.e. using database)

Looking at the business rules we have in place, all three operations for assigning and unassigning licenses need to be executed in a single transaction. If any of them fail, consistency boundary is broken. Thus, consistency boundary consists of all three operations.

When unassigning the license, it can happen that we delete the license from database, but bug in the code prevents us from increasing number of available licenses for a subscription and decreasing total number of used licenses for a given customer. Can we recalculate the two numbers again later given the license has already been deleted from the database? I’m not sure we can (at least not easily). This can leave us in an inconsistent state. In production. For hundreds of customers. Where we cannot easily trace back what state we were in before the bug has happened and thus not know what operations to revert in order to get back to the previous consistent state or what operations to apply to get to the next consistent state.

One of the ways to address this is by decreasing a probability of having a bug in code that caused this. Yeah, right ūüôā
On the other hand, perhaps we can ask ourselves if that way we’d be solving the wrong problem actually. Perhaps we shouldn’t be having all three operations in transaction in the first place. Perhaps, we could somehow decrease the consistency boundary.

Using Event Sourcing

What if we use event sourcing? Meaning, when assigning/unassigning a license we only record that that event happened instead of inserting/deleting the records in the database and updating related counter fields. In that case, consistency boundary would only encompass storing license assigned/unassigned event that has happened. Calculating two counters would be a matter of projection over a set of events for a given moment in time. E.g. asking what is a number of unused licenses is a projection over a set of all the license assigned and unassigned events that have happened for a given customer and given subscription up until now. In a similar way we’d be able to calculate total number of used licenses by a given customer.
License assignment/unassignment event is the single source of truth. Every other information can be derived from this.

Moment that we calculate information derived from the original event should not be coupled to the moment we record original event.

When storing only current state of the system, you have no option but to save the information that is derived from the original event at the moment when the original event happens. You must save it at the given moment because you have no history of changes and thus no ability to recalculate it afterwards based on the state you find in the database. This extends consistency boundary, meaning more operations in the transaction.

If anything about calculating number of unused licenses goes wrong because of the bug in the code, we can easily try the projection over the same set of recorded assigned/unassigned events again when we fix the bug. No need to intervene in order to get the data back in the consistent state.

Other side benefits of ES

Besides increasing the consistency boundary, former approach with storing only current system state and always precalculating counter fields can be a performance hit (imagine if this is some CPU heavy operation). What if it turns out that a given customer rarely goes to the part of portal where they can see this precalculated information (e.g. customer’s total number of assigned licenses)? Event sourcing also enables us to delay the calculation until the last responsible moment, meaning just when the user asks for it and thus save CPU cycles. We can even go further and record the events when and how often projections/calculations are triggered and thus observe how often do customers actually use this part of system. Low activity can indicate that there’s no value in keeping that functionality of the system at all. Less code, less problems.

In case you’re worried about the load hit that you’d face when lots of customers use this functionality, event sourcing presents recorded immutable series of events and thus plays very nicely with functional paradigm. This means that projections are pure functions without side effects and can be easily scaled out if needed. There’s also a possibility to use CQRS and separate read store containing two mentioned counters that would be updated on the license assign/unassign event. You could say that this boils down to the same thing as with the first approach, but the thing is that this is actually caching for performance reasons and also you can always recalculate these counters which is not the case when storing only the current state of the system.

Also, when using event sourcing, the amount of data that needs to be stored can be smaller than when storing only current state of the system. As we’ve seen in the given example, we only need to store the event that assign/unassign has happened, while two related counters are calculated when needed and don’t need to be stored. When you scale this to fairly large number of use cases this can make a difference. Surely, when you record series of events, rather than only current state, you’ll be using more data, but I wonder if not needing to store derived information actually zeroes out the differences in terms of data usage in these two approaches.

And some trade offs

On the other hand, using event sourcing usually needs more complex projection since you need to go through the whole history of events (though you can shortcut this with snapshots). When storing only current system state you are forced to calculated derived information and at this moment it’s usually fairly simple to do this. Using above mentioned example, asking what is a number of unused licenses for two approaches respectively:

  • is a projection over a set of all the license assigned and unassigned events that have happened for a given customer and given subscription up until now, versus
  • simply increasing number of used licenses by one

So, there’s a trade off between consistency boundary¬†scope and complexity of derived value calculation.

Summary

Storing only current state of the system increases the consistency boundary of the system and can cause more problems and inconsistent state in case of the failure of the system. It can also place upfront performance hit and also speculates that the derived, calculated information is going to be needed (at all).
Event sourcing decreases consistency boundary, enables projection/calculation of related information just when it is needed and keeps the application state consistent, while somewhat increasing complexity of derived values calculations.

(Some) 3rd party dependencies as a design smell

How do you define good software design?

How do you know you’ve arrived at that sweet spot where you can stop refactoring?
Ask this question to hundred developers and you’ll get hundred different answers. Also, I bet most of them will start with infamous:

And that’s OK. I’d say that, as well. Why? Because, most of the time, lots of factors are playing a role and one factor is function of other(s).
It’s all about trade-offs.

The opposite question

How do we know that a given design is sub-optimal?
Are there any smells that we can use in order to drive design away from bad design spectrum?
Since lots of us cannot agree on what a good software design is, I believe that this approach, of asking the opposite question, can help us in, at least, driving away from a bad design.

As, I hope, many of you know, there are already some seminal books that explain reasoning behind why a certain code is “smelly”. Those books also explain a step-by-step mechanics that can drive us away from bad design by removing these smells.

One thing worth noticing here is that these code smells are defined on the function/class level.
But are there any design smells on higher abstraction level? Perhaps assembly/package level?
I think there are and some of those are the mere fact of having dependencies on (some) 3rd party libraries.

 

“Programmers know the benefits of everything and the trade-offs of nothing”
‚Äē Rich Hickey

 

An example (IoC containers vs Pure DI)

Dependency injection (DI) pattern is quite popular nowadays and most of us are using Inversion of Control containers to specify which dependency implementation to use in which client class.¬†As an alternative, you can utilize Pure DI¬†(previously known as Poor Man’s DI), but you could argue that, in that case, when you add a new dependency for a given class, it will affect composition root¬†(i.e. you have to manually pass instance of dependency when instantiating client class) and the code won’t compile anymore.

For this reason, some people see the benefit in using IoC containers, since adding new dependencies doesn’t affect the composition root.¬†If I was to rephrase it, with IoC container it’s easier and more convenient to add a new dependency.

My question would be: “Would you really like to make it too easy to add a new dependency?”

This, in my experience, easily gets out of hand and leads to classes having too many dependencies which, in turn:

  • violate Single Responsibility Principle (SRP)
  • are harder to maintain
  • are harder to reuse
  • are harder to test
  • lead to big ball of mud anti-pattern
Pain induced (design) feedback

If I’m able to get a feedback about design from using Pure DI, then I’d like to leverage that information and address design smell before it’s too late.
Of course, I’m not advocating “IoC containers considered harmful” stance here. I’m advocating their critical usage and the need to reconsider the reasons behind why someone would use them in the first place. Maybe, they are in fact sweeping some design smells under the rug?


Also, bear in mind that, when using IoC container, you’re in fact circumventing compiler and if binding is not correct, you’ll know it only at run time. Not to mention how easy it is for developers to get wrong lifetime scopes.
If you have too many classes in your solution and if that’s the reason why you find it cumbersome¬†to use Pure DI, ask yourself why do you have so many classes in the solution.
Is there too much logic that needs to be separated into libraries/components/microservices? Are you sliding into monolith with bunch of entangled dependencies?
All of these “Why” questions lead you to the root cause of the problem.

More examples
Mocking frameworks vs Hand-rolled stubs/mocks

Other example is using various frameworks that “aid” in unit testing. Let’s take a look at mocking frameworks (e.g. Moq in C#, Mockito in Java etc.) which enable you to stub and mock dependencies (actually their interfaces) in the runtime. Some of the reasons developers use it (myself included), instead of hand-rolled stubs/mocks, is that, when you add new method to the interface, that change doesn’t affect the test¬†code.
On the other hand, with hand-rolled stubs/mocks, whenever you add new method to the interface, you have to implement it in all of the existing implementors (including hand-rolled stubs/mocks used in test code).

Thus, some of us would say that maintainability is affected when using hand-rolled stubs comparing to using mocking frameworks, and I’d agree with you. But, why did we need to add new method to the interface in the first place? Are we sliding towards fat interface anti-pattern?¬†Should we, instead, adhere to the Interface Segregation Principle (ISP) and split fat interface into multiple smaller ones? That way, perhaps, we won’t affect unrelated test code.

Also, maybe I’d go as far as to ask why do you need to use stubs/mocks in the first place? I’m Outside-In TDD practitioner and highly value the way that that approach positively affects design, but there are other approaches which may go one step further. For example,¬†Imperative shell – functional core¬†approach avoids the need for using mocks in the first place.

Feature branching in DVCS vs Trunk based development

Compared to branching in Centralized Versioning Control Systems (CVCS), branching in Distributed Versioning Control Systems (DVCS) is much more easier and convenient to use. Thus, I believe that this is the main reason why developers like to use DVCSs in order to implement feature branching in their development process. But.

Feature branching just feels more convenient to use comparing to trunk base development¬†approach. With trunk based development, you’re very often merging your work with someone else’s changes. Because of these often merges, you might feel that this creates friction in your development. On the other hand, feature branching functionality, of (distributed) versioning control systems, enables you to have your little sandbox where you can isolate yourself from others and not be interrupted. But what happens in the end of the sprint? All other developers finish features, they’ve been developing, and merge changes into trunk. Suddenly all of you have monster merges that you need to resolve. It’s the end of the sprint. Clock is ticking. Pressure to deliver.

 

If it hurts, do it more frequently, and bring the pain forward.
‚Äē Jez Humble

 

Feature branching just feels convenient and is only temporary convenient. It masks/delays integration problems and yields monster merges at the end of the sprint when all functionality should be completed.

Mocking frameworks vs Dependency injection

Some other examples of coding by using “more convenient” approach is using static methods. It’s “easier” to call static method than to inject dependency and call instance method.
But, this leads to

  • hidden dependencies,
  • inability to test,
  • separating behavior from data and
  • anemic domain model

If we don’t address the root cause, we might go down the rabbit hole of asking, sadly popular, questions (like this one), that solve the consequence rather than the cause.
Then we introduce mocking frameworks that have super exotic, advanced features, that solve this issue (like TypeMockIsolator in C# or Mockito in Java) by being able to mock static methods.
These frameworks have very very limited scenarios that justify their use. For frameworks that enable mocking concrete classes, private and static methods that would be just the case when we need to introduce unit tests into legacy code (by definition code without the tests).

More of other examples
  • automocking containers vs SRP – masking/circumventing design smell related to too many dependencies in the class, while making it easy to stub/mock
  • automappers vs proper domain modeling¬†– exposing getters/setters and violating encapsulation and all domain model invariants while making it easy to map incoming DTOs to domain classes
  • etc.
Summary

There are definitely cases when using 3rd party library/framework is fine and a justified decision. You find that advantages outweigh disadvantages and you accept the trade offs and risks of using them. And that’s fine.
My point is not only should you consider the trade offs that come along with introducing new 3rd party dependency (e.g. accepting the possibility that a given library will not be supported or maintained in the future), but also, what does that need tell you about your design? Where did that need come from?

Agile development is all about small steps and incremental approach. Why? Because it provides a fast feedback about if you’re heading in the right direction.
For the given reason, don’t neglect the design feedback you’re getting when you feel a need to include a new dependency.
Think twice and ask yourself where did that need came from in the first place. Answer those ‘why’ questions and you’ll be surprised as to how much of future problems you’ll be able to prevent and how fast you’ll be able to deliver.

Bloated constructor means procedural code

Today, I was browsing through a codebase I’m currently working in and I’ve ran into a class with bunch of dependencies (to be precise, exactly 10 of them).
When I find myself in these kind of situations, I use my own kind of heuristics around figuring out which candidates are for refactoring (which I’ll talk about in some other post) and one of the rules is that if a class has more than 3 dependencies, that should be treated as a code smell and reconsidered from design perspective.
Usually, when having this many dependencies, my first guess is that given class exhibits low cohesion design smell, where some dependencies are used in some methods (at worst each dependency is used in one of the methods). On the contrary, if you use majority of your dependencies in majority of your methods, you have high cohesion.

Also, this usually means that class violates Single Responsibility Principle and is addressed by splitting up given class into different classes with clearly defined responsibilities. In that case, constructors of the separate classes now use only a subset of the original set of dependencies. Furthermore, this has positive ripple effect on the original client classes of original class, since that usually means that they can be split up as well.

So, while looking at the given class and how the dependencies were actually used throughout method implementations, I ran into a method with bunch of code that, when you Clint squint, actually see something like this.

As you can notice, this method uses bunch of dependencies (some are explicitly injected through constructor and some implicit through private methods) and thus, someone could say, has high cohesion. But, when you look carefully at the names of the methods it uses, you figure out that bunch of them start with ‘GetXYZ‘.

“Procedural code gets information than makes decisions, object oriented code tells objects to do things”
Alec Sharp

This means:

  • pure procedural code¬†that mixes different levels of abstractions
  • heavy violation of Tell Don’t Ask principle
  • missing domain concept not represented as a layer of indirection between client class and dependencies

Also, there is a huge negative side effect when you think about unit tests and number of dependencies you need to stub in order to test the given method! From my experience, these are kind of methods that demotivate some developers to practice TDD saying that TDD is too hard to do, time consuming and introduces lot of friction in development.

On the contrary. Use design feedback provided by tests and fix your design. You’ll decrease coupling between classes, have higher cohesion and create code that communicates in domain terms.

Usages of decorator pattern: Factory Decorator

One of¬†the typical usages of Strategy pattern, that I’ve seen in the code “out there”, is the one where it is combined with the Factory pattern. Client has dependencies on¬†factory and strategy interface, where factory produces the instance of concrete strategy, which, in turn, client invokes¬†in order to server the incoming request.

This combination is used since the concrete strategy/algorithm, that is used to serve the request, is known only at run time.

Class diagram of this usage is following:

Typical usage of Factory + Strategy pattern
Typical usage of Factory + Strategy pattern

Here’s, also, an example of a given usage, where MailSender is a client, ShippingMethodFactory is a factory and PriorityMail and RegularMail are concrete strategies:

	public interface IAmShippingMethod
	{
		void ShipToCustomer(Book book, Customer customer);
	}


	public class PriorityMail : IAmShippingMethod
	{
		public void ShipToCustomer(Book book, Customer customer)
		{
			//deliver by priority mail
		}
	}


	public class RegularMail : IAmShippingMethod
	{
		public void ShipToCustomer(Book book, Customer customer)
		{
			//deliver by regular mail
		}
	}


	public interface IAmShippingMethodFactory
	{
		IAmShippingMethod GetShippingMethodFor(Customer customer);
	}


	public class ShippingMethodFactory : IAmShippingMethodFactory
	{
		private readonly IDictionary<MembershipType, IAmShippingMethod> _shippingMethods;

		public ShippingMethodFactory(IDictionary<MembershipType, IAmShippingMethod> shippingMethods)
		{
			_shippingMethods = shippingMethods;
		}

		public IAmShippingMethod GetShippingMethodFor(Customer customer)
		{
			var membershipType = _shippingMethods.Keys.First(type => customer.Has(type));
			return _shippingMethods[membershipType];
		}
	}


	public class MailSender
	{
		private readonly IAmShippingMethodFactory _factory;

		public MailSender(IAmShippingMethodFactory factory)
		{
			_factory = factory;
		}

		public void SendToCustomer(Book book, Customer customer)
		{
			//some logic...
			var shippingStrategy = _factory.GetShippingMethodFor(customer);
			shippingStrategy.ShipToCustomer(book, customer);
		}
	}


	public class CompositionRoot
	{
		public MailSender Bootstrap()
		{
			var shippingMethods = new Dictionary<MembershipType, IAmShippingMethod>
			{
				{ MembershipType.Gold, new PriorityMail()},
				{ MembershipType.Regular, new RegularMail() }
			};
			return new MailSender(new ShippingMethodFactory(shippingMethods));

		}
	}


	public class Customer
	{
		private readonly MembershipType _membershipType;

		public Customer(MembershipType membershipType)
		{
			_membershipType = membershipType;
		}

		public bool Has(MembershipType membershipType)
		{
			return _membershipType.Equals(membershipType);
		}
	}


	public enum MembershipType
	{
		Gold,
		Regular
	}

Recently, while working on one of the projects, I thought about a question: “In case client was not being forced to have (semantic) dependencies¬†to both factory and strategy, what would’ve design looked like?”, and thought about factory decorator came upon. In this case design would look something like this:

Factory Decorator

And the code would look something like this:

	public class ShippingMethodFactory : IAmShippingMethod
	{
		private readonly IDictionary<MembershipType, IAmShippingMethod> _shippingMethods;

		public ShippingMethodFactory(IDictionary<MembershipType, IAmShippingMethod> shippingMethods)
		{
			_shippingMethods = shippingMethods;
		}

		public void ShipToCustomer(Book book, Customer customer)
		{
			var membershipType = _shippingMethods.Keys.First(type => customer.Has(type));
			_shippingMethods[membershipType].ShipToCustomer(book, customer);
		}
	}


	public class MailSender
	{
		private readonly IAmShippingMethod _shippingMethod;

		public MailSender(IAmShippingMethod shippingMethod)
		{
			_shippingMethod = shippingMethod;
		}

		public void SendToCustomer(Book book, Customer customer)
		{
			//some logic...
			_shippingMethod.ShipToCustomer(book, customer);
		}
	}

As for me, client code shouldn’t be bothered with the fact that the request is of dynamic nature and that the factory is needed in order to produce an¬†instance of a class that represents¬†algorithm that needs to be applied for¬†a given request. For this reason, with usage of factory decorator, client code communicates more in domain, rather than implementation specific, terms.
Also, code¬†better aligns with Tell Don’t Ask principle and this has positive repercussions on unit tests, since we can get rid of stubs (in this case factory stub) that return canned concrete strategy and also tests can get a bit more readable.

On the other hand, drawback of this approach is that Factory has to implement Strategy interface and thus conform to it. This can be misleading when reading Factory class, since the names of methods are the ones from Strategy interface.
Also, another drawback of this approach is, since the code is streamlined (client -> factory -> strategy), it can happen¬†that the method signature of Strategy interface has to contain parameter(s) that are not needed by Strategy at all, but only by Factory in order to instantiate appropriate object. In the given example, imagine that Factory doesn’t use membership type information from Customer¬†class in order to decide which concrete strategy is instantiated:

 	var membershipType = _shippingMethods.Keys.First(type => customer.Has(type)); 

but that this decision is based on the current day in the year (e.g. during holidays everyone gets books by priority mail and on other days books are shipped by regular or priority mail), this parameter would have to be added to the Strategy interface, which can be thought of as unrelated information.
Thus, when trying to apply Factory Decorator, for a specific problem, these points need to be taken into account.

If I was to summarize this approach I guess it would be: Factory Decorator trades interface readability for client transparency.