Caching: Why You Should Care!

What type of database should I use? What object-relational mapper? What deployment strategy? Sometimes even user interface is discussed and designed before writing the next killer application. Unfortunately, it is still quite common to forget  that those killer applications will quickly need to reach a critical mass. These days a critical mass is not just a few hundred or thousand people - more like millions!

Providing an architecture to keep up with growth and to allow easy scaling has to include caching in one way or another. Caching has to be part of the initial design and requires the same amount of brainpower as the database discussion. Introducing a clean caching layer afterwards is often as complex as rewriting the system to feature it in the first place.

After realizing you’ll need caching at some point and that it's easier to introduce sooner rather than later, you'll wonder: What exactly is a cache?

A cache is commonly a component (internal or external to the application) used to store and provide fast access to portions of datasets. These datasets are called hot data, because they are used often or have been used recently. There are two types of caches. The first is information that otherwise would either take a long to time to calculate/process. The second originates from another underlying external resource. For this, caches speed up the access times to prevent slow query operations or high latency round trips.

Caches are designed to quickly respond to simple requests that are comparable to the usage of a map or dictionary (key-value pairs). Therefore, they outperform typical general databases or other systems by an order of magnitude and answer in near real-time. In addition, most cache implementations offer read-through and write-through to those underlying data storages for transparent access.

Caching First is an architectural design pattern which reminds the architects to treat caching as a first-class citizen when designing new software architectures. However, a good caching layer is not as easy as it sounds, as lots of architectural and use case factors play into design decisions.

Geographical Caches, Partial Caches, Distributed Caches and all of the other types of common caching solutions fulfill different use cases and solve different problems. Selection of the right caching strategy is only the beginning, other things to consider include picking the best matching eviction algorithm (defining if and when elements are discarded, removed or updated) or which topology (local or remote caches) will be used.

Hazelcast’s new Caching Strategies whitepaper that I wrote provides a general overview of caching and its purpose, while offering deep insight into a number of caching strategies, their advantages and disadvantages, and when to apply them.