HCL Commerce 9.1 introduced a myriad of new technologies and features, and a brand-new caching framework to support them. The HCL Cache integrates with DynaCache and improves performance and scalability with features such as remote caching, real-time monitoring, and auto-tuning capabilities.
In this blog I will go over its key features, and how they will help you exceed the performance goals of your site.
A Multi-Tier Caching Solution
Besides traditional in-memory cache configurations, HCL Cache support remote caching with Redis, and a combination of both.
The use of remote caches has multiple benefits: As the size of local caches is constrained by JVM memory, storing cache remotely on Redis allows for much larger caches, of 100s of GB (or more if Redis is clustered). With practically unlimited cache sizes, hit ratios (the percentage of cache requests that find pre-cached content) are much greater.
Remote caching enables reuse: new containers brought on-line have access to all existing caches, which dramatically reduces warm-up periods and increases resilience to cache-clear operations. As each container no longer needs to create its own cache, new servers can be added while minimizing the added load on shared resources such the database.
The one drawback of using a remote cache is probably I/O. As entries are stored outside the JVM, each cache operation (Get/Put/Invalidate) now involves the network, which increases the operation time (they should still be under 1ms). But what if you could have the best of both worlds? With the default configuration, HCL Caches enable both local and remote caching. Cache entries are added and removed from both, the local and remote caches, but they are only fetched from the remote cache if not found locally. Commonly used keys should often be found in the local cache, saving the I/O time and reducing the load on the remote cache.
Besides the Local+Remote configuration, caches can be configured with only local, or only remote caching. Local-only caching can be used when the objects are not serializable (a requirement for remote caching) or under different tuning scenarios, such as for data that is unlikely to be reused by other containers. Although local caches are not stored remotely, Redis is still used to distribute invalidations (configurable).
With remote-only caching containers do not keep local copies of the cache. This avoids the need to distribute invalidations and can be used for frequently updated data, or to ensure that a single copy of the object is maintained. For example, out-of-the-box remote-only caching is used for Precision Marketing and punch-out.
Redis as the Remote Cache Server
The HCL Cache uses Redis (https://redis.io/) as the remote cache server. Redis is a very popular open-source (BSD licensed) in-memory data store that is used as a database, cache or message broker. Due to its popularity, Redis is available with multiple deployments options and configurations.
Redis open-source can be installed stand-alone or in Kubernetes (e.g. Helm Charts from Bitnami). Most cloud providers offer compatible fully managed versions, and RedisLab sells Redis Enterprise, that can be installed on-prem or used as a service.
Although the open-source versions are the most popular, with Commerce we provide support for the HCL Cache and its use of Redis. For issues with Redis itself we only offer best-effort support. To get official support for the Redis runtime you should consider a fully-managed or paid option.
The HCL Cache uses Redisson (https://github.com/redisson/redisson), an open-source Redis client for Java. It supports all the popular topologies. The configuration of the connection is stored in the HCL Cache config maps (tenant-env-envType-hcl-cache-config) and it is fully customizable.
For performance considerations, caches are mapped to a single node (cache slot). This allows the client to perform cache operations with a single request without running into cross-slot errors. Most operations use LUA scripting for performance.
The easiest way to scale Redis Cache Servers, either for memory or performance, is by using Redis Cluster and adding nodes. Caches are distributed to the nodes using the calculated hash-slot and the slot-node assignment.
With the multi-tier design of the HCL Cache, many get() requests are fulfilled by local caches. This not only improves performance and saves I/O, but also frees up Redis resources to perform other operations. Due to this, the hardware requirements for Redis should be quite “reasonable” (e.g. single master or 3 node cluster), but of course this will vary from site to site depending on the load, the amount of data that is cached, and the hit ratios.
As the HCL Cache uses Redis to store not only cache keys (content) but also metadata such as dependency Id information, you should not remove keys manually as it could impact the integrity of the cache.
Memory Management in Redis
Redis should be configured with the volatile-lru policy. Although expiring keys are automatically removed by Redis, the HCL Cache has its own maintenance processes for dependency id information.
Redis starts evicting keys when memory is full, but in our testing, we found that even though Redis was using LRU eviction, we would still see memory errors on the client. For this reason, we recommend avoiding memory full scenarios by allocating more memory, or by tuning the expiry times of the different caches.
The HCL Cache also has a mechanism to avoid memory errors called “LowMemoryMaintenance”, which automatically removes the entries sooner to expire. It can also disable “put” requests when memory is completely allocated.
HCL Cache and DynaCache
To support existing code (and your customizations), the HCL Cache is installed as a DynaCache Cache Provider. This enables the HCL Cache to be used with Servlet caching (cachespec.xml) and the Java code to use the new caches with the DynaCache APIs (no code changes are required).
All out-of-the-box and new caches are configured with the HCL Cache provider by default. In the Transaction Server, this is done with run-engine commands, and in Liberty, in the cache definition in server.xml. It is still possible to change cache providers, but not really recommended unless for example, you need to continue using WebSphere Extreme Scale. If you are using Elastic Search, the HCL Cache with Redis is the only supported configuration.
Cache sizes (in number of entries and in megabytes) are configured the same way as before. The HCL Cache reads the values from the WebSphere configuration. Other HCL Cache-specific settings, such as local and remote configurations, or the Redis connection information, are stored in the HCL Cache config maps (tenant-env-envType-hcl-cache-config) which are mounted as files under the /SETUP/hcl-cache directory.
The WebSphere Cache Monitor continues to work, but with the HCL Cache’s REST services and support for Prometheus/Grafana real-time monitoring, you should have few reasons to use it. Keep in mind that although the Cache Monitor only shows local-cache entries, invalidate() and clear() operations are also performed on the Remote cache and invalidation messages are propagated to other containers.
Another thing worth highlighting is that HCL Caches do not support disk off-load. Remote caching with Redis should be used instead.
Built-in Invalidation Support
Up to Commerce V8 we used DynaCache/DRS to distribute invalidations. This ability was lost in V9 when we dropped support for the WAS Deployment Manager that is required for replication domains used by DRS. Kafka was introduced in Commerce 9.0 to fill this gap.
If the HCL Cache is used without Redis, Kafka is required for invalidations (same as V9). When Redis is used, the HCL Cache handles invalidations automatically and internally, similar to how DRS works. HCL Cache invalidations are distributed using the Redis PUBSUB capability.
Auto-Tuning Local Cache Sizes (9.1.4+)
Tuning of cache sizes is a critical activity. Under-sized caches lead to degraded performance, and over-sized caches cause memory and stability issues.
Things get easier with the HCL Cache: Not only does it offer monitoring to track cache usage in real-time, but it also incorporates intelligent auto-tuning that monitors Java Heap garbage collection activity and adjusts cache sizes to match the memory conditions.
If there is enough free memory when a cache reaches its maximum size, before starting LRU processing the cache can (by default) grow over its configured maximum (up to 4x the original size). Conversely, if the module detects memory is limited, it can restrict the cache to a size smaller than its original configuration.
This screen capture from a Grafana dashboard (HCL Cache – Local Cache Details) shows a cache that was allowed to grow over its configured maximum size:
High Availability Configurations
Although Redis is mature and stable, high availability needs to be considered same as all other components in an enterprise system.
If you deploy with a cloud-managed solution such as Google Cloud Memorystore or Amazon ElastiCache, the provider takes ownership of availability and maintenance. Managed solutions offer service tiers with different performance and availability options. If you run Redis yourself, e.g., in Kubernetes, Redis has support for replicas that take over if the master crashes. It also has persistence options (RDB/AOF) to backup the in-memory data on disk, so it can be re-loaded after a crash (persistence can slow down Redis).
Back to the Commerce containers, the HCL Cache implements circuit breakers that support standalone and clustered Redis. While a circuit breaker is on, remote connection operations are not attempted. The framework will allow retries after a minute (configurable).
Local caches play a key role for high availability. If Redis becomes unavailable, the pods can continue to use local caches, which helps maintain a level of performance. During an outage we limit the local caching TTL to 5 minutes (configurable at the cache level), as with Redis down, invalidations do not get replicated across containers. Caches that are remote-only such as DM_UserCache (think of the Marketing “Recently viewed” feature) are unavailable during a Redis outage. Remote Cache operations will not be attempted while the circuit breaker is on.
Finally, the Redisson client, which is used by the HCL Cache to connect to Redis, offers several timeouts that are tunable. These configurations can be updated in the HCL Cache config maps (tenant-env-envType-hcl-cache-config).
Monitoring and Alerting with Prometheus and Grafana
An enterprise caching solution needs real-time monitoring infrastructure to support it. So, with HCL Commerce 9.1 we released Prometheus monitoring and Grafana dashboards.
We publish the Grafana dashboards we develop for internal cache tuning, and they can be used as-is, or customized for your own testing, monitoring, or alerting. The dashboards include details for remote operations (HCL Cache – Remote), such as cache sizes, response times (average and percentiles), operations by cache, details on invalidations, state of circuit breakers and more. We also publish dashboards for local caches (Details and Summary), which include usage details (by entries and MB), operations and removals (e.g., expiry, LRU) and more.
If you are using a different solution you do not necessarily need to change. Most commercial monitoring frameworks include options to either scrape the Commerce metrics endpoints directly, or to import the metrics from the Prometheus database. With this you will have all the metrics available, and you can create dashboards in the tooling of your choice.
HCL Cache Manager (9.1.3+)
The Cache Manager pod enables you to interface with the HCL Cache using REST. This is useful for operations such as cache clears and invalidates, and it also provides APIs for debugging and monitoring the cache. The metric for remote cache sizes, for example, is reported by the Cache Manager and not the Commerce pods.
Now is your time to try it!
If you have been using DynaCache for a while, this must feel like Christmas!. The HCL Cache is enabled by default in v9.1 and required if you are using the new search with Elastic. You just need to decide the flavor of Redis to use, enable Prometheus for monitoring, and you are ready to start taking advantage of the new cache to improve the performance of your site.