Hazelcast Overview with CPS/NCMP

Hazelcast - distributed in-memory data grid (IMDG) platform, reliable for processing and storing data efficiently.

Defaults

Distributes data, partitions (both primary and backup replicas) across all cluster members
- Back up counts are ignored in a single instance setup
- Default number of partitions 271
Automatically creates backups at partition level which are distributed across all cluster members (back up count is configurable)
Members have core count * 20 threads that handle requests
Single replica of each partition is created
- One of these replica is called primary and others are backups
- Partition table information is sent and update across the members by default every 15 seconds, this can be configured with the ff:
  - hazelcast.partition.table.send.interval
For maps and queues
- one synchronous back up, zero asynchronous back up
- Binary in memory format

Hazelcast Topologies

There are two available Hazelcast Topologies (Embedded, Client/Server).

CPS/NCMP uses Embedded technology and will be the main focus for the rest of this documentation.

Hazelcast as a AP or CP system

Hazelcast as an AP system provides Availability and Partition Tolerance

Data structures under HazelcastInstance API are all AP data structures
- CPS/NCMP uses AP data structures
No master node in an AP cluster
When a partition fails, its data is redistributed to other members

Hazelcast as a CP system provides Consistency and Partition Tolerance

Data structures under HazelcastInstance.getCPSubsytem() API are CP data structures

CP Subsystem

Currently, CP Subsystem contains only the implementations of Hazelcast's concurrency APIs.
Provides mechanisms for leader election, distributed locking, synchronization, and metadata management.
Prioritizes consistency which means it may halt operations on some members
CPMap only available in enterprise edition

Hazelcast Distributed Data Structures

Standard Collections (AP)

Maps
- synchronous backup
  - changes to a map waits to finish update on the primary and all other backups before operation continues
- asynchronous backup
  - changes to a map does not wait to finish update on the primary and all other backups before operation continues
Queues
Lists
Sets
Ringbuffers

Concurrency Utilities

Publish/Subscribe Messaging

CP Subsystem

Hazelcast in CPS/NCMP

Multiple instances of hazelcast created in application.

Each instance is a member and/or client in a Hazelcast cluster.

Multiple instances

Creates complexity
Resource overhead
- Each hazelcast instance needs its own memory, CPU and network resources
- Each instance has its own memory allocation within the JVM heap
Increase in network traffic
- synchronization
- processing
Data inconsistency is a risk

Cluster Sizing

Cluster size depends on multiple factors for application
Adding more members to the cluster
- Capacity for CPU-bound computation rises
- Use of Jet stream processing engine helps this
- Adding members increases the amount of data that needs to be replicated.
- Can improve fault tolerance or performance (if minimum recommended (1 per application instance) is not enough)
Fault tolerance recommendation
- At least 3 members or n+1

Tips

One instance of hazelcast per application instance
- though for fault tolerance, we might need 3 ?
Use map.set() on maps instead of map.put() if you don’t need the old value.
- This eliminates unnecessary deserialization of the old value.
- map.putIfAbsent() - inserts only if key doesn't exist
- map.set() - inserts or updates unconditionally
Consider use of Entry processor of updating map entries in bulk (i.e. get then set)
Changing to OBJECT In-Memory format for specific maps
- applies entry processor directly on the object
- useful for queries
- No serialization/deserialization is performed
Enabling in-memory back up reads
- Allowing member to check its own back up instead of calling the owner of the partition or the primary copy
- Can be useful to reduce latency and improve performance
- Needs back ups to be synchronised
Back Pressure mechanism in hazelcast
- acts like a traffic controller
- helps prevent system overload
- helps prevent crashes due to excessive operation invocations
- Limits the number of concurrent operations and periodically syncs async backups
- Needs to be configured as it is disabled by default
  - configure to be enabled
  - configure max concurrent invocations per partition
Controlling/Configuring partitions
- Putting data structures in one partition
  - reduces network traffic
  - improves data access performance
  - Simplifies data management
- Using custom partition strategies based on requirements (Partitioning strategy)
Reading Map Metrics