/
Hazelcast Overview with CPS/NCMP

Hazelcast Overview with CPS/NCMP

Hazelcast - distributed in-memory data grid (IMDG) platform, reliable for processing and storing data efficiently.

Defaults

  • Distributes data, partitions (both primary and backup replicas) across all cluster members

    • Back up counts are ignored in a single instance setup

    • Default number of partitions 271

  • Automatically creates backups at partition level which are distributed across all cluster members (back up count is configurable)

  • Members have core count * 20 threads that handle requests

  • Single replica of each partition is created

    • One of these replica is called primary and others are backups

    • Partition table information is sent and update across the members by default every 15 seconds, this can be configured with the ff:

      • hazelcast.partition.table.send.interval

  • For maps and queues

    • one synchronous back up, zero asynchronous back up

    • Binary in memory format

 

Hazelcast Topologies

There are two available Hazelcast Topologies (Embedded, Client/Server).

CPS/NCMP uses Embedded technology and will be the main focus for the rest of this documentation.

  • Scaling application instance in embedded mode also scales the hazelcast cluster

  • Application instance and hazelcast shares the same JVM and resources.

 

Hazelcast as a AP or CP system

Hazelcast as an AP system provides Availability and Partition Tolerance

  • Data structures under HazelcastInstance API are all AP data structures

    • CPS/NCMP uses AP data structures 

  • No master node in an AP cluster

  • When a partition fails, its data is redistributed to other members

Hazelcast as a CP system provides Consistency and Partition Tolerance

  • Data structures under HazelcastInstance.getCPSubsytem() API are CP data structures

 

CP Subsystem

  • Currently, CP Subsystem contains only the implementations of Hazelcast's concurrency APIs.

  • Provides mechanisms for leader election, distributed locking, synchronization, and metadata management.

  • Prioritizes consistency which means it may halt operations on some members

  • CPMap only available in enterprise edition

 

Hazelcast Distributed Data Structures

Standard Collections (AP)

  • Maps

    • synchronous backup

      • changes to a map waits to finish update on the primary and all other backups before operation continues

    • asynchronous backup

      • changes to a map does not wait  to finish update on the primary and all other backups before operation continues

  • Queues

  • Lists

  • Sets

  • Ringbuffers

Concurrency Utilities

  • AtomicLongs

  • AtomicReferences

  • Semaphores

  • CountdownLatches

Publish/Subscribe Messaging

CP Subsystem

 

 

Hazelcast in CPS/NCMP

Multiple instances of hazelcast created in application.

Each instance is a member and/or client in a Hazelcast cluster.

 

Multiple instances

  • Creates complexity

  • Resource overhead

    • Each hazelcast instance needs its own memory, CPU and network resources

    • Each instance has its own memory allocation within the JVM heap

  • Increase in network traffic

    • synchronization

    • processing

  • Data inconsistency is a risk

 

Cluster Sizing 

  • Cluster size depends on multiple factors for application

  • Adding more members to the cluster

    • Capacity for CPU-bound computation rises

    • Use of Jet stream processing engine helps this

    • Adding members increases the amount of data that needs to be replicated.

    • Can improve fault tolerance or performance (if minimum recommended (1 per application instance) is not enough)

  • Fault tolerance recommendation

    • At least 3 members or n+1

See Hazelcast's Cluster Sizing

 

Tips

  • One instance of hazelcast per application instance

    • though for fault tolerance, we might need 3 ?

  • Use map.set() on maps instead of map.put() if you don’t need the old value.

    • This eliminates unnecessary deserialization of the old value.

    • map.putIfAbsent() - inserts only if key doesn't exist

    • map.set() - inserts or updates unconditionally

  • Consider use of Entry processor of updating map entries in bulk (i.e. get then set)

  • Changing to OBJECT In-Memory format for specific maps

    • applies entry processor directly on the object

    • useful for queries

    • No serialization/deserialization is performed

  • Enabling in-memory back up reads

    • Allowing member to check its own back up instead of calling the owner of the partition or the primary copy

    • Can be useful to reduce latency and improve performance

    • Needs back ups to be synchronised

  • Back Pressure mechanism in hazelcast

    • acts like a traffic controller 

    • helps prevent system overload

    • helps prevent crashes due to excessive operation invocations

    • Limits the number of concurrent operations and periodically syncs async backups

    • Needs to be configured as it is disabled by default

      • configure to be enabled

      • configure max concurrent invocations per partition

  • Controlling/Configuring partitions

    • Putting data structures in one partition

      • reduces network traffic

      • improves data access performance

      • Simplifies data management

    • Using custom partition strategies based on requirements (Partitioning strategy)

  • Reading Map Metrics

 

Backups in Hazelcast

  • Hazelcast does not support backup redistribution, meaning if the back up configuration of a running cluster is changed then the existing backups are unaffected. The cluster needs to be restarted to take effect of the new configuration.

    • For example: The following cluster is configured to have the default backup count of 1, when the backup count is increased to 2, no new backups will be created. If the backup count is decreased to 0, A2 will still exist but may not be valid.

Member 1

Member 2

Member 1

Member 2

Data: A1

Data: B1

Backup: A2

Backup: B2

2 types of data backups

By default, hazelcast structures are configured to have 1 synchronous back up and 0 asynchronous back up.

Synchronous backups

  • operations are blocked until all backups are copied

  • default count is 1

  • sets by “.setBackupCount(1)”

Asynchronous (async) backups

  • operations proceed without waiting for backup completion

  • best for performance over immediate consistency

  • sets by “.setAsyncBackupCount(1)”

 

Different scenarios

Scenario

Data Availability

Performance

 

Scenario

Data Availability

Performance

 

Synchronous Backup Count = 1

Async Backup Count = 0

As main data fails, the back up takes over immediately

Slower write operations as it awaits for sync back up to complete

 

Synchronous Backup Count = 0

Async Backup Count = 1

As main data fails, the async back up can take over but may be stale

Faster write due to no wait time on sync backup, but if there's failure data might be lost

 

Synchronous Backup Count = 0

Async Backup Count = 0

If main data fails, all data is lost

Fastest write as there’s no backups to manage

most likely solution

Synchronous Backup Count = 1

Async Backup Count = 1

Immediate back up when main data fails, async back up provides extra security

Slower writes but more data integrity and redundancy

 

 

Proposal scenario for CPS-NCMP:

Synchronous backup count = 0

Async backup count = 1

  • allows one backup count to be created asynchronously, meaning it won’t block operations while the backup is being created

Map Backups

  • Memory needed for map is “M + B(M)” wherein M is the memory used by primary data and B(M) is the number of backups

  • Enabling In-Memory backup reads

    • allows local members to access data from their own backups than relying on the primary owner of data.

    • sets by “.setReadBackupData(true)”

Scenario with in memory backup reads enabled:

  1. Local backup read attempt → 1a. Local backup contains data → 1b. Returns data

  2. Local back up read attempt → 2a. Local backup does contain data → 2b. Fetches data from primary source → 2c. Returns data

 

Hazelcast data structures overview

  • Hazelcast data structures have options for using synchronous operations or asynchronous operations.

  • Synchronous operations blocks the calling thread until the operation is completed.

  • Asynchronous operations allows the calling thread to continue executing while the operation is processed in the background.

 

Here are some of the operation examples…

IMap

Synchronous operation

Asynchronous (async) operation

Synchronous operation

Asynchronous (async) operation

put(_)

putAsync(_,_)

remove(_)

removeAsync(_)

get(_)

getAsync(_,_)

set(_,_)

setAsync(_,_)

IQueue

Synchronous operation

Asynchronous (async) operation

Synchronous operation

Asynchronous (async) operation

add(_)

addAsync(_,_)

remove(_)

removeAsync(_)

poll(_)

pollAsync(_,_)

ISet

Synchronous operation

Asynchronous (async) operation

Synchronous operation

Asynchronous (async) operation

add(_)

addAsync(_)

remove(_)

removeAsync(_)

contains(_)

containsAsync(_)

 

CPS-NCMP Hazelcast data structures

 

Current Description

Proposal changes

persistence

Done

moduleSyncWorkQueue

holds data nodes awaiting sync process

  • BlockingQueue<DataNode>

  • Change to IQueue<DataNode>

    • distributed queue implementation in Hazelcast

no

 

moduleSyncStartedOnCmHandles

used as a ‘progress map’ for cm handles as it starts module sync

  • Map<String,Object>

    • e.g. entry : [cmHandle1:'started']

  • Change to use of ISet<String>

    • ensures unique entries .. removes use of .putIfAbsent

no

 

dataSyncSemaphores

holds cmhandleIDs with sync state values to signify either data sync done or in progress

  • IMap<String, Boolean>

  • Configuration change from synchronousBackupCount = 0 and asynchronousBackupCount = 1 gives resilience without blocking operations

no

 

trustLevelPerCmHandle

~upon registration of cmhandles to map, unless specified default value is assigned as TrustLevel.Complete

~upon updating dmiTrustLevel, this map is used to retrieve cm handle trustlevel of affected cmhandles

  • Change to use of IMap<> and .putAsync

  • on update of dmiTrustLevel - the value returned from this map is only for notifications

    • does not entirely need to be from shared cache and can have default values (similar to use of .getOrDefault())

maybe

 

trustLevelPerDmiPlugin

 

  • might not need to be shared

no

 

cmNotificationSubscriptionCache

 

  • Map

  • use of .put() on Map

  • Change to IMap to allow use of hazelcast async operations i.e. .putAsync()

  • benefits from configuration of InMemoryFormat.Object

no

 

 

TrustLevel Data structure Overview

Map.TrustLevelPerCmHandle

How is it filled?
  • CmHandleRegistrationService calls TrustLevelManager

  1. Call TrustLevelManager registerCmHandles

  2. Check if Map.TrustLevelPerCmHandle contains cm handle

  3. Put cmhandle in Map.TrustLevelPerCmHandle if it doesnt exist

    1. with TrustLevel.Complete if no initial trust level stated

  4. SendAvcNotificationEvent if initial trust level is NONE

Proposed changes

N/A

 

 

Questions
  • When initial trust level is not stated , why not also send event?

    • for instance update

Notes
  • cmhandles are spread between two instances (batch)

  • trustLevel is not persisted

  • in initial cm handle registration store cmhandle with trustLevel.NONE

    • see performance for this with/without hazelcast

    • for cmhandle search

      • near cache

How is it updated?
  • DeviceTrustLevelConsumer

  1. Gets event update from DMI wherein cmhandle ID and new cmhandle trust level is data

  2. call TrustLevelManager updateCmHandleTrustLevel

 

  • TrustLevelManager

  1. Call updateCmHandleTrustLevel

  2. Get DmiServiceName for given cmHandleId

  3. Get DmiTrustLevel from Map.TrustLevelPerDmiPlugin

  4. Get old cmhandle trustLevel from Map.TrustLevelPerCmHandle

  5. Put new cmhandle trustLevel from Map.TrustLevelPerCmHandle

  6. Get effective trustlevel and sendAVCevent if required

Proposed changes
  • TrustLevelManager

    1. if DmiTrustLevel is not in Map.TrustLevelPerDmiPlugin

      1. get status from dmi rest client and add to Map.TrustLevelPerDmiPlugin (3)

      2. or get status from rest DMI client anyhow

    2. if cmhandle is not in Map.TrustLevelPerCmHandle (4)

      1. call register cmHandle

  • use different consumer groups for each instance

    • allows both instance to listen to the event

    • issue of event based syncing: duplication of event forwarded

 

Questions

 

Notes

  • trustLevel.NONE stored and if trustLevel.COMPLETE removed from map

How is it queried?
  • CmHandleQueryServiceImpl

  1. Call getCmHandleReferencesByTrustLevel

  2. Iterate through all DMI plugin identifiers from Map.TrustLevelPerDmiPlugin

  3. Get all cm handles for given dmi plugin identifier

  4. For all retrieved cm handles , get trustLevel from

    Map.TrustLevelPerCmHandle

  5. Get effective trust level and return if it is same as target trust level

Proposed changes
  • Get all entries in Map.TrustLevelPerCmHandle

    • no trustLevelPerCmHandle entries

  • Retrieve dmi name via cmHandleId

    • Get dmiTrustLevel from Map.TrustLevelPerDmiPlugin or if DmiTrustLevel is not in Map.TrustLevelPerDmiPlugin

      • get status from dmi rest client and add to Map.TrustLevelPerDmiPlugin

 

 

Questions
  • Is there another way to get all DMI plugin identifiers available?

Notes

 

 

Map.TrustLevelPerDmiPlugin

How is it filled?
  • CmHandleRegistrationService calls TrustLevelManager

  1. Call TrustLevelManager registerDmiPlugin

  2. Put dmiServiceName in Map.TrustLevelPerDmiPlugin with TrustLevel.COMPLETE

Proposed changes
  •  

 

 

Questions
  •  

Notes
How is it updated?
  • DmiPluginTrustLevelWatchDog

  1. Iterate through all DMI plugin entries from Map.TrustLevelPerDmiPlugin

  2. Get DMI health status using Dmi rest client

  3. If Old Dmi trustlevel is not same as new trustlevel then

    1. get affected cm handle IDs

    2. call TrustLevel updateDmi

 

  • TrustLevelManager

  1. Call updateDmi

  2. Get old DMI trustLevel from Map.TrustLevelPerDmiPlugin

  3. Put new DmiTrustLevel into Map.TrustLevelPerDmiPlugin

  4. Get each affected cmHandleIDs trustlevel from Map.TrustLevelPerCmHandle

    1. get effective trustLevel

    2. SendAvcNotificationEvent if required

Proposed changes
  • TrustLevelManager

    • Get old DMItrustLevel from rest client

    • if affected cmHandleID is not in Map.TrustLevelPerCmHandle (4) i.e. also not been reported for any change by consumer

      • call registerCmHandle (updated through event consumer)

      • or use of @Cacheable annotation for read-through cache like used in YangTextSchemaSourceSetCache wherein if not in cache/map , it is fetched from database

 

 

 

Questions
  • Would watchdog still be needed if we get DMI trustLevel directly whenever DMI trustLevel is needed?

Notes

Nov 7, 2024

  • assume DMIplugin trust.Level complete when not in collection

 

 

 

 

 

 

Hazelcast Maps

Master branch

 

in-memory-format

back-up count

async-backup count

 

in-memory-format

back-up count

async-backup count

trustLevelPerDmiPlugin

BINARY

1

0

cmNotificationSubscription

BINARY

1

0

moduleSyncStarted

BINARY

1

0

dataSyncSemaphores

BINARY

1

0

trustLevelPerCmHandle

BINARY

1

0

 

k6 test ran with focus on change of trustLevelPerCmHandle map

  1. master

  2. back up count reduced to 0 (0 sync , 0 async)

  3. back up count (0 sync, 1 async)

  4. use of Imap.putAsync

  5. use of map.putAsync, back up count (0 sync, 1 async)

  6. map<String,String> instead of <String, TrustLevel>

  7. storing trustLevel NONE only + use of .putAll in registration

  8. storing trustLevel NONE only + Use of keySet() with a HashSet,and check if map contains key before calling remove

  9. use of putAll in registration + Use of keySet() with a HashSet,and check if map contains key before calling remove

  10. Imap putAllAsync putAsync removeAsync in de/registration + Use of keySet() with a HashSet,and check if map contains key before calling remove

  11. near cache

 

master

2

3

4

5

6

7

8

9

10

 

master

2

3

4

5

6

7

8

9

10

Registration of CM-handles

54.588

51.077

57.190

57.195

58

51.568

65.286

53

55

56

De-registration of CM-handles

211.162

217.382

214.908

166.957

199.40

210.433

235.363

160

211

227

CM-handle ID search with No filter

329.486

356.791

341.746

342.753

356.414

355.305

348.965

357

363

379

CM-handle ID search with Module filter

272.013

287.342

275.668

279.459

289.135

287.880

280.386

297

298

301

CM-handle ID search with Property filter

746.382

818.973

765.278

761.433

779.455

788.856

783.455

814

799

839

CM-handle ID search with Cps Path filter

742.759

801.932

769.161

758.174

783.990

786.947

776.063

809

808

839

CM-handle ID search with Trust Level filter

8491.230

8938.885

8675.511

8204.402

8825.536

8414.379

8417.216

8456

8944

9307

CM-handle search with No filter

10981.454

11567.080

11154.749

10854.956

11542.284

10996.947

10988.398

11169

11566

12027

CM-handle search with Module filter

12673.197

13464.128

12901.995

12338.793

13257.547

12681.682

12336.462

12873

13479

14062

CM-handle search with Property filter

13079.920

14051.511

13476.860

13067.756

13825.757

13437.706

13233.103

13500

14132

14774

CM-handle search with Cps Path filter

13081.646

14053.627

13478.719

13069.727

13826.442

13437.273

13231.833

13504

14131

14776

CM-handle search with Trust Level filter

20139.598

21943.904

20612.278

20019.675

21060.484

20.074

20131.543

20672

21606

22557.56

NCMP overhead for Synchronous single CM-handle pass-through read

19.935

21.403

20.323

21.298

20.956

21.090

19.948

20.240

20.9

21.56

NCMP overhead for Synchronous single CM-handle pass-through read with alternate id

51.035

55.459

51.415

52.857

54.433

53.952

50.938

53.92

54.8

56.34

NCMP overhead for Synchronous single CM-handle pass-through write

26.965

28.758

26.560

27.948

28.300

27.882

25.731

27.79

28.0

30.43

NCMP overhead for Synchronous single CM-handle pass-through write with alternate id

42.645

46.496

44.836

43.100

44.950

45.396

43.625

45.91

46.8

49

 

 

Hazelcast metrics: trustLevelPerCmHandle map

Map Statistics

Gets

 

master

2

3

7

8

9

10

11

 

master

2

3

7

8

9

10

11

member 1

5,180,000

-

-

-

5,480,000

3,780,000

 

5,360,000

member 2

4,560,000

-

-

-

4,120,000

5,380,000

 

5,220,000

total

9,740,000

-

-

9760000

9600000

9,160,000

 

10,580,000

~487 get operation per cm handle

 

Puts

 

master

2

3

7

8

9

10

 

master

2

3

7

8

9

10

member 1

10000

-

-

0

0

10 000

 

member 2

10000

-

-

0

0

10 000

 

total

20000

-

-

0

0

20 000

 

Removals

 

master

2

3

7

8

9

10

 

master

2

3

7

8

9

10

member 1

10000

-

-

10000

10000

0

0

member 2

10000

-

-

10000

10000

0

0

total

20000

-

-

20000

20000

0

0

 

 

Map Throughput Statistics

 

master

member 1

member 2

2

3

4

5

6

7

8

9

10

11

 

master

member 1

member 2

2

3

4

5

6

7

8

9

10

11

Put/s

(registration)

55.56

57

 

 

 

53.51

54.59

46.34

46.29

-

 

55

55

55

57

 

Get/s

3813.95

4787.64

 

 

 

 

4253.22

2963.88

 

 

 

 

 

Removal/s

(de registration)

76

78

 

 

 

53.04

53.33

 

 

 

60

60

89.09

88.33

 

Avg put latency (ms)

1.75

1.64

1.93

1.88

1.18

1.29

1.62

1.45

2.44

2.16

1.49

1.25

 

 

0.1

0.09

0.09

0.13

1.51

1.42

Avg get latency (ms)

0.42

0.41

0.38

0.37

0.39

0.39

0.38

0.36

0.41

0.39

0.39

0.38

0.38

0.39

0.38

0.39

0.4

0.4

0.41

0.42

0.35

0.34

Avg remove latency (ms)

0.34

0.36

0.25

0.24

0.25

0.26

0.31

0.29

0.28

0.2

0.25

0.25

0.14

0.14

0.21

0.21

0.26

0.25

0.72

0.74

0.21

0.22

Max avg put latency (ms)

79

87

101

95

77

73

78

87

167

83

104

74

-

 

69

61

63

90

77

86

Max avg get latency (ms)

427

427

391

380

308

405

313

344

340

360

362

364

359

379

427

427

389

407

430

408

611

385

Max avg remove latency (ms)

30

22

27

16

2

18

20

30

12

9

33

8

14

8

3

3

16

12

2

7

21

38

 

Near Cache use (note: statistics enabled)

 

image-20241203-104518.png

Hazelcast metrics : moduleSyncStarted map

Map Statistics

Puts

 

master

2

3

4

 

master

2

3

4

member 1

25228

19,944

21124

-

member 2

21278

23,818

18469

-

total

46506

43,762

39593

-

Removals

 

master

2

3

4

 

master

2

3

4

member 1

24738

24,911

10000

-

member 2

10000

10,000

25409

-

total

34738

34,911

35409

-

 

 

Map Throughput Statistics

 

master

2

3

4

5

 

master

2

3

4

5

Avg put latency

1.19 ms, 1.56ms

1.71 ms, 1.52 ms 1.75ms 1.5ms

1.27 1.19 1.48 1.53

 

 

Avg remove latency

0.24ms, 0.13ms

0.22 ms, 0.12 ms 0.11ms 0.22ms

0.19 0.11 0.12 0.23

 

 

Max avg put latency

83ms, 84ms

101ms, 88ms 96ms 103ms

144ms 91ms 93 78

 

 

Max avg remove latency

11ms, 20ms

6ms, 1ms 1ms 10ms

2ms 1ms 4 69

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Related pages