Node Management

Focusing on building a robust system to register, monitor, and manage nodes.

Node Management Workflow

  • Node Registration: Each node registers itself with the cluster when it starts.

  • Heartbeat Mechanism: Nodes send periodic health checks to the cluster manager or peers.

  • Failure Detection: Identify and mark failed nodes.

  • Metadata Management: Maintain a dynamic list of active nodes. Rebalancing Trigger: Respond to node addition or removal.

Components to Implement (Current Scope):

Node Metadata:

  • Store: Store information about each node (e.g., ID, IP, status, capacity).

  • Use an in-memory database (e.g., Redis) or a distributed store (e.g., etcd).

Node Registration API:

  • Nodes register themselves with the cluster manager.

  • Store metadata in the metadata store.

Health Monitoring:

  • Nodes periodically send heartbeats to report their status. If a node misses multiple heartbeats, it is marked as failed.

Rebalancing:

  • Tasks/data are automatically redistributed when a node is added or removed.

Future Scope:

  • Rebalancing: Implement data redistribution logic on node failure.

  • UI/CLI: Create a tool to visualize and manage nodes.

Last updated