Day 4

On day 3 we successfully implemented the node spinup in the node manager and we were able to fully spin up a node with docker in the port allocation from a port pool to overcome the docker dynamic port setup in the network and we can control the ports from a central entity and track the details of the node. Hence we are using volumes we can easily use the volume to recover the data in case of a node crash. Now we are planning to implement the file saving in the nodes with the core DFS principle. we still have to implement the KPIs in the node manager and health monitor in there. But we are first going to build the storage solution for this here is an outlined document of the current planning for the implementation.

Implementing file storage in a distributed file system (DFS) involves breaking files into chunks, distributing those chunks across multiple nodes, and ensuring redundancy and efficient retrieval. Here's an outline of how we can achieve this with the node containers running on different ports:

buting those chunks across multiple nodes, and ensuring redundancy and efficient retrieval. Here's an outline of how we can achieve this with the node containers running on different ports:

1. Core Principles for File Distribution

Chunking: Divide files into smaller, fixed-size chunks (e.g., 64MB).
Redundancy: Replicate chunks across multiple nodes for fault tolerance.
Consistency: Use consistent hashing or another placement strategy for chunk allocation.
Scalability: Ensure the system can scale with minimal rebalancing.
Fault Tolerance: Implement replication and health checks to handle node failures.

2. Architecture Components

2.1 Metadata Manager

Maintains metadata about:
- File-to-chunk mappings.
- Chunk-to-node mappings.
- Node health and availability.
Centralized in a database (e.g., PostgreSQL), or distributed using a system like Zookeeper or etcd.

2.2 File Chunking Service

Splits files into chunks and assigns unique IDs (e.g., hashes) to each chunk.
Handles encryption or compression if needed.

2.3 Node Allocation Logic

Uses consistent hashing or another deterministic algorithm to assign chunks to nodes.
Ensures replication factor (e.g., 3 copies) is met.
Avoids overloading any single node by balancing load.

2.4 Node API

Each node exposes APIs (e.g., via HTTP) to:

Upload chunks: POST /chunk
Retrieve chunks: GET /chunk/{chunk_id}
Delete chunks: DELETE /chunk/{chunk_id}

2.5 Client API

The client interacts with the DFS via APIs exposed by the system:

Upload file: Breaks files into chunks, assigns them to nodes, and stores metadata
Download file: Fetches chunks from nodes and reassembles them.
Delete file: Removes all chunks and associated metadata.

3. Process Workflow

3.1 File Upload

Receive File: The client sends the file to the DFS Client API.
Chunk the File: Split the file into chunks.
Assign Nodes: Use the Node Allocation Logic to determine which nodes will store each chunk.
Upload Chunks: Send each chunk to the respective node's POST /chunk API.
Save Metadata: Update the Metadata Manager with file-to-chunk and chunk-to-node mappings.

3.2 File Retrieval

Request File: The client requests a file via the DFS Client API.
Retrieve Metadata: The Metadata Manager provides the chunk-to-node mappings.
Fetch Chunks: Retrieve each chunk from its respective node using GET /chunk/{chunk_id}.
Reassemble File: Combine chunks into the original file and return it to the client.

3.3 File Deletion

Request Deletion: The client requests file deletion.
Retrieve Metadata: The Metadata Manager identifies all chunks associated with the file.
Delete Chunks: Issue DELETE /chunk/{chunk_id} to the respective nodes.
Update Metadata: Remove file and chunk mappings from the Metadata Manager.

4. Challenges and Solutions

4.1 Data Distribution

Use consistent hashing to minimize data movement when scaling up or down.
Implement a replication strategy to ensure chunks are stored on multiple nodes.

4.2 Fault Tolerance

Periodically check node health and re-replicate chunks if a node goes down.
Use a quorum-based approach to ensure data consistency.

4.3 Performance

Optimize chunk size for network and storage efficiency.
Use caching (e.g., Redis) for frequently accessed metadata.

4.4 Scalability

Use Kafka for asynchronous chunk uploads and metadata updates.
Shard metadata across multiple database instances if needed.

6. Next Steps

Design the Metadata Manager schema (tables for file-chunk and chunk-node mappings).
Implement Node APIs for chunk storage, retrieval, and deletion.
Develop the Node Allocation Logic using consistent hashing.
Create a prototype for the client upload and retrieval process.
Test the system with multiple nodes to ensure reliability and performance.

PreviousDay 3

Last updated 7 months ago