Storage Unit 2.0

We are working on a more centralized way to manage client interaction and reduce latency, and we have a new ideation for the storage unit.

Our initial design for the storage unit was a bit complex, handling file chunking and node allocation with direct client interaction uploader API, which also increased the API's latency. We have updated the file storage unit workflow.

Workflow Design

  1. File Upload:

    • The client uploads the file to the Uploader API.

    • The Uploader API validates the file and assigns it a unique File ID.

    • The uploader service will find the node based on the load

    • The file will pushed to the entry node for temporary storage

  2. Chunking and Metadata Creation:

    • In the entry node, the files will be divided into chunks based on the chunk sizes. Each chunk will be assigned a Chunk ID and checksum sha256 for integrity verification.

    • Metadata for the files created

      type Object struct {
          id string
          chunks int
          checksums []string
          chunk_ids []string
          entry_node_id string
      }
    • Metadata will be sent to the metadata management service for the metadata registry

  3. Node Allocation and Distribution

    • The available nodes will get selected on basics of available storage and active health and replication factors and will be ranked in asc order and

    • Now the chunks are pushed to the distributed nodes which are available and nodes send an acknowledgment on receiving the file and saving the file.

    • The metadata service updates the metadata with the final details of the chunk location and ID.

    • There will be a client acknowledgment sent to the client for upload completion.

Design Consideration

1. Fault Tolerance

  • Use replication for redundancy at the chunk level.

  • Employ retry mechanisms for failed chunk uploads.

2. Scalability

  • Make the Uploader API stateless and use a load balancer to handle high traffic.

  • Scale storage nodes dynamically using container orchestration (e.g., Kubernetes).

3. Consistency

  • Ensure eventual consistency in metadata updates.

  • Validate chunk integrity using checksums during distribution.

4. Security

  • Use HTTPS for secure file uploads.

  • Encrypt file chunks during storage and transit if sensitive data is involved.

5. Monitoring

  • Continuously monitor:

    • Node health.

    • Storage capacity.

    • Upload progress.

Techstack for Development of Storage Unit

Components
Tech

Uploader API

Go Mux

Chunking service

Go , goroutines for parallelism

Metadata Service

PostgreSQL, Redis (caching for quick lookups)

Node Allocation Service

Custom Go Service, Kafka (event-driven allocation)

Storage Nodes

local file system

Replication

Custom Go Service

Health Monitoring

Grafana, Prometheus

Visual Representation

Client --> Uploader API --> Entry Node --> Metadata Service
                                     |
                                     +--> Chunking Service
                                          |
                                          +--> Node Allocation Service
                                               |
                                               +--> Storage Nodes

Last updated