Storage Unit 2.0
We are working on a more centralized way to manage client interaction and reduce latency, and we have a new ideation for the storage unit.
Our initial design for the storage unit was a bit complex, handling file chunking and node allocation with direct client interaction uploader API, which also increased the API's latency. We have updated the file storage unit workflow.
Workflow Design
File Upload:
The client uploads the file to the Uploader API.
The Uploader API validates the file and assigns it a unique File ID.
The uploader service will find the node based on the load
The file will pushed to the entry node for temporary storage
Chunking and Metadata Creation:
In the entry node, the files will be divided into chunks based on the chunk sizes. Each chunk will be assigned a Chunk ID and checksum sha256 for integrity verification.
Metadata for the files created
Metadata will be sent to the metadata management service for the metadata registry
Node Allocation and Distribution
The available nodes will get selected on basics of available storage and active health and replication factors and will be ranked in asc order and
Now the chunks are pushed to the distributed nodes which are available and nodes send an acknowledgment on receiving the file and saving the file.
The metadata service updates the metadata with the final details of the chunk location and ID.
There will be a client acknowledgment sent to the client for upload completion.
Design Consideration
1. Fault Tolerance
Use replication for redundancy at the chunk level.
Employ retry mechanisms for failed chunk uploads.
2. Scalability
Make the Uploader API stateless and use a load balancer to handle high traffic.
Scale storage nodes dynamically using container orchestration (e.g., Kubernetes).
3. Consistency
Ensure eventual consistency in metadata updates.
Validate chunk integrity using checksums during distribution.
4. Security
Use HTTPS for secure file uploads.
Encrypt file chunks during storage and transit if sensitive data is involved.
5. Monitoring
Continuously monitor:
Node health.
Storage capacity.
Upload progress.
Techstack for Development of Storage Unit
Uploader API
Go Mux
Chunking service
Go , goroutines for parallelism
Metadata Service
PostgreSQL, Redis (caching for quick lookups)
Node Allocation Service
Custom Go Service, Kafka (event-driven allocation)
Storage Nodes
local file system
Replication
Custom Go Service
Health Monitoring
Grafana, Prometheus
Visual Representation
Last updated