# Storage Unit 2.0

Our initial design for the storage unit was a bit complex, handling file chunking and node allocation with direct client interaction uploader API, which also increased the API's latency. We have updated the file storage unit workflow.

**Workflow Design**

1. **File Upload:**&#x20;
   * The client uploads the file to the Uploader API.&#x20;
   * The Uploader API validates the file and assigns it a unique File ID.&#x20;
   * The uploader service will find the node based on the load
   * The file will pushed to the entry node for temporary storage
2. **Chunking and Metadata Creation:**
   * In the entry node, the files will be divided into chunks based on the chunk sizes. Each chunk will be assigned a Chunk ID and checksum sha256 for integrity verification.
   * Metadata for the files created&#x20;

     ```
     type Object struct {
         id string
         chunks int
         checksums []string
         chunk_ids []string
         entry_node_id string
     }
     ```
   * Metadata will be sent to the metadata management service for the metadata registry
3. **Node Allocation and Distribution**
   * **The** available nodes will get selected on basics of available storage and active health and replication factors and will be ranked in asc order and&#x20;
   * Now the chunks are pushed to the distributed nodes which are available and nodes send an acknowledgment on receiving the file and saving the file.
   * The metadata service updates the metadata with the final details of the chunk location and ID.
   * There will be a client acknowledgment sent to the client for upload completion.

**Design Consideration**

**1. Fault Tolerance**

* Use replication for redundancy at the chunk level.
* Employ retry mechanisms for failed chunk uploads.

**2. Scalability**

* Make the Uploader API stateless and use a load balancer to handle high traffic.
* Scale storage nodes dynamically using container orchestration (e.g., Kubernetes).

**3. Consistency**

* Ensure eventual consistency in metadata updates.
* Validate chunk integrity using checksums during distribution.

**4. Security**

* Use HTTPS for secure file uploads.
* Encrypt file chunks during storage and transit if sensitive data is involved.

**5. Monitoring**

* Continuously monitor:
  * Node health.
  * Storage capacity.
  * Upload progress.

**Techstack for Development of Storage Unit**

| Components              | Tech                                               |
| ----------------------- | -------------------------------------------------- |
| Uploader API            | Go Mux                                             |
| Chunking service        | Go , goroutines for parallelism                    |
| Metadata Service        | PostgreSQL, Redis (caching for quick lookups)      |
| Node Allocation Service | Custom Go Service, Kafka (event-driven allocation) |
| Storage Nodes           | local file system                                  |
| Replication             | Custom Go Service                                  |
| Health Monitoring       | Grafana, Prometheus                                |

**Visual Representation**

```
Client --> Uploader API --> Entry Node --> Metadata Service
                                     |
                                     +--> Chunking Service
                                          |
                                          +--> Node Allocation Service
                                               |
                                               +--> Storage Nodes
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dipghoshraj.gitbook.io/dipghoshrajwiki/storage-unit-2.0.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
