/
Spring Boot MinIO Integration: Lab and Exercises

Spring Boot MinIO Integration: Lab and Exercises

 

This document provides a hands-on lab to integrate Spring Boot with MinIO. It includes step-by-step instructions for setting up MinIO using Docker, configuring a Spring Boot application, and testing it using Postman. Exercises are included to reinforce learning.

Lab Objectives

  1. Set up MinIO using Docker.

  2. Integrate MinIO with a Spring Boot application.

  3. Implement CRUD operations for file management.

  4. Test the implementation using Postman.

Prerequisites

  • Docker: Ensure Docker is installed and running.

  • Java: Install Java 17+.

  • Spring Boot: Use Spring Boot 3.x.

  • Postman: Install Postman for API testing.

  • Maven: Install Maven for dependency management.

Step 1: Set Up MinIO Using Docker

  1. Pull the MinIO Docker image:

    docker pull minio/minio
  2. Run the MinIO container:

    docker run -p 9000:9000 -p 9001:9001 \ -e "MINIO_ROOT_USER=YOUR_ACCESS_KEY" \ -e "MINIO_ROOT_PASSWORD=YOUR_SECRET_KEY" \ --name minio \ -v $(pwd)/data:/data \ -v $(pwd)/config:/root/.minio \ minio/minio server /data --console-address ":9001"
  3. Access the MinIO web UI at http://localhost:9001.

  4. Login using the credentials provided as MINIO_ROOT_USER and MINIO_ROOT_PASSWORD.

  5. Create a bucket named mybucket.

Step 2: Create Spring Boot Project

  1. Generate a Spring Boot project from Spring Initializr with the following dependencies:

    • Spring Web

    • Spring Boot DevTools

    • Lombok

    • MinIO SDK (manually added as a dependency)

  2. Add the MinIO SDK dependency in pom.xml:

    <dependency> <groupId>io.minio</groupId> <artifactId>minio</artifactId> <version>8.5.2</version> </dependency>

Step 3: Configure MinIO in Spring Boot

  1. Add MinIO configuration properties in application.properties:

  2. Create a configuration class:


Step 4: Implement MinIO Features

1. File Upload

File Upload

2. List Files

3. Download File

4. Delete File


Step 5: Test with Postman

1. File Upload

  • Endpoint: POST /api/files/upload

  • Headers: Content-Type: multipart/form-data

  • Body: Form-data with key file and value as the file to upload.

2. List Files

  • Endpoint: GET /api/files/list

  • Headers: None

3. Download File

  • Endpoint: GET /api/files/download/{filename}

  • Headers: None

  • Response: Downloaded file.

4. Delete File

  • Endpoint: DELETE /api/files/delete/{filename}

  • Headers: None

  • Response: Confirmation message.

Export “File Management API.postman_collection.json“ into postman

 

What is Erasure Coding?

Erasure Coding (EC) is a data protection and storage technique that splits data into multiple fragments, encodes it with redundant data pieces, and stores them across different storage nodes or disks. This allows data to be reconstructed even if some of the fragments are lost or unavailable, making it an efficient and reliable way to ensure data durability and fault tolerance.


How Erasure Coding Works:

  1. Data Fragmentation:

    • The original data is divided into smaller chunks or fragments. For example, a file of 1 GB might be divided into four 250 MB chunks.

  2. Redundancy Encoding:

    • Using mathematical algorithms (e.g., Reed-Solomon codes), redundant chunks (parity fragments) are created from the original data chunks.

    • These redundant fragments do not duplicate the data but encode it in a way that allows any lost chunks to be reconstructed.

  3. Storage Across Nodes/Disks:

    • Both the original and redundant chunks are distributed across different storage nodes or disks.

    • For example, in a 4+2 EC configuration, there are 4 data fragments and 2 parity fragments, stored across 6 disks or nodes.

  4. Data Reconstruction:

    • If one or more fragments are lost (e.g., due to a disk failure), the missing data can be reconstructed using the remaining fragments and the parity fragments.


Erasure Coding vs. RAID:

Feature

Erasure Coding

RAID

Feature

Erasure Coding

RAID

Redundancy

Uses parity fragments

Uses parity or mirroring

Storage Efficiency

High (less redundancy data)

Moderate to low

Fault Tolerance

Can tolerate multiple failures

Limited tolerance depending on RAID level

Rebuild Time

Faster (only specific fragments)

Slower (rebuilds entire disk)

Scalability

Highly scalable

Limited scalability


Advantages of Erasure Coding:

  1. High Fault Tolerance:

    • EC can tolerate the loss of multiple disks or nodes depending on the configuration, ensuring high reliability.

  2. Efficient Use of Storage:

    • Unlike full data replication (e.g., 3x replication, which requires 300% storage), EC provides redundancy with significantly lower overhead (e.g., 150% or 125% depending on the configuration).

  3. Data Durability:

    • EC ensures data integrity over time, even in environments prone to disk failures.

  4. Cost-Effectiveness:

    • By using less redundant storage compared to replication, EC reduces storage costs for large-scale systems.


Disadvantages of Erasure Coding:

  1. Increased Compute Overhead:

    • Encoding and decoding processes require significant computational resources, especially for large-scale systems.

  2. Latency:

    • The process of reconstructing lost data can introduce latency, which may not be ideal for real-time systems.

  3. Complexity:

    • Implementation and management of EC require advanced knowledge and proper configuration.

  4. Small File Overhead:

    • For small files, the overhead of EC may outweigh the benefits compared to simpler redundancy techniques.


Use Cases of Erasure Coding:

  1. Object Storage:

    • Widely used in object storage systems like MinIO, Amazon S3, and Ceph to provide high durability and fault tolerance.

  2. Cold Storage:

    • Ideal for archival and backup storage systems where data access is infrequent but durability is critical.

  3. Cloud Storage:

    • Used in distributed cloud storage environments to protect data across geographically dispersed data centers.

  4. Big Data Applications:

    • Erasure coding is essential for data-intensive applications like analytics, where data must remain available and durable across failures.


Erasure Coding in MinIO:

  • MinIO uses erasure coding to protect data stored across multiple drives or nodes.

  • The configuration specifies the number of data and parity blocks (e.g., 4+2, 6+3), allowing MinIO to reconstruct data even if multiple drives fail.

  • MinIO automatically handles encoding, storage, and reconstruction of data in the background, ensuring high availability and durability.


Example: Erasure Coding in MinIO

Suppose you configure MinIO with 8 drives using a 4+2 erasure coding scheme:

  1. The data is divided into 4 chunks.

  2. 2 parity chunks are created.

  3. All 6 chunks are distributed across the 8 drives.

  4. MinIO can tolerate the failure of up to 2 drives and still reconstruct the original data.


Let's take an example of using 4 drives with MinIO, configured with two storage classes:

  1. Standard Storage Class:

    • Parity: 2

    • Tolerates failure of 2 drives while still allowing data recovery.

  2. Reduced Redundancy Storage Class:

    • Parity: 1

    • Tolerates failure of 1 drive while still allowing data recovery.


How Erasure Coding Works in This Setup

1. Definitions:

  • Data Blocks (D): Chunks of original data.

  • Parity Blocks (P): Chunks that contain redundant information to help reconstruct lost data.

2. Storage Layout:

In this example, with 4 drives:

  • Standard Storage Class (Parity = 2):

    • Total blocks = 4 (2 data + 2 parity)

    • Data blocks are divided into 2 chunks, and 2 parity blocks are created.

    • These 4 blocks are distributed across the 4 drives.

  • Reduced Redundancy Storage Class (Parity = 1):

    • Total blocks = 3 (2 data + 1 parity)

    • Data blocks are divided into 2 chunks, and 1 parity block is created.

    • These 3 blocks are distributed across the 4 drives, with one drive unused.


Detailed Example

Scenario 1: Standard Storage Class

  • Parity = 2 means MinIO can tolerate the failure of any 2 drives.

  • Data is stored as follows across 4 drives:

Drive 1

Drive 2

Drive 3

Drive 4

Drive 1

Drive 2

Drive 3

Drive 4

Data 1

Data 2

Parity 1

Parity 2

  • If Drive 3 and Drive 4 fail, MinIO can reconstruct the missing data (Parity 1 and Parity 2) using the remaining drives.

Scenario 2: Reduced Redundancy Storage Class

  • Parity = 1 means MinIO can tolerate the failure of only 1 drive.

  • Data is stored as follows across 4 drives:

Drive 1

Drive 2

Drive 3

Drive 4

Drive 1

Drive 2

Drive 3

Drive 4

Data 1

Data 2

Parity 1

(Unused)

  • If Drive 3 fails, MinIO can reconstruct the missing parity block using the remaining drives.

  • If any additional drive fails, data recovery is not possible because only 1 parity block was allocated.


Key Differences Between Parity Levels

Feature

Standard Storage Class (P=2)

Reduced Redundancy Class (P=1)

Feature

Standard Storage Class (P=2)

Reduced Redundancy Class (P=1)

Drives Required

4

3 (1 unused in 4-drive setup)

Data Blocks

2

2

Parity Blocks

2

1

Tolerated Drive Failures

2

1

Storage Efficiency

50% (2 data, 2 parity)

66.67% (2 data, 1 parity)


How to Configure This in MinIO

  1. Set the Parity Levels:

When deploying MinIO, you can configure the parity levels for each storage class in the MINIO_STORAGE_CLASS_STANDARD and MINIO_STORAGE_CLASS_RRS environment variables.

For example:

CopyEdit

docker run -p 9000:9000 \ -p 9001:9001 \ -e "MINIO_ROOT_USER=YOUR_ACCESS_KEY" \ -e "MINIO_ROOT_PASSWORD=YOUR_SECRET_KEY" \ -e "MINIO_STORAGE_CLASS_STANDARD=EC:2" \ -e "MINIO_STORAGE_CLASS_RRS=EC:1" \ --name minio \ -v /mnt/disk1:/data1 \ -v /mnt/disk2:/data2 \ -v /mnt/disk3:/data3 \ -v /mnt/disk4:/data4 \ minio/minio server /data1 /data2 /data3 /data4

  1. Behavior with Drive Failures:

  • Standard Class: MinIO tolerates up to 2 drive failures before data becomes unavailable.

  • Reduced Redundancy Class: MinIO tolerates only 1 drive failure.


Simulating Disk Failure

To test these configurations, you can:

  1. Unmount a volume (simulate disk failure).

  2. Delete data from a volume in the container.

Use the docker logs minio command to verify how MinIO handles the failure. It will display error logs and indicate which drives or data are missing.

What is Bitrot Protection?

Bitrot (or data degradation) refers to the gradual corruption of data over time due to physical storage medium issues, silent disk errors, or electromagnetic interference. Bitrot protection is a mechanism to detect and prevent this corruption, ensuring the integrity of data stored over extended periods.

MinIO implements bitrot protection by using cryptographic hash algorithms (e.g., SHA-256) to generate checksums for stored data. These checksums are stored alongside the data and are verified each time the data is read, ensuring its integrity.


How Bitrot Protection Works

  1. Checksum Generation:

    • When data is written to storage, MinIO computes a checksum (e.g., SHA-256 hash) for the data.

    • This checksum is stored alongside the data on the disk.

  2. Verification During Reads:

    • Whenever data is read from storage, MinIO recalculates the checksum for the retrieved data and compares it with the stored checksum.

    • If the checksums match, the data is intact.

    • If the checksums don't match, it indicates data corruption.

  3. Data Reconstruction:

    • If MinIO detects corrupted data, it reconstructs the corrupted data using redundant data and parity blocks (if erasure coding is enabled).


Example of Bitrot Protection

Scenario: Storing a File in MinIO

  1. File Upload:

    • You upload a file named report.pdf to MinIO.

    • MinIO splits the file into smaller chunks (e.g., 1 MB each) if necessary.

    • For each chunk, MinIO calculates a checksum (e.g., SHA-256) and stores it alongside the chunk in the storage backend.

    Example:

    Chunk IDChunk DataChecksum (SHA-256)chunk1Binary data (part 1)abc123...chunk2Binary data (part 2)def456...

  2. File Retrieval:

    • When you download report.pdf, MinIO retrieves the chunks and their stored checksums.

    • For each chunk:

      • MinIO recalculates the checksum of the retrieved data.

      • Compares it with the stored checksum.

    • If the checksums match for all chunks, the file is reassembled and returned.

Scenario: Bitrot Detected

  1. Corruption Event:

    • One of the storage disks experiences an error, corrupting chunk2.

    • The corrupted data is read from the disk, and its checksum is recalculated.

    Example:

    • Stored checksum for chunk2: def456...

    • Recalculated checksum for chunk2: xyz789...

  2. Detection:

    • MinIO detects that the recalculated checksum doesn't match the stored checksum.

    • It identifies that chunk2 is corrupted.

  3. Data Reconstruction:

    • If MinIO is configured with erasure coding, it reconstructs chunk2 using redundant data and parity blocks.

    • The reconstructed chunk2 is verified, and the correct data is returned to the user.

    Example (with 4+2 erasure coding):

    • MinIO uses the remaining 3 data blocks and 2 parity blocks to recreate the corrupted block.

  4. Automatic Healing:

    • MinIO automatically replaces the corrupted chunk2 on the storage disk with the reconstructed data.

    • A new checksum is generated and stored.


Why Bitrot Protection is Important

  1. Data Integrity:

    • Ensures that the data you retrieve is exactly the same as the data you stored, even after years.

  2. Fault Tolerance:

    • Combined with erasure coding, MinIO can recover from corruption without data loss.

  3. Automatic Detection and Healing:

    • Corruption is detected and repaired automatically, reducing the risk of prolonged data degradation.

  4. Critical for Long-Term Storage:

    • Ideal for archival systems, backup solutions, and cloud storage, where data integrity is paramount over time.


Benefits of MinIO's Bitrot Protection

  • Cryptographic Hashing: Uses robust cryptographic algorithms (e.g., SHA-256) for checksum calculations.

  • Real-Time Integrity Checks: Verifies data integrity every time data is read.

  • Self-Healing: Automatically reconstructs corrupted data using redundancy.

  • Highly Scalable: Works seamlessly in distributed setups across multiple drives or nodes.


Conclusion

Bitrot protection ensures that data stored in MinIO remains accurate and reliable, even in the face of disk failures or silent data corruption. By leveraging checksums, erasure coding, and automatic healing, MinIO provides a robust solution for long-term, fault-tolerant storage.


Enhanced Lab: Demonstrating Advanced MinIO Features


1. Erasure Coding

Erasure coding protects data from drive failures by splitting and storing data across multiple drives.

Steps:

Create the Docker Volumes (if not created yet):

Before running the modified command, ensure that you have created the Docker volumes if they don't exist:

After running the command, MinIO will be using the Docker volumes instead of local disk directories for its data storage.

  1. Modify the Docker setup to enable erasure coding:

  2. Simulate a disk failure by stopping one volume:

Steps for Simulating the Disk Failure:

  1. Identify the Volume: First, identify the Docker volume being used by MinIO for data storage. This is necessary to understand where the data resides and which volume is being used.

    This will list all the Docker volumes. You should see the volume used by MinIO, such as minio_volume1, minio_volume2, etc.

  2. Find the Mount Path for the Volume: Inspect the volume to get its mount path on the host machine:

    Look for the "Mountpoint" field in the output, which indicates where the volume is mounted on the host.

  1. image-20250121-152222.png

     

    image-20250121-152302.png
  2. Test file uploads/downloads using the existing Spring Boot APIs to observe no data loss.

Command Structure:

1. docker run:

Runs a container from a Docker image.

2. -p 9000:9000 -p 9001:9001:

Maps ports on the host to ports in the container:

  • 9000:9000 → Maps the host's port 9000 to the container's port 9000 (used for the MinIO API).

  • 9001:9001 → Maps the host's port 9001 to the container's port 9001 (used for the MinIO web console).

3. -e "MINIO_ROOT_USER=YOUR_ACCESS_KEY":

Sets the environment variable MINIO_ROOT_USER inside the container to YOUR_ACCESS_KEY. This is the access key for authentication.

4. -e "MINIO_ROOT_PASSWORD=YOUR_SECRET_KEY":

Sets the environment variable MINIO_ROOT_PASSWORD inside the container to YOUR_SECRET_KEY. This is the secret key for authentication.

5. --name minio:

Assigns the name minio to the container. You can reference the container by this name for future commands.

6. -v /mnt/disk1:/data1 to -v /mnt/disk4:/data4:

Mounts the host's directories (/mnt/disk1, /mnt/disk2, etc.) to directories in the container (/data1, /data2, etc.). These are storage volumes for MinIO.

7. minio/minio:

Specifies the Docker image to use, which in this case is the official MinIO image.

8. server /data{1...4}:

  • Runs the MinIO server in distributed mode, using the specified directories (/data1, /data2, /data3, /data4) as storage locations.

  • The {1...4} syntax expands to /data1 /data2 /data3 /data4.


Summary:

This command starts a MinIO server in distributed mode with:

  1. Two exposed ports (9000 for API, 9001 for the web console).

  2. Authentication credentials provided via environment variables.

  3. Four storage volumes mapped from host directories.

  4. The MinIO container named minio.

Before Disk Failure (Online: 4 Offline: 0)

After Disk Failure (Online: 3 Offline: 1)

 


2. Bitrot Protection

MinIO uses checksums to detect and repair corrupted data.

Steps:

  1. Upload a file through the Spring Boot API.

  2. Manually corrupt the uploaded file in the backend storage.

  3. Run the MinIO heal command:

    Verify the integrity of the file using the API.


3. Encryption

Enable server-side encryption for data protection.

Configuration:

  1. Add encryption configuration in MinIO:

     

  2. Update the Spring Boot application to encrypt files on upload:

  3. Verify files are stored in encrypted format by examining backend storage.


4. Continuous Replication

Replicate data to another MinIO server.

Setup:

  1. Start a second MinIO instance:

  2. Configure replication between the primary and replica servers:

  3. Test replication by uploading files to the primary bucket and verifying their presence on the replica.


5. Global Federation

Federate MinIO servers across regions into a single namespace.

Setup:

  1. Deploy two MinIO instances in different regions.

  2. Configure federation in the MinIO console by linking the servers under a unified namespace.

  3. Use the Spring Boot APIs to interact with the federated namespace.


6. Multi-Cloud Gateway

Use MinIO as a gateway to AWS S3 or another cloud provider.

Steps:

  1. Configure MinIO as an S3 gateway:

  2. Update the Spring Boot configuration to interact with the gateway endpoint.

  3. Test the gateway by uploading and downloading files through Spring Boot APIs.


Exercises

  1. Erasure Coding Recovery: Test uploading files during a simulated disk failure and confirm data recovery.

  2. Corruption Simulation: Introduce bitrot and validate repair using MinIO healing commands.

  3. Replication Failover: Test failover between replicated servers during primary server downtime.

  4. Federated Operations: Perform CRUD operations on the federated namespace and verify data availability across regions.

  5. Multi-Cloud Gateway Usage: Test MinIO as a gateway to AWS S3 by uploading and accessing files.