Object Storage
A summary of Object Storage — what it is, why it exists, and how it works under the hood.
[01] Definition
A storage system that assigns a unique ID (identifiable by the user or server) to data regardless of its physical location, stores it in a container or bucket, and retrieves it on demand using that ID.
Modern data broadly falls into two categories:
| Type | Description | Examples |
|---|---|---|
| Structured data | Classifiable by defined criteria | Student ID, name, age, records |
| Unstructured data | Cannot be neatly classified | Large images, videos, logs, backups |
Traditional storage approaches handle structured data well:
- File system storage — file-based, hierarchical directories
- Block storage — block-based with sectors and tracks
Neither scales cleanly for unstructured data at volume. Object storage was designed specifically for this problem.
Analogy: Think of a valet parking service. Your car (the data object) is handed off with a receipt (the unique ID). You don’t know where the car was parked — and you don’t need to. Hand in the receipt and your car comes back.
[02] Advantages
2-1. Scalability
- No need to partition storage volumes
- Holds data regardless of total capacity
- Adding new nodes is horizontal — no structural changes required
2-2. Efficiency
- No hierarchical directory system means no inter-layer bottleneck
- Metadata is stored alongside each object, enabling rich, fast lookups
- Retrieval is by ID, not by path traversal
2-3. Availability
| Feature | Benefit |
|---|---|
| Auto-replication | Data copied across multiple nodes automatically |
| Rolling updates | System stays online during upgrades |
| No single point of failure | Any node can serve a request |
[03] How It Works (IDA)
3-1. IDA — Information Dispersal Algorithm
Object storage systems typically use IDA to distribute data reliably across nodes:
- Split — incoming data is broken into pieces (shards)
- Distribute — shards are sent across local or globally distributed storage nodes over the network
- Namespace — all distributed nodes form a single unified namespace; clients see one logical storage pool
1
2
3
4
5
6
7
8
9
Original Data
│
▼
[Shard 1] → Node A (Region: US-East)
[Shard 2] → Node B (Region: EU-West)
[Shard 3] → Node C (Region: AP-South)
│
▼
Single Namespace (client sees one bucket)
A minimum number of shards is sufficient to reconstruct the original object — you don’t need all nodes available simultaneously. This is what gives object storage its high durability.
3-2. Comparison with Other Storage Types
| Feature | File Storage | Block Storage | Object Storage |
|---|---|---|---|
| Structure | Hierarchical | Sector/track | Flat namespace |
| Scalability | Limited | Limited | Virtually unlimited |
| Best for | Small files, OS | Databases, VMs | Media, backups, logs |
| Metadata | Directory + name | Minimal | Rich, extensible |
| Access method | Path | Block address | Unique ID / API |
3-3. Common Use Cases
- Cloud media hosting (images, videos)
- Backup and archival storage
- Data lake storage (log files, analytics data)
- Static website hosting (S3-compatible buckets)
- AI/ML dataset storage
[04] Troubleshooting / Things to Watch
Object storage is eventually consistent in some implementations — after a write, there may be a brief window where a read returns stale data. Design applications accordingly.
- Large object uploads: Use multipart upload APIs for files over a few hundred MB to avoid timeout errors.
- Key naming: In high-throughput scenarios, prefix keys with random characters to avoid hot-spotting on a single shard partition.
- Lifecycle policies: Most object storage platforms support automatic tiering (e.g., move to cold storage after 90 days) — configure these to control costs.
Reference: IBM Tech Forum — Object Storage