Since the inception of the decentralization revolution in 2009, a lot of promising projects have come up and changed the way we see and live in this world.
IPFS lacks an incentivization layer that can help in its mass adoption and hence its ultimate goal to replace HTTP.
Here is where filecoin comes in. Since its announcement filecoin has gained a lot of interest in the community. With the launch of the test network in December 2020, there is a lot you can explore.
There is a lot of information on its tech and economics on the web, which can be confusing as well as overwhelming. So, here we have consolidated all the information available in ONE SINGLE SOURCE.
So, buckle up and make a cup of coffee☕ first, it’s going to be a long one…
First, we will discuss the technical aspect of filecoin and then move to its economic aspect in our next post.
But before diving into the core technical stuff, let’s analyze today’s state of the file storage market.
State of Today’s File Storage Market
Today, Amazon S3 is the juggernaut of file storage on the internet. There’re a number of reasons for this:
- It’s Incredibly Cheap: $0.023 per GB of Storage. 0.04 cents per 10,000 read requests.
- It’s Insanely Fast.
- It’s reliable: Well, it has seen a few big downtimes, effectively taking down a big part of the internet offline; but it still has 99.9% uptime.
- It’s highly scalable.
- and it provides a great developer experience. It is easily integratable with a suite of other Amazon services for scaling(eg. CloudFront)
In a world where we have such an amazing cloud storage service, any competition will have to perform even better than this, or at least equivalent. At small scales, decentralized networks don’t work well. But if it(IPFS) gets adopted on a large scale(more adoption than BitTorrent) this may prove to be a better version of the Internet and hence will open a totally new economy.
We will divide this into 4 sections:
- An Overview of How Filecoin network works?
- An In-depth Study on Filecoin protocols
- Other Issues(not discussed in whitepaper)
- Possible Improvements to Filecoin protocol
An Overview of How Filecoin network works?
There are 3 groups of users in Filecoin: clients, storage miners & retrieval miners.
Clients pay to store and retrieve data. They can choose from the available service providers. If they want to store the private data, they need to encrypt it before submitting it to the providers.
Storage miners store client’s data for the reward. They decide how much space they are willing to reserve for storage. After the client and storage miner agree on the deal, the miner is obliged to continuously provide proofs that he stores the data. Everyone can look at the proofs and make sure that the storage miner is reliable.
Retrieval miners provide client’s data at their request. They can get the data either from clients or storage miners. Retrieval miners and clients exchange data and coins using micropayments: the data is split into pieces and clients pay a small number of coins for each piece. Retrieval miner can also work as a storage miner.
Finally, the network represents all full nodes that validate the actions of clients and miners. These nodes count the available storage, check the storage proofs, and repair data faults.
Some terms used in the paper:
Pieces: A piece is some part of the data that a client is storing in the decentralized storage network. For example, data(maybe a cat🐈 pic) can be deliberately divided into many pieces and each piece can be stored by a different set of Storage Miners.
Sectors: A sector is some disk space that a Storage Miner provides to the network(it can be thought of as unique id that is associated with a specific part of the disk space of a particular storage provider). Miners store pieces from clients in their sectors and earn tokens for their services. In order to store pieces, Storage Miners must pledge their sectors to the network.
Allocation Table: The AllocTable is a data structure that keeps track of pieces and their assigned sectors. The AllocTable is updated at every block in the ledger and its Merkle root is stored in the latest block. In practice, the table is used to keep the state of the DSN, allowing for quick look-ups during proof verification.
Orders: An order is a statement of intent to request or offer a service. Clients submit bid orders to the markets to request a service (resp. Storage Market for storing data and Retrieval Market for retrieving data) and Miners submit ask orders to offer a service.
Orderbook: Orderbooks are sets of orders. Filecoin maintains separate orderbooks for Storage Market and Retrieval market.
Pledge: A pledge is a commitment to offering storage (specifically a sector(s)) to the network. Storage Miners must submit their pledge to the ledger(filecoin blockchain) in order to start accepting orders in the Storage Market. A pledge consists of the size of the pledged sector and the collateral deposited by the Storage Miner.
Users share their intentions by making orders. Clients submit bid orders, specifying the price they want to pay. Miners submit ask orders, specifying the price they want to receive. When bid and ask orders match, the client and miner both sign a deal order and submit it to the blockchain.
Bid and ask orders together form Storage Market(the market for file storage) and Retrieval Market(the market for file retrieval). Let’s dive deep into these markets and see how they work.
It is a decentralized exchange run by the Network, where all asks and bids are stored in the blockchain for storing data on Filecoin network.
A Client submits a bid order(using PUT protocol, explained in the next section) to the Storage Orderbook. Clients must deposit the coins specified in the order & specify the number of replicas they want to be stored. Clients can either submit multiple orders or specify a replication factor in an order. Higher redundancy(higher replication factor) results in a higher tolerance for storage faults(discussed below).
Storage Miners pledge their storage to the network by depositing collateral via a pledge transaction in the blockchain via Manage.PledgeSector. The collateral(filecoins) is deposited for the time intended to provide the service, and it is returned if the miner generates proofs of storage for the data they committed to store. If some proofs of storage fail, a proportional amount of collateral is lost. Once the pledge transaction appears in the blockchain, miners can offer their storage in the Storage Market: they set their price and add an ask order to the market’s orderbook.
Once the pledge transaction appears in the blockchain (hence in the AllocTable), miners can offer their storage in the Storage Market: they set their price and add an ask order to the market’s orderbook via Put.AddOrders.
When a matching ask & bid order is found(via Put.MatchOrders), the client sends the piece(data) to the miner.
When receiving the piece, miners run Put.ReceivePiece. When the data is received, both the miner and the client sign a deal order and submit it to the blockchain(in Storage Market orderbook).
Storage Miners’ storage is divided into sectors, each sector contains pieces assigned to the miner. The Network keeps track of each Storage Miners’ sector via the allocation table. At this point(when deal order is signed), the Network assigns the data to the miner and makes a note of it in the allocation table.
When a Storage Miner sector is filled, the sector is sealed. Sealing is a slow, sequential operation that transforms the data in a sector into a replica, a unique physical copy of the data that is associated with the public key of the Storage Miner. Sealing is a necessary operation during the Proof-of-Replication(described in Consensus Section below).
When Storage Miners are assigned data, they must repeatedly generate proofs of replication to guarantee they are storing the data (we will talk in detail about these proofs below). Proofs are posted on the blockchain and the Network verifies them.
All the storage allocations are public to every participant in the network. At every block, the Network checks if the required proofs for each assignment are present, checks that they are valid, and acts accordingly:
- if any proof is missing or invalid, the network penalizes the Storage Miners by taking part of their collateral,
- if a large number of proofs are missing or invalid (defined by a system parameter ∆fault), the network considers the Storage Miner faulty, settles the order as failed and reintroduces a new order for the same piece into the market,
- if every Storage Miner storing this piece is faulty, then the piece is lost and the client gets refunded.
It is an off-chain exchange where clients and retrieval miners discover each other in a peer-to-peer way. Once the client and miner agree on the price, they start to exchange data and coins piece by piece using micropayments.
Let's see how it works.
Retrieval Miners announce their pieces by gossiping their ask orders to the network: they set their price and add an ask order to the market’s orderbook.
A client submits a bid order to the Retrieval Market orderbook. Retrieval Miners check if their orders are matched with a corresponding bid order from a client.
Once orders are matched, Retrieval Miners send the piece to the client(miner sends parts of data & client sends micro-payments). When the piece is received, both the miner and the client sign a deal order and submit it to the blockchain.
Below is a diagram showing all the activities taking place in the network.
An In-depth Study on Filecoin Protocols
Filecoin introduces the concept of a decentralized storage network (DSN). DSN is a scheme that describes a network of independent clients and storage providers. DSN aggregates storage offered by multiple independent storage providers and self-coordinate to provide data storage and data retrieval to clients. Coordination is decentralized and does not require trusted parties: the secure operation of these systems is achieved through protocols that coordinate and verify operations carried out by individual parties. DSNs can employ different strategies for coordination, including Byzantine Agreement, gossip protocols, or CRDTs, depending on the requirements of the system.
DSN involves the implementation of 3 functions: put, get, and manage. Put allows clients to store data under unique identifiers. Get allows clients to retrieve data using the identifier. Manage orchestrates the network by measuring space available for rent, auditing providers, and repairing possible data faults. The Manage protocol is run by storage providers often in conjunction with clients or a network of auditors(this involves Byzantine Faults which are discussed below).
DSN has several properties. The first 2 are essentially required.
- Data integrity means that the client always receives the same data that he puts to storage, and storage providers can’t convince a client to take faulty data.
- Retrievability simply means that the client will be able to retrieve his data over time.
Optional Properties of a DSN:
- Public verifiability allows everyone on the network to verify that the data is being stored without knowing the data itself.
- Auditability allows verifying that the data was stored for the right duration of time.
- Incentive-compatibility strives to reward good service providers and punish bad ones.
- Achieving Confidentiality: Clients that desire for their data to be stored privately, must encrypt their data before submitting them to the network.
There are 2 types of possible faults which a DSN should tolerate:
- Management faults: These faults are byzantine faults caused by participants(storage providers, clients & auditors) in the Manage protocol. A DSN scheme relies on the fault tolerance of its underlining Manage protocol. Violations on the faults tolerance assumptions for management faults can compromise liveness and safety of the system. For example, consider a DSN scheme, where the Manage protocol requires Byzantine Agreement(as the nodes can lie about their audits) to audit storage providers(if they are storing all the data that they are supposed to store according to the agreed conditions). In such protocol, the network receives proofs of storage from storage providers and runs Byzantine Agreement(BA) to agree on the validity of these proofs. If the BA tolerates up to f faults out of n total nodes, then our DSN can tolerate f < n/2 faulty nodes. On violations of these assumptions, audits can be compromised, thus rendering the whole system useless.
- Storage Faults: Storage faults are byzantine faults that prevent clients from retrieving the data: i.e. Storage Miners lose their pieces, Retrieval Miners stop serving pieces. A successful Put execution is (f,m)-tolerant if it results in its input data being stored in m independent storage providers (out of n total) and it can tolerate up to f byzantine providers. The parameters f and m depend on protocol implementation; protocol designers can fix f and m or leave the choice to the user, extending Put(data) into Put(data,f,m). A Get execution on stored data is successful if there are fewer than f faulty storage providers. For example, consider a simple scheme, where thePutprotocol is designed such that each storage provider stores all of the data. In this scheme m=n and f=m−1. Is it always f=m−1? No, some schemes can be designed using erasure coding, where each storage providers store a special portion of the data, such that x out of m storage providers are required to retrieve the data; in this case f=m−x.
The Filecoin DSN protocol can be implemented on top of any consensus protocol that allows for verification of the Filecoin’s proofs. Proof-of-Work schemes often require solving puzzles whose solutions are not reusable or require a substantial amount wasteful computation to find.
Non-reusable Work: Most permissionless blockchains require miners to solve a hard computational puzzle, such as inverting a hash function. Often the solutions to these puzzles are useless and do not have any inherent value beyond securing the network. Some blockchains like Ethereum(executing smart contract logic) and Primecoin(finding new prime numbers) have tried to use some of the computational power to do useful work.
Wasteful Work: Solving hard puzzles can be really expensive in terms of the cost of machinery and energy consumed, especially if these puzzles solely rely on computational power. When the mining algorithm is embarrassingly parallel, then the prevalent factor to solve the puzzle is computational power.
Attempts to reduce waste: Ideally, the majority of a network’s resources should be spent on useful work. Some efforts require miners to use more energy-efficient solutions. For example, Spacemint requires miners to dedicate disk space rather than computation; while more energy efficient, these disks are still “wasted” since they are filled with random data. Other efforts replace hard to solve puzzles with a traditional byzantine agreement based on Proof-of-Stake, where stakeholders vote on the next block proportional to their share of currency in the system.
So, instead of wasteful Proof-of-Work computation, the work Filecoin miners do generating Proof-of-Spacetime is what allows them to participate in the consensus.
Useful work: We consider the work done by the miners in a consensus protocol to be useful, if the outcome of the computation is valuable to the network, beyond securing the blockchain.
Filecoin proposes a useful work consensus protocol, where the probability that the network elects a miner to create a new block (we refer to this as the voting power of the miner) is proportional to their storage currently in use in relation to the rest of the network. Filecoin protocol is designed such that miners would rather invest in storage than in computing power to parallelize the mining computation. Miners offer storage and re-use the computation for proof that data is being stored to participate in the consensus.
Modeling Mining Power
Power Fault Tolerance: In this technical report, Power Fault Tolerance is presented as an abstraction that re-frames byzantine faults in terms of participants’ influence over the outcome of the protocol. Every participant controls some power of which n is the total power in the network, and f is the fraction of power controlled by faulty or adversarial participants.
Power in Filecoin: In Filecoin, the power p of miner M at time t is the sum of the M’s storage assignments. The influence I of M is the fraction of M’s power over the total power in the network. In Filecoin, power has the following properties:
- Public: The total amount of storage currently in use in the network is public. By reading the blockchain, anyone can calculate the storage assignments of each miner — hence anyone can calculate the power of each miner and the total amount of power at any point in time.
- Publicly Verifiable: For each storage assignment, miners are required to generate Proofs-of-Spacetime, proving that the service is being provided. By reading the blockchain, anyone can verify if the power claimed by a miner is correct.
- Variable: At any point in time, miners can add new storage to the network by pledging with a new sector and filling the sector. In this way, miners can change their amount of power they have through time.
To read more about how this power plays a role(mathematically) in the consensus algorithm refer to the whitepaper.
We also need mechanisms to prevent three types of attacks that malicious miners could exploit to get rewarded for storage they are not providing: Sybil attack, Outsourcing attacks, Generation attacks.
- Sybil Attacks: Malicious miners could pretend to store (and get paid for) more copies than the ones physically stored by creating multiple Sybil identities, but storing the data only once.
- Outsourcing Attacks: Malicious miners could commit to store more data than the amount they can physically store, relying on quickly fetching data from other storage providers.
- Generation Attacks: Malicious miners could claim to be storing a large amount of data which they are instead efficiently generating on-demand using a small program. If the program is smaller than the allegedly stored data, this inflates the malicious miner’s likelihood of winning a block reward in Filecoin, which is proportional to the miner’s storage currently in use.
Storage providers must convince their clients that they stored the data they were paid to store. In practice, storage providers will generate Proofs-of-Storage(PoS) that the blockchain network(or the clients themselves) verifies.
To make the act of storage publicly verifiable, Filecoin introduces two consensus algorithms: Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt). If you are new to consensus algorithms or want to know more about them, then head here:
Proof-of-Replication(PoRep) is a novel Proof-of-Storage which allows a server (i.e. the prover P) to convince a user (i.e. the verifier V) that some data D has been replicated to its own uniquely dedicated physical storage. Our scheme is an interactive protocol, where the prover P: (a) commits to store n distinct replicas(physically independent copies) of some data D, and then (b) convinces the verifier V, that P is indeed storing each of the replicas via a challenge/response protocol. PoRep improves on PoR and PDP schemes, preventing Sybil Attacks, Outsourcing Attacks, and Generation Attacks.
Proof-of-Spacetime: Proof-of-Storage schemes allow a user to check if a storage provider is storing the outsourced data at the time of the challenge. How can we use PoS schemes to prove that some data was being stored throughout a period of time? A natural answer to this question is to require the user to repeatedly (e.g. every minute) send challenges to the storage provider. However, the communication complexity required in each interaction can be the bottleneck in systems such as Filecoin, where storage providers are required to submit their proofs to the blockchain network.
To address this question, we introduce a new proof, Proof-of-Spacetime, where a verifier can check if a prover is storing her/his outsourced data for a range of time. The intuition is to require the prover to
- generate sequential Proofs-of-Storage(in our case Proof-of-Replication), as a way to determine time.
- recursively compose the executions to generate short proof.
The prover receives a random challenge(c) from the verifier and generates Proofs-of-Replication in sequence, using the output of proof as an input of the other for a specified amount of iterations t. Thus ensuring that all of the work done is reusable(as discussed above).
PoSt & PoRep uses zk-SNARKS, making proofs are very short and easy to verify.
To know more about Practical implementation of PoSt refer to the Whitepaper.
Smart Contracts enable users of Filecoin to write stateful programs that can spend tokens, request storage/retrieval of data in the markets and validate storage proofs. Users can interact with the smart contracts by sending transactions to the ledger that trigger function calls in the contract. We extend the Smart Contract system to support Filecoin specific operations (e.g. market operations, proof verification).
Filecoin supports contracts specific to data storage, as well as more generic smart contracts:
- File Contracts: We allow users to program the conditions for which they are offering or providing storage services. There are several examples worth mentioning: (1) contracting miners: clients can specify in advance the miners offering the service without participating in the market, (2) payment strategies: clients can design different reward strategies for the miners, for example a contract can pay the miner increasingly higher through time, another contract can set the price of storage informed by a trusted oracle, (3) ticketing services: a contract could allow a miner to deposit tokens and to pay for storage/retrieval on behalf of their users, (4) more complex operations: clients can create contracts that allow for data update.
- Smart Contracts: Users can associate programs to their transactions like in other systems (as in Ethereum) which do not directly depend on the use of storage. We foresee applications such as: decentralized naming systems, asset tracking, and crowdsale platforms.
Cross chain Interactions
Bridges are tools that aim at connecting different blockchains; while still work in progress, we plan to support cross chain interaction in order to bring the Filecoin storage in other blockchain-based platforms as well as bringing functionalities from other platforms into Filecoin.
- Filecoin in other platforms: Other blockchain systems such as Bitcoin, Zcash and in particular Ethereum and Tezos, allow developers to write smart contracts; however, these platforms provide very little storage capability and at a very high cost. We plan to provide a bridge to bring storage and retrieval support to these platforms. We note that IPFS is already in use by several smart contracts (and protocol tokens) as a way to reference and distribute content. Adding support to Filecoin would allow these systems to guarantee storage of IPFS content in exchange of Filecoin tokens.
- Other platforms in Filecoin: We plan to provide bridges to connect other blockchain services with Filecoin. For example, integration with Zcash would allow support for sending requests for storing data in privacy.
Some Other Issues
Here we list a few potential issues which are not well discussed in the whitepaper.
- Retrieval market scalability: The micropayments system(retrieval market) creates a lot of overhead on the retrieval protocol. In order to achieve retrieval speeds that match today's centralized infrastructure, filecoin and hence IPFS needs a lot of adoption so as to create a dense state channel network.
- Censorship(Illegal Content): As we have seen in the past with Napster and the pirate bay, lack of censorship will eventually lead to Illegal content over the network, effectively bringing the dark web to the surface. Possible solutions could be AI-powered protocols that learn over time and automatically detect Illegal content and take necessary actions. But in order for the network to be a democratic one, the protocol needs to governed by the users itself(thus introducing byzantine behavior) to decide if the content needs some action or not.
So, summing up Censorship is an issue that is different for different people, and it needs a more personalized approach rather than a central open. Filecoin’s job is to create a market place for data management, rather than coming up with censorship governing policy. So, this “personalized” censorship layer can be shifted to the applications, building on the filecoin.
Possible Improvements in Filecoin Protocol
Here we list some possible improvements in the filecoin protocol.
- Tahor-LAFS Encryption Scheme: When adding value, the client first encrypts it (with a symmetric key), then splits it into segments of manageable sizes, and then erasure-encodes these for redundancy. So, for example, a “2-of-3” erasure-encoding means that the segment is split into a total of 3 pieces, but any 2 of them are enough to reconstruct the original (read more about ZFEC). These segments then become shares, which are stored on particular Storage nodes. Storage nodes are a data repository for shares; users do not rely on them for integrity or confidentiality of the data.
Ultimately, the encryption-key and some information to help find the right Storage nodes become part of the “capability string” (read more about the encoding process). The important point is that a capability string is both necessary and sufficient to retrieve a value from the Grid — the case where this will fail is when too many nodes have become unavailable (or gone offline) and you can no longer retrieve enough shares.
There are write-capabilities, read-capabilities and verify capabilities; one can be diminished into the “less authoritative” capabilities offline. That is, someone with a write-capability can turn it into a read-capability (without interacting with a server). A verify-capability can confirm the existence and integrity of value, but not decrypt the contents. It is possible to put both mutable and immutable values into the Grid; naturally, immutable values don’t have a write-capability at all.