IPFS for storage, the blockchain for verified immutability

Alfonso de la Rocha
9 min readMay 31, 2021

It’s never a good idea to store big chunks of data in a blockchain. For starters this is straight impossible, as the amount of data that may be included in a transaction is limited. But even if you could, it would be prohibitively expensive in terms of cost and resources, as every peer in the network would have to store that piece of data you chose to store on-chain. Moreover, if your plan is to do all of this over a public blockchain network, everything that you store on-chain will be visible for every peer in the network, so forget about storing sensitive information, as if you are not careful you may be disclosing your darkest secrets (they kind of secrets you already disclose to Facebook).

And you may be wondering, “then, how can someone implement a decentralized application that requires the storage of large chunks if they can’t be stored directly on-chain?”. Well, fortunately, we already have in the web3 ecosystem decentralized storage solutions like IPFS to help us in this quest.

IPFS, blockchain’s best friend to deal with data

IPFS is a distributed system for storing and accessing files, websites, applications and data. It is a public network, which means that anyone can run a peer and start downloading and storing content in the network right away. IPFS is a content-addressable network, so all content stored in the network is identified by a unique identifier called Content IDentifier or CID. The CID of some content is derived from the hash of the content. This means that if the content changes, its CID changes, so different versions of the same content will have completely different identifiers.

To download content from the IPFS network we need to tailor a request specifying the CID of the content we want to download. Our IPFS client will take care of the rest leveraging the network’s underlying protocols. It will find the peers in the network that are storing the content we are looking for, and download it for us. “Get” requests, how we call download operations in IPFS, are usually specified through a link that looks like this:

/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/docs.html

In the end downloading data from the IPFS network means providing our IPFS clients with one of these links. If you want to learn more about the specifics of IPFS you can check these tutorials out, or the project’s doc page.

And why is IPFS blockchain’s best friend to deal with data? Well, the fact that content in the IPFS is uniquely identified, and that if the data in any piece of content changes, its CID also changes with it, means that data in the IPFS network is immutable. These CIDs are a few bytes long, and can be stored on-chain and used in smart contracts to point to data stored in the IPFS network. With this, we don’t need to store large chunks of data on-chain anymore. The only thing stored and managed on-chain is the CID of the corresponding data. If someone needs to access the specific content (and not just the identifier), it can do so by making a request to the IPFS network for that CID. Cool, right?

But enough with the theory behind IPFS. What are some good use cases for this integration between IPFS and the blockchain? The perfect example of the need of decentralized storage in the blockchain are NFTs (Non Fungible Tokens), of course.

NFTs are used to represent one-of-a-kind digital assets. When a NFT represents a collectible crypto-cat (like a in CryptoKitties), all the specifics of the NFT can be stored directly on-chain, but what happens when what we are minting as an NFT is a digital asset like a song, an image, or a deep learning dataset? We can’t store these things directly on-chain. Here is where IPFS and content-addressable decentralized storage solutions excel. We can store our digital asset in IPFS, and then use the CID (and maybe some additional metadata), to mint the NFT. Anyone would be able to validate the ownership on-chain, and access the asset in question in the IPFS network. You have an illustration of how this would work at nft.storage and in the image below.

Decentralized storage as a first-class citizen in L2 solutions

Does one need to operate an Ethereum (pick your blockchain of choice) node and an IPFS node in order to implement and orchestrate these kinds of use cases and interaction in a decentralized application? Well, not really. There are several alternatives: you can use IPFS gateways for the IPFS side of things (like Pinata, Infura, or Textile), or even delegate the operation of all your nodes to someone else. What is clear is that even with this, the operation of storing your asset and minting your token can not be done atomically.

I was reflecting on this “atomicity of operations between decentralized storage systems and blockchain platforms” when I realized something. A few weeks ago I wrote a comparison between different Optimistic L2 solutions. For that publication, one of the contendants of the comparison was Metis. Metis is an optimistic L2 rollup solution. One of the features that caught my attention from this project was their VM integration with IPFS. According to their whitepaper they support decentralized storage “out-of-the-box” in their VM through an IPFS resolver. The idea of atomically interacting with the IPFS network and making transactions on-chain is something that really interested me. Atomic operation in IPFS and on-chain seemed like an impossible thing to me, but this may actually be possible in the L2 world. I decided then to deepen into Metis technology and understand if this kind of atomic operation would be possible in Metis.

Metis includes two types of storage in their VM. The regular VM storage, responsible for storing the blocks and account states; and a special storage layer that integrates with IPFS (see figure below). Metis leverages the IPFS cluster technology. IPFS cluster nodes are regular IPFS peers that can run private sub-networks, so the data stored in the IPFS cluster is not shared with public peers from the public IPFS network (making it really convenient for the storage of sensitive information). IPFS cluster nodes can choose to store content in the public network or restrict the access to content to one of its connected subnetworks.

Content stored in IPFS may be accessed from Metis through the IPFS resolver of the VM. When a user invokes a method that needs to interact with the special storage layer, the IPFS router in the VM intercepts the corresponding operations, and sends them to the IPFS network through the IPFS Resolver. The IPFS resolver behaves as an IPFS client and is also responsible for encrypting both the data and the final CID of the content (if the information needs to be private), so it can be committed on-chain without privacy and security worries.

VM architecture (Source: Metis whitepaper)

To illustrate how all this integration works, let’s use an example. Imagine that you want to mint an NFT for your new song in the Ethereum network using Metis. If the NFT factory smart contract is already deployed and in place, the only thing that you need to worry about is triggering the right operations to store the song in IPFS, and mint the NFT. The Metis VM will be responsible for intercepting the IPFS operation, encrypting the data (if necessary), and interacting with the IPFS network to store the song. The result of this operation is the CID of the song, which is then used in the L2 transaction sent to mint the song. This L2 transaction is then rolled-up in L1, and eventually persisted in the Ethereum network. In this way, the Metis node manages all the interactions necessary to atomically store data in the IPFS network and persist the result in the blockchain.

Another interesting part of this integration, and which is specific for Metis, is that DACs (Decentralized Autonomous Companies) can use these IPFS layers to store sensitive information for the DAC in a decentralized way, without having to rely on centralized storage systems. The data is conveniently encrypted with the corresponding DAC credentials. Even more, when A DAC is created for the first time, a new “charter” is also created to determine the rules of the DAC. In the charter, the DAC creator can include access rights and the operation permissions for this IPFS and sensitive data storage integrations.

Let’s imagine that a big retailer company is using Metis to track all the lifecycle of its products using Metis: from their production, to their distribution, and the sale to its customers. There are already companies using blockchain technology for this purpose (Carrefour, Costco, Maersk, etc.). A smart contract in a blockchain network is used by every party involved in the lifecycle of the product. Every status update in the life of the product is conveniently registered on-chain. These updates can include information such as: the time when a specific entity in the supply chain manipulated the product, how, and what is the next step (or owner) in the chain. All of this can be done today with any blockchain network with support to run smart contracts. Unfortunately, in real life all of these interactions are governed through legal contracts, and acknowledged by “real-life documents” such as delivery notes.

One of the added values of having a blockchain network orchestrating these interactions is that all entities have a common information system that stores all the supply chain information, but what happens with the documents related with the actions performed in the blockchain? They need to be stored somewhere else. This is where solutions like Metis’ work like a charm. These documents may include sensitive information, they can’t be stored in the clear in a public network. Even more, presumably not every document should be accessible by every party.

Through Metis IPFS integration, every DAC involved in this supply chain use case, is able to perform the transaction to trigger an update to the state of a product in the blockchain, while storing the corresponding document to the IPFS network. As described above, these documents would be conveniently encrypted with the a of keys that gives access to the document exclusively to the right entities. The status update in the smart contract would add a pointer to the document’s CID in case anyone wants to check the “real-life document” associated with the product status update. In this process, DACs will be able to determine what other DACs or entities have access to these documents. And with this, our companies are able to share a common information system which is consistent with the state in the blockchain without having to worry about implementing additional schemes or having to maintain an independent system for document storage.

L2 is more than just scalability

It is clear for everyone by now the importance of decentralized storage systems for the success of Web3, but something that people don’t realize when thinking about L2 solutions, is that they are not exclusively for scalability. They are actually way more than that. L2 can become complete enhancements over L1, in terms of scalability, but also in terms of features. In this publication we’ve seen a clear example of this (integrated decentralized storage in L2).

Decentralized Applications are increasingly in need of storing large amounts of data in an immutable way leveraging blockchain technologies and decentralized storage systems (take NFTs as an example), and L2 solutions can take this opportunity to inject additional features to L1 networks, just as Metis have done with their IPFS integration. I can’t wait to see what is yet to come in the L2 ecosystem. Are you aware of any other cool projects with innovative L2 ideas? Do not hesitate to ping me :)

--

--