And what if a validator node does not like our consensus?

Alfonso de la Rocha
DRILL
Published in
5 min readJun 28, 2018

--

“There may be a rogue node between us”

Nodes in a Byzantine Fault Tolerant environment must assume that the network is unreliable, they can never be sure that the data that they communicate arrives its destination. In one of my previous posts, I shared how to easily deploy a test Quorum-based blockchain network using an IBFT (Istanbul Byzantine Fault Tolerant) consensus. This allowed us to test our smart contracts and dApps in a controlled environment but, what if we want to try how would the network behave when not every node is “as good as expected”? Let’s see what happens in our Quorum IBFT consensus when one or more of our nodes misbehave.

To test this, I included a new feature to my test-environment tool, enabling the possibility of adding faulty nodes to the test network in order to analyse their impact. I enabled this by compiling a geth version from getamis [2] that includes an implementation of different types of faulty nodes, and including it to my test-environment.

To run the test-environment with this new feature you just need to follow these steps (for further information about how to operate the test-environment go to this post or this github repo) :

  • First of all, you need to compile the geth client including the faulty node. To do this you can either run again the ./bin/bootstrap.shscript to install all dependencies, or run directly ./bin/build_faulty_nodes.sh.
  • With the faulty geth client compiled, we are ready to run our test network. To run a test network with a faulty node we can use:
$ ./bin/start_network.sh clean <num_validators> <num_gws> 
--faulty_node <faulty_mode>
$ ./bin/start_network.sh clean 3 3 --faulty_node 2
  • The main script only supports for now the execution of a single faulty node. Specifically, the script starts validator1 in a faulty mode if the --faulty_node flag is enable. However, if you desire to run more than one faulty node, you can run the network and then start an additional faulty node using ./bin/start_faulty_node.sh as follows:
$ ./bin/start_faulty_node.sh <node-name> <faulty-mode>$ ./bin/start_faulty_node.sh validator2 4

And what about these faulty-modes? What do they mean? They are the different types of faulty behaviours implemented by getamis in their geth client for testing purposes [1]. Let’s briefly test what happens in a 3 validator-nodes Quorum network when one of our nodes misbehaves in one of the following modes:

  • mode 0 — Disable Faulty Behaviour:This mode is like not using any faulty mode at all. It runs geth without a faulty behaviour, and the network runs smoothly. For the matter of testing misbehaviours in the network, you are better off running the network without the--faulty_nodesflag enabled. A sample trace of the result from the main validator:
INFO [06-28|09:53:57] 🔨 mined potential block                  number=47 hash=906ee8…d11959INFO [06-28|09:53:57] Commit new mining work                   number=48 txs=0 uncles=0 elapsed=229.742µsINFO [06-28|09:53:58] Committed                                address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 hash=2a6c45…50e74f number=48INFO [06-28|09:53:58] Imported new chain segment               blocks=1 txs=0 mgas=0.000 elapsed=1.057ms   mgasps=0.000 number=48 hash=2a6c45…50e74f cache=0.00BINFO [06-28|09:53:58] Commit new mining work                   number=49 txs=0 uncles=0 elapsed=199.896µs
  • mode 1— Randomly run any faulty behaviour:I like to call this the wildcard mode. If you just want to run random faulty nodes, but you don’t care their specific misbehaviour, use this mode.
  • mode 2— NotBroadcast: A validator node in this mode will not broadcast any message to the rest of the network. The result is that the node will block all communications with the rest of the network. Is as if the node didn’t existed at all, thus, accounting as non-existent for the block proposal process.
INFO [06-28|10:41:20] Not broadcast message                    address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 state="Accept request" message="{Code: 3, Address: 0x0000000000000000000000000000000000000000}"
  • mode 3— SendWrongMsg: — The validator sends out messages with wrong message codes. The rest of the nodes will fail to decode the messages. Thus, for the rest of the nodes in the network, it would seem as if the misbehaving validator node was dead. Therefore, it accounts as a failed validator in the F = (N-1)/2 formula that allows us to know how many faulty nodes we can afford in our network.
ERROR[06-28|10:16:14] Failed to decode message from payload    address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 err="unauthorized address"WARN [06-28|10:16:27] Invalid stats history request            msg=false
  • mode 4— ModifySig: The validator modifies the message signatures of every message broadcasted to the network. Nodes will reject these modified messages for having an incorrect signer.
WARN [06-28|10:10:13] Invalid stats history request            msg=falseINFO [06-28|10:10:19] Modify the signature                     address=0xb87dC349944CC47474775DDe627A8a171fC94532ERROR[06-28|10:10:19] Failed to get signer address             err="recovery failed"ERROR[06-28|10:10:19] Failed to decode message from payload
  • mode 5— AlwaysPropose: The validator is continuosly sending block proposals, even if it is not its turn to propse.

The faulty node will send these proposals:

INFO [06-28|10:03:57] Always propose a proposal                address=0xb87dC349944CC47474775DDe627A8a171fC94532 state="Accept request" request="&{Proposal:Block(#70): Size: 677.00 B {\nMinerHash: 17c3372f657c6dc973f9cd18a4bfda16d73216aa1fda13f24f59bb174afde897\nHeader(b9f1c217ae418bdcaf9b24a210fddc970b5a5f0905b0873663629be5fcc01263):\n[\n\tParentHash:\t    cd814e4d9f132f0dcc09ba6e28fbc968e6e4728872fe295bad596dbd140335dc\n\tUncleHash:\t    1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347\n\tCoinbase:\t    0000000000000000000000000000000000000000\n\tRoot:\t\t    3ec94002202a20e803850ddeff006fa88bf399f9f02563d80195c5818010f37b\n\tTxSha\t\t    56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421\n\tReceiptSha:\t    56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421\n\tBloom:\t\tn\tDifficulty:\t    1\n\tNumber:\t\t    70\n\tGasLimINFO [06-28|10:03:5n\tGasUsed:\t    0\n\tTime:\t\t    1530180237\n\tExtra:\t\t    ׃

While other validators will see in some cases several blocks being proposed at the same time:

INFO [06-28|10:04:25] 🔗 block reached canonical chain          number=86 hash=eb2eed…8f989eINFO [06-28|10:04:26] Committed                                address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 hash=11e7dd…e60857 number=92INFO [06-28|10:04:26] Successfully sealed new block            number=92 hash=11e7dd…e60857INFO [06-28|10:04:26] 🔨 mined potential block                  number=92 hash=11e7dd…e60857INFO [06-28|10:04:26] Commit new mining work                   number=93 txs=0 uncles=0 elapsed=228.383µsINFO [06-28|10:04:27] Committed                                address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 hash=51bb55…fd1ee4 number=93INFO [06-28|10:04:27] Imported new chain segment               blocks=1 txs=0 mgas=0.000 elapsed=1.070ms   mgasps=0.000 number=93 hash=51bb55…fd1ee4 cache=0.00BINFO [06-28|10:04:27] Commit new mining work                   number=94 txs=0 uncles=0 elapsed=224.741µsINFO [06-28|10:04:29] Imported new chain segment               blocks=1 txs=0 mgas=0.000 elapsed=1.070ms   mgasps=0.000 number=94 hash=a9edde…9a0ecc cache=0.00BINFO [06-28|10:04:29] Commit new mining work                   number=95 txs=0 uncles=0 elapsed=225.095µs
  • mode 6— AlwaysRoundChange: The validator always sends a ROUND CHANGE message, thus not proposing any block even when it is its turn. In short, it always passes his turn to propose.
INFO [06-28|12:01:40] 🔗 block reached canonical chain          number=132 hash=9ced45…82be47INFO [06-28|12:01:41] Committed                                address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 hash=409ad2…d29665 number=138INFO [06-28|12:01:41] Successfully sealed new block            number=138 hash=409ad2…d29665INFO [06-28|12:01:41] 🔨 mined potential block                  number=138 hash=409ad2…d29665INFO [06-28|12:01:41] Commit new mining work                   number=139 txs=0 uncles=0 elapsed=276.691µsINFO [06-28|12:01:42] Committed                                address=0xB50001FfA410F4D03663D69540c1C8e1C017e7e6 hash=a9775c…ba30b0 number=139INFO [06-28|12:01:42] Imported new chain segment               blocks=1 txs=0 mgas=0.000 elapsed=2.186ms   mgasps=0.000 number=139 hash=a9775c…ba30b0 cache=0.00BINFO [06-28|12:01:42] Commit new mining work                   number=140 txs=0 uncles=0 elapsed=245.532µs
  • mode 7— BadBlock: The validator propose blocks in its turn with bad bodies. The situation is really similar to what happened in modes 2 and 3.

So now you can see some samples of rogue behaviours in a BFT environment, and play with them see what happens. By now, everybody had a slight knowledge of what a misbehaving node would look like, but to be honest, until know I didn’t really know their impact in a blockchain network. I still have to do way more tests to really understand the impact of these misbehaviours, however, this was a good first approach. The tests were performed with a small number of validators and a small network but it was a good exercise to check that these faulty implementations worked in our test-environment.

In the next weeks, I will test different network with biggest topologies and different configurations in the number and mode of rogue nodes to see their potential impact in a real system. If I achieve some nice conclusions expect a brand new post related to this matter in the upcoming weeks ;) .

References:

[1] Istanbul Fault Tolerant Consensus EIP650 — https://github.com/ethereum/EIPs/issues/650

[2] Geth Faulty Nodes implementation — https://github.com/getamis/go-ethereum/tree/feature/faulty_nodes

--

--