Geth node sync always behind #16796

New issue

Closed

Geth node sync always behind#16796

kandrejevs

opened

on May 24, 2018

System information

Geth version: 1.8.8-stable-2688dab4
OS & Version: Ubuntu Server 16.04, 2 cores, 8GB of RAM, 170GB SSD storage (aws c5.large)

Expected behaviour

Geth should be able to sync with network

Actual behaviour

Geth node is always couple hundred blocks behind and is not able to be fully in sync, after 40 hours I get:

> eth.syncing
{
  currentBlock: 5667131,
  highestBlock: 5667209,
  knownStates: 111239860,
  pulledStates: 111225435,
  startingBlock: 5664481
}

Previosly Geth was crashing due out of memory, I increased swap to 20GB, node is stable now, but falls behind.

Steps to reproduce the behaviour

Launch geth node with these params:

geth --syncmode "fast" --cache=2048 --maxpeers=128 --metrics --rpc --rpcapi "db,eth,net,web3,personal,web3" --rpcaddr "0.0.0.0" --rpcport 8545 --rpcvhosts "geth.domain.com"

karalabe

Member

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.

Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

jsvisa

mentioned this

on May 24, 2018

cant sync #16795

ddwrtmenace

@karalabe Thanks for the thorough explanation. Will it eventually become impossible to sync with a SSD too?

deckarep

Contributor

@karalabe - great detailed answer that is useful for helping people understand intricacies of running a node.

Question: is there any known metrics on what it takes to time wise to bootstrap a full node with full history back to the genesis block? Hours? Days? Months? Just looking for ballpark as hardware and environment are all huge variables. Also what is the success rate of getting into a full synced state? Is 100% reliable and just a factor of time or do people find that they regularly have problems trying to sync fully?

Rohithzr

@karalabe - nice explanation but the issue I am facing is that my node reaches the full-sync i.e. eth.syncing becomes false, but after sometime I check and it is now 100 blocks behind, sometimes even 1000 blocks.
I have it running on a 300 GiG EBS (SSD) with a 2 Core 4 GiG Memory on Xeon 3.0 GHz configuration.
It always reaches a full sync but then lags behind sometimes.

So Is this behaviour normal? and is there a way to avoid this situation?

references:
ec2 instance types

kandrejevs

Author

@Rohithzr you have to use i3.large instance on aws to be able to sync. the problem is that EBS have large latency and that is the reason why it fails to sync. i3 have nvme drives, that have smaller latency.

geth is a really terrible piece of software if you have to pay 130$ a month in order to be even able to run node on the cloud.

Rohithzr

@kandrejevs I am using external EBS volumes for geth which gives me NVME drives on the C5, so is there any other specific reason to go for i3?

kandrejevs

Author

@Rohithzr only c5d instanes use nvme's, ebs gp2 is general purpose ssd with higher latency than nvme.

for me i3 instances sync relatively fast, I think less than 8 hours, c5 took almost a week

Rohithzr

@kandrejevs you are correct, but scaling the instances this way would mean wasting the 15 gig ram and once the eth blockchain size passes 475 GiG then I will have to scale it to the i3.xlarge with 30 GiG ram, and pay double. These volumes cannot be resized and hence give less control on it. Is there a benchmark that can compare?

kandrejevs

Author

8GB of RAM is too little anyway, I often experienced random crashes because of that, had to add an additional 20GB swap file to ensure the system is somewhat stable. I did run geth through supervisor so it is always running, those random crashes due memory outage even corrupted local database twice and had to restore from AMI. I would say 15GB is bare minimum anyway, so I would not worry about 15GB of RAM.

about scaling- well that is the sad reality about blockchain and geth and nothing you can do about it. it is what it is- a really expensive to run software that brings little if none value at the current state.

the only alternative is to run geth on light mode and rely on external peers only, but due low number of peers, the system is not stable at all and you can get outages for hours or even days where your node is not able to find any peers and is pretty much useless. i3 is your best bet on aws to run geth somewhat stable and reliable if it is mission critical.

Rohithzr

@kandrejevs thanks for your inputs, i would still experiment in some ways and maybe a cluster would provide more reliability or maybe something else. If nothing then I will probably go for i3, and yeah mine runs on supervisor too and is giving me 70 to 80% reliability, so i guess ill see what the future holds.

juergenhoetzel

I was able to increase performance significantly by increasing the number of database handlers (open files cache): Add new command line option --database-handles (#16796)

juergenhoetzel

added a commit that references this issue

on Sep 7, 2018

Add new command line option --database-handles (ethereum#16796)

d23ad53

Rohithzr

@juergenhoetzel so i currently am getting good enough reliability after I scaled the server to r5-large (2GiG, 16GiG), increased the cache. So does this option help me in someway?

13 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Geth node sync always behind #16796

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

13 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Geth node sync always behind #16796

Description

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Activity

karalabe commented on May 24, 2018

ddwrtmenace commented on May 31, 2018

deckarep commented on Jun 1, 2018

Rohithzr commented on Aug 8, 2018

kandrejevs commented on Aug 8, 2018

Rohithzr commented on Aug 8, 2018

kandrejevs commented on Aug 8, 2018

Rohithzr commented on Aug 9, 2018

kandrejevs commented on Aug 9, 2018

Rohithzr commented on Aug 9, 2018

juergenhoetzel commented on Sep 6, 2018

Rohithzr commented on Sep 7, 2018

13 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions