eth/downloader: fix invalid hash chain error due to head mini reorg #17839

karalabe · 2018-10-04T13:50:18Z

For a very long time now we're received issue reports about the synchronization failing with "invalid hash chain" errors. We managed to reproducibly hit it on Rinkeby lately, so here's the fix! 💥

The issue happens during fast sync, while the downloader is busy pulling the latest state trie. Since we added in-memory pruning about a year ago, the downloader cannot stick to a single pivot point throughout sync, rather it needs to push it forward if the chain progresses enough (since old state will become unavailable). This was handled by the downloader constantly rechecking for new headers while state sync is ongoing, slowly pushing them on top of the current queue.

A small mini reorg on the chain head however meant that the newly discovered header might actually not fit on top of the old queue, causing an invalid hash chain error, dropping the master peer. This can happen on mainnet if a previous head becomes an uncle, and this happens relatively often on Rinkeby if out-of-turn signers race for the next block.

I've tried to create a solution that's minimally invasive, since the downloader is one of our most sensitive pieces of code. The PR contains two tiny extensions to the header sync logic:

If we're not syncing skeletons any more, only small batches of headers (i.e. we're honing in on the head of the chain), we reach out to our local database and see where we're at sync progress wise. If there's plenty of work left (48 blocks currently), then instead of delivering all the new headers to the queue, we retain/delay the last 2. This ensures that as long as we're busy syncing, any tiny 2-block head reorgs get silently ignored.
After a while, the only missing headers will be the 2 we ignored, so every header fetch will just return the same 2 headers. We don't want the downloader to spin like crazy fetching these over and over again, so the second extension is that if we received headers, but have none to import (i.e. all were delayed), we sleep for 3 seconds before trying again to check for new data.

This seems to have completely fixed my Rinkeby sync, where I can do a full fast sync without hitting any invalid hash chain errors whatsoever.

holiman · 2018-10-05T07:20:38Z

eth/downloader/downloader.go

+							head = full
+						}
+					}
+					// If the head is way older that this batch, delay the last few headers


s/that/than

holiman · 2018-10-05T07:22:21Z

eth/downloader/downloader.go

+						}
+					}
+					// If the head is way older that this batch, delay the last few headers
+					if head+uint64(reorgProtThreshold) < headers[n-1].Number.Uint64() && n >= reorgProtHeaderDelay {


that means. If head+48 < headers[last] && len(headers) > 2 , then use headers[:-2]. So what if len(headers)== 1

Either we should pause there, or the initial check n > 0 should be n > reorgProtHeaderDelay

Headers are always fetched in batches of 128. Worst case scenario if there's exactly 1 header after a skeleton fetch, then we skip delaying and potentially hit the previous issue. I could add an extra clause to handle the case if len(headers) is less then reorgProtHeaderDelay.

holiman · 2018-10-05T07:28:26Z

eth/downloader/downloader.go

+			} else {
+				// No headers delivered, or all of them being delayed, sleep a bit and retry
+				p.log.Trace("All headers delayed, waiting")
+				select {


I don't know the full details about how this part works -- but you introduce a sleep here (3s) which wasn't there previously. Are you sure that's 1. sufficient time and 2. doesn't block other important things, like downloader events that cannot be delivered?

This mirrors line 826, which did the same thing, just when we fully exhausted our headers.

…thereum#17839)

karalabe added this to the 1.8.17 milestone Oct 4, 2018

karalabe mentioned this pull request Oct 4, 2018

Countless issues when doing basic GETH syncing #16935

Closed

holiman reviewed Oct 5, 2018

View reviewed changes

eth/downloader: fix invalid hash chain error due to head mini reorg

Verified

This commit was signed with the committer’s verified signature.

bagder Daniel Stenberg

GPG key ID: 5CC908FDB71E12C2

Verified
Learn about vigilant mode

6ee3b26

karalabe force-pushed the downloader-invalid-hash-chain-fix branch from 087ab30 to 6ee3b26 Compare October 5, 2018 07:45

holiman approved these changes Oct 5, 2018

View reviewed changes

karalabe merged commit 5d3b7bb into ethereum:master Oct 5, 2018

holiman mentioned this pull request Oct 5, 2018

retrieved hash chain is invalid #17847

Closed

gzliudan mentioned this pull request Feb 24, 2025

upgrade downloader XinFinOrg/XDPoSChain#881

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eth/downloader: fix invalid hash chain error due to head mini reorg #17839

eth/downloader: fix invalid hash chain error due to head mini reorg #17839

karalabe commented Oct 4, 2018

Uh oh!

holiman Oct 5, 2018

Uh oh!

holiman Oct 5, 2018

Uh oh!

karalabe Oct 5, 2018

Uh oh!

holiman Oct 5, 2018

Uh oh!

karalabe Oct 5, 2018

Uh oh!

eth/downloader: fix invalid hash chain error due to head mini reorg #17839

eth/downloader: fix invalid hash chain error due to head mini reorg #17839

Conversation

karalabe commented Oct 4, 2018

Uh oh!

holiman Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

holiman Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

karalabe Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

holiman Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!

karalabe Oct 5, 2018

Choose a reason for hiding this comment

Uh oh!