-
Notifications
You must be signed in to change notification settings - Fork 20.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/downloader: fix invalid hash chain error due to head mini reorg #17839
eth/downloader: fix invalid hash chain error due to head mini reorg #17839
Conversation
eth/downloader/downloader.go
Outdated
head = full | ||
} | ||
} | ||
// If the head is way older that this batch, delay the last few headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/that/than
eth/downloader/downloader.go
Outdated
} | ||
} | ||
// If the head is way older that this batch, delay the last few headers | ||
if head+uint64(reorgProtThreshold) < headers[n-1].Number.Uint64() && n >= reorgProtHeaderDelay { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that means. If head+48 < headers[last] && len(headers) > 2
, then use headers[:-2]
. So what if len(headers)== 1
Either we should pause there, or the initial check n > 0
should be n > reorgProtHeaderDelay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers are always fetched in batches of 128. Worst case scenario if there's exactly 1 header after a skeleton fetch, then we skip delaying and potentially hit the previous issue. I could add an extra clause to handle the case if len(headers) is less then reorgProtHeaderDelay.
} else { | ||
// No headers delivered, or all of them being delayed, sleep a bit and retry | ||
p.log.Trace("All headers delayed, waiting") | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the full details about how this part works -- but you introduce a sleep here (3s
) which wasn't there previously. Are you sure that's 1. sufficient time and 2. doesn't block other important things, like downloader events that cannot be delivered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mirrors line 826, which did the same thing, just when we fully exhausted our headers.
087ab30
to
6ee3b26
Compare
For a very long time now we're received issue reports about the synchronization failing with "invalid hash chain" errors. We managed to reproducibly hit it on Rinkeby lately, so here's the fix! 💥
The issue happens during fast sync, while the downloader is busy pulling the latest state trie. Since we added in-memory pruning about a year ago, the downloader cannot stick to a single pivot point throughout sync, rather it needs to push it forward if the chain progresses enough (since old state will become unavailable). This was handled by the downloader constantly rechecking for new headers while state sync is ongoing, slowly pushing them on top of the current queue.
A small mini reorg on the chain head however meant that the newly discovered header might actually not fit on top of the old queue, causing an invalid hash chain error, dropping the master peer. This can happen on mainnet if a previous head becomes an uncle, and this happens relatively often on Rinkeby if out-of-turn signers race for the next block.
I've tried to create a solution that's minimally invasive, since the downloader is one of our most sensitive pieces of code. The PR contains two tiny extensions to the header sync logic:
This seems to have completely fixed my Rinkeby sync, where I can do a full fast sync without hitting any invalid hash chain errors whatsoever.