Tracking issue for next generation pallet-staking and elections

I have had lots of ideas about how to improve staking, its election algorithm and its configurability lately. Here is a digest and a tracking issue about it. 

### Current State of Affairs

`pallet-staking` is currently at its most sophisticated state. Aside from basic staking ops ([un]bonding, setting intentions, maintaining eras and exposures), it handles rewards, slashing, and election, each of which being quite a complex piece of machinery. The problem is, this _sophistication_ also translates to the code being _bloated_. At the time of this writing, staking's main `lib.rs` is ~3700 LoC. It worth noting the fact that most of the logic related to slashing and offchain workers are already placed in their own modules. Moreover, it has more than a hundred unit tests, some of which have been reported to be outdated already (https://github.com/paritytech/substrate/issues/5838). 

There are a few major problems with this code and how it is evolving.

1. Sooner or later, this monolith piece of code will accumulate too much technical debt and become increasingly hard to maintain.
2. Frame currently provides _one and only one_ staking pallet which has all the aforementioned complexities that I named. It would be a major drawback not to have any configurability over this. I really don't think all the chains that get built by substrate need for instance the offchain worker logic of staking. Same could be arguably said about slashing and rewards as well.

> That being said, there is a way to switch off the offchain workers, but the code is still very complicated and will be shipped to everyone who doesn't need it. Since there is a runtime check that disables the offchain machinery, the compiler will also have now way to know this and this will increase the wasm size by a small amount. 

3. I am increasingly worried if the unit tests are sound and will prevent all the basic errors, specially at the boundaries of these logical components: i.e. `slashing <-> election`.


Aside from aesthetic refactoring, we also need a bunch more election algorithms. Indeed, they are no longer `Phragmén`-related in any sense, hence we also need to clean and rename the `sp-phragmen` crate into something more general. The current pipeline is: 

Offchain: `seq_phragmen()` -> random iteration of `balancing()` (aka. `equalise()`) -> `reduce()` -> `compact()` -> submit to pool.
Onchain: `un_compact()` -> `validity_checks()`*.

The next generation pipeline will be: 

Offchain: `balanced_heuristic()`* -> `reduce()` -> `compact()` -> submit to pool.
Onchain: `un_compact()` -> `validity_checks()` -> `PJR_check()`.

We also need a `PJR_enabler()` implementation, only to be used if we reach the end of the era with no good submissions. In that case we run `seq_phragmen()` -> `PJR_enabler()`. 

* Validity checks are those that only make sure a solution is valid, and better than the best solution that we have had before. If we have no solutions for some reason, we accept any piece of sh*t solution as well. 
* New election scheme developed by Web3 Foundation. see [here](https://github.com/w3f/research-internal/blob/master/papers/NPoS_validator_selection.pdf) for more details. 

### The Plan Ahead

Here are the steps that I think are necessary:

- [x] 1. Rename `sp_phragmen` to something more general, and rename internal functions to reflect this. Now, since the crate is called `sp_phragmen`, `seq_phragmen` method is simply called `elect()`. This is wrong and should ba named `seq_phragmen`. Later on, we can add other algorithms to this list. 
  - Would also be good to close this now: https://github.com/paritytech/substrate/issues/4593
  - Pull Request: https://github.com/paritytech/substrate/pull/6245

- [x] 2. Implement `balanced_heuristic()` aka `phragmms`. This will be first of the few new algorithms that we will need. My initial study concluded that implementing it will not have that much complexity and should be straightforward. 
  - Pull request that adds the implementation https://github.com/paritytech/substrate/pull/6685.

- [x] 3. Clean all staking tests. Before making too much changes, I want to make sure that we have a solid army of unit tests to make sure any further change will not break anything. 
  - Related Issues include but are not limited to: https://github.com/paritytech/substrate/issues/5244,
  - https://github.com/paritytech/substrate/pull/9516


- [x] 4. Introduce `ElectionProvider` to staking and decouple staking from something that can provide a new set of election result at the end of an era. Something like the below sketch: 

```rust
// Staking trait
trait Trait {
	type Election: ElectionProvider;
	// ...
}

impl<T: Trait> Module<T> {
	fn new_session(n: BlockNumber) -> Option<T::AccountId> {
		if somehow_time_to_trigger_new_era {
			// prepare inputs
			let validates = <Validators<T>>::iter().collect();
			let nominators = <Nominators<T>>::iter().collect();
			let inp = ElectionInput { validators, nominators };
			let let maybe_new_election_output = T::Election::elect(inp).map(|output| {
				// do some processing with output, such as updating exposures etc.

				// extract only the winners Vec<T::AccountId>
				output.winners
			})
		} else {
			None
		}
	}
}

/// Something that can elect a new set of validators at the end of an era.
trait ElectionProvider {
	/// Elect new validator set.
	// The SUPER TRICKY part here is to chose a worthwhile API for this to cover all the cases.
	// A bonus not to be forgotten is that I want to use this for elections-phragmen pallet as well.
	fn elect(input: ElectionInput) -> Option<ElectionOutput>;
}

// To have an OnChain phragmen only, we just need a simple type to implement this.
struct OnChainSeqPhragmen;

impl ElectionProvider for OnChainSeqPhragmen {
	fn elect(input: ElectionInput) -> Option<ElectionOutput> {
		// just proxy a call to the primitives crates with the elections. No further call should be
		// needed.
	}
}

// To implement the current offchain machinery we will need a new pallet named:
// OffChainElectionProvider.
// /frame/offchain-election-provider/lib.rs

// this module might need to depend on staking to check solutions. We need access to staking
trait Trait: system::Trait /* + staking::Trait */ {
	type ElectionLookahead: Get<Self::BlockNumber>;
	type Call: Dispatchable + From<Call<Self>> + IsSubType<Module<Self>, Self> + Clone;
	type MaxIterations: Get<u32>;
	type MinSolutionScoreBump: Get<Perbill>;
	type UnsignedPriority: Get<TransactionPriority>;
}

// all the storage items that are only for staking
decl_storage! {
	pub SnapshotValidators get(fn snapshot_validators): Option<Vec<T::AccountId>>;
	pub SnapshotNominators get(fn snapshot_nominators): Option<Vec<T::AccountId>>;

	pub QueuedElected get(fn queued_elected): Option<ElectionResult<T::AccountId, BalanceOf<T>>>;
	pub QueuedScore get(fn queued_score): Option<ElectionScore>;

	pub EraElectionStatus get(fn era_election_status): ElectionStatus<T::BlockNumber>;

	pub IsCurrentSessionFinal get(fn is_current_session_final): bool = false;
}

decl_module! {
	fn submit_election_solution()  {}
	fn submit_election_solution_unsigned() {}
}

impl<T: Trait> ElectionProvider for Module<T> {
	fn elect(input: ElectionInput) -> Option<ElectionOutput> {
		// same logic as we have now: return best queued solution, else fallback to onchain staking. 
		<QueuedElected<T>>::take().or_else(||
			Self::fallback_seq_phragmen()
		);
	}
}

```

Notable challenges here are:
- The `offchain-election-provider` module will need to read staking's storage to check values, unless if we pass every data that it needs to it. Not worth IMO. 
- The `offchain-election-provider` will need to somehow still force staking to lock itself at some points in time. 


- [ ] 5. Once this is done, we could optionally investigate stripping down rewards and slashing from staking as well. Ideally, I would like to have a core staking modules that **only** does, as mentioned above: 

> [un]bonding, setting intentions, maintaining eras and exposures

And the rest can be plugged to it as additional modules. 


--- 

I the first 3 steps of the issue are somewhat mandatory to be done very soon in my opinion. The new election is a great security improvement (since we will check for `PJR` property) and the test cleanup has been due for a long time. 

As for the refactor, if it is feasible to do, it would be much better to do it sooner than later, since it will require a complicated migration in storage. Perhaps if it can be done prior to Polkadot's staking enablement, it would be much easier to do the migration while there is still a sudo key at hand. Nonetheless, it is not a big issue as we have to ship this to Kusama first anyhow. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for next generation pallet-staking and elections #6242

Current State of Affairs

The Plan Ahead

Roadmap

PJR-Check Track

Miner Track

ElectionProvider Track

1. Slashing and Self Stake

2. Events that should chill

11 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Tracking issue for next generation pallet-staking and elections #6242

Description

Current State of Affairs

The Plan Ahead

Activity

kianenigma commented on Aug 5, 2020

kianenigma commented on Aug 25, 2020

kianenigma commented on Aug 25, 2020

kianenigma commented on Aug 26, 2020

Roadmap

PJR-Check Track

Miner Track

ElectionProvider Track

kianenigma commented on Aug 31, 2020

1. Slashing and Self Stake

2. Events that should chill

burdges commented on Aug 31, 2020

stale commented on Jul 7, 2021

11 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions