Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

Latest commit

 

History

History
63 lines (53 loc) · 4.21 KB

roadmap.md

File metadata and controls

63 lines (53 loc) · 4.21 KB

Data Migration Roadmap

Primary Focus

Usability Improvement

  • bring relay log support back in v2.0 #1234
    • What: binlog replication unit can read binlog events from relay log, as it did in v1.0
    • Why:
      • AWS Aurora and some other RDS may purge binlog ASAP, but full dump & import may take a long time
      • some users will create many data migration tasks for a single upstream instance, but it's better to avoid pull binlog events many times
  • support to migrate exceeded 4GB binlog file automatically #989
    • What: exceeded 4GB binlog file doesn't interrupt the migration task
    • Why: some operations (like DELETE FROM with large numbers of rows, CREATE TABLE new_tbl AS SELECT * FROM orig_tbl) in upstream may generate large binlog files
  • better configuration file #775
    • What: avoid misusing for configuration files
    • Why: many users meet problem when write configuration file but don’t know how to deal with it
  • solve other known usability issues (continuous work)
    • What: solve usability issues recorded in the project
    • Why: a lot of usability issues that have been not resolved yet, we need to stop user churn

New features

  • stop/pause until reached the end of a transaction #1095
    • What: replicate binlog events until reached the end of the current transaction when stopping or pausing the task
    • Why: achieve transaction consistency as the upstream MySQL after stopped/paused the task
  • stop at the specified position/GTID #348
    • What: stop to replicate when reached the specified binlog position or GTID, like until_option in MySQL
    • Why: control over the data we want to replicate more precisely
  • update source config online #1076
    • What: update source configs (like upstream connection arguments) online
    • Why: switch from one MySQL instance to another in the replica group easier
  • provides a complete online replication checksum feature #1097
    • What:
      • check data without stopping writing in upstream
      • no extra writes in upstream
    • Why: found potential inconsistency earlier
  • support DM v2.0 in TiDB Operator tidb-operator#2868
    • What: use TiDB-Operator to manage DM 2.0
  • use Lightning to import full dumped data #405
    • What: use Lighting as the full data load unit
    • Why:
      • Lightning is stabler than current Loader in DM
      • Lightning support more source data formats, like CSV
      • Lightning support more storage drivers, like AWS S3

Performance Improvement

  • flush incremental checkpoint asynchronously #605
    • What: flush checkpoint doesn't block replication for DML statements
    • Why: no block or serialization for DML replication should get better performance

Out of Scope currently

  • Only supports synchronization of MySQL protocol Binlog to TiDB cluster but not all MySQL protocol databases
    • Why: some MySQL protocol databases are not binlog protocol compatible
  • Provides a fully automated shard DDL merge and synchronization solution
    • Why: many scenes are difficult to automate
  • Replicates multiple upstream MySQL sources with one dm-worker
    • Why: Large amount of development work and many uncertainties