Skip to content

Adding a new column will cause the error "table columns structure in zookeeper is different from local table structure" #13848

Closed
@IVitamin-C

Description

@IVitamin-C

When I execute this command to add a new column, it will continue to run normally after adding. If you restart the server or execute "deache table", it will not be able to continue running. The error "table columns structure in zookeeper is different from local table structure" is returned

version: 20.3.5.21
table engine:ReplicatedMergeTree
execute command : alter table test.test1 on cluster testcluster add column if not exists test_add_column Int64 default 0 comment 'test add column';
error :DB::Exception: There was an error on [ck1.:9000]: Code: 122, e.displayText() = DB::Exception: Table columns structure in ZooKeeper is different from local table structure (version 20.3.5.21 (official build)).

Activity

added
bugConfirmed user-visible misbehaviour in official release
on Aug 18, 2020
IVitamin-C

IVitamin-C commented on Aug 18, 2020

@IVitamin-C
Author

If the server restarts, if the table of new fields is not deleted, the server will not be able to start. If you execute "deache table", "DB:: exception: table columns structure in zookeeper is different from local table structure (version 20.3.5.21)" will be returned after "attach table on cluster testcluster"

Fanduzi

Fanduzi commented on Aug 18, 2020

@Fanduzi
Contributor

I also face a similar situation in our RC environment.

bj2-all-clickhouse-test-02 :) select version();

SELECT version()

┌─version()─┐
│ 20.3.4.10 │
└───────────┘

we add one column to a table A yesterday,

structure of table A

CREATE TABLE inno_topping_user.active_duration_local (
`I_USER_ID` UInt64 COMMENT '用户id',
`I_PROVINCE_ID` UInt64 COMMENT '省份id',
`I_DEVICE_TYPE` UInt8 COMMENT '设备类型',
`I_CHANNEL_ID` UInt8 COMMENT '渠道号',
`V_DEVICE_UUID` String COMMENT '设备uuid',
`V_VERSION` String COMMENT '版本号',
`D_ACTIVE_START` Int64 COMMENT '开始活跃时间',
`D_ACTIVE_END` Int64 COMMENT '结束活跃时间',
`I_DURATION` Int64,
`D_DATE` Date DEFAULT toDate(now()) COMMENT '日期') ENGINE = ReplicatedMergeTree('/clickhouse/inno_topping_user/tables/{layer}-{shard}/active_duration', '{replica}') PARTITION BY D_DATE ORDER BY (I_PROVINCE_ID, I_DEVICE_TYPE, I_CHANNEL_ID, V_VERSION) SETTINGS index_granularity = 8192

the alter query is

alter table inno_topping_user.active_duration_local on cluster ck_cluster add column V_IP String DEFAULT 1 after V_VERSION;

so after alter the structure of table A is

CREATE TABLE inno_topping_user.active_duration_local (
`I_USER_ID` UInt64 COMMENT '用户id',
`I_PROVINCE_ID` UInt64 COMMENT '省份id',
`I_DEVICE_TYPE` UInt8 COMMENT '设备类型',
`I_CHANNEL_ID` UInt8 COMMENT '渠道号',
`V_DEVICE_UUID` String COMMENT '设备uuid',
`V_VERSION` String COMMENT '版本号',
`V_IP` String DEFAULT 1,
`D_ACTIVE_START` Int64 COMMENT '开始活跃时间',
`D_ACTIVE_END` Int64 COMMENT '结束活跃时间',
`I_DURATION` Int64,
`D_DATE` Date DEFAULT toDate(now()) COMMENT '日期') ENGINE = ReplicatedMergeTree('/clickhouse/inno_topping_user/tables/{layer}-{shard}/active_duration', '{replica}') PARTITION BY D_DATE ORDER BY (I_PROVINCE_ID, I_DEVICE_TYPE, I_CHANNEL_ID, V_VERSION) SETTINGS index_granularity = 8192

it all runs ok, data were inserted, nothing wrong happened. But today our clickhouse killed by oom-killer, then we restart the clickhouse, but get these error

2020.08.18 14:10:11.445420 [ 26758 ] {} <Error> Application: Caught exception while loading metadata: Code: 122, e.displayText() = DB::Exception: Table columns structure in ZooKeeper is different from local table structure: Cannot attach table `inno_toppin
g_user`.`active_duration_local` from metadata file /data/clickhouse/node1/metadata/inno_topping_user/active_duration_local.sql from query ATTACH TABLE active_duration_local (`I_USER_ID` UInt64 COMMENT '用户id', `I_PROVINCE_ID` UInt64 COMMENT '省份id', `I_D
EVICE_TYPE` UInt8 COMMENT '设备类型', `I_CHANNEL_ID` UInt8 COMMENT '渠道号', `V_DEVICE_UUID` String COMMENT '设备uuid', `V_VERSION` String COMMENT '版本号', `V_IP` String DEFAULT 1, `D_ACTIVE_START` Int64 COMMENT '开始活跃时间', `D_ACTIVE_END` Int64 COMMEN
T '结束活跃时间', `I_DURATION` Int64, `D_DATE` Date DEFAULT toDate(now()) COMMENT '日期') ENGINE = ReplicatedMergeTree('/clickhouse/inno_topping_user/tables/{layer}-{shard}/active_duration', '{replica}') PARTITION BY D_DATE ORDER BY (I_PROVINCE_ID, I_DEVIC
E_TYPE, I_CHANNEL_ID, V_VERSION) SETTINGS index_granularity = 8192, Stack trace (when copying this message, always include the lines below):

0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x102e0d8c in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x8f2d989 in /usr/bin/clickhouse
2. ? @ 0xd5fab04 in /usr/bin/clickhouse
3. DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, 
DB::StorageID const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::StorageInMemoryMetadata const&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > cons
t&, DB::MergeTreeData::MergingParams const&, std::__1::unique_ptr<DB::MergeTreeSettings, std::__1::default_delete<DB::MergeTreeSettings> >, bool) @ 0xd5da56b in /usr/bin/clickhouse
4. ? @ 0xd9585da in /usr/bin/clickhouse
5. std::__1::__function::__func<std::__1::shared_ptr<DB::IStorage> (*)(DB::StorageFactory::Arguments const&), std::__1::allocator<std::__1::shared_ptr<DB::IStorage> (*)(DB::StorageFactory::Arguments const&)>, std::__1::shared_ptr<DB::IStorage> (DB::Storage
Factory::Arguments const&)>::operator()(DB::StorageFactory::Arguments const&) @ 0xd95baf3 in /usr/bin/clickhouse
6. DB::StorageFactory::get(DB::ASTCreateQuery const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, DB::Context&, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, bool) const @ 
0xd4fc66c in /usr/bin/clickhouse
7. DB::createTableFromAST(DB::ASTCreateQuery, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool) @ 
0xcedc88e in /usr/bin/clickhouse
8. ? @ 0xced33bf in /usr/bin/clickhouse
9. ? @ 0xced3b71 in /usr/bin/clickhouse
10. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x8f51c4b in /usr/bin/clickhouse
11. ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<
void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()::operator()() const @ 0x8f528c4 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x8f50b4b in /usr/bin/clickhouse
13. ? @ 0x8f4f00f in /usr/bin/clickhouse
14. start_thread @ 0x7dd5 in /usr/lib64/libpthread-2.17.so
15. clone @ 0xfdead in /usr/lib64/libc-2.17.so
 (version 20.3.4.10 (official build))

this is not the first time we see this error.
I don’t know if it should be handled like this, but when I perform the following operations clickhouse starts successfully and the table is restored

rename metadata sql file

cd /data/clickhouse/node1/metadata/inno_topping_user
mv active_duration_local.sql active_duration_local.sql.bak

then, start clickhouse-server.

check metadata in zookeeper

[zk: localhost:2181(CONNECTED) 20] get /clickhouse/inno_topping_user/tables/01-01/active_duration/replicas/bj2-all-clickhouse-test-01/columns
columns format version: 1
11 columns:
`I_USER_ID` UInt64      COMMENT \'用户id\'
`I_PROVINCE_ID` UInt64  COMMENT \'省份id\'
`I_DEVICE_TYPE` UInt8   COMMENT \'设备类型\'
`I_CHANNEL_ID` UInt8    COMMENT \'渠道号\'
`V_DEVICE_UUID` String  COMMENT \'设备uuid\'
`V_VERSION` String      COMMENT \'版本号\'
`V_IP` String   DEFAULT 1
`D_ACTIVE_START` Int64  COMMENT \'开始活跃时间\'
`D_ACTIVE_END` Int64    COMMENT \'结束活跃时间\'
`I_DURATION` Int64
`D_DATE` Date   DEFAULT toDate(now())   COMMENT \'日期\'

remove V_IP String DEFAULT 1, reset columns value

zkCli.sh set /clickhouse/inno_topping_user/tables/01-01/active_duration/replicas/bj2-all-clickhouse-test-01/columns "columns format version: 1
10 columns:
\`I_USER_ID\` UInt64      COMMENT \'用户id\'
\`I_PROVINCE_ID\` UInt64  COMMENT \'省份id\'
\`I_DEVICE_TYPE\` UInt8   COMMENT \'设备类型\'
\`I_CHANNEL_ID\` UInt8    COMMENT \'渠道号\'
\`V_DEVICE_UUID\` String  COMMENT \'设备uuid\'
\`V_VERSION\` String      COMMENT \'版本号\'
\`D_ACTIVE_START\` Int64  COMMENT \'开始活跃时间\'
\`D_ACTIVE_END\` Int64    COMMENT \'结束活跃时间\'
\`I_DURATION\` Int64
\`D_DATE\` Date   DEFAULT toDate(now())   COMMENT \'日期\'
"

restore table

vim active_duration_local.sql.bak , remove line ``V_IP` String DEFAULT 1,`
mv active_duration_local.sql.bak active_duration_local.sql

attach table

ATTACH TABLE inno_topping_user.active_duration_local;

the table was stored, and is able to select/insert

I realy don't know why... It's really uncomfortable, I don't know if the production environment will encounter this problem.

filimonov

filimonov commented on Aug 18, 2020

@filimonov
Contributor

Similar: #13673

IVitamin-C

IVitamin-C commented on Aug 18, 2020

@IVitamin-C
Author

After testing, it is found that if the newly added column is of string type, this problem will not occur. If it is of int type, it will appear, with 100% recurrence

Fanduzi

Fanduzi commented on Aug 18, 2020

@Fanduzi
Contributor

After testing, it is found that if the newly added column is of string type, this problem will not occur. If it is of int type, it will appear, with 100% recurrence

hi.
alter table inno_topping_user.active_duration_local on cluster ck_cluster add column V_IP String DEFAULT 1 after V_VERSION;

maybe string default int will appear too?
I did not test, if you are interested, you can help test it, my case is this situation

IVitamin-C

IVitamin-C commented on Aug 18, 2020

@IVitamin-C
Author

After testing, it is found that if the newly added column is of string type, this problem will not occur. If it is of int type, it will appear, with 100% recurrence

hi.
alter table inno_topping_user.active_duration_local on cluster ck_cluster add column V_IP String DEFAULT 1 after V_VERSION;

maybe string default int will appear too?
I did not test, if you are interested, you can help test it, my case is this situation

String is not allowed to default to 1. It can be '1' and converted once during upper processing

IVitamin-C

IVitamin-C commented on Aug 26, 2020

@IVitamin-C
Author

At present, the upgrade version can solve this problem. We will upgrade the production environment cluster

filimonov

filimonov commented on Aug 27, 2020

@filimonov
Contributor

I've got one more error report about the same:

After attempting to update a DDL for one of our clickhouse tables, our alter query command halted. We rolled back the column addition, but after restarting one of the hosts we see this error: Table columns structure in ZooKeeper is different from local table

alesapin

alesapin commented on Aug 27, 2020

@alesapin
Member

At present, the upgrade version can solve this problem. We will upgrade the production environment cluster

Yes we had several bugs in early 20.3 versions with different metadata structure. Just try to update and check that issue reproduces.

I'll try to investigate, it reproduces only when the added column has a comment?

IVitamin-C

IVitamin-C commented on Sep 3, 2020

@IVitamin-C
Author

At present, the upgrade version can solve this problem. We will upgrade the production environment cluster

Yes we had several bugs in early 20.3 versions with different metadata structure. Just try to update and check that issue reproduces.

I'll try to investigate, it reproduces only when the added column has a comment?

At present, it is found that distributed DDL for updating or deleting columns cannot be executed in some tables,progress 0%
Version has been updated to 20.6.

IVitamin-C

IVitamin-C commented on Sep 3, 2020

@IVitamin-C
Author

At present, the upgrade version can solve this problem. We will upgrade the production environment cluster

Yes we had several bugs in early 20.3 versions with different metadata structure. Just try to update and check that issue reproduces.
I'll try to investigate, it reproduces only when the added column has a comment?

At present, it is found that distributed DDL for updating or deleting columns cannot be executed in some tables,progress 0%
Version has been updated to 20.6.

DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. Cannot alter, Stack trace (when copying this message, always include the lines below):

IVitamin-C

IVitamin-C commented on Sep 3, 2020

@IVitamin-C
Author

At present, the upgrade version can solve this problem. We will upgrade the production environment cluster

Yes we had several bugs in early 20.3 versions with different metadata structure. Just try to update and check that issue reproduces.
I'll try to investigate, it reproduces only when the added column has a comment?

At present, it is found that distributed DDL for updating or deleting columns cannot be executed in some tables,progress 0%
Version has been updated to 20.6.

DB::Exception: Metadata on replica is not up to date with common metadata in Zookeeper. Cannot alter, Stack trace (when copying this message, always include the lines below):

The current situation is that new fields can be added. If the new fields are deleted immediately, some problems may occur. After these problems, all DDL statements will execute exceptions. progress 0%
Restarting the cluster did not solve the problem, so for stability, we migrated the data in the table to a new table and deleted the table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releasecomp-replicationReplicated tables

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @filimonov@alesapin@alexey-milovidov@Fanduzi@IVitamin-C

        Issue actions

          Adding a new column will cause the error "table columns structure in zookeeper is different from local table structure" · Issue #13848 · ClickHouse/ClickHouse