Skip to content

mimic: kv/RocksDBStore: ignore empty rocksdb transaction #36108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

wuhongsong
Copy link

@wuhongsong wuhongsong commented Jul 15, 2020

otherwise, disks associated to osds have small write io every bluefs_balance_interval seconds even on an idle ceph cluster.

Problem only exists in Luminous/Mimic.

Fixes: https://tracker.ceph.com/issues/46411
Signed-off-by: hzwuhongsong wojiaowugen@163.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Sorry, something went wrong.

@wuhongsong wuhongsong requested a review from a team July 15, 2020 08:19
@wuhongsong wuhongsong requested review from a team as code owners July 15, 2020 08:19
@wuhongsong wuhongsong changed the base branch from master to mimic July 15, 2020 08:21
@wuhongsong wuhongsong changed the title Abnormal io kv/RocksDBStore: ignore empty rocksdb transaction Jul 15, 2020
@wuhongsong
Copy link
Author

jenkins test make check

@tchaikov
Copy link
Contributor

@wuhongsong do you know where the empty transaction is created and submitted ?

@wuhongsong
Copy link
Author

wuhongsong commented Jul 15, 2020

@wuhongsong do you know where the empty transaction is created and submitted ?

@tchaikov

The empty rocksdb transaction is created and submitted in BlueStore::_kv_sync_thread().

After the pr( #26866  ) is applied, even the ceph cluster is idle, it will create a rocksdb transaction ( KeyValueDB::Transaction synct = db->get_transaction() ) and be submitted by submit_transaction_sync every bluefs_balance_interval seconds,But there is no real bufferlist to set the rocksdb transaction, so it’s empty.

After my change in this pr, there is no longer have the small write on an idel ceph cluster.

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting empty transactions to RocksDB does not make sense, unless it is used as synchronization point.
So, when woptions.sync==true, we should pass even empty transaction to RocksDB.

@wuhongsong
Copy link
Author

Submitting empty transactions to RocksDB does not make sense, unless it is used as synchronization point.
So, when woptions.sync==true, we should pass even empty transaction to RocksDB.

@aclamk
Maybe I can move the judgement of empty transaction to BlueStore::_kv_sync_thread(),then if the transaction is empty and bluefs_do_check_balance is true, it will not call submit_transaction_sync any more in _kv_sync_thread().

Do you think it’a a good idea?

@smithfarm smithfarm added this to the mimic milestone Jul 20, 2020
Copy link
Contributor

@smithfarm smithfarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please amend the commit message (git commit --amend) so it includes an explanation of why this fix cannot be cherry-picked from master, octopus, or nautilus.

For details, see https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst and, in particular, https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#cherry-picking-rules

(In short, we do not ordinarily accept fixes implemented directly in stable branch. Only in special cases when the fix is not applicable to any of the more recent branches.)

@smithfarm smithfarm changed the title kv/RocksDBStore: ignore empty rocksdb transaction mimic: kv/RocksDBStore: ignore empty rocksdb transaction Jul 20, 2020
@wuhongsong
Copy link
Author

Please amend the commit message (git commit --amend) so it includes an explanation of why this fix cannot be cherry-picked from master, octopus, or nautilus.

For details, see https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst and, in particular, https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#cherry-picking-rules

(In short, we do not ordinarily accept fixes implemented directly in stable branch. Only in special cases when the fix is not applicable to any of the more recent branches.)

Thanks for your advice.

@wuhongsong wuhongsong requested review from aclamk and smithfarm and removed request for a team July 21, 2020 08:22
Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting rid of empty transactions is a good idea.

ceph_assert(r == 0);
}

bluefs_do_check_balance = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to move "bluefs_do_check_balance = false"?

Copy link
Author

@wuhongsong wuhongsong Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast<RocksDBStore::RocksDBTransactionImpl *>(synct.get());
RocksDBStore::RocksWBHandler bat_txc;
_t->bat.Iterate(&bat_txc);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over submit batch just to check if it is empty is problematic, especially because kv_sync_thread is known bottleneck. Moreover, synct can be empty, but we could have already submitted some other batches to db ( _txc_apply_kv).
We should gather information whether we need to submit sync by other means.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclamk Thanks for your review.

In addition to the judgment of synct is empty, there is also the judgment of whether the bluefs_do_check_balance is true.

As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db. Right?

Thanks again for your review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuhongsong

  1. "As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db."
    I did not actually realized that. True, each time we add to kv_queue we signal kv_cond, so it IS impossible to get both timeout and non-empty kv_queue.
    I am only afraid that relying on this might bring us an error when we "optimize" notification frequency, as we did with deferred_done_queue.
  2. Iterating over bat_txc should be done only when bluefs_do_check_balance is true. Unless there any other situation for us to get empty transaction; which I do not see.

@smithfarm smithfarm dismissed their stale review July 21, 2020 09:17

@wuhongsong Thanks for including the sentence "Problem only exists in Luminous/Mimic" in the commit message. This is nice, but it would be even nicer if the commit message explained why the problem can't be fixed by cherry-picking one or more commits from master. For example, the problem doesn't exist in master because something was completely reworked, and that rework is not going to be backported. That said, I don't want to make problems for anyone. The sentence you added is enough for me.

@smithfarm smithfarm removed their request for review July 21, 2020 09:17
Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of empty transaction is good idea.

static_cast<RocksDBStore::RocksDBTransactionImpl *>(synct.get());
RocksDBStore::RocksWBHandler bat_txc;
_t->bat.Iterate(&bat_txc);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuhongsong

  1. "As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db."
    I did not actually realized that. True, each time we add to kv_queue we signal kv_cond, so it IS impossible to get both timeout and non-empty kv_queue.
    I am only afraid that relying on this might bring us an error when we "optimize" notification frequency, as we did with deferred_done_queue.
  2. Iterating over bat_txc should be done only when bluefs_do_check_balance is true. Unless there any other situation for us to get empty transaction; which I do not see.

@wuhongsong wuhongsong requested a review from aclamk August 3, 2020 07:28
_t->bat.Iterate(&bat_txc);
}
if (!(bluefs_do_check_balance && bat_txc.seen.empty())) {
// submit synct synchronously (block and wait for it to commit)
Copy link
Contributor

@aclamk aclamk Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuhongsong Have you considered using rocksdb::WriteBatch::Count() for testing if transaction is empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aclamk Thank you for your advice, I have modified it.

otherwise, disks associated to osds have small write io even on an idel ceph cluster.

Problem only exists in Luminous/Mimic.

Fixes: https://tracker.ceph.com/issues/46411
Signed-off-by: hzwuhongsong <hzwuhongsong@corp.netease.com>
@wuhongsong
Copy link
Author

wuhongsong commented Aug 25, 2020

@aclamk can you help to have a review again,thanks very much.

@tchaikov
Copy link
Contributor

tchaikov commented Jan 3, 2021

@aclamk ping?

@smithfarm
Copy link
Contributor

mimic EOL

@smithfarm smithfarm closed this Jan 27, 2021
@10199962
Copy link

Ihave met the same problem ,the growth rate of block.wal space is 20M per hour......
so huge...

@Gangbiao
Copy link
Contributor

How to fix this problem at last?

@wuhongsong
Copy link
Author

How to fix this problem at last?

the pr maybe can solve the problem, but mimic EOL

@wuhongsong
Copy link
Author

Ihave met the same problem ,the growth rate of block.wal space is 20M per hour...... so huge...

maybe you can have a look:
https://access.redhat.com/solutions/5173092

@wuhongsong wuhongsong deleted the abnormal_io branch March 30, 2023 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants