-
Notifications
You must be signed in to change notification settings - Fork 6.1k
mimic: kv/RocksDBStore: ignore empty rocksdb transaction #36108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jenkins test make check |
@wuhongsong do you know where the empty transaction is created and submitted ? |
The empty rocksdb transaction is created and submitted in BlueStore::_kv_sync_thread(). After the pr( #26866 ) is applied, even the ceph cluster is idle, it will create a rocksdb transaction ( KeyValueDB::Transaction synct = db->get_transaction() ) and be submitted by submit_transaction_sync every bluefs_balance_interval seconds,But there is no real bufferlist to set the rocksdb transaction, so it’s empty. After my change in this pr, there is no longer have the small write on an idel ceph cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting empty transactions to RocksDB does not make sense, unless it is used as synchronization point.
So, when woptions.sync==true, we should pass even empty transaction to RocksDB.
@aclamk Do you think it’a a good idea? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please amend the commit message (git commit --amend
) so it includes an explanation of why this fix cannot be cherry-picked from master, octopus, or nautilus.
For details, see https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst and, in particular, https://github.com/ceph/ceph/blob/master/SubmittingPatches-backports.rst#cherry-picking-rules
(In short, we do not ordinarily accept fixes implemented directly in stable branch. Only in special cases when the fix is not applicable to any of the more recent branches.)
Thanks for your advice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting rid of empty transactions is a good idea.
ceph_assert(r == 0); | ||
} | ||
|
||
bluefs_do_check_balance = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason to move "bluefs_do_check_balance = false"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I used it in the previous. like below:
https://github.com/ceph/ceph/pull/36108/files#diff-a9faffcf40600fd57aea5451cef5abe9R9509
src/os/bluestore/BlueStore.cc
Outdated
static_cast<RocksDBStore::RocksDBTransactionImpl *>(synct.get()); | ||
RocksDBStore::RocksWBHandler bat_txc; | ||
_t->bat.Iterate(&bat_txc); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over submit batch just to check if it is empty is problematic, especially because kv_sync_thread is known bottleneck. Moreover, synct can be empty, but we could have already submitted some other batches to db ( _txc_apply_kv).
We should gather information whether we need to submit sync by other means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aclamk Thanks for your review.
In addition to the judgment of synct is empty, there is also the judgment of whether the bluefs_do_check_balance is true.
As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db. Right?
Thanks again for your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db."
I did not actually realized that. True, each time we add to kv_queue we signal kv_cond, so it IS impossible to get both timeout and non-empty kv_queue.
I am only afraid that relying on this might bring us an error when we "optimize" notification frequency, as we did withdeferred_done_queue
. - Iterating over bat_txc should be done only when bluefs_do_check_balance is true. Unless there any other situation for us to get empty transaction; which I do not see.
@wuhongsong Thanks for including the sentence "Problem only exists in Luminous/Mimic" in the commit message. This is nice, but it would be even nicer if the commit message explained why the problem can't be fixed by cherry-picking one or more commits from master. For example, the problem doesn't exist in master because something was completely reworked, and that rework is not going to be backported. That said, I don't want to make problems for anyone. The sentence you added is enough for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting rid of empty transaction is good idea.
src/os/bluestore/BlueStore.cc
Outdated
static_cast<RocksDBStore::RocksDBTransactionImpl *>(synct.get()); | ||
RocksDBStore::RocksWBHandler bat_txc; | ||
_t->bat.Iterate(&bat_txc); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "As you konw,when the bluefs_do_check_balance is true,the kv_queue should be empty , so we impossibility have already submitted some other batches to db."
I did not actually realized that. True, each time we add to kv_queue we signal kv_cond, so it IS impossible to get both timeout and non-empty kv_queue.
I am only afraid that relying on this might bring us an error when we "optimize" notification frequency, as we did withdeferred_done_queue
. - Iterating over bat_txc should be done only when bluefs_do_check_balance is true. Unless there any other situation for us to get empty transaction; which I do not see.
_t->bat.Iterate(&bat_txc); | ||
} | ||
if (!(bluefs_do_check_balance && bat_txc.seen.empty())) { | ||
// submit synct synchronously (block and wait for it to commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wuhongsong Have you considered using rocksdb::WriteBatch::Count() for testing if transaction is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aclamk Thank you for your advice, I have modified it.
otherwise, disks associated to osds have small write io even on an idel ceph cluster. Problem only exists in Luminous/Mimic. Fixes: https://tracker.ceph.com/issues/46411 Signed-off-by: hzwuhongsong <hzwuhongsong@corp.netease.com>
@aclamk can you help to have a review again,thanks very much. |
@aclamk ping? |
mimic EOL |
Ihave met the same problem ,the growth rate of block.wal space is 20M per hour...... |
How to fix this problem at last? |
the pr maybe can solve the problem, but mimic EOL |
maybe you can have a look: |
otherwise, disks associated to osds have small write io every bluefs_balance_interval seconds even on an idle ceph cluster.
Problem only exists in Luminous/Mimic.
Fixes: https://tracker.ceph.com/issues/46411
Signed-off-by: hzwuhongsong wojiaowugen@163.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox