-
Notifications
You must be signed in to change notification settings - Fork 6.1k
rbd: implement image qos in tokenbucket algorithm. #17032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/common/Throttle.h
Outdated
@@ -12,7 +12,9 @@ | |||
#include <map> | |||
#include <mutex> | |||
|
|||
#include "Cond.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: To maintain consistency throughout the code
Can you kindly explain this condition...
Why header file name starts from Capital.
I believe in ceph, headers starts from lower cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkumar50 That's an existing file -- there are several headers that start w/ upper case (including nearly 100% of the librbd headers to match the class name contained within it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dillaman Thanks for clarification.. All good from my end
src/common/Throttle.h
Outdated
@@ -330,4 +332,71 @@ class OrderedThrottle { | |||
uint32_t waiters = 0; | |||
}; | |||
|
|||
|
|||
class Bucket { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a standalone class? Can it just be folded into TokenBucketThrottle
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can put it in TokenBucketThrottle. The reason I put it outside is that I was thinking is there a possibility we need to implement another LeakyBucketThrottle which will use the Bucket as well? But that's just a possibility in the future, and we can move it outside on that day. I will put Bucket into TokenBucketThrottle now.
src/common/Throttle.h
Outdated
void cancel_tokens(); | ||
|
||
public: | ||
int get(int64_t c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IO path needs to be non-blocking, so this would need to support Context
callback. Since it would be wasteful to construct a Context
for the allowed case, you could provide a templated get
method that had a parameters for the callback class and function. If a throttle is required, it could create a Context
to invoke the provided class/function when tokens are available and return false; otherwise it returns true to indicate that throttling isn't required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx for your detailed suggestion. will try an async-way for this.
src/common/Throttle.h
Outdated
public: | ||
int get(int64_t c); | ||
void set_max(int64_t m); | ||
void set_avg(int64_t avg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: set_average
src/common/options.cc
Outdated
@@ -5226,6 +5226,14 @@ static std::vector<Option> get_rbd_options() { | |||
Option("rbd_journal_max_concurrent_object_sets", Option::TYPE_INT, Option::LEVEL_ADVANCED) | |||
.set_default(0) | |||
.set_description("maximum number of object sets a journal client can be behind before it is automatically unregistered"), | |||
|
|||
Option("rbd_image_qos_iops_read_limit", Option::TYPE_UINT, Option::LEVEL_ADVANCED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: _image_
is unnecessary since all rbd_
options are for an image. Might also be good to name these rbd_read_iops_max
and rbd_write_iops_max
.
src/librbd/ImageCtx.h
Outdated
@@ -88,6 +89,9 @@ namespace librbd { | |||
ImageWatcher<ImageCtx> *image_watcher; | |||
Journal<ImageCtx> *journal; | |||
|
|||
TokenBucketThrottle *read_iops_throttle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these should be within librbd::io::ImageRequestWQ
src/librbd/ImageCtx.h
Outdated
@@ -179,6 +183,8 @@ namespace librbd { | |||
uint32_t readahead_trigger_requests; | |||
uint64_t readahead_max_bytes; | |||
uint64_t readahead_disable_after_bytes; | |||
uint64_t write_iops_limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these are unused
src/librbd/io/ImageRequestWQ.cc
Outdated
@@ -597,6 +597,12 @@ void *ImageRequestWQ<I>::_void_dequeue() { | |||
return nullptr; | |||
} | |||
|
|||
if (write_op) { | |||
m_image_ctx.write_iops_throttle->get(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will block the thread pool's worker thread. These need to be asynchronous callbacks when throttling is required. Also, if I set me write qos to X and my read to Y (where Y > X), all my reads can be throttled if I go past my write limit. Perhaps just start w/ a single rbd_iops_max
configuration parameter that throttles both -- or provide separate read and write queues for throttled IOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make this async-ed, I believe the problem you mentioned will be gone, right?
Hi @dillaman I found I can't catch up a good idea to make it asynchronoused. If I do the work like #15799 to So I think the solution to solve the problem you mentioned at #17032 (comment) is seperating read and write queue. Then a write IO would be blocked when it can't get token, but no need to worry, the all io behind this write in the same queue are all writing IO. and they should be blocked as well. At the same time, the reading IO is being processed in another queue without any blocking. So I incline start with only a rbd_iops_max as you suggested. And then provide separate queues for read and write later. What's your suggestion? |
@yangdongsheng That's what I was trying to say in my last comment. Either (for now) don't try to separate the read and write IOPS throttle so that it doesn't matter, or else you will need to add two "throttled requests" lists to Alternatively, instead of the two lists, if the |
b2eba40
to
e3aad72
Compare
@dillaman what about this version as the start with only |
@yangdongsheng Fine w/ me |
Hi @dillaman I am preparing another pr for read or write qos. But this PR is for both read and write. Please help to review. thanx a lot. |
src/common/Throttle.cc
Outdated
|
||
bool TokenBucketThrottle::Bucket::_wait(int64_t c) | ||
{ | ||
utime_t start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks start
is unused variable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkumar50 yea, thanx
@yangdongsheng Looks like all my existing comments need to be addressed -- specifically with regard to async operation. |
36e644b
to
85357b1
Compare
@dillaman Hi Jason, what about this version? There is a list in TokenBucketThrottle which will hold the callback for each IO if need blocked. |
85357b1
to
83673ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If bursting is to eventually be supported, shouldn't the tokens being added to the bucket represent an IO operation where the timer should be draining the bucket at the IOPS average and the max is set to the burst IOPS?
src/librbd/Operations.cc
Outdated
@@ -1464,7 +1466,8 @@ void Operations<I>::execute_metadata_remove(const std::string &key, | |||
ldout(cct, 5) << this << " " << __func__ << ": key=" << key << dendl; | |||
|
|||
operation::MetadataRemoveRequest<I> *request = | |||
new operation::MetadataRemoveRequest<I>(m_image_ctx, on_finish, key); | |||
new operation::MetadataRemoveRequest<I>(m_image_ctx, | |||
new C_NotifyUpdate<I>(m_image_ctx, on_finish), key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spacing -- align w/ previous line's parameter (if room) or bump all the parameters down and indent.
auto &expect = EXPECT_CALL(get_mock_io_ctx(mock_image_ctx.md_ctx), | ||
exec(mock_image_ctx.header_oid, _, StrEq("rbd"), StrEq("metadata_list"), _, _, _)); | ||
if (r < 0) { | ||
expect.WillOnce(Return(r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for the failure path
src/common/Throttle.cc
Outdated
@@ -1,6 +1,8 @@ | |||
// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- | |||
// vim: ts=8 sw=2 smarttab | |||
|
|||
#include <boost/bind.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should be able to use std::bind
src/common/Throttle.h
Outdated
Bucket(CephContext *cct, const std::string& n, int64_t m = 0); | ||
~Bucket(); | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: duplicate
src/common/Throttle.h
Outdated
Mutex lock; | ||
list<Cond*> cond; | ||
int64_t get(int64_t c); | ||
friend class TokenBucketThrottle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: just make these structs and remove the need for friend class
-- the C++11 standard also already provide access to nested classes.
src/common/Throttle.h
Outdated
}; | ||
|
||
class Blocker { | ||
int64_t token_requested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: tokens_requested
-- why signed int throughout?
src/common/Throttle.h
Outdated
CephContext *cct; | ||
const std::string name; | ||
std::atomic<int64_t> remain = { 0 }, max = { 0 }; | ||
Mutex lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this require its own lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To protect the remain and max. Yes, the remain and max are both atomic but we need to make sure a function atomic. e.g, function put(c)
:
if (remain + c <= max)
remain += c;
If the remain is 2 and max is 5. There are two threads A and B are doing put(2).
Thread A Thread B
if (remain + c <= max) <---- 2 + 2 < 5
if (remain + c <= max) <---- 2 + 2 < 5
remain += c <-------remain = 4
remain += c <-------- remain = 6
Then we probably get a remain > max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... but since Bucket
is only accessible via TokenBucketThrottle
and it already has a lock, why not just have TokenBucketThrottle
use its lock to protect its nested class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean TokenBucketThrottle::m_blockers_lock, this is to protect the blockers_list. but not to protect fields in bucket. for example, the set_max() is not protected by m_blockers_lock, will be in a race condition if we don't have a Bucket::lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TokenBucketThrottle::set_max
can easily take the lock before invoking the method in TokenBucketThrottle::Bucket::set_max
-- and TokenBucketThrottle::m_blockers_lock
can be renamed TokenBucketThrottle::m_lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.... okey, not bad. thanx.
src/common/Throttle.cc
Outdated
{ | ||
} | ||
|
||
TokenBucketThrottle::Bucket::~Bucket() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unused -- remove
src/common/Throttle.h
Outdated
@@ -12,7 +12,9 @@ | |||
#include <map> | |||
#include <mutex> | |||
|
|||
#include "Cond.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not used
src/common/Throttle.h
Outdated
const std::string name; | ||
std::atomic<int64_t> remain = { 0 }, max = { 0 }; | ||
Mutex lock; | ||
list<Cond*> cond; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not used
@dillaman yes, the max could be different with average if we support bursting. for example, if we set |
46999e0
to
0c2413b
Compare
@ceph-jenkins retest this please |
40068ec
to
820d451
Compare
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
f57b8bf
to
470c1d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ceph: | ||
conf: | ||
client: | ||
rbd qos iops limit = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not valid JSON, should be rbd qos iops limit: 50
(http://qa-proxy.ceph.com/teuthology/jdillaman-2017-11-13_12:10:01-rbd-wip-jd-testing2-distro-basic-smithi/1844450/teuthology.log)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, what a stupid mistake.
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
470c1d8
to
bf4e454
Compare
@yangdongsheng the fsx test hangs due to an issue where |
rbd: implement image qos in tokenbucket algorithm Reviewed-by: Jason Dillaman <dillaman@redhat.com>
@dillaman Okey, thanx |
I merged these patches into 10.2.5. I find a bug which is image.close unable to return, m_in_progress_writes had been added twice times., I made the following modifications. Is there any problem with it? Thank you very much.
|
@junxfl This is a new feature for the mimic release, which means it's not planned to be backported to luminous (let alone the jewel release). Merge conflicts are to be expected and are something that you will have to handle if you want to run your own custom jewel-release of Ceph (due to resource limitations on our side). |
I test these patch by LIO + tcmu-runner. I set conf_rbd_qos_iops_limit 100, conf_rbd_qos_read_iops_limit 30, I found that read iops can be limited to 30,but write iops only 24,not 70. May it be caused by the fact that all the iO requests are in the same queue? Do you have any suggestion on it? Thank you in advance! The fio configuration file is as follows: [test] [test1] |
We only support iops limit in this pr, both for read and write. The
config of
conf_rbd_qos_read_iops_limit 30
is not supported in this Pull request, did you implement it by yourself?
…On 04/24/2018 11:18 AM, junxfl wrote:
I test these patch by LIO + tcmu-runner. I set conf_rbd_qos_iops_limit 100, conf_rbd_qos_read_iops_limit 30, I found that read iops can be limited to 30,but write iops only 24,not 70. May it be caused by the fact that all the iO requests are in the same queue? Do you have any suggestion on it? Thank you in advance! The fio configuration file is as follows:
***@***.*** conf]# cat config.iops.two
[global]
ioengine=libaio
bs=4k
runtime=30
iodepth=32
direct=1
thread
filename=/dev/sdb
[test]
rw=randread
[test1]
rw=randwrite
|
But there is a CLOSED pr #18762 to implement it. I am sorry I did not finish to merge this into upstream. I will try my best to comeback to push this feature into upstream. But you can pick them and give it a try. Thanx |
Sorry, I merge other patchs from your github address(https://github.com/yangdongsheng/ceph/commits/rbd_qos_read_write_bps),then I test it. |
Thank you very much for your reply. I merge all patchs from your pr #18762,And then I found that read,write can't limit at the same time. |
@junxfl I will rebase My read_write qos branch against the latest ceph. And do a test you described. |
@junxfl I rebased my patch and did a test as follow.
on another terminal:
It works well. |
@yangdongsheng but if two threads(read、write)operate the same imageCtx,the issue will come up,like i said. |
Okey,then I will try the test case, but anyway, this should be a topic in my another PR, thanx and I will investigate it. Write should not be blocked by read throttle. |
@junxfl I did a fio testing as below, that's a little different with what you met:
fio file
|
@yangdongsheng Yes, it appears normal if test in this way,but it shows abnormal result tested using LIO +tcmu,let me double check it,many thanks. |
Same with #16865. But recreated on correct branch of my repo.
@dillaman please help to review. thanx