-
Notifications
You must be signed in to change notification settings - Fork 6.1k
mds: add drop cache
command
#21566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: add drop cache
command
#21566
Conversation
retest this please. |
retest this please |
@rishabh-d-dave this change does not compile |
2e5a4dc
to
9fb16cf
Compare
@tchaikov Made a mistake while cleaning up the debug code. Fixed it. Thanks! |
Please annotate the commit which fixes/resolves a Ceph tracker issue with:
This is essential when examining the history of the repository (this commit fixes what) and helps merge scripts identify issues that have been resolved by a merge. |
8b3e920
to
ebba6d1
Compare
src/mds/MDSRank.cc
Outdated
// Optionally, flush the journal - | ||
Formatter *fmtr = new JSONFormatter(true); | ||
command_flush_journal(fmtr); | ||
// TODO: do we want output from `flush journal`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running ceph tell mds.a drop cache
displays output from MDSRank::command_flush_journal()
as well as from MDSRank::cache_status()
.
{
"message": "",
"return_code": 0
}
{
"pool": {
"items": 482,
"bytes": 52398
}
}
Do we want both or only the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both, combine into one object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be:
{
"message": "",
"return_code": 0
"cache_status": {
"pool": {
"items": 482,
"bytes": 52398
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. From a general user's POV, I think it will be difficult to understand the context of the first two lines. Perhaps it'll be better to add a section with title "flush_journal" like the following snippet?
{
"flush_journal": {
"message": "",
"return_code": 0
},
"cache": {
"pool": {
"items": 482,
"bytes": 52398
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the PR with "flush_journal" part included. Let me know if you think it's not good idea.
src/mds/MDSRank.cc
Outdated
@@ -2240,7 +2240,7 @@ int MDSRank::_command_flush_journal(std::stringstream *ss) | |||
{ | |||
assert(ss != NULL); | |||
|
|||
Mutex::Locker l(mds_lock); | |||
mds_lock.Lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@batrick I don't know if this change is really appropriate but without it the MDS crashes after the executing the command at this line [1] since it expects MDS to be locked.
[1] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving Mutex::Locker l(mds_lock)
to surround the call to _command_flush_journal
in command_flush_journal
:
Line 2225 in 189192b
const int r = _command_flush_journal(&ss); |
to be
{
Mutex::Locker l(mds_lock);
const int r = _command_flush_journal(&ss);
}
Then comment MDSrank::_command_flush_journal
that it expects mds_lock
to be locked.
src/common/options.cc
Outdated
.set_default(30) | ||
.set_description("how long should the MDS wait for receiving " | ||
"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions " | ||
"while dropping cache."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep this vague. Don't mention dropping cache. Also, rename to mds_session_recall_ack_timeout
.
src/mds/MDSRank.cc
Outdated
} else { | ||
return false; | ||
} | ||
} | ||
|
||
void MDSRank::command_drop_cache(stringstream *ds) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add Mutex::Locker l(mds_lock);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@batrick Using Mutex::Locker l(mds_lock);
creates the same issue as before [1]. At least for time being, I am replacing it by mds_lock.Lock()
so that MDS would stay locked until control reaches here [2] after exiting command_drop_cache()
; this prevents MDS from crashing.
[1] #21566 (comment)
[2] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right that lock is not necessary because it's locked in MDSDaemon::ms_dispatch
.
src/mds/MDSRank.cc
Outdated
|
||
// Optionally, flush the journal - | ||
Formatter *fmtr = new JSONFormatter(true); | ||
command_flush_journal(fmtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should call _command_flush_journal
.
src/mds/MDSRank.cc
Outdated
"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions." << dendl; | ||
|
||
// Optionally, flush the journal - | ||
Formatter *fmtr = new JSONFormatter(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use:
std::unique_ptr<Formatter> fmtr(new JSONFormatter(true));
then
fmtr.reset(new JSONFormatter(true));
when you want to create a new one.
if (r != 0) | ||
*ds << "Failed to get cache status: " << cpp_strerror(r); | ||
fmtr->flush(*ds); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I noted elsewhere, what you should actually do is concatenate all of the commands outputs into one JSON object. So you should be using the same JSONFormatter
for all of these commands. Just create a new object section before calling each command.
src/mds/MDSRank.cc
Outdated
|
||
time_passed = mono_clock::now() - time_at_beg; | ||
if (TIMEOUT - time_passed.count() > 2) | ||
sleep(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be using waiter/gather continuations. @ukernel can you suggest the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see code in Server::flush_client_sessions() and see how does it get used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally wrote this code to make MDS consume less CPU. Should it be gotten rid of? I am confused since discussion of waiter/gather continuations began from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how 'sleep' consume less CPU. please get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I was under the impression that sleep
would lead to lesser number of iterations (when ACKs are not received spontaneously) allowing other threads/processes to use that CPU time instead.
src/mds/Server.cc
Outdated
@@ -465,6 +465,11 @@ void Server::handle_client_session(MClientSession *m) | |||
mdlog->flush(); | |||
break; | |||
|
|||
case CEPH_SESSION_RECALL_STATEMSG_ACK: | |||
if (this->report_acks) | |||
session->recall_acked = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right way to do this is to record a sequence number for the recall messages the MDS sends so that the client can ACK the seq number. Then we know which recall messages it has handled. Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".
Please modify those messages (and increment the message version) to include the sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I am wrong: in my understanding, I need to assign seq. numbers to CEPH_SESSION_RECALL_STATE
messages being sent by MDS in Server::client_recall_state
, store them for later and then while receiving ACKs (until mds_session_recall_ack_timeout
) for these messages I need to check them and report the ones (in the logs) I haven't received an ACK for.
Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".
Yeah, I couldn't comprehend the "seq" part back then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the PR as per my understanding above.
ebba6d1
to
821c4ad
Compare
src/mds/Server.h
Outdated
@@ -91,6 +93,7 @@ class Server { | |||
|
|||
public: | |||
bool terminating_sessions; | |||
set<int> crs_msg_seq_nums; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be a pair of sequence numbers (version_t
) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE
message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK
message.
But now that I'm reading the code more in SessionMap.h, I'm wondering if we can reuse the existing CEPH_SESSION_FLUSHMSG
code for this purpose. There is already a sequence number in the session to use (see the code for Session::cap_push_seq
. @ukernel would this be appropriate or should we use a separate sequence number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using Session::cap_push_seq is appropriate.
If client always sends ACK immediately after receiving drop cache request, existing Session::{wait_for_flush,finish_flush} code can serve this purpose completely. If not, need introduce a new waiter list ( similar to Session::waitfor_flush)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be a pair of sequence numbers (version_t) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK message.
If sequence number of the message sent by MDS is different from that of the client, I wouldn't know which sessions have not been trimmed and it will not be possible to report those sessions in the log. Though I can still see how many sessions have not trimmed. Will that be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check code that handle CEPH_SESSION_FLUSHMSG/CEPH_SESSION_FLUSHMSG_ACK. they use the same seq number. please reuse cap_push_seq instead of introducing a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ukernel Done. Thanks. :)
src/client/Client.cc
Outdated
@@ -2078,6 +2078,9 @@ void Client::handle_client_session(MClientSession *m) | |||
|
|||
case CEPH_SESSION_RECALL_STATE: | |||
trim_caps(session, m->get_max_caps()); | |||
session->con->send_message(new MClientSession( | |||
CEPH_SESSION_RECALL_STATEMSG_ACK, | |||
m->get_seq())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does client always send ACK immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ukernel Client will always send ACKs but I don't if its fast enough to be immediate as it calls trim_caps() first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recalling from one of the reply Patrick's on the mail, clients that have caps pinned locally will not return the cap. In that case ACK may not be returned.
336f2d2
to
90d24e7
Compare
Should the command |
90d24e7
to
3aeed0d
Compare
|
src/mds/MDSRank.cc
Outdated
} | ||
|
||
if (not mdcache->trim(UINT64_MAX)) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think trimming the cache should occur after we get all the client recall acks. We want to trim the MDS cache when the client caps are released.
src/mds/MDSRank.cc
Outdated
fmtr.get()->open_object_section("result"); | ||
|
||
int retval = _command_flush_journal(&ss); | ||
fmtr.get()->open_object_section("flush_journal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be fmtr->open_object_section...
src/mds/MDSRank.cc
Outdated
|
||
sessionmap.get_client_session_set(unacked_sessions); | ||
|
||
while (time_passed.count() < TIMEOUT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MDS should do a wait, not an infinite loop. You need to add a waiter interface for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MDS should do a wait, not an infinite loop.
It's not infinite. The looping stops on timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my wording was wrong. It's not infinite but it is a busy-wait. i.e. the CPU loops until the condition is true... for TIMEOUT
seconds. This isn't acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
src/include/ceph_fs.h
Outdated
@@ -301,6 +301,7 @@ enum { | |||
CEPH_SESSION_RENEWCAPS, | |||
CEPH_SESSION_STALE, | |||
CEPH_SESSION_RECALL_STATE, | |||
CEPH_SESSION_RECALL_STATEMSG_ACK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go at the end of the enum otherwise the binary compatibility of the state messages is compromised
src/mds/Server.cc
Outdated
case CEPH_SESSION_RECALL_STATEMSG_ACK: | ||
finish_flush_session(session, m->get_seq()); | ||
if (this->report_acks) | ||
session->recall_acked = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recall_acked = m->get_seq()
src/mds/Server.cc
Outdated
{ | ||
MClientSession *m; | ||
static MDSGatherBuilder gather(g_ceph_context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw MDS crash without it -
(gdb)
1155 set<Session*> sessions;
(gdb)
1133 MDSGatherBuilder gather(g_ceph_context);
(gdb)
Program received signal SIGABRT, Aborted.
0x00007f7fd283b277 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)
handle_fatal_signal (signum=6)
at /home/centos/repos/ceph/src/global/signal_handler.cc:92
92 {
(gdb) c
Continuing.
I do not have a higher understanding of how MDS runs as a daemon but I saw the line being executed multiple times. So, the making it static
seemed like a natural solution.
src/mds/Server.cc
Outdated
{ | ||
MClientSession *m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move this var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it during re-writings. Undid this change.
src/mds/MDSRank.cc
Outdated
if (not mdcache->trim(UINT64_MAX)) | ||
return; | ||
|
||
server->recall_client_state(1, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass a pointer to
MDSGatherBuilder gather(g_ceph_context);
to server->recall_client_state
which adds to the gather:
https://github.com/ceph/ceph/pull/21566/files#diff-eecf840e09d7481f7b8f415789852c7dR1177
then you can do a gather.activate()
in _wait_for_client_recall_state_acks
instead of the cpu-intensive loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@batrick I can't figure out how to replace the busy/wait loop by gather.activate()
. Also, calling gather.activate()
in _wait_for_client_recall_state_acks
leads to a crash due to the assert in gather.activate()
[1]
And would these changes mean no need timeout? If so, the option mds_session_recall_ack_timeout
will be redundant.
[1] https://github.com/ceph/ceph/blob/master/src/include/Context.h#L476
3aeed0d
to
922f5d4
Compare
This needs rebased to handle new message handling introduced by #22555. |
aeb14bf
to
df8b700
Compare
Needs rebased again |
df8b700
to
8b10e79
Compare
@batrick rebased and updated The hang I mentioned about earlier was not really a deadlock -- the "drop cache" command used to wait for the timeout to expire even if there were no sessions. That's fixed in this update. Also, I removed the custom gather class that didn't do much and used existing @batrick I could not reproduce the deadlock issue you had mentioned earlier. Also, w.r.t. the asynchronous execution of "ceph tell" -- I think the command works fine -- although I haven't looked at the tell code closely. I'll try to take a look sometime tomorrow, but for now, the rest of the PR is reviewable. |
src/mds/Server.cc
Outdated
dout(10) << "recall_client_state " << ratio | ||
<< ", caps per client " << min_caps_per_client << "-" << max_caps_per_client | ||
<< dendl; | ||
dout(10) << __func__ << ": ration=" << ratio << ", caps per client " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "ratio"
src/mds/MDSRank.cc
Outdated
} | ||
|
||
JSONFormatter f(true); | ||
if (!command_cache_drop(f, timeout)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this will work (ceph tell
). The dispatcher is processing this MCommand
message here. I believe it will remain blocked on command_cache_drop
until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)
I think it will be necessary to convert this drop cache command into an asynchronous Context which releases the dispatch thread back to the messenger. Consider that this is a problem we share with asynchronous scrub which currently executes in an unaccountable way in the MDS. It'd be valuable to have a generic "command" module of some kind which allows the operator to manipulate/stop these operations once started.
This doesn't need to be written now but it's worth thinking about. I think we can solve drop cache
by just using contexts to drive the drop to completion and eventually send an MCommandReply once finished. The dispatch thread should be released ASAP. If it simplifies the code, go ahead and drop the asok version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batrick wrote
I don't see how this will work (ceph tell). The dispatcher is processing this MCommand message here. I believe it will remain blocked on command_cache_drop until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)
It indeed does not -
$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep 7 10:22:51 UTC 2018
{
"client_recall": {
"return_code": 0,
"message": "(0) Success"
},
"flush_journal": {
"return_code": 0,
"message": ""
},
"cache": {
"pool": {
"items": 583,
"bytes": 57080
}
}
}
Fri Sep 7 10:22:51 UTC 2018
... unlike using the asok command -
$ date && ./bin/ceph daemon mds.a cache drop 20 && date
Fri Sep 7 10:19:38 UTC 2018
{
"client_recall": {
"return_code": 0,
"message": "(0) Success"
},
"flush_journal": {
"return_code": 0,
"message": ""
},
"cache": {
"pool": {
"items": 729,
"bytes": 67320
}
}
}
Fri Sep 7 10:19:38 UTC 2018
The tell command waits until timeout only when session->caps.size() > newlim
[1] -
$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep 7 10:23:16 UTC 2018
{
"client_recall": {
"return_code": 0,
"message": "(0) Success"
},
"flush_journal": {
"return_code": 0,
"message": ""
},
"cache": {
"pool": {
"items": 583,
"bytes": 57080
}
}
}
Fri Sep 7 10:23:17 UTC 2018
For that mds min caps per client
needs to be set (in ceph.conf) to some lower value (depending on FS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @batrick -- I understand the issue with ceph tell
now. With @rishabh-d-dave testing above, I expected client_recall
status to return ETIMEDOUT
.
Its probably worth to drive command_cache_drop()
via context. I'll look into this and update the PR.
8b10e79
to
73080b3
Compare
@batrick Updated PR implements "drop cache" as an asynchronous state machine. This required journal flush to be converted to an asynchronous state machine. The major difference with asynchronous execution is that the Also, the async implementation might not look elegant as of now -- but I wanted to get this out for reviews and also since we probably want to convert other (tell?) commands to execute asynchronously (for problems similar to what we faced when recalling client state). I have the async state machines as separate commits -- so if we want more discussions/eyes on that, I can resend this with just the asok interface for now. |
@vshankar I still need to look at your update but I want to add onto this PR a note that we should also think about an option to this command to ask clients to drop their object buffers/caches (for all types of clients, kernel and ceph-fuse). This could be a followup ticket or, if you're interested, we can add it to this PR. |
I would vote for having that as a follow-up enhancement. |
needs rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks like great work! Thanks @vshankar
ss(ss), on_finish(on_finish) { | ||
} | ||
|
||
void send() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start
would be better I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what I got used to when writing async state machines in rbd ;)
src/mds/MDSRank.cc
Outdated
#define dout_context g_ceph_context | ||
#define dout_subsys ceph_subsys_mds | ||
#undef dout_prefix | ||
#define dout_prefix *_dout << "mds." << whoami << '.' << incarnation << ' ' | ||
|
||
class C_Flush_Journal : public Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's inherit from MDSInternalContext
so it hooks into some general purpose logging too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
src/mds/MDSRank.cc
Outdated
Context *on_finish; | ||
}; | ||
|
||
class C_Drop_Cache : public Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDSInternalContext
here too
src/mds/MDSRank.cc
Outdated
// case we change the logic here. | ||
assert(mdsrank->mds_lock.is_locked()); | ||
|
||
if (recall_timeout < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use a uint64_t
then?
src/mds/MDSRank.cc
Outdated
gather.activate(); | ||
|
||
mdsrank->mds_lock.Unlock(); | ||
retval = sleeper->wait_for(recall_timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to block the thread at all here. Let's use a SafeTimer
(? maybe the right choice) which runs its finisher after recall_timeout
. I think this will require two detachable contexts (like MDSContextSleeper
). One waits for all clients to trim caps and the other waits for the Timer to complete. We won't be using a condition variable here because we don't want to wait at all. (MDSContextSleeper
may get axed since I don't think we will use it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, specifically it will block the finisher thread which is a problem. (The dispatcher thread is fortunately free now, which is good!)
src/mds/MDSRank.cc
Outdated
@@ -36,11 +36,244 @@ | |||
|
|||
#include "MDSRank.h" | |||
|
|||
#include <boost/scope_exit.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use src/include/scope_guard.h
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be removed with the latest update...
Signed-off-by: Rishabh Dave <ridave@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
With this command, the MDS would request clients to release caps followed by trimming its own cache and a journal flush. The command accepts a timeout to wait for clients to respond to session recall and flush messages. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Rishabh Dave <ridave@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com>
Fixes: http://tracker.ceph.com/issues/23362 Signed-off-by: Venky Shankar <vshankar@redhat.com>
73080b3
to
43d1b8e
Compare
retest this please |
@batrick fixed and updated with the following changes
|
* refs/pull/21566/head: test: add test for mds drop cache command mds: command to trim mds cache and client caps mds: implement journal flush as asynchronous context execution mds: cleanup some asok commands Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Adds the command
drop cache
for MDS and an extra message so that client acknowledge to MDS that cache has been released. See - http://tracker.ceph.com/issues/23362