Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: add drop cache command #21566

Merged
merged 4 commits into from Oct 1, 2018
Merged

Conversation

rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented Apr 20, 2018

Adds the command drop cache for MDS and an extra message so that client acknowledge to MDS that cache has been released. See - http://tracker.ceph.com/issues/23362

@rishabh-d-dave rishabh-d-dave changed the title Adds the drop_cache command for MDS. See - http://tracker.ceph.com/issues/23362 mds: add drop_cache command Apr 20, 2018
@tchaikov
Copy link
Contributor

retest this please.

@tchaikov
Copy link
Contributor

retest this please

@tchaikov
Copy link
Contributor

@rishabh-d-dave this change does not compile

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave this change does not compile

@tchaikov Made a mistake while cleaning up the debug code. Fixed it. Thanks!

@tchaikov tchaikov added the cephfs Ceph File System label Apr 23, 2018
@batrick
Copy link
Member

batrick commented Apr 23, 2018

Please annotate the commit which fixes/resolves a Ceph tracker issue with:

Fixes: http://tracker.ceph.com/issues/...

This is essential when examining the history of the repository (this commit fixes what) and helps merge scripts identify issues that have been resolved by a merge.

@rishabh-d-dave rishabh-d-dave force-pushed the mds-drop-cache-command branch 2 times, most recently from 8b3e920 to ebba6d1 Compare April 25, 2018 19:27
// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);
command_flush_journal(fmtr);
// TODO: do we want output from `flush journal`?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running ceph tell mds.a drop cache displays output from MDSRank::command_flush_journal() as well as from MDSRank::cache_status().

{
    "message": "",
    "return_code": 0
}
{
    "pool": {
        "items": 482,
        "bytes": 52398
    }
}

Do we want both or only the latter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both, combine into one object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be:

{
    "message": "",
    "return_code": 0
    "cache_status": {
      "pool": {
          "items": 482,
          "bytes": 52398
      }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. From a general user's POV, I think it will be difficult to understand the context of the first two lines. Perhaps it'll be better to add a section with title "flush_journal" like the following snippet?

{
    "flush_journal": {
        "message": "",
        "return_code": 0
    },
    "cache": {
        "pool": {
            "items": 482,
            "bytes": 52398
        }
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the PR with "flush_journal" part included. Let me know if you think it's not good idea.

@@ -2240,7 +2240,7 @@ int MDSRank::_command_flush_journal(std::stringstream *ss)
{
assert(ss != NULL);

Mutex::Locker l(mds_lock);
mds_lock.Lock();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick I don't know if this change is really appropriate but without it the MDS crashes after the executing the command at this line [1] since it expects MDS to be locked.

[1] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving Mutex::Locker l(mds_lock) to surround the call to _command_flush_journal in command_flush_journal:

const int r = _command_flush_journal(&ss);

to be

{
  Mutex::Locker l(mds_lock);
  const int r = _command_flush_journal(&ss);
}

Then comment MDSrank::_command_flush_journal that it expects mds_lock to be locked.

.set_default(30)
.set_description("how long should the MDS wait for receiving "
"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions "
"while dropping cache."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this vague. Don't mention dropping cache. Also, rename to mds_session_recall_ack_timeout.

} else {
return false;
}
}

void MDSRank::command_drop_cache(stringstream *ds)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Mutex::Locker l(mds_lock);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick Using Mutex::Locker l(mds_lock); creates the same issue as before [1]. At least for time being, I am replacing it by mds_lock.Lock() so that MDS would stay locked until control reaches here [2] after exiting command_drop_cache(); this prevents MDS from crashing.

[1] #21566 (comment)
[2] https://github.com/ceph/ceph/blob/master/src/mds/MDSDaemon.cc#L1143

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right that lock is not necessary because it's locked in MDSDaemon::ms_dispatch.


// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);
command_flush_journal(fmtr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should call _command_flush_journal.

"CEPH_SESSION_RECALL_STATEMSG_ACK from all sessions." << dendl;

// Optionally, flush the journal -
Formatter *fmtr = new JSONFormatter(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use:

std::unique_ptr<Formatter> fmtr(new JSONFormatter(true));

then

fmtr.reset(new JSONFormatter(true));

when you want to create a new one.

if (r != 0)
*ds << "Failed to get cache status: " << cpp_strerror(r);
fmtr->flush(*ds);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted elsewhere, what you should actually do is concatenate all of the commands outputs into one JSON object. So you should be using the same JSONFormatter for all of these commands. Just create a new object section before calling each command.


time_passed = mono_clock::now() - time_at_beg;
if (TIMEOUT - time_passed.count() > 2)
sleep(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be using waiter/gather continuations. @ukernel can you suggest the right approach.

Copy link
Contributor

@ukernel ukernel May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see code in Server::flush_client_sessions() and see how does it get used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wrote this code to make MDS consume less CPU. Should it be gotten rid of? I am confused since discussion of waiter/gather continuations began from here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how 'sleep' consume less CPU. please get rid of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I was under the impression that sleep would lead to lesser number of iterations (when ACKs are not received spontaneously) allowing other threads/processes to use that CPU time instead.

@@ -465,6 +465,11 @@ void Server::handle_client_session(MClientSession *m)
mdlog->flush();
break;

case CEPH_SESSION_RECALL_STATEMSG_ACK:
if (this->report_acks)
session->recall_acked = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way to do this is to record a sequence number for the recall messages the MDS sends so that the client can ACK the seq number. Then we know which recall messages it has handled. Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".

Please modify those messages (and increment the message version) to include the sequence number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong: in my understanding, I need to assign seq. numbers to CEPH_SESSION_RECALL_STATE messages being sent by MDS in Server::client_recall_state, store them for later and then while receiving ACKs (until mds_session_recall_ack_timeout) for these messages I need to check them and report the ones (in the logs) I haven't received an ACK for.

Recall what Zheng said in the email thread: "we can add a recall_state seq to MClientCapRelease and MClientSession message.".

Yeah, I couldn't comprehend the "seq" part back then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the PR as per my understanding above.

src/mds/Server.h Outdated
@@ -91,6 +93,7 @@ class Server {

public:
bool terminating_sessions;
set<int> crs_msg_seq_nums;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be a pair of sequence numbers (version_t) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK message.

But now that I'm reading the code more in SessionMap.h, I'm wondering if we can reuse the existing CEPH_SESSION_FLUSHMSG code for this purpose. There is already a sequence number in the session to use (see the code for Session::cap_push_seq. @ukernel would this be appropriate or should we use a separate sequence number?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Session::cap_push_seq is appropriate.

If client always sends ACK immediately after receiving drop cache request, existing Session::{wait_for_flush,finish_flush} code can serve this purpose completely. If not, need introduce a new waiter list ( similar to Session::waitfor_flush)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick

This should actually be a pair of sequence numbers (version_t) that are associated with each session (just like seq numbers in TCP). The first would be what the MDS sends and sequentially increases with each new CEPH_SESSION_RECALL_STATE message. The second would be what the client sends back as finished, via the CEPH_SESSION_RECALL_STATEMSG_ACK message.

If sequence number of the message sent by MDS is different from that of the client, I wouldn't know which sessions have not been trimmed and it will not be possible to report those sessions in the log. Though I can still see how many sessions have not trimmed. Will that be okay?

Copy link
Contributor

@ukernel ukernel May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check code that handle CEPH_SESSION_FLUSHMSG/CEPH_SESSION_FLUSHMSG_ACK. they use the same seq number. please reuse cap_push_seq instead of introducing a new one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukernel Done. Thanks. :)

@@ -2078,6 +2078,9 @@ void Client::handle_client_session(MClientSession *m)

case CEPH_SESSION_RECALL_STATE:
trim_caps(session, m->get_max_caps());
session->con->send_message(new MClientSession(
CEPH_SESSION_RECALL_STATEMSG_ACK,
m->get_seq()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does client always send ACK immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukernel Client will always send ACKs but I don't if its fast enough to be immediate as it calls trim_caps() first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recalling from one of the reply Patrick's on the mail, clients that have caps pinned locally will not return the cap. In that case ACK may not be returned.

@rishabh-d-dave rishabh-d-dave force-pushed the mds-drop-cache-command branch 2 times, most recently from 336f2d2 to 90d24e7 Compare June 4, 2018 18:44
@rishabh-d-dave
Copy link
Contributor Author

rishabh-d-dave commented Jun 4, 2018

Should the command drop cache be changed to cache drop? There are two more cache-related commands - cache status and dump cache - but they do not make the preferred order clear. I think cache preceding the operation is better.

@batrick
Copy link
Member

batrick commented Jun 11, 2018

cache drop sounds good to me.

}

if (not mdcache->trim(UINT64_MAX))
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trimming the cache should occur after we get all the client recall acks. We want to trim the MDS cache when the client caps are released.

fmtr.get()->open_object_section("result");

int retval = _command_flush_journal(&ss);
fmtr.get()->open_object_section("flush_journal");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be fmtr->open_object_section...


sessionmap.get_client_session_set(unacked_sessions);

while (time_passed.count() < TIMEOUT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MDS should do a wait, not an infinite loop. You need to add a waiter interface for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MDS should do a wait, not an infinite loop.

It's not infinite. The looping stops on timeout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my wording was wrong. It's not infinite but it is a busy-wait. i.e. the CPU loops until the condition is true... for TIMEOUT seconds. This isn't acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@@ -301,6 +301,7 @@ enum {
CEPH_SESSION_RENEWCAPS,
CEPH_SESSION_STALE,
CEPH_SESSION_RECALL_STATE,
CEPH_SESSION_RECALL_STATEMSG_ACK,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to go at the end of the enum otherwise the binary compatibility of the state messages is compromised

case CEPH_SESSION_RECALL_STATEMSG_ACK:
finish_flush_session(session, m->get_seq());
if (this->report_acks)
session->recall_acked = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recall_acked = m->get_seq()

{
MClientSession *m;
static MDSGatherBuilder gather(g_ceph_context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw MDS crash without it -

(gdb)
1155      set<Session*> sessions;
(gdb)
1133      MDSGatherBuilder gather(g_ceph_context);
(gdb)

Program received signal SIGABRT, Aborted.
0x00007f7fd283b277 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)
handle_fatal_signal (signum=6)
    at /home/centos/repos/ceph/src/global/signal_handler.cc:92
92      {
(gdb) c
Continuing.

I do not have a higher understanding of how MDS runs as a daemon but I saw the line being executed multiple times. So, the making it static seemed like a natural solution.

{
MClientSession *m;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move this var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it during re-writings. Undid this change.

if (not mdcache->trim(UINT64_MAX))
return;

server->recall_client_state(1, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass a pointer to

MDSGatherBuilder gather(g_ceph_context);

to server->recall_client_state which adds to the gather:

https://github.com/ceph/ceph/pull/21566/files#diff-eecf840e09d7481f7b8f415789852c7dR1177

then you can do a gather.activate() in _wait_for_client_recall_state_acks instead of the cpu-intensive loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick I can't figure out how to replace the busy/wait loop by gather.activate(). Also, calling gather.activate() in _wait_for_client_recall_state_acks leads to a crash due to the assert in gather.activate() [1]

And would these changes mean no need timeout? If so, the option mds_session_recall_ack_timeout will be redundant.

[1] https://github.com/ceph/ceph/blob/master/src/include/Context.h#L476

@batrick
Copy link
Member

batrick commented Aug 16, 2018

This needs rebased to handle new message handling introduced by #22555.

@vshankar vshankar force-pushed the mds-drop-cache-command branch 3 times, most recently from aeb14bf to df8b700 Compare August 27, 2018 09:31
@batrick
Copy link
Member

batrick commented Aug 28, 2018

Needs rebased again

@vshankar
Copy link
Contributor

@batrick rebased and updated

The hang I mentioned about earlier was not really a deadlock -- the "drop cache" command used to wait for the timeout to expire even if there were no sessions. That's fixed in this update. Also, I removed the custom gather class that didn't do much and used existing MDSGatherBuilder class. Plus, added a couple of command related tests.

@batrick I could not reproduce the deadlock issue you had mentioned earlier. Also, w.r.t. the asynchronous execution of "ceph tell" -- I think the command works fine -- although I haven't looked at the tell code closely. I'll try to take a look sometime tomorrow, but for now, the rest of the PR is reviewable.

dout(10) << "recall_client_state " << ratio
<< ", caps per client " << min_caps_per_client << "-" << max_caps_per_client
<< dendl;
dout(10) << __func__ << ": ration=" << ratio << ", caps per client "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "ratio"

}

JSONFormatter f(true);
if (!command_cache_drop(f, timeout)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this will work (ceph tell). The dispatcher is processing this MCommand message here. I believe it will remain blocked on command_cache_drop until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)

I think it will be necessary to convert this drop cache command into an asynchronous Context which releases the dispatch thread back to the messenger. Consider that this is a problem we share with asynchronous scrub which currently executes in an unaccountable way in the MDS. It'd be valuable to have a generic "command" module of some kind which allows the operator to manipulate/stop these operations once started.

This doesn't need to be written now but it's worth thinking about. I think we can solve drop cache by just using contexts to drive the drop to completion and eventually send an MCommandReply once finished. The dispatch thread should be released ASAP. If it simplifies the code, go ahead and drop the asok version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batrick wrote

I don't see how this will work (ceph tell). The dispatcher is processing this MCommand message here. I believe it will remain blocked on command_cache_drop until the timeout is reached. During this time, the MDS should not be able to process the messages from the clients. (Unless the messenger dispatch thread count is bumped I guess, I've not tried this recently.)

It indeed does not -

$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep  7 10:22:51 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 583,
            "bytes": 57080
        }
    }
}
Fri Sep  7 10:22:51 UTC 2018

... unlike using the asok command -

$ date && ./bin/ceph daemon mds.a cache drop 20 && date
Fri Sep  7 10:19:38 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 729,
            "bytes": 67320
        }
    }
}
Fri Sep  7 10:19:38 UTC 2018

The tell command waits until timeout only when session->caps.size() > newlim[1] -

$ date && ./bin/ceph tell mds.a cache drop 20 && date
Fri Sep  7 10:23:16 UTC 2018
{
    "client_recall": {
        "return_code": 0,
        "message": "(0) Success"
    },
    "flush_journal": {
        "return_code": 0,
        "message": ""
    },
    "cache": {
        "pool": {
            "items": 583,
            "bytes": 57080
        }
    }
}
Fri Sep  7 10:23:17 UTC 2018

For that mds min caps per client needs to be set (in ceph.conf) to some lower value (depending on FS).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @batrick -- I understand the issue with ceph tell now. With @rishabh-d-dave testing above, I expected client_recall status to return ETIMEDOUT.

Its probably worth to drive command_cache_drop() via context. I'll look into this and update the PR.

@vshankar
Copy link
Contributor

@batrick Updated PR implements "drop cache" as an asynchronous state machine. This required journal flush to be converted to an asynchronous state machine. The major difference with asynchronous execution is that the mds_lock is not held for the entire duration of command execution -- its dropped when invoking an async call and gets reacquired in the context callback. Would that be fine or would we want it to be as it is currently done (in the synchronous version)?

Also, the async implementation might not look elegant as of now -- but I wanted to get this out for reviews and also since we probably want to convert other (tell?) commands to execute asynchronously (for problems similar to what we faced when recalling client state). I have the async state machines as separate commits -- so if we want more discussions/eyes on that, I can resend this with just the asok interface for now.

@batrick batrick self-assigned this Sep 17, 2018
@batrick
Copy link
Member

batrick commented Sep 18, 2018

@vshankar I still need to look at your update but I want to add onto this PR a note that we should also think about an option to this command to ask clients to drop their object buffers/caches (for all types of clients, kernel and ceph-fuse). This could be a followup ticket or, if you're interested, we can add it to this PR.

@vshankar
Copy link
Contributor

@vshankar I still need to look at your update but I want to add onto this PR a note that we should also think about an option to this command to ask clients to drop their object buffers/caches (for all types of clients, kernel and ceph-fuse). This could be a followup ticket or, if you're interested, we can add it to this PR.

I would vote for having that as a follow-up enhancement.

@batrick
Copy link
Member

batrick commented Sep 21, 2018

needs rebase

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks like great work! Thanks @vshankar

ss(ss), on_finish(on_finish) {
}

void send() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start would be better I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I got used to when writing async state machines in rbd ;)

#define dout_context g_ceph_context
#define dout_subsys ceph_subsys_mds
#undef dout_prefix
#define dout_prefix *_dout << "mds." << whoami << '.' << incarnation << ' '

class C_Flush_Journal : public Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's inherit from MDSInternalContext so it hooks into some general purpose logging too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Context *on_finish;
};

class C_Drop_Cache : public Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDSInternalContext here too

// case we change the logic here.
assert(mdsrank->mds_lock.is_locked());

if (recall_timeout < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a uint64_t then?

gather.activate();

mdsrank->mds_lock.Unlock();
retval = sleeper->wait_for(recall_timeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to block the thread at all here. Let's use a SafeTimer (? maybe the right choice) which runs its finisher after recall_timeout. I think this will require two detachable contexts (like MDSContextSleeper). One waits for all clients to trim caps and the other waits for the Timer to complete. We won't be using a condition variable here because we don't want to wait at all. (MDSContextSleeper may get axed since I don't think we will use it.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, specifically it will block the finisher thread which is a problem. (The dispatcher thread is fortunately free now, which is good!)

@@ -36,11 +36,244 @@

#include "MDSRank.h"

#include <boost/scope_exit.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use src/include/scope_guard.h instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be removed with the latest update...

Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
With this command, the MDS would request clients to release
caps followed by trimming its own cache and a journal flush.
The command accepts a timeout to wait for clients to respond
to session recall and flush messages.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Fixes: http://tracker.ceph.com/issues/23362
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar
Copy link
Contributor

retest this please

@vshankar
Copy link
Contributor

@batrick fixed and updated with the following changes

  • journal and cache drop async state machines now subclass from MDSInternalContext. Also, included are dout logs carried over from earlier implementation.
  • waiting for clients (w/ timeout) to trim its cache is now non-blocking -- a custom C_Drop_Cache::C_ContextTimeout context is now used to trigger the timeout. this context is also passed to MDSGatherBuilder which gets completed if client recall happens before the timeout. also note, that MDSGatherBuilder takes care of deleting its finisher (our context callback in this case), so, in the case of client recall timeout, its left to the gather class to eventually free this context (later?).
    this change gets rid of custom MDSContextSleeper class and the changes in class Cond for the supporting member functions.
  • added tests for dropping cache using tell interface.
  • added range=1 for asok and tell interfaces for timeout argument.

@batrick batrick changed the title mds: add drop_cache command mds: add drop cache command Sep 28, 2018
@batrick batrick merged commit 43d1b8e into ceph:master Oct 1, 2018
batrick added a commit that referenced this pull request Oct 1, 2018
* refs/pull/21566/head:
	test: add test for mds drop cache command
	mds: command to trim mds cache and client caps
	mds: implement journal flush as asynchronous context execution
	mds: cleanup some asok commands

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
@rishabh-d-dave rishabh-d-dave deleted the mds-drop-cache-command branch July 23, 2019 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants