Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in DruidStatement & DruidConnection #6868

Merged
merged 2 commits into from
Jan 17, 2019

Conversation

zhaojiandong
Copy link
Contributor

Fix issue #6867
We found deadlock in DruidStatement and DruidConnection, this will cause exhaust connections.
The statement thread will hold the lock, but when the slow query occurs in statement. The connection close signal from client or server, will cause deadlock with statement thread.

@@ -63,7 +63,7 @@ public DruidConnection(final String connectionId, final int maxStatements, final
this.connectionId = Preconditions.checkNotNull(connectionId);
this.maxStatements = maxStatements;
this.context = ImmutableMap.copyOf(context);
this.statements = new HashMap<>();
this.statements = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaojiandong For clarity's sake, could you please change the type of the field to ConcurrentMap, & a comment about why it's both @GuardedBy and Concurrent?

// ConcurrentMap even though it is usually accessed synchronized, because it's unsynchronized
// in one case: the onClose function passed into DruidStatements contained by the map. This is
// done to avoid deadlocks; see https://github.com/apache/incubator-druid/issues/6867.
@GuardedBy("statements")
private final ConcurrentMap<Integer, DruidStatement> statements;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll put some comments on statements and onClose, thank for review

Copy link
Contributor Author

@zhaojiandong zhaojiandong Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaojiandong For clarity's sake, could you please change the type of the field to ConcurrentMap, & a comment about why it's both @GuardedBy and Concurrent?

// ConcurrentMap even though it is usually accessed synchronized, because it's unsynchronized
// in one case: the onClose function passed into DruidStatements contained by the map. This is
// done to avoid deadlocks; see https://github.com/apache/incubator-druid/issues/6867.
@GuardedBy("statements")
private final ConcurrentMap<Integer, DruidStatement> statements;

How about use another sync flag: connectionLock, and the type of statements is ConcurrentHashMap ?
In this way GuardedBy check will be passed. @gianm

@gianm
Copy link
Contributor

gianm commented Jan 16, 2019

Hi @zhaojiandong, I think something went wrong with a merge in this patch. It has all of a sudden started involving a lot of unrelated code.

@gianm gianm self-assigned this Jan 16, 2019
@zhaojiandong
Copy link
Contributor Author

Hi @zhaojiandong, I think something went wrong with a merge in this patch. It has all of a sudden started involving a lot of unrelated code.

Sorry, I've merged some commit from master by mistake @gianm

@zhaojiandong zhaojiandong force-pushed the bugfixes/statement_deadlock branch from a623f05 to 4cbd40e Compare January 16, 2019 17:35
@zhaojiandong
Copy link
Contributor Author

Hi @gianm , I force to push a clean branch to this pr

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good... I had one last suggestion about a clarifying comment & type change. Let me know what you think.

@zhaojiandong zhaojiandong force-pushed the bugfixes/statement_deadlock branch from 4cbd40e to 899869d Compare January 17, 2019 06:45

@GuardedBy("statements")
@GuardedBy("connectionLock")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add @GuardedBy annotation to connectionLock itself here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @asdf2014 , the field connectionLock now needs to be accessed synchronized, So I think should add an annotation here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... Please help me figure out that our initial goal should be to hope that statements can be accessed synchronously. Why not add the @GuardedBy("connectionLock") annotation to the statements property, but add the connectionLock object itself. So I don't understand what this annotation means here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we use field connectionLock as a synchronized flag, connectionLock now protect all methods. we do not need connectionLock to access statements, since the type of statements is ConcurrentMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it makes sense because statements is now not always guarded by connectionLock. But, it is in some (well, many) cases. The comment explains it in a way that the annotation would struggle to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining this @gianm @zhaojiandong . I mean we shouldn't remove the @GuardedBy("connectionLock") from statements because the statements is not always synchronized by connectionLock and the CI fails, which is unreasonable. In addition, putting a @GuardedBy on connectionLock is also not necessary. If this is done, it means that we should have the connectionLock lock object when we use connectionLock, in fact it cannot achieve the purpose that we should own the connectionLock when using statements. So, i think there is a better way to solve this situation by adding a @SuppressWarnings("FieldAccessNotGuarded") annotation to the Runnable#run method, or adding a // noinspection FieldAccessNotGuarded comment to the statements.remove(statementId) caluse, which will suppress this check and make CI happy.

@GuardedBy("connectionLock")
private final ConcurrentMap<Integer, DruidStatement> statements;
private final Object connectionLock = new Object();

public DruidStatement createStatement(SqlLifecycleFactory sqlLifecycleFactory)
{
    // ...
    final DruidStatement statement = new DruidStatement(
        // ...
        () -> {
          // noinspection FieldAccessNotGuarded
          statements.remove(statementId);
        }
    );
    // ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised up a PR #6903, PTAL. @gianm @zhaojiandong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean now @asdf2014 -- your patch #6903 looks good, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm You are welcome, I am sorry for my poor English expression 😰

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @zhaojiandong

@gianm gianm added this to the 0.14.0 milestone Jan 17, 2019
@fjy fjy merged commit 9f0fdcf into apache:master Jan 17, 2019
gianm pushed a commit to implydata/druid-public that referenced this pull request Jan 30, 2019
* Fix deadlock in DruidStatement & DruidConnection

* change statements type to ConcurrentMap
clintropolis pushed a commit to implydata/druid-public that referenced this pull request Feb 5, 2019
* Fix deadlock in DruidStatement & DruidConnection

* change statements type to ConcurrentMap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants