SUNRPC: Address Kerberos performance/behavior regression
authorChuck Lever <chuck.lever@oracle.com>
Wed, 9 Jan 2019 15:04:57 +0000 (10:04 -0500)
committerAnna Schumaker <Anna.Schumaker@Netapp.com>
Tue, 15 Jan 2019 20:36:41 +0000 (15:36 -0500)
When using Kerberos with v4.20, I've observed frequent connection
loss on heavy workloads. I traced it down to the client underrunning
the GSS sequence number window -- NFS servers are required to drop
the RPC with the low sequence number, and also drop the connection
to signal that an RPC was dropped.

Bisected to commit 918f3c1fe83c ("SUNRPC: Improve latency for
interactive tasks").

I've got a one-line workaround for this issue, which is easy to
backport to v4.20 while a more permanent solution is being derived.
Essentially, tk_owner-based sorting is disabled for RPCs that carry
a GSS sequence number.

Fixes: 918f3c1fe83c ("SUNRPC: Improve latency for interactive ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
net/sunrpc/xprt.c

index 9075ae150ae560777969d393e31a691fe663edda..f1ec2110efebe6315523036bd11e30dbdf7c8de4 100644 (file)
@@ -1178,7 +1178,7 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
                                INIT_LIST_HEAD(&req->rq_xmit2);
                                goto out;
                        }
-               } else {
+               } else if (!req->rq_seqno) {
                        list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
                                if (pos->rq_task->tk_owner != task->tk_owner)
                                        continue;