14 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
16 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
18 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
26 <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
30 ==== Short History ====
33 * 2.0: 1999/01: domain-member, +SWAT
34 * 2.2: 2001/04: NT4-DC
35 * 3.0: 2003/09: AD-member, Samba4 project started
36 * 3.2: 2008/07: GPLv3, experimental clustering
37 * 3.3: 2009/01: clustering
38 * 3.4: 2009/07: merged S3+S4 code
39 * 3.5: 2010/03: experimental SMB 2.0
40 * 3.6: 2011/09: SMB 2.0
41 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
42 * 4.1: 2013/10: stability
43 * 4.2: soon: AD trusts, performance, scalability, CTDB included
45 ==== Release Stream ====
49 <<<samba-release-stream_exp.png,width=.8\textwidth>>>
55 <<<samba-team-20141011.png,height=.9\textheight>>>
61 <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
65 ==== Samba File Server Topics / Challenges ====
67 # performance: scalable file server
68 #* scale-up: exhaust powerful boxes
69 #* scale-out: flexible all-active clusters
70 #* scale-down: perform well on low-end boxes
71 # interop: multi-protocol access (nfs, afp, ...)
72 # server workloads / SMB features
73 #* tune for: small \# of connections, threaded applications
75 #* SMB3 (clustering, RDMA, ...)
76 # special file systems support (gluster, ceph, gpfs, btrfs, ...)
77 # cloud / openstack?...
78 %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
81 %% ==== Samba File Serving Topics ====
84 %% * Clustering (CTDB)
85 %% * SMB features (SMB3...)
86 %% * Interop (protocols, NFS, AFP, ...)
87 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
90 %%==== Other Samba Topics ====
92 %%* Auth/Domain Member
97 ==== Performance ====[plain]
101 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
104 ==== Performance - low end systems ====
107 <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
109 ** didn't saturate 1G nic (arm), CPU 100\%
110 * reduced memory allocations
111 * instrument SMB 2.1 multi-credit / large MTU
113 ** saturates 1G nic (arm), CPU $<$ 100\%
117 ==== Performance - DB performance ====
121 * used for IPC (smbd processes)
122 * cluster (CTDB): local copies
125 <[block]{hot databases}
126 * @locking.tdb@ (open files)
127 * @brlock.tdb@ (byte range locks)
128 * @notify\_index.tdb@ (for change notify)
131 ==== Performance - DB performance ====
134 * fcntl bty range locks for record locks
135 * contention via single kernel spinlock
139 * alternative to fcntl: pthread robust mutexes
140 * ==> massive speedup
141 * ==> included in TDB 1.3.1, Samba 4.2
144 ==== Performance - DB performance ====
148 ** single chain, contended (@locking.tdb@)
149 ** gets fragmented (singly linked)
150 * especially a problem in ctdb-cluster: vacuuming
153 <[block]{improvements}
154 * make use of small per-record freelists (dead records)
155 * add automatic defragmentation upon traversal
156 * ==> included in TDB 1.3.1, Samba 4.2
159 ==== Performance - DB performance ====
161 * change notify not scalable
164 <[block]{first improvement}
165 * restructured @notify.tdb@ to
166 ** global @notify\_index.tdb@ and
167 ** local @notify.tdb@
168 ** ==> better but still not good enough for some workloads
172 * replace DB-approach by new scalable, async notify daemon using messaging
173 * some false positives do not harm
178 ==== Performance - scaling ====
180 <[block]{parellelism}
181 * samba is multi-process:
182 ** smbd child process $\leftrightarrow$ TCP connection
183 ** event-loop in one process
184 * within a smbd process:
185 ** pthread-pool jobs for potentially blocking syscalls
186 ** ==> parallelism for reads/writes
187 ** default for async I/O since Samba 4.0
190 ==== Performance - scaling ====
193 * classical messaging:
194 ** messages.tdb and signals between processes
195 ** does not scale well
196 * new massaging in Samba 4.2:
197 ** fast and scalable messaging based on unix datagram messages
198 ** ==> WIP: integrate with AD/DC messaging
199 ** ==> features fd-passing for sockets (SMB3 multi-channel)
200 ** ==> TODO: integrate into CTDB inter-node-messaging
204 ==== Interop ====[plain]
209 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
213 ==== Interop-Central ====
215 <[block]{multi-protocol access}
216 * nfs (kernel, ganesha, ...)
219 * SMB2+ unix-extensions
223 ==== File Server Layout/Scope ====
226 <<<samba-layers.jpg,height=.8\textheight>>>
230 ==== Interop - Fruit ====
235 * MacOS 10.9: SMB 2.1 preferred file protocol
236 * @vfs\_fruit@ - new module in Samba 4.2
248 ** SMB2 create context
249 ** speed up directory listings
253 <<<apfel_1280.jpg,width=.9\textwidth>>>
263 ==== SMB features ====[plain]
272 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
277 ==== SMB features in Samba ====
285 ** durable file handles [4.0]
287 ** multi-credit / large mtu [4.0]
288 ** dynamic reauthentication [4.0]
290 ** resilient file handles [tracer]
292 ** new crypto (sign/encrypt) [4.0]
293 ** secure negotiation [4.0]
294 ** durable handles v2 [4.0]
295 ** persistent file handles [tracer]
296 ** multi-channel [WIP+]
297 ** SMB direct [designed/starting]
298 ** cluster features [designing]
300 ** storage features [WIP]
304 %<<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
314 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
317 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
327 %%% * new crypto (signing, transport encryption)
328 %%% * persistent file handles
330 %%% * RDMA transport (SMB direct)
331 %%% * storage features
334 %%% ** transparent failover (continuous availability)
335 %%% ** all-active (scale-out)
338 %%% ==== SMB3 - Goals ====
343 %%% * fault tolerance / reliability
344 %%% * performance / throughput / scaling
345 %%% * focus on support for server workloads \\ %
346 %%% (as opposed to workstation workloads)
347 %%% * especially support for:
351 %%% ** replace block storage in data center
352 %%% ** block (SCSI) over SMB
355 %%% ==== Requirements for Hyper-V ====
360 %%% * minimum requirements:
362 %%% ** is that really all??? - maybe resilient file handles..
365 %%% * desired features:
366 %%% ** cluster ($\ge 2$ nodes)
367 %%% ** CA / persistent handles
368 %%% ** RDMA / SMB direct
372 %%% ==== SMB Protocol in Samba ====
380 %%% ** experimental incomplete support for SMB 2.0
382 %%% ** official support for SMB 2.0
383 %%% ** missing: durable handles
384 %%% ** default server max proto: SMB 1
386 %%% ** SMB 2.0: complete with durable handles
387 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
388 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
389 %%% ** default server max proto: SMB 3.0
391 %%% ** SMB 3.02: basic
394 %%% ==== ==== [plain]
397 %%% Technical Details...
405 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
410 ==== Multi-Channel - Windows/Protocol ====
412 * find interfaces with interface discovery: \\ %
413 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
414 * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
415 * windows: uses connections of same (and best quality)
416 * windows: binds only to a single node
417 * replay / retry mechanisms, epoch numbers
419 ==== Multi-Channel - Samba ====
421 * samba/smbd: multi-process
422 ** process $\Leftrightarrow$ tcp connection
423 ** ==> transfer new connection to existing smbd
424 ** use fd-passing (sendmsg/recvmsg)
426 * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
427 * add fd-passing [DONE,4.2]
428 * transfer connection already in negprot (ClientGUID) [ess.DONE]
429 * implement channel epoch numbers [WIP]
430 * implement interface discovery [WIP]
432 ==== Multi-Channel - Samba ====
435 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
445 ==== SMB Direct (RDMA) ====
448 ** requires multi-channel
449 ** start with TCP, bind an RDMA channel
450 ** reads and writes use RDMB write/read
451 ** protocol/metadata via send/receive
453 * wireshark dissector: [DONE]
456 ** prereq: multi-channel / fd-passing
457 ** buffer / transport abstractions [TODO]
458 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
459 ==> central daemon (or kernel module) to serve as RDMA "proxy"
461 ==== SMB Direct (RDMA) - Plan ====
464 <<<smb3-rdma-samba_exp.png,height=.9\textheight>>>
467 %%%==== SMB Direct (RDMA) - Plan ====
470 %%%* smbd-d (rdma proxy daemon)
471 %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@)
472 %%%** listens for RDMA connection (as told by main smbd)
474 %%%** listens for TCP connections
475 %%%** connects to smbd-d-socket
476 %%%*** request rdma-interfaces, tell smbd-d on which to listen
477 %%%** "accepts" new smb-direct connections on smdb-d-socket
480 %%%==== SMB Direct (RDMA) - Plan ====
484 %%%** connects via TCP --> smbd forks child smbd (c)
485 %%%** connects via RDMA to smbd-d
487 %%%** creates socket-pair as rdma-proxy-channel
488 %%%** passes one end of socket-pair to main smbd for accept
489 %%%** sends smb direct packages over proxy-channel
491 %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
493 %%%** continues proxy-communication with smdb-d
496 %%%* For @rdma\_read@ and @rdma\_write@:
497 %%%** c and smbd-d establish shared memory area
501 %%% ==== Persistent Handles ====
506 %%% * like durable file handles with strong guarantees
507 %%% * framework is already there in samba (by support for durable v2)
508 %%% ** ==> easy to satisfy at the protocol level
511 %%% * the difficulty lies in implementing the guarantees
512 %%% ** need make metadata persistent
513 %%% ** but don't kill performance!
514 %%% ** persistent tdbs !would! kill performance
516 %%% *** need to be sync
517 %%% *** record-level transactions (instead of db-level)
518 %%% *** only replicate to some nodes, not all
522 %%==== Clustering Concepts (Windows) ====
528 %%** (``traditional'') failover cluster (active-passive)
529 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
531 %%*** runs off a cluster (failover) volume
532 %%*** offers the Witness service
535 %%* Scale-Out (SOFS):
536 %%** scale-out cluster (all-active!)
537 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
538 %%** no client caching
539 %%** Windows: runs off a cluster shared volume (implies cluster)
542 %%* Continuous Availability (CA):
543 %%** transparent failover, persistent handles
544 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
545 %%** can independently turned on on any cluster share (failover or scale-out)
546 %%** ==> changed client retry behaviour!
549 %%% ==== Clustering -- Controlling Flags from Windows ====
554 %%% * a share on a cluster carries
555 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
558 %%% * a share on a cluster carries
559 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
560 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
563 %%% * independently settable on a clustered share:
564 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
565 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
569 %%==== Clustering -- Server Behaviour ====
574 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
575 %%** run witness service (RPC)
576 %%** client can register and get notified about resource changes
579 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
580 %%** do not grant batch oplocks, write leases, handle leases
581 %%** ==> no durable handles unless also CA
584 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
585 %%** offer persistent handles
586 %%** timeout from durable v2 request
590 %%==== Clustering -- Client Behaviour (Win8) ====
596 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
597 %%** clients happily work if witness is not available
600 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
601 %%** clients happily connect if @CLUSTER@ is not set.
602 %%** clients DO request oplocks/leases/durable handles
603 %%** clients are not confused if they get these
606 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
607 %%** clients happily connect if @CLUSTER@ is not set.
608 %%** clients typically request persistent handle with RWH lease
613 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
614 %%%$\Leftrightarrow$ \\ %
615 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
618 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
621 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
624 %%% * Server-Matrix (on/off):
625 %%% ** persistent handle cap
626 %%% ** durable handles
627 %%% ** cluster share cap
633 %%% ** connect to share with explorer
634 %%% ** start copying file (2G)
636 %%% ** wait for the client to pop up an error dialog
641 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
644 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
648 %%% ** 3 consecutive attempt rounds:
649 %%% *** for each of the two IPs: \\ %
651 %%% three tcp syn attempts to IP with 0.5 sec breaks
652 %%% ** ==> some 2.1 seconds for 1 round
653 %%% ** between attempts:
654 %%% ** dns, ping, arp ... 5.8 seconds
655 %%% ** ==> _red_18 seconds_
659 %%% ** retries attempt rounds from above for _red_14 minutes_
669 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
674 %%==== Clustering with Samba/CTDB ====
677 %%* all-active SMB-cluster with Samba and CTDB... \\ %
678 %%+<3->{...since 2007! \smiley }
681 %%* transparent for the client
683 %%*** metadata and messaging engine for Samba in a cluster
684 %%*** plus cluster resource manager (IPs, services...)
685 %%** client only sees one ``big'' SMB server
686 %%** we could not change the client!...
687 %%** works ``well enough''
691 %%** how to integrate SMB3 clustering with Samba/CTDB
692 %%** good: rather orthogonal
693 %%** ctdb-clustering transparent mostly due to management
696 %%==== Witness Service ====
700 %%** monitoring of availability of resources (shares, NICs)
701 %%** server asks client to move to another resource
705 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
706 %%** but clients happily connect w/o witness
709 %%* status in Samba [WIP (Metze, Gregor Beck)]:
710 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
711 %%** wireshark dissector: essentially done
712 %%** client: in @rpcclient@ - done
713 %%** server: dummy PoC / tracer bullet implementation done
714 %%** CTDB: changes / integration needed
722 %%% !@https://wiki.samba.org/index.php/SMB3@!
732 %%% [[[.6\textwidth]]]
734 %%% [[[.3\textwidth]]]
735 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
741 ==== Misc ====[plain]
746 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
751 <[block]{File Systems}
752 * gpfs, gluster, ceph, btrfs...
753 * support through vfs modules
754 * fuse-based: avoid context switches
755 * instrument SMB3 storage features (fsctls)
760 %%<[block]{Under the hood}
761 %%* restructurings, reconsilations
762 %%* ctdb moved into samba tree
763 %%* published libs: talloc, tdb, tevent ...
767 * unprivileged selftest, autobuild
768 * selfcontained testing: wrapper
772 ** resolv wrapper [_red_new_]
773 * externalized as separate projects:
774 ** ==> @http://cwrap.org/@
776 ** ==> Adreas Schneider's talk
780 ==== Forecast: Cloudy ====
782 <[block]{Possible involvement with OpenStack}
783 * SMB storage service for Windows (and other) VMs
784 * SMB3 storage backend for Hyper-V images
785 * also: chances for AD-integration into auth
790 <[block]{especially but not exclusively}
803 * Samba 4.X is quite different from 3.Y
806 <[block]{What's coming?}
807 * Performance: the story continues
808 * Interop: strengthen strenths
809 * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
810 * Some clouds in the sky...
814 ==== Thanks for your attention! ====[plain]
821 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>