14 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
16 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
18 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
26 <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
30 ==== Short History ====
33 * 2.0: 1999/01: domain-member, +SWAT
34 * 2.2: 2001/04: NT4-DC
35 * 3.0: 2003/09: AD-member, Samba4 project started
36 * 3.2: 2008/07: GPLv3, experimental clustering
37 * 3.3: 2009/01: clustering
38 * 3.4: 2009/07: merged S3+S4 code
39 * 3.5: 2010/03: experimental SMB 2.0
40 * 3.6: 2011/09: SMB 2.0
41 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
42 * 4.1: 2013/10: stability
43 * 4.2: soon: AD trusts, performance, scalability, CTDB included
45 ==== Release Stream ====
49 <<<samba-release-stream_exp.png,width=.8\textwidth>>>
52 ==== Release Planning ====
56 @https://wiki.samba.org/index.php/Samba\_Release\_Planning@
62 <<<samba-team-20141011.png,height=.9\textheight>>>
68 <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
72 ==== Samba File Server Topics / Challenges ====
74 # performance: scalable file server
75 #* scale-up: exhaust powerful boxes
76 #* scale-out: flexible all-active clusters
77 #* scale-down: perform well on low-end boxes
78 # interop: multi-protocol access (nfs, afp, ...)
79 # server workloads / SMB features
80 #* tune for: small \# of connections, threaded applications
82 #* SMB3 (clustering, RDMA, ...)
83 # special file systems support (gluster, ceph, gpfs, btrfs, ...)
84 # cloud / openstack?...
85 %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
88 %% ==== Samba File Serving Topics ====
91 %% * Clustering (CTDB)
92 %% * SMB features (SMB3...)
93 %% * Interop (protocols, NFS, AFP, ...)
94 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
97 %%==== Other Samba Topics ====
99 %%* Auth/Domain Member
104 ==== Performance ====[plain]
108 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
111 ==== Performance - low end systems ====
114 <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
116 ** didn't saturate 1G nic (arm), CPU 100\%
117 * reduced memory allocations
118 * instrument SMB 2.1 multi-credit / large MTU
120 ** saturates 1G nic (arm), CPU $<$ 100\%
124 ==== Performance - DB performance ====
128 * used for IPC (smbd processes)
129 * cluster (CTDB): local copies
132 <[block]{hot databases}
133 * @locking.tdb@ (open files)
134 * @brlock.tdb@ (byte range locks)
135 * @notify\_index.tdb@ (for change notify)
138 ==== Performance - DB performance ====
141 * fcntl bty range locks for record locks
142 * contention via single kernel spinlock
146 * alternative to fcntl: pthread robust mutexes
147 * ==> massive speedup
148 * ==> included in TDB 1.3.1, Samba 4.2
151 ==== Performance - DB performance ====
155 ** single chain, contended (@locking.tdb@)
156 ** gets fragmented (singly linked)
157 * especially a problem in ctdb-cluster: vacuuming
160 <[block]{improvements}
161 * make use of small per-record freelists (dead records)
162 * add automatic defragmentation upon traversal
163 * ==> included in TDB 1.3.1, Samba 4.2
166 ==== Performance - DB performance ====
168 * change notify not scalable
171 <[block]{first improvement}
172 * restructured @notify.tdb@ to
173 ** global @notify\_index.tdb@ and
174 ** local @notify.tdb@
175 ** ==> better but still not good enough for some workloads
179 * replace DB-approach by new scalable, async notify daemon using messaging
180 * some false positives do not harm
185 ==== Performance - scaling ====
187 <[block]{parellelism}
188 * samba is multi-process:
189 ** smbd child process $\leftrightarrow$ TCP connection
190 ** event-loop in one process
191 * within a smbd process:
192 ** pthread-pool jobs for potentially blocking syscalls
193 ** ==> parallelism for reads/writes
194 ** default for async I/O since Samba 4.0
197 ==== Performance - scaling ====
200 * classical messaging:
201 ** messages.tdb and signals between processes
202 ** does not scale well
203 * new massaging in Samba 4.2:
204 ** fast and scalable messaging based on unix datagram messages
205 ** ==> WIP: integrate with AD/DC messaging
206 ** ==> features fd-passing for sockets (SMB3 multi-channel)
207 ** ==> TODO: integrate into CTDB inter-node-messaging
211 ==== Interop ====[plain]
216 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
220 ==== Interop-Central ====
222 <[block]{multi-protocol access}
223 * nfs (kernel, ganesha, ...)
226 * SMB2+ unix-extensions
230 ==== File Server Layout/Scope ====
233 <<<samba-layers.jpg,height=.8\textheight>>>
237 ==== Interop - Fruit ====
242 * MacOS 10.9: SMB 2.1 preferred file protocol
243 * @vfs\_fruit@ - new module in Samba 4.2
255 ** SMB2 create context
256 ** speed up directory listings
260 <<<apfel_1280.jpg,width=.9\textwidth>>>
270 ==== SMB features ====[plain]
279 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
284 ==== SMB features in Samba - SMB2 ====
291 * SMB 2.0 (Vista / 2008):
292 ** durable file handles [4.0]
293 * SMB 2.1 (Win7 / 2008R2):
294 ** multi-credit / large mtu [4.0]
295 ** dynamic reauthentication [4.0]
297 ** resilient file handles [WIP-tracer]
300 <<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
305 ==== SMB features in Samba - SMB3 ====
312 * SMB 3.0 (Win8 / 2012):
313 ** new crypto (sign/encrypt) [4.0]
314 ** secure negotiation [4.0]
315 ** durable handles v2 [4.0]
316 ** persistent file handles [WIP.tracer]
317 ** multi-channel [WIP+]
318 ** SMB direct [designed/starting]
319 ** cluster features [designing]
321 ** storage features [WIP]
322 * SMB 3.02 (Win8.1 / 2012R2): [WIP]
323 * SMB 3.1 (Win10 / 2014): [ess.DONE]
326 <<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
336 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
339 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
349 %%% * new crypto (signing, transport encryption)
350 %%% * persistent file handles
352 %%% * RDMA transport (SMB direct)
353 %%% * storage features
356 %%% ** transparent failover (continuous availability)
357 %%% ** all-active (scale-out)
360 %%% ==== SMB3 - Goals ====
365 %%% * fault tolerance / reliability
366 %%% * performance / throughput / scaling
367 %%% * focus on support for server workloads \\ %
368 %%% (as opposed to workstation workloads)
369 %%% * especially support for:
373 %%% ** replace block storage in data center
374 %%% ** block (SCSI) over SMB
377 %%% ==== Requirements for Hyper-V ====
382 %%% * minimum requirements:
384 %%% ** is that really all??? - maybe resilient file handles..
387 %%% * desired features:
388 %%% ** cluster ($\ge 2$ nodes)
389 %%% ** CA / persistent handles
390 %%% ** RDMA / SMB direct
394 %%% ==== SMB Protocol in Samba ====
402 %%% ** experimental incomplete support for SMB 2.0
404 %%% ** official support for SMB 2.0
405 %%% ** missing: durable handles
406 %%% ** default server max proto: SMB 1
408 %%% ** SMB 2.0: complete with durable handles
409 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
410 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
411 %%% ** default server max proto: SMB 3.0
413 %%% ** SMB 3.02: basic
416 %%% ==== ==== [plain]
419 %%% Technical Details...
427 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
432 ==== Multi-Channel - Windows/Protocol ====
434 * find interfaces with interface discovery: \\ %
435 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
436 * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
437 * windows: uses connections of same (and best quality)
438 * windows: binds only to a single node
439 * replay / retry mechanisms, epoch numbers
441 ==== Multi-Channel - Samba ====
443 * samba/smbd: multi-process
444 ** process $\Leftrightarrow$ tcp connection
445 ** ==> transfer new connection to existing smbd
446 ** use fd-passing (sendmsg/recvmsg)
448 * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
449 * add fd-passing [DONE,4.2]
450 * transfer connection already in negprot (ClientGUID) [ess.DONE]
451 * implement channel epoch numbers [WIP]
452 * implement interface discovery [WIP]
454 ==== Multi-Channel - Samba ====
457 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
467 ==== SMB Direct (RDMA) ====
470 ** requires multi-channel
471 ** start with TCP, bind an RDMA channel
472 ** reads and writes use RDMB write/read
473 ** protocol/metadata via send/receive
475 * wireshark dissector: [DONE]
478 ** prereq: multi-channel / fd-passing
479 ** buffer / transport abstractions [TODO]
480 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
481 ==> central daemon (or kernel module) to serve as RDMA "proxy"
483 ==== SMB Direct (RDMA) - Plan ====
486 <<<smb3-rdma-samba_exp.png,height=.9\textheight>>>
489 %%%==== SMB Direct (RDMA) - Plan ====
492 %%%* smbd-d (rdma proxy daemon)
493 %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@)
494 %%%** listens for RDMA connection (as told by main smbd)
496 %%%** listens for TCP connections
497 %%%** connects to smbd-d-socket
498 %%%*** request rdma-interfaces, tell smbd-d on which to listen
499 %%%** "accepts" new smb-direct connections on smdb-d-socket
502 %%%==== SMB Direct (RDMA) - Plan ====
506 %%%** connects via TCP --> smbd forks child smbd (c)
507 %%%** connects via RDMA to smbd-d
509 %%%** creates socket-pair as rdma-proxy-channel
510 %%%** passes one end of socket-pair to main smbd for accept
511 %%%** sends smb direct packages over proxy-channel
513 %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
515 %%%** continues proxy-communication with smdb-d
518 %%%* For @rdma\_read@ and @rdma\_write@:
519 %%%** c and smbd-d establish shared memory area
523 %%% ==== Persistent Handles ====
528 %%% * like durable file handles with strong guarantees
529 %%% * framework is already there in samba (by support for durable v2)
530 %%% ** ==> easy to satisfy at the protocol level
533 %%% * the difficulty lies in implementing the guarantees
534 %%% ** need make metadata persistent
535 %%% ** but don't kill performance!
536 %%% ** persistent tdbs !would! kill performance
538 %%% *** need to be sync
539 %%% *** record-level transactions (instead of db-level)
540 %%% *** only replicate to some nodes, not all
544 %%==== Clustering Concepts (Windows) ====
550 %%** (``traditional'') failover cluster (active-passive)
551 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
553 %%*** runs off a cluster (failover) volume
554 %%*** offers the Witness service
557 %%* Scale-Out (SOFS):
558 %%** scale-out cluster (all-active!)
559 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
560 %%** no client caching
561 %%** Windows: runs off a cluster shared volume (implies cluster)
564 %%* Continuous Availability (CA):
565 %%** transparent failover, persistent handles
566 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
567 %%** can independently turned on on any cluster share (failover or scale-out)
568 %%** ==> changed client retry behaviour!
571 %%% ==== Clustering -- Controlling Flags from Windows ====
576 %%% * a share on a cluster carries
577 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
580 %%% * a share on a cluster carries
581 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
582 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
585 %%% * independently settable on a clustered share:
586 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
587 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
591 %%==== Clustering -- Server Behaviour ====
596 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
597 %%** run witness service (RPC)
598 %%** client can register and get notified about resource changes
601 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
602 %%** do not grant batch oplocks, write leases, handle leases
603 %%** ==> no durable handles unless also CA
606 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
607 %%** offer persistent handles
608 %%** timeout from durable v2 request
612 %%==== Clustering -- Client Behaviour (Win8) ====
618 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
619 %%** clients happily work if witness is not available
622 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
623 %%** clients happily connect if @CLUSTER@ is not set.
624 %%** clients DO request oplocks/leases/durable handles
625 %%** clients are not confused if they get these
628 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
629 %%** clients happily connect if @CLUSTER@ is not set.
630 %%** clients typically request persistent handle with RWH lease
635 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
636 %%%$\Leftrightarrow$ \\ %
637 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
640 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
643 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
646 %%% * Server-Matrix (on/off):
647 %%% ** persistent handle cap
648 %%% ** durable handles
649 %%% ** cluster share cap
655 %%% ** connect to share with explorer
656 %%% ** start copying file (2G)
658 %%% ** wait for the client to pop up an error dialog
663 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
666 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
670 %%% ** 3 consecutive attempt rounds:
671 %%% *** for each of the two IPs: \\ %
673 %%% three tcp syn attempts to IP with 0.5 sec breaks
674 %%% ** ==> some 2.1 seconds for 1 round
675 %%% ** between attempts:
676 %%% ** dns, ping, arp ... 5.8 seconds
677 %%% ** ==> _red_18 seconds_
681 %%% ** retries attempt rounds from above for _red_14 minutes_
691 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
696 %%==== Clustering with Samba/CTDB ====
699 %%* all-active SMB-cluster with Samba and CTDB... \\ %
700 %%+<3->{...since 2007! \smiley }
703 %%* transparent for the client
705 %%*** metadata and messaging engine for Samba in a cluster
706 %%*** plus cluster resource manager (IPs, services...)
707 %%** client only sees one ``big'' SMB server
708 %%** we could not change the client!...
709 %%** works ``well enough''
713 %%** how to integrate SMB3 clustering with Samba/CTDB
714 %%** good: rather orthogonal
715 %%** ctdb-clustering transparent mostly due to management
718 %%==== Witness Service ====
722 %%** monitoring of availability of resources (shares, NICs)
723 %%** server asks client to move to another resource
727 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
728 %%** but clients happily connect w/o witness
731 %%* status in Samba [WIP (Metze, Gregor Beck)]:
732 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
733 %%** wireshark dissector: essentially done
734 %%** client: in @rpcclient@ - done
735 %%** server: dummy PoC / tracer bullet implementation done
736 %%** CTDB: changes / integration needed
744 %%% !@https://wiki.samba.org/index.php/SMB3@!
754 %%% [[[.6\textwidth]]]
756 %%% [[[.3\textwidth]]]
757 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
762 ==== SMB features in Samba ====
766 @https://wiki.samba.org/index.php/Samba3/SMB3@
770 ==== Misc ====[plain]
775 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
780 <[block]{File Systems}
781 * gpfs, gluster, ceph, btrfs...
782 * support through vfs modules
783 * fuse-based: avoid context switches
784 * instrument SMB3 storage features (fsctls)
789 %%<[block]{Under the hood}
790 %%* restructurings, reconsilations
791 %%* ctdb moved into samba tree
792 %%* published libs: talloc, tdb, tevent ...
796 * unprivileged selftest, autobuild
797 * selfcontained testing: wrapper
801 ** resolv wrapper [_red_new_]
802 * externalized as separate projects:
803 ** ==> @http://cwrap.org/@
805 ** ==> Andreas Schneider's talk
809 ==== Forecast: Cloudy ====
811 <[block]{Possible involvement with OpenStack}
812 * SMB storage service for Windows (and other) VMs
813 * SMB3 storage backend for Hyper-V images
814 * also: chances for AD-integration into auth
819 <[block]{especially but not exclusively}
829 ==== Conclusion ====[plain]
833 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
839 * Samba 4.X is quite different from 3.Y
842 <[block]{What's coming?}
843 * Performance: the story continues
844 * Interop: strengthen strenths
845 * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
846 * Some clouds in the sky...
850 ==== Thanks for your attention! ====[plain]
869 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>