14 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
16 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
18 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
26 <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
30 ==== Short History ====
33 * 2.0: 1999/01: domain-member, +SWAT
34 * 2.2: 2001/04: NT4-DC
35 * 3.0: 2003/09: AD-member, Samba4 project started
36 * 3.2: 2008/07: GPLv3, experimental clustering
37 * 3.3: 2009/01: clustering
38 * 3.4: 2009/07: merged S3+S4 code
39 * 3.5: 2010/03: experimental SMB 2.0
40 * 3.6: 2011/09: SMB 2.0
41 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
42 * 4.1: 2013/10: stability
43 * 4.2: soon: AD trusts, performance, scalability, CTDB included
45 ==== Release Stream ====
49 <<<samba-release-stream_exp.png,width=.8\textwidth>>>
55 <<<samba-team-20141011.png,height=.9\textheight>>>
61 <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
65 ==== Samba File Server Topics / Challenges ====
67 # performance: scalable file server
68 #* scale-up: exhaust powerful boxes
69 #* scale-out: flexible all-active clusters
70 #* scale-down: perform well on low-end boxes
71 # interop: multi-protocol access (nfs, afp, ...)
72 # server workloads / SMB features
73 #* tune for: small \# of connections, threaded applications
75 #* SMB3 (clustering, RDMA, ...)
76 # special file systems support (gluster, ceph, gpfs, btrfs, ...)
77 # cloud / openstack?...
78 %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
81 %% ==== Samba File Serving Topics ====
84 %% * Clustering (CTDB)
85 %% * SMB features (SMB3...)
86 %% * Interop (protocols, NFS, AFP, ...)
87 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
90 %%==== Other Samba Topics ====
92 %%* Auth/Domain Member
97 ==== Performance ====[plain]
101 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
104 ==== Performance - low end systems ====
107 <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
109 ** didn't saturate 1G nic (arm), CPU 100\%
110 * reduced memory allocations
111 * instrument SMB 2.1 multi-credit / large MTU
113 ** saturates 1G nic (arm), CPU $<$ 100\%
117 ==== Performance - DB performance ====
121 * used for IPC (smbd processes)
122 * cluster (CTDB): local copies
125 <[block]{hot databases}
126 * @locking.tdb@ (open files)
127 * @brlock.tdb@ (byte range locks)
128 * @notify\_index.tdb@ (for change notify)
131 ==== Performance - DB performance ====
134 * fcntl bty range locks for record locks
135 * contention via single kernel spinlock
139 * alternative to fcntl: pthread robust mutexes
140 * ==> massive speedup
141 * ==> included in TDB 1.3.1, Samba 4.2
144 ==== Performance - DB performance ====
148 ** single chain, contended (@locking.tdb@)
149 ** gets fragmented (singly linked)
150 * especially a problem in ctdb-cluster: vacuuming
153 <[block]{improvements}
154 * make use of small per-record freelists (dead records)
155 * add automatic defragmentation upon traversal
156 * ==> included in TDB 1.3.1, Samba 4.2
159 ==== Performance - DB performance ====
161 * change notify not scalable
164 <[block]{first improvement}
165 * restructured @notify.tdb@ to
166 ** global @notify\_index.tdb@ and
167 ** local @notify.tdb@
168 ** ==> better but still not good enough for some workloads
172 * replace DB-approach by new scalable, async notify daemon using messaging
173 * some false positives do not harm
178 ==== Performance - scaling ====
180 <[block]{parellelism}
181 * samba is multi-process:
182 ** smbd child process $\leftrightarrow$ TCP connection
183 ** event-loop in one process
184 * within a smbd process:
185 ** pthread-pool jobs for potentially blocking syscalls
186 ** ==> parallelism for reads/writes
187 ** default for async I/O since Samba 4.0
190 ==== Performance - scaling ====
193 * classical messaging:
194 ** messages.tdb and signals between processes
195 ** does not scale well
196 * new massaging in Samba 4.2:
197 ** fast and scalable messaging based on unix datagram messages
198 ** ==> WIP: integrate with AD/DC messaging
199 ** ==> features fd-passing for sockets (SMB3 multi-channel)
200 ** ==> TODO: integrate into CTDB inter-node-messaging
204 ==== Interop ====[plain]
209 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
213 ==== Interop-Central ====
215 <[block]{multi-protocol access}
216 * nfs (kernel, ganesha, ...)
219 * SMB2+ unix-extensions
223 ==== File Server Layout/Scope ====
226 <<<samba-layers.jpg,height=.8\textheight>>>
230 ==== Interop - Fruit ====
235 * MacOS 10.9: SMB 2.1 preferred file protocol
236 * @vfs\_fruit@ - new module in Samba 4.2
248 ** SMB2 create context
249 ** speed up directory listings
253 <<<apfel_1280.jpg,width=.9\textwidth>>>
263 ==== SMB features ====[plain]
272 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
277 ==== SMB features in Samba ====
285 ** durable file handles [4.0]
287 ** multi-credit / large mtu [4.0]
288 ** dynamic reauthentication [4.0]
290 ** resilient file handles [tracer]
292 ** new crypto (sign/encrypt) [4.0]
293 ** secure negotiation [4.0]
294 ** durable handles v2 [4.0]
295 ** persistent file handles [tracer]
296 ** multi-channel [WIP+]
297 ** SMB direct [designed/starting]
298 ** cluster features [designing]
300 ** storage features [WIP]
304 %<<<durable-crop-colormod-1024,width=.9\textwidth,right>>>
314 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
317 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
327 %%% * new crypto (signing, transport encryption)
328 %%% * persistent file handles
330 %%% * RDMA transport (SMB direct)
331 %%% * storage features
334 %%% ** transparent failover (continuous availability)
335 %%% ** all-active (scale-out)
338 %%% ==== SMB3 - Goals ====
343 %%% * fault tolerance / reliability
344 %%% * performance / throughput / scaling
345 %%% * focus on support for server workloads \\ %
346 %%% (as opposed to workstation workloads)
347 %%% * especially support for:
351 %%% ** replace block storage in data center
352 %%% ** block (SCSI) over SMB
355 %%% ==== Requirements for Hyper-V ====
360 %%% * minimum requirements:
362 %%% ** is that really all??? - maybe resilient file handles..
365 %%% * desired features:
366 %%% ** cluster ($\ge 2$ nodes)
367 %%% ** CA / persistent handles
368 %%% ** RDMA / SMB direct
372 %%% ==== SMB Protocol in Samba ====
380 %%% ** experimental incomplete support for SMB 2.0
382 %%% ** official support for SMB 2.0
383 %%% ** missing: durable handles
384 %%% ** default server max proto: SMB 1
386 %%% ** SMB 2.0: complete with durable handles
387 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
388 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
389 %%% ** default server max proto: SMB 3.0
391 %%% ** SMB 3.02: basic
394 %%% ==== ==== [plain]
397 %%% Technical Details...
405 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
410 ==== Multi-Channel - Windows/Protocol ====
413 * find interfaces with interface discovery: \\ %
414 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
415 * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
416 * windows: uses connections of same (and best quality)
417 * windows: binds only to a single node
418 * replay / retry mechanisms, epoch numbers
421 ==== Multi-Channel - Samba ====
424 * samba/smbd: multi-process
425 ** process $\Leftrightarrow$ tcp connection
426 ** ==> transfer new connection to existing smbd
427 ** use fd-passing (sendmsg/recvmsg)
430 * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
431 * add fd-passing [DONE,4.2]
432 * transfer connection already in negprot (ClientGUID) [ess.DONE]
433 * implement channel epoch numbers [WIP]
434 * implement interface discovery [WIP]
437 ==== Multi-Channel - Samba ====
440 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
450 ==== SMB Direct (RDMA) ====
454 ** requires multi-channel
455 ** start with TCP, bind an RDMA channel
456 ** reads and writes use RDMB write/read
457 ** protocol/metadata via send/receive
460 * wireshark dissector: [DONE]
464 ** prereq: multi-channel / fd-passing
465 ** buffer / transport abstractions [TODO]
466 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
467 ==> central daemon (or kernel module) to serve as RDMA "proxy"
470 ==== SMB Direct (RDMA) - Plan ====
473 <<<smb3-rdma-samba_exp.png,height=.9\textheight>>>
476 ==== SMB Direct (RDMA) - Plan ====
479 * smbd-d (rdma proxy daemon)
480 ** listens on unix domain socket (@/var/lib/smbd-d/socket@)
481 ** listens for RDMA connection (as told by main smbd)
483 ** listens for TCP connections
484 ** connects to smbd-d-socket
485 *** request rdma-interfaces, tell smbd-d on which to listen
486 ** "accepts" new smb-direct connections on smdb-d-socket
489 ==== SMB Direct (RDMA) - Plan ====
493 ** connects via TCP --> smbd forks child smbd (c)
494 ** connects via RDMA to smbd-d
496 ** creates socket-pair as rdma-proxy-channel
497 ** passes one end of socket-pair to main smbd for accept
498 ** sends smb direct packages over proxy-channel
500 ** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
502 ** continues proxy-communication with smdb-d
505 * For @rdma\_read@ and @rdma\_write@:
506 ** c and smbd-d establish shared memory area
510 %%% ==== Persistent Handles ====
515 %%% * like durable file handles with strong guarantees
516 %%% * framework is already there in samba (by support for durable v2)
517 %%% ** ==> easy to satisfy at the protocol level
520 %%% * the difficulty lies in implementing the guarantees
521 %%% ** need make metadata persistent
522 %%% ** but don't kill performance!
523 %%% ** persistent tdbs !would! kill performance
525 %%% *** need to be sync
526 %%% *** record-level transactions (instead of db-level)
527 %%% *** only replicate to some nodes, not all
531 %%==== Clustering Concepts (Windows) ====
537 %%** (``traditional'') failover cluster (active-passive)
538 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
540 %%*** runs off a cluster (failover) volume
541 %%*** offers the Witness service
544 %%* Scale-Out (SOFS):
545 %%** scale-out cluster (all-active!)
546 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
547 %%** no client caching
548 %%** Windows: runs off a cluster shared volume (implies cluster)
551 %%* Continuous Availability (CA):
552 %%** transparent failover, persistent handles
553 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
554 %%** can independently turned on on any cluster share (failover or scale-out)
555 %%** ==> changed client retry behaviour!
558 %%% ==== Clustering -- Controlling Flags from Windows ====
563 %%% * a share on a cluster carries
564 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
567 %%% * a share on a cluster carries
568 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
569 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
572 %%% * independently settable on a clustered share:
573 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
574 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
578 %%==== Clustering -- Server Behaviour ====
583 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
584 %%** run witness service (RPC)
585 %%** client can register and get notified about resource changes
588 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
589 %%** do not grant batch oplocks, write leases, handle leases
590 %%** ==> no durable handles unless also CA
593 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
594 %%** offer persistent handles
595 %%** timeout from durable v2 request
599 %%==== Clustering -- Client Behaviour (Win8) ====
605 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
606 %%** clients happily work if witness is not available
609 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
610 %%** clients happily connect if @CLUSTER@ is not set.
611 %%** clients DO request oplocks/leases/durable handles
612 %%** clients are not confused if they get these
615 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
616 %%** clients happily connect if @CLUSTER@ is not set.
617 %%** clients typically request persistent handle with RWH lease
622 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
623 %%%$\Leftrightarrow$ \\ %
624 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
627 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
630 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
633 %%% * Server-Matrix (on/off):
634 %%% ** persistent handle cap
635 %%% ** durable handles
636 %%% ** cluster share cap
642 %%% ** connect to share with explorer
643 %%% ** start copying file (2G)
645 %%% ** wait for the client to pop up an error dialog
650 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
653 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
657 %%% ** 3 consecutive attempt rounds:
658 %%% *** for each of the two IPs: \\ %
660 %%% three tcp syn attempts to IP with 0.5 sec breaks
661 %%% ** ==> some 2.1 seconds for 1 round
662 %%% ** between attempts:
663 %%% ** dns, ping, arp ... 5.8 seconds
664 %%% ** ==> _red_18 seconds_
668 %%% ** retries attempt rounds from above for _red_14 minutes_
678 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
683 %%==== Clustering with Samba/CTDB ====
686 %%* all-active SMB-cluster with Samba and CTDB... \\ %
687 %%+<3->{...since 2007! \smiley }
690 %%* transparent for the client
692 %%*** metadata and messaging engine for Samba in a cluster
693 %%*** plus cluster resource manager (IPs, services...)
694 %%** client only sees one ``big'' SMB server
695 %%** we could not change the client!...
696 %%** works ``well enough''
700 %%** how to integrate SMB3 clustering with Samba/CTDB
701 %%** good: rather orthogonal
702 %%** ctdb-clustering transparent mostly due to management
705 %%==== Witness Service ====
709 %%** monitoring of availability of resources (shares, NICs)
710 %%** server asks client to move to another resource
714 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
715 %%** but clients happily connect w/o witness
718 %%* status in Samba [WIP (Metze, Gregor Beck)]:
719 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
720 %%** wireshark dissector: essentially done
721 %%** client: in @rpcclient@ - done
722 %%** server: dummy PoC / tracer bullet implementation done
723 %%** CTDB: changes / integration needed
731 %%% !@https://wiki.samba.org/index.php/SMB3@!
741 %%% [[[.6\textwidth]]]
743 %%% [[[.3\textwidth]]]
744 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
750 ==== Misc ====[plain]
755 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
760 <[block]{File Systems}
761 * gpfs, gluster, ceph, btrfs...
762 * support through vfs modules
763 * fuse-based: avoid context switches
764 * instrument SMB3 storage features (fsctls)
769 %%<[block]{Under the hood}
770 %%* restructurings, reconsilations
771 %%* ctdb moved into samba tree
772 %%* published libs: talloc, tdb, tevent ...
776 * unprivileged selftest, autobuild
777 * selfcontained testing: wrapper
781 ** resolv wrapper [_red_new_]
782 * externalized as separate projects:
783 ** ==> @http://cwrap.org/@
785 ** ==> Adreas Schneider's talk
789 ==== Forecast: Cloudy ====
791 <[block]{Possible involvement with OpenStack}
792 * SMB storage service for Windows (and other) VMs
793 * SMB3 storage backend for Hyper-V images
794 * also: chances for AD-integration into auth
799 <[block]{especially but not exclusively}
811 * Samba 4.X is quite different from 3.Y
814 <[block]{What's coming?}
815 * Performance: the story continues
816 * Interop: strengthen strenths
817 * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
818 * Some clouds in the sky...
822 ==== Thanks for your attention! ====[plain]
829 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>