11 <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
15 ==== Short History ====
18 * 2.0: 1999/01: domain-member, +SWAT
19 * 2.2: 2001/04: NT4-DC
20 * 3.0: 2003/09: AD-member, Samba4 project started
21 * 3.2: 2008/07: GPLv3, experimental clustering
22 * 3.3: 2009/01: clustering
23 * 3.4: 2009/07: merged S3+S4 code
24 * 3.5: 2010/03: experimental SMB 2.0
25 * 3.6: 2011/09: SMB 2.0
26 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
27 * 4.1: 2013/10: stability
28 * 4.2: 2015/03: AD trusts, leases, performance, scalability, CTDB
31 %%% ==== Release Stream ====
35 %%% <<<samba-release-stream_exp.png,width=.8\textwidth>>>
38 %%% ==== Release Planning ====
42 %%% @https://wiki.samba.org/index.php/Samba\_Release\_Planning@
48 <<<samba-team-20141011.png,height=.9\textheight>>>
54 <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
64 %%% [[[.3\textwidth]]]
65 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
66 %%% [[[.3\textwidth]]]
67 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
68 %%% [[[.3\textwidth]]]
69 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
75 ==== Samba File Server Topics / Challenges ====
77 # performance: scalable file server
78 #* scale-up: exhaust powerful boxes
79 #* scale-out: flexible all-active clusters
80 #* scale-down: perform well on low-end boxes
81 # interop: multi-protocol access (nfs, afp, ...)
82 # server workloads / SMB features
83 #* tune for: small \# of connections, threaded applications
85 #* SMB3 (clustering, RDMA, ...)
86 # special file systems support (gluster, ceph, gpfs, btrfs, ...)
87 # cloud / openstack?...
88 %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
91 %% ==== Samba File Serving Topics ====
94 %% * Clustering (CTDB)
95 %% * SMB features (SMB3...)
96 %% * Interop (protocols, NFS, AFP, ...)
97 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
100 %%==== Other Samba Topics ====
102 %%* Auth/Domain Member
107 %%% ==== Performance ====[plain]
111 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
114 %%% ==== Performance - low end systems ====
117 %%% <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
119 %%% ** didn't saturate 1G nic (arm), CPU 100\%
120 %%% * reduced memory allocations
121 %%% * instrument SMB 2.1 multi-credit / large MTU
123 %%% ** saturates 1G nic (arm), CPU $<$ 100\%
127 %%% ==== Performance - DB performance ====
130 %%% * trivial database
131 %%% * used for IPC (smbd processes)
132 %%% * cluster (CTDB): local copies
135 %%% <[block]{hot databases}
136 %%% * @locking.tdb@ (open files)
137 %%% * @brlock.tdb@ (byte range locks)
138 %%% * @notify\_index.tdb@ (for change notify)
141 %%% ==== Performance - DB performance ====
143 %%% <[block]{problem 1}
144 %%% * fcntl byte range locks for record locks
145 %%% * contention via single kernel spinlock
148 %%% <[block]{solution}
149 %%% * alternative to fcntl: pthread robust mutexes
150 %%% * ==> massive speedup
151 %%% * ==> included in TDB 1.3.1, Samba 4.2
154 %%% ==== Performance - DB performance ====
156 %%% <[block]{problem 2}
158 %%% ** single chain, contended (@locking.tdb@)
159 %%% ** gets fragmented (singly linked)
160 %%% * especially a problem in ctdb-cluster: vacuuming
163 %%% <[block]{improvements}
164 %%% * make use of small per-record freelists (dead records)
165 %%% * add automatic defragmentation upon traversal
166 %%% * ==> included in TDB 1.3.1, Samba 4.2
169 %%% ==== Performance - DB performance ====
170 %%% <[block]{problem 3}
171 %%% * change notify not scalable
174 %%% <[block]{first improvement}
175 %%% * restructured @notify.tdb@ to
176 %%% ** global @notify\_index.tdb@ and
177 %%% ** local @notify.tdb@
178 %%% ** ==> better but still not good enough for some workloads
181 %%% <[block]{next steps}
182 %%% * replace DB-approach by new scalable, async notify daemon using messaging
183 %%% * some false positives do not harm
188 %%% ==== Performance - scaling ====
190 %%% <[block]{parellelism}
191 %%% * samba is multi-process:
192 %%% ** smbd child process $\leftrightarrow$ TCP connection
193 %%% ** event-loop in one process
194 %%% * within a smbd process:
195 %%% ** pthread-pool jobs for potentially blocking syscalls
196 %%% ** ==> parallelism for reads/writes
197 %%% ** default for async I/O since Samba 4.0
200 %%% ==== Performance - scaling ====
202 %%% <[block]{messaging}
203 %%% * classical messaging:
204 %%% ** messages.tdb and signals between processes
205 %%% ** does not scale well
206 %%% * new massaging in Samba 4.2:
207 %%% ** fast and scalable messaging based on unix datagram messages
208 %%% ** ==> WIP: integrate with AD/DC messaging
209 %%% ** ==> features fd-passing for sockets (SMB3 multi-channel)
210 %%% ** ==> TODO: integrate into CTDB inter-node-messaging
214 %%% ==== Interop ====[plain]
219 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
223 %%% ==== Interop-Central ====
225 %%% <[block]{multi-protocol access}
226 %%% * nfs (kernel, ganesha, ...)
229 %%% * SMB2+ unix-extensions
233 %%% ==== File Server Layout/Scope ====
236 %%% <<<samba-layers.jpg,height=.8\textheight>>>
240 %%% ==== Interop - Fruit ====
244 %%% [[[.9\textwidth]]]
245 %%% * MacOS 10.9: SMB 2.1 preferred file protocol
246 %%% * @vfs\_fruit@ - new module in Samba 4.2
247 %%% [[[.05\textwidth]]]
251 %%% [[[.55\textwidth]]]
254 %%% ** indexed search
255 %%% ** dcerpc service
256 %%% ** ==> under review
258 %%% ** SMB2 create context
259 %%% ** speed up directory listings
260 %%% ** ==> under review
262 %%% [[[.4\textwidth]]]
263 %%% <<<apfel_1280.jpg,width=.9\textwidth>>>
273 ==== SMB features ====[plain]
282 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
287 ==== SMB features in Samba - SMB2 ====
294 * SMB 2.0 (Vista / 2008):
295 ** durable file handles [4.0]
296 * SMB 2.1 (Win7 / 2008R2):
297 ** multi-credit / large mtu [4.0]
298 ** dynamic reauthentication [4.0]
300 ** resilient file handles [WIP-tracer]
303 <<<durable-crop-colormod-1024,width=.9\textwidth>>>
308 ==== SMB features in Samba - SMB3 ====
315 * SMB 3.0 (Win8 / 2012):
316 ** new crypto (sign/encrypt) [4.0]
317 ** secure negotiation [4.0]
318 ** durable handles v2 [4.0]
319 ** persistent file handles [WIP.tracer]
320 ** multi-channel [WIP+]
321 ** SMB direct [designed/starting]
322 ** cluster features [designing]
324 ** storage features [WIP]
325 * SMB 3.02 (Win8.1 / 2012R2): [WIP]
326 * SMB 3.1 (Win10 / 2014): [ess.DONE]
329 <<<durable-crop-colormod-1024,width=.9\textwidth>>>
339 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
342 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
352 %%% * new crypto (signing, transport encryption)
353 %%% * persistent file handles
355 %%% * RDMA transport (SMB direct)
356 %%% * storage features
359 %%% ** transparent failover (continuous availability)
360 %%% ** all-active (scale-out)
363 %%% ==== SMB3 - Goals ====
368 %%% * fault tolerance / reliability
369 %%% * performance / throughput / scaling
370 %%% * focus on support for server workloads \\ %
371 %%% (as opposed to workstation workloads)
372 %%% * especially support for:
376 %%% ** replace block storage in data center
377 %%% ** block (SCSI) over SMB
380 %%% ==== Requirements for Hyper-V ====
385 %%% * minimum requirements:
387 %%% ** is that really all??? - maybe resilient file handles..
390 %%% * desired features:
391 %%% ** cluster ($\ge 2$ nodes)
392 %%% ** CA / persistent handles
393 %%% ** RDMA / SMB direct
397 %%% ==== SMB Protocol in Samba ====
405 %%% ** experimental incomplete support for SMB 2.0
407 %%% ** official support for SMB 2.0
408 %%% ** missing: durable handles
409 %%% ** default server max proto: SMB 1
411 %%% ** SMB 2.0: complete with durable handles
412 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
413 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
414 %%% ** default server max proto: SMB 3.0
416 %%% ** SMB 3.02: basic
419 %%% ==== ==== [plain]
422 %%% Technical Details...
430 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
435 ==== Multi-Channel - Windows/Protocol ====
437 * find interfaces with interface discovery: \\ %
438 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
439 * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
440 * windows: uses connections of same (and best quality)
441 * windows: binds only to a single node
442 * replay / retry mechanisms, epoch numbers
444 ==== Multi-Channel - Samba ====
446 * samba/smbd: multi-process
447 ** process $\Leftrightarrow$ tcp connection
448 ** ==> transfer new connection to existing smbd
449 ** use fd-passing (sendmsg/recvmsg)
451 * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
452 * add fd-passing [DONE,4.2]
453 * transfer connection already in negprot (ClientGUID) [ess.DONE]
454 * implement channel epoch numbers [WIP]
455 * implement interface discovery [WIP]
457 ==== Multi-Channel - Samba ====
460 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
470 ==== SMB Direct (RDMA) ====
473 ** requires multi-channel
474 ** start with TCP, bind an RDMA channel
475 ** reads and writes use RDMB write/read
476 ** protocol/metadata via send/receive
478 * wireshark dissector: [DONE]
481 ** prereq: multi-channel / fd-passing
482 ** buffer / transport abstractions [TODO]
483 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
484 ==> central daemon (or kernel module) to serve as RDMA "proxy"
486 ==== SMB Direct (RDMA) - Plan ====
489 <<<smb3-rdma-samba-v2_exp.png,height=.9\textheight>>>
492 %%%==== SMB Direct (RDMA) - Plan ====
495 %%%* smbd-d (rdma proxy daemon)
496 %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@)
497 %%%** listens for RDMA connection (as told by main smbd)
499 %%%** listens for TCP connections
500 %%%** connects to smbd-d-socket
501 %%%*** request rdma-interfaces, tell smbd-d on which to listen
502 %%%** "accepts" new smb-direct connections on smdb-d-socket
505 %%%==== SMB Direct (RDMA) - Plan ====
509 %%%** connects via TCP --> smbd forks child smbd (c)
510 %%%** connects via RDMA to smbd-d
512 %%%** creates socket-pair as rdma-proxy-channel
513 %%%** passes one end of socket-pair to main smbd for accept
514 %%%** sends smb direct packages over proxy-channel
516 %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
518 %%%** continues proxy-communication with smdb-d
521 %%%* For @rdma\_read@ and @rdma\_write@:
522 %%%** c and smbd-d establish shared memory area
526 %%% ==== Persistent Handles ====
531 %%% * like durable file handles with strong guarantees
532 %%% * framework is already there in samba (by support for durable v2)
533 %%% ** ==> easy to satisfy at the protocol level
536 %%% * the difficulty lies in implementing the guarantees
537 %%% ** need make metadata persistent
538 %%% ** but don't kill performance!
539 %%% ** persistent tdbs !would! kill performance
541 %%% *** need to be sync
542 %%% *** record-level transactions (instead of db-level)
543 %%% *** only replicate to some nodes, not all
547 %%==== Clustering Concepts (Windows) ====
553 %%** (``traditional'') failover cluster (active-passive)
554 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
556 %%*** runs off a cluster (failover) volume
557 %%*** offers the Witness service
560 %%* Scale-Out (SOFS):
561 %%** scale-out cluster (all-active!)
562 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
563 %%** no client caching
564 %%** Windows: runs off a cluster shared volume (implies cluster)
567 %%* Continuous Availability (CA):
568 %%** transparent failover, persistent handles
569 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
570 %%** can independently turned on on any cluster share (failover or scale-out)
571 %%** ==> changed client retry behaviour!
574 %%% ==== Clustering -- Controlling Flags from Windows ====
579 %%% * a share on a cluster carries
580 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
583 %%% * a share on a cluster carries
584 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
585 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
588 %%% * independently settable on a clustered share:
589 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
590 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
594 %%==== Clustering -- Server Behaviour ====
599 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
600 %%** run witness service (RPC)
601 %%** client can register and get notified about resource changes
604 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
605 %%** do not grant batch oplocks, write leases, handle leases
606 %%** ==> no durable handles unless also CA
609 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
610 %%** offer persistent handles
611 %%** timeout from durable v2 request
615 %%==== Clustering -- Client Behaviour (Win8) ====
621 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
622 %%** clients happily work if witness is not available
625 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
626 %%** clients happily connect if @CLUSTER@ is not set.
627 %%** clients DO request oplocks/leases/durable handles
628 %%** clients are not confused if they get these
631 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
632 %%** clients happily connect if @CLUSTER@ is not set.
633 %%** clients typically request persistent handle with RWH lease
638 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
639 %%%$\Leftrightarrow$ \\ %
640 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
643 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
646 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
649 %%% * Server-Matrix (on/off):
650 %%% ** persistent handle cap
651 %%% ** durable handles
652 %%% ** cluster share cap
658 %%% ** connect to share with explorer
659 %%% ** start copying file (2G)
661 %%% ** wait for the client to pop up an error dialog
666 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
669 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
673 %%% ** 3 consecutive attempt rounds:
674 %%% *** for each of the two IPs: \\ %
676 %%% three tcp syn attempts to IP with 0.5 sec breaks
677 %%% ** ==> some 2.1 seconds for 1 round
678 %%% ** between attempts:
679 %%% ** dns, ping, arp ... 5.8 seconds
680 %%% ** ==> _red_18 seconds_
684 %%% ** retries attempt rounds from above for _red_14 minutes_
694 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
699 %%==== Clustering with Samba/CTDB ====
702 %%* all-active SMB-cluster with Samba and CTDB... \\ %
703 %%+<3->{...since 2007! \smiley }
706 %%* transparent for the client
708 %%*** metadata and messaging engine for Samba in a cluster
709 %%*** plus cluster resource manager (IPs, services...)
710 %%** client only sees one ``big'' SMB server
711 %%** we could not change the client!...
712 %%** works ``well enough''
716 %%** how to integrate SMB3 clustering with Samba/CTDB
717 %%** good: rather orthogonal
718 %%** ctdb-clustering transparent mostly due to management
721 %%==== Witness Service ====
725 %%** monitoring of availability of resources (shares, NICs)
726 %%** server asks client to move to another resource
730 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
731 %%** but clients happily connect w/o witness
734 %%* status in Samba [WIP (Metze, Gregor Beck)]:
735 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
736 %%** wireshark dissector: essentially done
737 %%** client: in @rpcclient@ - done
738 %%** server: dummy PoC / tracer bullet implementation done
739 %%** CTDB: changes / integration needed
747 %%% !@https://wiki.samba.org/index.php/SMB3@!
757 %%% [[[.6\textwidth]]]
759 %%% [[[.3\textwidth]]]
760 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
765 ==== SMB features in Samba ====
769 @https://wiki.samba.org/index.php/Samba3/SMB3@
773 %%% ==== Misc ====[plain]
778 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
783 <[block]{File Systems}
784 * gpfs, gluster, ceph, btrfs...
785 * support through vfs modules
786 * fuse-based: avoid context switches
787 * instrument SMB3 storage features (fsctls)
792 %%<[block]{Under the hood}
793 %%* restructurings, reconsilations
794 %%* ctdb moved into samba tree
795 %%* published libs: talloc, tdb, tevent ...
799 * unprivileged selftest, autobuild
800 * selfcontained testing: wrapper
804 ** resolv wrapper [_red_new_]
805 * externalized as separate projects:
806 ** ==> @http://cwrap.org/@
808 ** ==> Andreas Schneider's talk
812 %%% ==== Forecast: Cloudy ====
814 %%% <[block]{Possible involvement with OpenStack}
815 %%% * SMB storage service for Windows (and other) VMs
816 %%% * SMB3 storage backend for Hyper-V images
817 %%% * also: chances for AD-integration into auth
822 <[block]{especially but not exclusively}
832 %%% ==== Conclusion ====[plain]
836 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
839 %%% ==== Conclusion ====
841 %%% <[block]{Remember}
842 %%% * Samba 4.X is quite different from 3.Y
845 %%% <[block]{What's coming?}
846 %%% * Performance: the story continues
847 %%% * Interop: strengthen strenths
848 %%% * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
849 %%% * Some clouds in the sky...
853 ==== Thanks for your attention! ====[plain]
872 <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>