11 %%% <<<samba-kisses-better-selection.jpg,height=.8\textheight>>>
15 %%% ==== Short History ====
18 %%% * 2.0: 1999/01: domain-member, +SWAT
19 %%% * 2.2: 2001/04: NT4-DC
20 %%% * 3.0: 2003/09: AD-member, Samba4 project started
21 %%% * 3.2: 2008/07: GPLv3, experimental clustering
22 %%% * 3.3: 2009/01: clustering
23 %%% * 3.4: 2009/07: merged S3+S4 code
24 %%% * 3.5: 2010/03: experimental SMB 2.0
25 %%% * 3.6: 2011/09: SMB 2.0
26 %%% * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0
27 %%% * 4.1: 2013/10: stability
28 %%% * 4.2: 2015/03: AD trusts, leases, performance, scalability, CTDB
31 %%% ==== Release Stream ====
35 %%% <<<samba-release-stream_exp.png,width=.8\textwidth>>>
38 %%% ==== Release Planning ====
42 %%% @https://wiki.samba.org/index.php/Samba\_Release\_Planning@
45 %%% ==== Samba Team ====
48 %%% <<<samba-team-20141011.png,height=.9\textheight>>>
51 %%% ==== Samba Team ====
54 %%% <<<samba-team-20141011-colorized.png,height=.9\textheight>>>
64 %%% [[[.3\textwidth]]]
65 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
66 %%% [[[.3\textwidth]]]
67 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
68 %%% [[[.3\textwidth]]]
69 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
75 %% ==== Samba File Server Topics / Challenges ====
77 %% # performance: scalable file server
78 %% #* scale-up: exhaust powerful boxes
79 %% #* scale-out: flexible all-active clusters
80 %% #* scale-down: perform well on low-end boxes
81 %% # interop: multi-protocol access (nfs, afp, ...)
82 %% # server workloads / SMB features
83 %% #* tune for: small \# of connections, threaded applications
85 %% #* SMB3 (clustering, RDMA, ...)
86 %% # special file systems support (gluster, ceph, gpfs, btrfs, ...)
87 %% # cloud / openstack?...
88 %% %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...)
91 %% ==== Samba File Serving Topics ====
94 %% * Clustering (CTDB)
95 %% * SMB features (SMB3...)
96 %% * Interop (protocols, NFS, AFP, ...)
97 %% * special file systems support (gluster, ceph, gpfs, btrfs...)
100 %%==== Other Samba Topics ====
102 %%* Auth/Domain Member
107 %%% ==== Performance ====[plain]
111 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
114 %%% ==== Performance - low end systems ====
117 %%% <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)}
119 %%% ** didn't saturate 1G nic (arm), CPU 100\%
120 %%% * reduced memory allocations
121 %%% * instrument SMB 2.1 multi-credit / large MTU
123 %%% ** saturates 1G nic (arm), CPU $<$ 100\%
127 %%% ==== Performance - DB performance ====
130 %%% * trivial database
131 %%% * used for IPC (smbd processes)
132 %%% * cluster (CTDB): local copies
135 %%% <[block]{hot databases}
136 %%% * @locking.tdb@ (open files)
137 %%% * @brlock.tdb@ (byte range locks)
138 %%% * @notify\_index.tdb@ (for change notify)
141 %%% ==== Performance - DB performance ====
143 %%% <[block]{problem 1}
144 %%% * fcntl byte range locks for record locks
145 %%% * contention via single kernel spinlock
148 %%% <[block]{solution}
149 %%% * alternative to fcntl: pthread robust mutexes
150 %%% * ==> massive speedup
151 %%% * ==> included in TDB 1.3.1, Samba 4.2
154 %%% ==== Performance - DB performance ====
156 %%% <[block]{problem 2}
158 %%% ** single chain, contended (@locking.tdb@)
159 %%% ** gets fragmented (singly linked)
160 %%% * especially a problem in ctdb-cluster: vacuuming
163 %%% <[block]{improvements}
164 %%% * make use of small per-record freelists (dead records)
165 %%% * add automatic defragmentation upon traversal
166 %%% * ==> included in TDB 1.3.1, Samba 4.2
169 %%% ==== Performance - DB performance ====
170 %%% <[block]{problem 3}
171 %%% * change notify not scalable
174 %%% <[block]{first improvement}
175 %%% * restructured @notify.tdb@ to
176 %%% ** global @notify\_index.tdb@ and
177 %%% ** local @notify.tdb@
178 %%% ** ==> better but still not good enough for some workloads
181 %%% <[block]{next steps}
182 %%% * replace DB-approach by new scalable, async notify daemon using messaging
183 %%% * some false positives do not harm
188 %%% ==== Performance - scaling ====
190 %%% <[block]{parellelism}
191 %%% * samba is multi-process:
192 %%% ** smbd child process $\leftrightarrow$ TCP connection
193 %%% ** event-loop in one process
194 %%% * within a smbd process:
195 %%% ** pthread-pool jobs for potentially blocking syscalls
196 %%% ** ==> parallelism for reads/writes
197 %%% ** default for async I/O since Samba 4.0
200 %%% ==== Performance - scaling ====
202 %%% <[block]{messaging}
203 %%% * classical messaging:
204 %%% ** messages.tdb and signals between processes
205 %%% ** does not scale well
206 %%% * new massaging in Samba 4.2:
207 %%% ** fast and scalable messaging based on unix datagram messages
208 %%% ** ==> WIP: integrate with AD/DC messaging
209 %%% ** ==> features fd-passing for sockets (SMB3 multi-channel)
210 %%% ** ==> TODO: integrate into CTDB inter-node-messaging
214 %%% ==== Interop ====[plain]
219 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
223 %%% ==== Interop-Central ====
225 %%% <[block]{multi-protocol access}
226 %%% * nfs (kernel, ganesha, ...)
229 %%% * SMB2+ unix-extensions
233 %%% ==== File Server Layout/Scope ====
236 %%% <<<samba-layers.jpg,height=.8\textheight>>>
240 %%% ==== Interop - Fruit ====
244 %%% [[[.9\textwidth]]]
245 %%% * MacOS 10.9: SMB 2.1 preferred file protocol
246 %%% * @vfs\_fruit@ - new module in Samba 4.2
247 %%% [[[.05\textwidth]]]
251 %%% [[[.55\textwidth]]]
254 %%% ** indexed search
255 %%% ** dcerpc service
256 %%% ** ==> under review
258 %%% ** SMB2 create context
259 %%% ** speed up directory listings
260 %%% ** ==> under review
262 %%% [[[.4\textwidth]]]
263 %%% <<<apfel_1280.jpg,width=.9\textwidth>>>
273 %% ==== SMB features ====[plain]
279 %% [[[.6\textwidth]]]
281 %% [[[.3\textwidth]]]
282 %% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
287 %% ==== SMB features in Samba - SMB2 ====
292 %% [[[.7\textwidth]]]
294 %% * SMB 2.0 (Vista / 2008):
295 %% ** durable file handles [4.0]
296 %% * SMB 2.1 (Win7 / 2008R2):
297 %% ** multi-credit / large mtu [4.0]
298 %% ** dynamic reauthentication [4.0]
299 %% ** leasing [WIP++]
300 %% ** resilient file handles [WIP-tracer]
302 %% [[[.3\textwidth]]]
303 %% <<<durable-crop-colormod-1024,width=.9\textwidth>>>
309 ==== SMB3 features in Samba ====
315 # SMB 3.0 (Win8 / 2012):
316 #* new crypto (sign/encrypt) [4.0]
317 #* secure negotiation [4.0]
318 #* durable handles v2 [4.0]
319 #* persistent file handles [WIP/tracer]
320 #* '''_red_Multi-Channel_''' [WIP+]
321 #* SMB direct [designing/starting]
322 #* cluster features [designing]
324 #* storage features [WIP]
325 # SMB 3.0.2 (Win8.1 / 2012R2): [master]
326 # SMB 3.1.1 (Win10 / 2014):
327 #* negotiate contexts, preauth: [master]
330 %<<<durable-crop-colormod-1024,width=.9\textwidth>>>
331 <<<smb-auto-crop1,width=\textwidth>>>
338 <[block]{implemented}
341 * preauthentication integrity
342 * encryption improvements (choose cipher) \\ %
343 AES-128-CCM --> AES-128-GCM
346 <[block]{not implemented}
347 * cluster dialect fencing
348 * cluster client failover v2 (client)
371 %%==== Clusterd Samba / CTDB (SOFS since 2007) ====
374 %%<<<design-ctdb-three-nodes.png,width=.9\textwidth>>>
384 %%% * new crypto (signing, transport encryption)
385 %%% * persistent file handles
387 %%% * RDMA transport (SMB direct)
388 %%% * storage features
391 %%% ** transparent failover (continuous availability)
392 %%% ** all-active (scale-out)
395 %%% ==== SMB3 - Goals ====
400 %%% * fault tolerance / reliability
401 %%% * performance / throughput / scaling
402 %%% * focus on support for server workloads \\ %
403 %%% (as opposed to workstation workloads)
404 %%% * especially support for:
408 %%% ** replace block storage in data center
409 %%% ** block (SCSI) over SMB
412 %%% ==== Requirements for Hyper-V ====
417 %%% * minimum requirements:
419 %%% ** is that really all??? - maybe resilient file handles..
422 %%% * desired features:
423 %%% ** cluster ($\ge 2$ nodes)
424 %%% ** CA / persistent handles
425 %%% ** RDMA / SMB direct
429 %%% ==== SMB Protocol in Samba ====
437 %%% ** experimental incomplete support for SMB 2.0
439 %%% ** official support for SMB 2.0
440 %%% ** missing: durable handles
441 %%% ** default server max proto: SMB 1
443 %%% ** SMB 2.0: complete with durable handles
444 %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication
445 %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2
446 %%% ** default server max proto: SMB 3.0
448 %%% ** SMB 3.02: basic
451 %%% ==== ==== [plain]
454 %%% Technical Details...
462 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
467 ==== Multi-Channel - General ====
469 * bind multiple transport connections to one session
470 * increase throughput and fault tolerance
472 ==== Multi-Channel - Windows/Protocol ====
474 # establish initial session on TCP connection
475 # find interfaces with interface discovery: \\ %
476 @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@
477 # bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind)
478 # windows: uses connections of same (and best quality)
479 # windows: binds only to a single node
480 # replay / retry mechanisms, epoch numbers
482 ==== Multi-Channel - Samba ====
484 <[block]{samba/smbd: multi-process}
485 * '''Currently:''' process $\Leftrightarrow$ TCP connection
486 * '''Idea:''' transfer new TPC connection to existing smbd
487 * '''How?''' ==> use fd-passing (sendmsg/recvmsg)
488 * '''When?''' as early as possible, based on client GUID \\ %
489 ==> per client GUID single process model
492 ==== Multi-Channel - Samba ====
495 <<<smb3-mc-samba_exp.png,height=.9\textheight>>>
498 ==== Multi-Channel - Samba ====
501 messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2]
502 # add fd-passing to messaging [DONE,4.2]
503 # preparations in internal structures [ess.DONE]
504 # implement smbd message to pass a tcp connection [ess.DONE]
505 # transfer connection already in negprot (ClientGUID) [largely DONE]
506 # implement session bind [ess.DONE]
507 # implement channel epoch numbers [WIP]
508 # implement interface discovery [WIP]
509 # implement test case [WIP]
512 ==== @MSG\_SMBXSRV\_CONNECTION\_PASS@ ====
514 <[block]{from smbXsrv.idl}
517 NTTIME initial_connect_time;
520 DATA_BLOB negotiate_request;
521 } smbXsrv_connection_pass0;
525 ==== Internal Structures (smbXsrv) ====
530 smbXsrv_session->smbXsrv_connection
536 smbXsrv_session->smbXsrv_client->smbXsrv_connections
548 shell breakout to browse code/diff
564 '''Outlook: SMB Direct'''
571 ==== SMB Direct (RDMA) ====
574 ** requires multi-channel
575 ** start with TCP, bind an RDMA channel
576 ** reads and writes use RDMA write/read
577 ** protocol/metadata via send/receive
579 * wireshark dissector: [DONE]
582 ** prereq: multi-channel / fd-passing
583 ** buffer / transport abstractions [TODO]
584 ** _red_problem_: libraries: not fork safe and no fd-passing \\ %
585 ==> central daemon (or kernel module) to serve as RDMA "proxy"
587 ==== SMB Direct (RDMA) - Plan ====
590 <<<smb3-rdma-samba-v2_exp.png,height=.9\textheight>>>
593 %%%==== SMB Direct (RDMA) - Plan ====
596 %%%* smbd-d (rdma proxy daemon)
597 %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@)
598 %%%** listens for RDMA connection (as told by main smbd)
600 %%%** listens for TCP connections
601 %%%** connects to smbd-d-socket
602 %%%*** request rdma-interfaces, tell smbd-d on which to listen
603 %%%** "accepts" new smb-direct connections on smdb-d-socket
606 %%%==== SMB Direct (RDMA) - Plan ====
610 %%%** connects via TCP --> smbd forks child smbd (c)
611 %%%** connects via RDMA to smbd-d
613 %%%** creates socket-pair as rdma-proxy-channel
614 %%%** passes one end of socket-pair to main smbd for accept
615 %%%** sends smb direct packages over proxy-channel
617 %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID
619 %%%** continues proxy-communication with smdb-d
622 %%%* For @rdma\_read@ and @rdma\_write@:
623 %%%** c and smbd-d establish shared memory area
627 %%% ==== Persistent Handles ====
632 %%% * like durable file handles with strong guarantees
633 %%% * framework is already there in samba (by support for durable v2)
634 %%% ** ==> easy to satisfy at the protocol level
637 %%% * the difficulty lies in implementing the guarantees
638 %%% ** need make metadata persistent
639 %%% ** but don't kill performance!
640 %%% ** persistent tdbs !would! kill performance
642 %%% *** need to be sync
643 %%% *** record-level transactions (instead of db-level)
644 %%% *** only replicate to some nodes, not all
648 %%==== Clustering Concepts (Windows) ====
654 %%** (``traditional'') failover cluster (active-passive)
655 %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@
657 %%*** runs off a cluster (failover) volume
658 %%*** offers the Witness service
661 %%* Scale-Out (SOFS):
662 %%** scale-out cluster (all-active!)
663 %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@
664 %%** no client caching
665 %%** Windows: runs off a cluster shared volume (implies cluster)
668 %%* Continuous Availability (CA):
669 %%** transparent failover, persistent handles
670 %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
671 %%** can independently turned on on any cluster share (failover or scale-out)
672 %%** ==> changed client retry behaviour!
675 %%% ==== Clustering -- Controlling Flags from Windows ====
680 %%% * a share on a cluster carries
681 %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume.
684 %%% * a share on a cluster carries
685 %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV
686 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
689 %%% * independently settable on a clustered share:
690 %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@
691 %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@
695 %%==== Clustering -- Server Behaviour ====
700 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
701 %%** run witness service (RPC)
702 %%** client can register and get notified about resource changes
705 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
706 %%** do not grant batch oplocks, write leases, handle leases
707 %%** ==> no durable handles unless also CA
710 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
711 %%** offer persistent handles
712 %%** timeout from durable v2 request
716 %%==== Clustering -- Client Behaviour (Win8) ====
722 %%* @SMB2\_SHARE\_CAP\_CLUSTER@:
723 %%** clients happily work if witness is not available
726 %%* @SMB2\_SHARE\_CAP\_SCALEOUT@:
727 %%** clients happily connect if @CLUSTER@ is not set.
728 %%** clients DO request oplocks/leases/durable handles
729 %%** clients are not confused if they get these
732 %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@:
733 %%** clients happily connect if @CLUSTER@ is not set.
734 %%** clients typically request persistent handle with RWH lease
739 %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ %
740 %%%$\Leftrightarrow$ \\ %
741 %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@.
744 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
747 %%% * Test: Win8 against slightly pimped Samba (2 IPs)
750 %%% * Server-Matrix (on/off):
751 %%% ** persistent handle cap
752 %%% ** durable handles
753 %%% ** cluster share cap
759 %%% ** connect to share with explorer
760 %%% ** start copying file (2G)
762 %%% ** wait for the client to pop up an error dialog
767 %%% ==== Clustering -- Client Behaviour (Win8) : Retries ====
770 %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA
774 %%% ** 3 consecutive attempt rounds:
775 %%% *** for each of the two IPs: \\ %
777 %%% three tcp syn attempts to IP with 0.5 sec breaks
778 %%% ** ==> some 2.1 seconds for 1 round
779 %%% ** between attempts:
780 %%% ** dns, ping, arp ... 5.8 seconds
781 %%% ** ==> _red_18 seconds_
785 %%% ** retries attempt rounds from above for _red_14 minutes_
795 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
800 %%==== Clustering with Samba/CTDB ====
803 %%* all-active SMB-cluster with Samba and CTDB... \\ %
804 %%+<3->{...since 2007! \smiley }
807 %%* transparent for the client
809 %%*** metadata and messaging engine for Samba in a cluster
810 %%*** plus cluster resource manager (IPs, services...)
811 %%** client only sees one ``big'' SMB server
812 %%** we could not change the client!...
813 %%** works ``well enough''
817 %%** how to integrate SMB3 clustering with Samba/CTDB
818 %%** good: rather orthogonal
819 %%** ctdb-clustering transparent mostly due to management
822 %%==== Witness Service ====
826 %%** monitoring of availability of resources (shares, NICs)
827 %%** server asks client to move to another resource
831 %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@
832 %%** but clients happily connect w/o witness
835 %%* status in Samba [WIP (Metze, Gregor Beck)]:
836 %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk)
837 %%** wireshark dissector: essentially done
838 %%** client: in @rpcclient@ - done
839 %%** server: dummy PoC / tracer bullet implementation done
840 %%** CTDB: changes / integration needed
848 %%% !@https://wiki.samba.org/index.php/SMB3@!
858 %%% [[[.6\textwidth]]]
860 %%% [[[.3\textwidth]]]
861 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
866 ==== SMB features in Samba ====
870 @https://wiki.samba.org/index.php/Samba3/SMB3@
874 %%% ==== Misc ====[plain]
879 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
884 %%% <[block]{File Systems}
885 %%% * gpfs, gluster, ceph, btrfs...
886 %%% * support through vfs modules
887 %%% * fuse-based: avoid context switches
888 %%% * instrument SMB3 storage features (fsctls)
893 %%% %%<[block]{Under the hood}
894 %%% %%* restructurings, reconsilations
895 %%% %%* ctdb moved into samba tree
896 %%% %%* published libs: talloc, tdb, tevent ...
899 %%% <[block]{Testing}
900 %%% * unprivileged selftest, autobuild
901 %%% * selfcontained testing: wrapper
902 %%% ** socket wrapper
905 %%% ** resolv wrapper [_red_new_]
906 %%% * externalized as separate projects:
907 %%% ** ==> @http://cwrap.org/@
908 %%% ** git on samba.org
909 %%% ** ==> Andreas Schneider's talk
913 %%% ==== Forecast: Cloudy ====
915 %%% <[block]{Possible involvement with OpenStack}
916 %%% * SMB storage service for Windows (and other) VMs
917 %%% * SMB3 storage backend for Hyper-V images
918 %%% * also: chances for AD-integration into auth
923 %% <[block]{especially but not exclusively}
925 %% * Stefan Metzmacher
928 %% * David Disseldorp
929 %% * Andreas Schneider
933 %%% ==== Conclusion ====[plain]
937 %%% <<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>
940 %%% ==== Conclusion ====
942 %%% <[block]{Remember}
943 %%% * Samba 4.X is quite different from 3.Y
946 %%% <[block]{What's coming?}
947 %%% * Performance: the story continues
948 %%% * Interop: strengthen strenths
949 %%% * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...)
950 %%% * Some clouds in the sky...
956 ==== Thanks for your attention! ====[plain]
977 <<<feet-sand-1280.png,height=.8\textheight>>>
978 %<<<samba-chilli-flavour-crop-bright-1280.jpg,height=.8\textheight>>>