% % colors: % _blue_text text_ % _red_text text_ % %%% ==== Samba... ==== %%% %%% <[center] %%% <<>> %%% [center]> %%% %%% %%% ==== Short History ==== %%% %%% * 1.9.17: 1996/08 %%% * 2.0: 1999/01: domain-member, +SWAT %%% * 2.2: 2001/04: NT4-DC %%% * 3.0: 2003/09: AD-member, Samba4 project started %%% * 3.2: 2008/07: GPLv3, experimental clustering %%% * 3.3: 2009/01: clustering %%% * 3.4: 2009/07: merged S3+S4 code %%% * 3.5: 2010/03: experimental SMB 2.0 %%% * 3.6: 2011/09: SMB 2.0 %%% * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0 %%% * 4.1: 2013/10: stability %%% * 4.2: 2015/03: AD trusts, leases, performance, scalability, CTDB %%% %%%CTDB included %%% ==== Release Stream ==== %%% %%% %%% <[center] %%% <<>> %%% [center]> %%% ==== Release Planning ==== %%% %%% <[center] %%% \large %%% @https://wiki.samba.org/index.php/Samba\_Release\_Planning@ %%% [center]> %%% ==== Samba Team ==== %%% %%% <[center] %%% <<>> %%% [center]> %%% %%% ==== Samba Team ==== %%% %%% <[center] %%% <<>> %%% [center]> %%% ==== ====[plain] %%% %%% %%\transdissolve %%% %%% <[center] %%% <[columns] %%% [[[.3\textwidth]]] %%% <<>> %%% [[[.3\textwidth]]] %%% <<>> %%% [[[.3\textwidth]]] %%% <<>> %%% [columns]> %%% [center]> %% ==== Samba File Server Topics / Challenges ==== %% %% # performance: scalable file server %% #* scale-up: exhaust powerful boxes %% #* scale-out: flexible all-active clusters %% #* scale-down: perform well on low-end boxes %% # interop: multi-protocol access (nfs, afp, ...) %% # server workloads / SMB features %% #* tune for: small \# of connections, threaded applications %% #* Hyper-V, ... %% #* SMB3 (clustering, RDMA, ...) %% # special file systems support (gluster, ceph, gpfs, btrfs, ...) %% # cloud / openstack?... %% %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...) %% ==== Samba File Serving Topics ==== %% %% * Performance %% * Clustering (CTDB) %% * SMB features (SMB3...) %% * Interop (protocols, NFS, AFP, ...) %% * special file systems support (gluster, ceph, gpfs, btrfs...) %% * ... %%==== Other Samba Topics ==== %% %%* Auth/Domain Member %%* RPC server %%* AD Sever %%* ... %%% ==== Performance ====[plain] %%% %%% %%\transdissolve %%% %%% <<>> %%% %%% %%% ==== Performance - low end systems ==== %%% %%% %%% <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)} %%% * Samba 4.0: %%% ** didn't saturate 1G nic (arm), CPU 100\% %%% * reduced memory allocations %%% * instrument SMB 2.1 multi-credit / large MTU %%% * Samba 4.2: %%% ** saturates 1G nic (arm), CPU $<$ 100\% %%% * ==> continuing %%% [block]> %%% %%% ==== Performance - DB performance ==== %%% %%% <[block]{TDB} %%% * trivial database %%% * used for IPC (smbd processes) %%% * cluster (CTDB): local copies %%% [block]> %%% %%% <[block]{hot databases} %%% * @locking.tdb@ (open files) %%% * @brlock.tdb@ (byte range locks) %%% * @notify\_index.tdb@ (for change notify) %%% [block]> %%% %%% ==== Performance - DB performance ==== %%% %%% <[block]{problem 1} %%% * fcntl byte range locks for record locks %%% * contention via single kernel spinlock %%% [block]> %%% %%% <[block]{solution} %%% * alternative to fcntl: pthread robust mutexes %%% * ==> massive speedup %%% * ==> included in TDB 1.3.1, Samba 4.2 %%% [block]> %%% %%% ==== Performance - DB performance ==== %%% %%% <[block]{problem 2} %%% * freelist: %%% ** single chain, contended (@locking.tdb@) %%% ** gets fragmented (singly linked) %%% * especially a problem in ctdb-cluster: vacuuming %%% [block]> %%% %%% <[block]{improvements} %%% * make use of small per-record freelists (dead records) %%% * add automatic defragmentation upon traversal %%% * ==> included in TDB 1.3.1, Samba 4.2 %%% [block]> %%% %%% ==== Performance - DB performance ==== %%% <[block]{problem 3} %%% * change notify not scalable %%% [block]> %%% %%% <[block]{first improvement} %%% * restructured @notify.tdb@ to %%% ** global @notify\_index.tdb@ and %%% ** local @notify.tdb@ %%% ** ==> better but still not good enough for some workloads %%% [block]> %%% %%% <[block]{next steps} %%% * replace DB-approach by new scalable, async notify daemon using messaging %%% * some false positives do not harm %%% * ==> TODO %%% [block]> %%% %%% %%% ==== Performance - scaling ==== %%% %%% <[block]{parellelism} %%% * samba is multi-process: %%% ** smbd child process $\leftrightarrow$ TCP connection %%% ** event-loop in one process %%% * within a smbd process: %%% ** pthread-pool jobs for potentially blocking syscalls %%% ** ==> parallelism for reads/writes %%% ** default for async I/O since Samba 4.0 %%% [block]> %%% %%% ==== Performance - scaling ==== %%% %%% <[block]{messaging} %%% * classical messaging: %%% ** messages.tdb and signals between processes %%% ** does not scale well %%% * new massaging in Samba 4.2: %%% ** fast and scalable messaging based on unix datagram messages %%% ** ==> WIP: integrate with AD/DC messaging %%% ** ==> features fd-passing for sockets (SMB3 multi-channel) %%% ** ==> TODO: integrate into CTDB inter-node-messaging %%% [block]> %%% ==== Interop ====[plain] %%% %%% %\transdissolve %%% %%% <[center] %%% <<>> %%% [center]> %%% %%% %%% ==== Interop-Central ==== %%% %%% <[block]{multi-protocol access} %%% * nfs (kernel, ganesha, ...) %%% * afp: netatalk %%% * local access %%% * SMB2+ unix-extensions %%% [block]> %%% ==== File Server Layout/Scope ==== %%% %%% <[center] %%% <<>> %%% [center]> %%% ==== Interop - Fruit ==== %%% %%% %%% <[columns] %%% [[[.9\textwidth]]] %%% * MacOS 10.9: SMB 2.1 preferred file protocol %%% * @vfs\_fruit@ - new module in Samba 4.2 %%% [[[.05\textwidth]]] %%% [columns]> %%% %%% <[columns] %%% [[[.55\textwidth]]] %%% %%% * spotlight %%% ** indexed search %%% ** dcerpc service %%% ** ==> under review %%% * AAPL %%% ** SMB2 create context %%% ** speed up directory listings %%% ** ==> under review %%% %%% [[[.4\textwidth]]] %%% <<>> %%% [columns]> %%% %%% ==== ====[plain] %%% %%% <[center] %%% \Large %%% Fruit Demo %%% [center]> %% ==== SMB features ====[plain] %% %% %\transdissolve %% %% <[center] %% <[columns] %% [[[.6\textwidth]]] %% %% [[[.3\textwidth]]] %% <<>> %% [columns]> %% [center]> %% ==== SMB features in Samba - SMB2 ==== %% %% %% <[center] %% <[columns] %% [[[.7\textwidth]]] %% %% * SMB 2.0 (Vista / 2008): %% ** durable file handles [4.0] %% * SMB 2.1 (Win7 / 2008R2): %% ** multi-credit / large mtu [4.0] %% ** dynamic reauthentication [4.0] %% ** leasing [WIP++] %% ** resilient file handles [WIP-tracer] %% %% [[[.3\textwidth]]] %% <<>> %% [columns]> %% [center]> ==== SMB3 features in Samba ==== <[center] <[columns] [[[.7\textwidth]]] # SMB 3.0 (Win8 / 2012): #* new crypto (sign/encrypt) [4.0] #* secure negotiation [4.0] #* durable handles v2 [4.0] #* persistent file handles [WIP/tracer] #* '''_red_Multi-Channel_''' [WIP+] #* SMB direct [designing/starting] #* cluster features [designing] #** witness [WIP+] #* storage features [WIP] # SMB 3.0.2 (Win8.1 / 2012R2): [master] # SMB 3.1.1 (Win10 / 2014): #* negotiate contexts, preauth: [master] [[[.3\textwidth]]] <<>> [columns]> [center]> %%==== ====[plain] %% %%old %% % %%==== Clusterd Samba / CTDB (SOFS since 2007) ==== %% %%<[center] %%<<>> %%[center]> %%% === SMB 3.0 ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * new crypto (signing, transport encryption) %%% * persistent file handles %%% * multi-channel %%% * RDMA transport (SMB direct) %%% * storage features %%% * clustering %%% ** witness %%% ** transparent failover (continuous availability) %%% ** all-active (scale-out) %%% } %%% %%% ==== SMB3 - Goals ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * fault tolerance / reliability %%% * performance / throughput / scaling %%% * focus on support for server workloads \\ % %%% (as opposed to workstation workloads) %%% * especially support for: %%% ** Hyper-V %%% ** MS-SQL %%% * goals: %%% ** replace block storage in data center %%% ** block (SCSI) over SMB %%% } %%% %%% ==== Requirements for Hyper-V ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * minimum requirements: %%% ** SMB 3.0 %%% ** is that really all??? - maybe resilient file handles.. %%% } %%% +<3->{ %%% * desired features: %%% ** cluster ($\ge 2$ nodes) %%% ** CA / persistent handles %%% ** RDMA / SMB direct %%% ** multi channel %%% } %%% ==== SMB Protocol in Samba ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * Samba $<$ 3.5: %%% ** SMB 1 %%% * Samba 3.5: %%% ** experimental incomplete support for SMB 2.0 %%% * Samba 3.6: %%% ** official support for SMB 2.0 %%% ** missing: durable handles %%% ** default server max proto: SMB 1 %%% * Samba 4.0: %%% ** SMB 2.0: complete with durable handles %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2 %%% ** default server max proto: SMB 3.0 %%% * Samba 4.1 %%% ** SMB 3.02: basic %%% } %%% ==== ==== [plain] %%% <[center] %%% {\Large %%% Technical Details... %%% } %%% [center]> %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <<>> %%% %%% %%% ==== Multi-Channel - General ==== * bind multiple transport connections to one session * increase throughput and fault tolerance ==== Multi-Channel - Windows/Protocol ==== # establish initial session on TCP connection # find interfaces with interface discovery: \\ % @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@ # bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind) # windows: uses connections of same (and best quality) # windows: binds only to a single node # replay / retry mechanisms, epoch numbers ==== Multi-Channel - Samba ==== <[block]{samba/smbd: multi-process} * '''Currently:''' process $\Leftrightarrow$ TCP connection * '''Idea:''' transfer new TPC connection to existing smbd * '''How?''' ==> use fd-passing (sendmsg/recvmsg) * '''When?''' as early as possible, based on client GUID \\ % ==> per client GUID single process model [block]> ==== Multi-Channel - Samba ==== <[center] <<>> [center]> ==== Multi-Channel - Samba ==== # preparation: \\ % messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2] # add fd-passing to messaging [DONE,4.2] # preparations in internal structures [ess.DONE] # implement smbd message to pass a tcp connection [ess.DONE] # transfer connection already in negprot (ClientGUID) [largely DONE] # implement session bind [ess.DONE] # implement channel epoch numbers [WIP] # implement interface discovery [WIP] # implement test case [WIP] [frame]> <[sambabg] ==== ====[plain] <[center] \Large Multi-Channel Demo [center]> ==== ====[plain] <[center] \Large Outlook: SMB Direct [center]> [frame]> [sambabg]> ==== SMB Direct (RDMA) ==== * windows: ** requires multi-channel ** start with TCP, bind an RDMA channel ** reads and writes use RDMB write/read ** protocol/metadata via send/receive * wireshark dissector: [DONE] * samba (TODO): ** prereq: multi-channel / fd-passing ** buffer / transport abstractions [TODO] ** _red_problem_: libraries: not fork safe and no fd-passing \\ % ==> central daemon (or kernel module) to serve as RDMA "proxy" ==== SMB Direct (RDMA) - Plan ==== <[center] <<>> [center]> %%%==== SMB Direct (RDMA) - Plan ==== %%% %%%+<2->{ %%%* smbd-d (rdma proxy daemon) %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@) %%%** listens for RDMA connection (as told by main smbd) %%%* main smbd: %%%** listens for TCP connections %%%** connects to smbd-d-socket %%%*** request rdma-interfaces, tell smbd-d on which to listen %%%** "accepts" new smb-direct connections on smdb-d-socket %%%} %%% %%%==== SMB Direct (RDMA) - Plan ==== %%% %%%+<2->{ %%%* client %%%** connects via TCP --> smbd forks child smbd (c) %%%** connects via RDMA to smbd-d %%%* smbd-d %%%** creates socket-pair as rdma-proxy-channel %%%** passes one end of socket-pair to main smbd for accept %%%** sends smb direct packages over proxy-channel %%%* main smbd %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID %%%* c %%%** continues proxy-communication with smdb-d %%%} %%%+<3->{ %%%* For @rdma\_read@ and @rdma\_write@: %%%** c and smbd-d establish shared memory area %%%} %%% ==== Persistent Handles ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * like durable file handles with strong guarantees %%% * framework is already there in samba (by support for durable v2) %%% ** ==> easy to satisfy at the protocol level %%% } %%% +<3->{ %%% * the difficulty lies in implementing the guarantees %%% ** need make metadata persistent %%% ** but don't kill performance! %%% ** persistent tdbs !would! kill performance %%% ** ideas: %%% *** need to be sync %%% *** record-level transactions (instead of db-level) %%% *** only replicate to some nodes, not all %%% } %%==== Clustering Concepts (Windows) ==== %% %%\transdissolve %% %%+<2->{ %%* Cluster: %%** (``traditional'') failover cluster (active-passive) %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@ %%** Windows: %%*** runs off a cluster (failover) volume %%*** offers the Witness service %%} %%+<3->{ %%* Scale-Out (SOFS): %%** scale-out cluster (all-active!) %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@ %%** no client caching %%** Windows: runs off a cluster shared volume (implies cluster) %%} %%+<4->{ %%* Continuous Availability (CA): %%** transparent failover, persistent handles %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@ %%** can independently turned on on any cluster share (failover or scale-out) %%** ==> changed client retry behaviour! %%} %%% ==== Clustering -- Controlling Flags from Windows ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * a share on a cluster carries %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume. %%% } %%% +<3->{ %%% * a share on a cluster carries %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@ %%% } %%% +<4->{ %%% * independently settable on a clustered share: %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@ %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@ %%% } %%% %%==== Clustering -- Server Behaviour ==== %% %%\transdissolve %% %%+<2->{ %%* @SMB2\_SHARE\_CAP\_CLUSTER@: %%** run witness service (RPC) %%** client can register and get notified about resource changes %%} %%+<3->{ %%* @SMB2\_SHARE\_CAP\_SCALEOUT@: %%** do not grant batch oplocks, write leases, handle leases %%** ==> no durable handles unless also CA %%} %%+<4->{ %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@: %%** offer persistent handles %%** timeout from durable v2 request %%} %% %%==== Clustering -- Client Behaviour (Win8) ==== %% %%\transdissolve %% %% %%+<2->{ %%* @SMB2\_SHARE\_CAP\_CLUSTER@: %%** clients happily work if witness is not available %%} %%+<3->{ %%* @SMB2\_SHARE\_CAP\_SCALEOUT@: %%** clients happily connect if @CLUSTER@ is not set. %%** clients DO request oplocks/leases/durable handles %%** clients are not confused if they get these %%} %%+<4->{ %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@: %%** clients happily connect if @CLUSTER@ is not set. %%** clients typically request persistent handle with RWH lease %%} %%%+<5->{ %%%* Note:\\ % %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ % %%%$\Leftrightarrow$ \\ % %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@. %%%} %%% ==== Clustering -- Client Behaviour (Win8) : Retries ==== %%% %%% +<2->{ %%% * Test: Win8 against slightly pimped Samba (2 IPs) %%% } %%% +<3->{ %%% * Server-Matrix (on/off): %%% ** persistent handle cap %%% ** durable handles %%% ** cluster share cap %%% ** scale out cap %%% ** ca share cap %%% } %%% +<4->{ %%% * The test: %%% ** connect to share with explorer %%% ** start copying file (2G) %%% ** kill smbd %%% ** wait for the client to pop up an error dialog %%% ** click cancel %%% ** stop capture %%% } %%% %%% ==== Clustering -- Client Behaviour (Win8) : Retries ==== %%% %%% +<2->{ %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA %%% } %%% +<3->{ %%% * non-CA-case %%% ** 3 consecutive attempt rounds: %%% *** for each of the two IPs: \\ % %%% arp IP \\ % %%% three tcp syn attempts to IP with 0.5 sec breaks %%% ** ==> some 2.1 seconds for 1 round %%% ** between attempts: %%% ** dns, ping, arp ... 5.8 seconds %%% ** ==> _red_18 seconds_ %%% } %%% +<4->{ %%% * CA-Case %%% ** retries attempt rounds from above for _red_14 minutes_ %%% } %%% %%% %%% %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <[center] %%% <<>> %%% [center]> %%% %%% %%==== Clustering with Samba/CTDB ==== %% %%+<2->{ %%* all-active SMB-cluster with Samba and CTDB... \\ % %%+<3->{...since 2007! \smiley } %%} %%+<4->{ %%* transparent for the client %%** CTDB: %%*** metadata and messaging engine for Samba in a cluster %%*** plus cluster resource manager (IPs, services...) %%** client only sees one ``big'' SMB server %%** we could not change the client!... %%** works ``well enough'' %%} %%+<5->{ %%* challenge: %%** how to integrate SMB3 clustering with Samba/CTDB %%** good: rather orthogonal %%** ctdb-clustering transparent mostly due to management %%} %% %%==== Witness Service ==== %% %%+<2->{ %%* an RPC service %%** monitoring of availability of resources (shares, NICs) %%** server asks client to move to another resource %%} %%+<3->{ %%* remember: %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@ %%** but clients happily connect w/o witness %%} %%+<4->{ %%* status in Samba [WIP (Metze, Gregor Beck)]: %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk) %%** wireshark dissector: essentially done %%** client: in @rpcclient@ - done %%** server: dummy PoC / tracer bullet implementation done %%** CTDB: changes / integration needed %%} %%% ==== ====[plain] %%% %%% <[center] %%% {\Large %%% !@https://wiki.samba.org/index.php/SMB3@! %%% } %%% [center]> %%% %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <[center] %%% <[columns] %%% [[[.6\textwidth]]] %%% %%% [[[.3\textwidth]]] %%% <<>> %%% [columns]> %%% [center]> %%% ==== SMB features in Samba ==== <[center] \Large @https://wiki.samba.org/index.php/Samba3/SMB3@ [center]> %%% ==== Misc ====[plain] %%% %%% %\transdissolve %%% %%% <[center] %%% <<>> %%% [center]> %%% ==== Misc ==== %%% %%% <[block]{File Systems} %%% * gpfs, gluster, ceph, btrfs... %%% * support through vfs modules %%% * fuse-based: avoid context switches %%% * instrument SMB3 storage features (fsctls) %%% [block]> %%% %%% ==== Misc ==== %%% %%% %%<[block]{Under the hood} %%% %%* restructurings, reconsilations %%% %%* ctdb moved into samba tree %%% %%* published libs: talloc, tdb, tevent ... %%% %%[block]> %%% %%% <[block]{Testing} %%% * unprivileged selftest, autobuild %%% * selfcontained testing: wrapper %%% ** socket wrapper %%% ** nss wrapper %%% ** uid wrapper %%% ** resolv wrapper [_red_new_] %%% * externalized as separate projects: %%% ** ==> @http://cwrap.org/@ %%% ** git on samba.org %%% ** ==> Andreas Schneider's talk %%% [block]> %%% ==== Forecast: Cloudy ==== %%% %%% <[block]{Possible involvement with OpenStack} %%% * SMB storage service for Windows (and other) VMs %%% * SMB3 storage backend for Hyper-V images %%% * also: chances for AD-integration into auth %%% [block]> %% ==== Credits ==== %% %% <[block]{especially but not exclusively} %% * Volker Lendecke %% * Stefan Metzmacher %% * Ralph Böhme %% * Jeremy Allison %% * David Disseldorp %% * Andreas Schneider %% [block]> %%% ==== Conclusion ====[plain] %%% %%% %%\transdissolve %%% %%% <<>> %%% %%% %%% ==== Conclusion ==== %%% %%% <[block]{Remember} %%% * Samba 4.X is quite different from 3.Y %%% [block]> %%% %%% <[block]{What's coming?} %%% * Performance: the story continues %%% * Interop: strengthen strenths %%% * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...) %%% * Some clouds in the sky... %%% [block]> [frame]> <[sambabg] ==== Thanks for your attention! ====[plain] %\transdissolve <[center] <[columns] [[[.6\textwidth]]] {\Large Questions? --*3em-- @obnox\@samba.org@ --*.5em-- @madam\@redhat.com@ } [[[.3\textwidth]]] <<>> %<<>> [columns]> [center]> [frame]> [sambabg]>