% % colors: % _blue_text text_ % _red_text text_ % ==== Samba... ==== <[center] <<>> [center]> ==== Short History ==== * 1.9.17: 1996/08 * 2.0: 1999/01: domain-member, +SWAT * 2.2: 2001/04: NT4-DC * 3.0: 2003/09: AD-member, Samba4 project started * 3.2: 2008/07: GPLv3, experimental clustering * 3.3: 2009/01: clustering * 3.4: 2009/07: merged S3+S4 code * 3.5: 2010/03: experimental SMB 2.0 * 3.6: 2011/09: SMB 2.0 * 4.0: 2012/12: AD/DC, SMB 2.0 durable handles, 2.1, 3.0 * 4.1: 2013/10: stability * 4.2: soon: AD trusts, performance, scalability, CTDB included ==== Release Stream ==== <[center] <<>> [center]> ==== Release Planning ==== <[center] \large @https://wiki.samba.org/index.php/Samba\_Release\_Planning@ [center]> ==== Samba Team ==== <[center] <<>> [center]> ==== Samba Team ==== <[center] <<>> [center]> ==== ====[plain] %%\transdissolve <[center] <[columns] [[[.3\textwidth]]] <<>> [[[.3\textwidth]]] <<>> [[[.3\textwidth]]] <<>> [columns]> [center]> ==== Samba File Server Topics / Challenges ==== # performance: scalable file server #* scale-up: exhaust powerful boxes #* scale-out: flexible all-active clusters #* scale-down: perform well on low-end boxes # interop: multi-protocol access (nfs, afp, ...) # server workloads / SMB features #* tune for: small \# of connections, threaded applications #* Hyper-V, ... #* SMB3 (clustering, RDMA, ...) # special file systems support (gluster, ceph, gpfs, btrfs, ...) # cloud / openstack?... %* (samba $\leftrightarrow$ cifs.ko alternative to nfs?...) %% ==== Samba File Serving Topics ==== %% %% * Performance %% * Clustering (CTDB) %% * SMB features (SMB3...) %% * Interop (protocols, NFS, AFP, ...) %% * special file systems support (gluster, ceph, gpfs, btrfs...) %% * ... %%==== Other Samba Topics ==== %% %%* Auth/Domain Member %%* RPC server %%* AD Sever %%* ... ==== Performance ====[plain] %%\transdissolve <<>> ==== Performance - low end systems ==== <[block]{Reduction of CPU usage for low profile platforms like arm (SMB2)} * Samba 4.0: ** didn't saturate 1G nic (arm), CPU 100\% * reduced memory allocations * instrument SMB 2.1 multi-credit / large MTU * Samba 4.2: ** saturates 1G nic (arm), CPU $<$ 100\% * ==> continuing [block]> ==== Performance - DB performance ==== <[block]{TDB} * trivial database * used for IPC (smbd processes) * cluster (CTDB): local copies [block]> <[block]{hot databases} * @locking.tdb@ (open files) * @brlock.tdb@ (byte range locks) * @notify\_index.tdb@ (for change notify) [block]> ==== Performance - DB performance ==== <[block]{problem 1} * fcntl byte range locks for record locks * contention via single kernel spinlock [block]> <[block]{solution} * alternative to fcntl: pthread robust mutexes * ==> massive speedup * ==> included in TDB 1.3.1, Samba 4.2 [block]> ==== Performance - DB performance ==== <[block]{problem 2} * freelist: ** single chain, contended (@locking.tdb@) ** gets fragmented (singly linked) * especially a problem in ctdb-cluster: vacuuming [block]> <[block]{improvements} * make use of small per-record freelists (dead records) * add automatic defragmentation upon traversal * ==> included in TDB 1.3.1, Samba 4.2 [block]> ==== Performance - DB performance ==== <[block]{problem 3} * change notify not scalable [block]> <[block]{first improvement} * restructured @notify.tdb@ to ** global @notify\_index.tdb@ and ** local @notify.tdb@ ** ==> better but still not good enough for some workloads [block]> <[block]{next steps} * replace DB-approach by new scalable, async notify daemon using messaging * some false positives do not harm * ==> TODO [block]> ==== Performance - scaling ==== <[block]{parellelism} * samba is multi-process: ** smbd child process $\leftrightarrow$ TCP connection ** event-loop in one process * within a smbd process: ** pthread-pool jobs for potentially blocking syscalls ** ==> parallelism for reads/writes ** default for async I/O since Samba 4.0 [block]> ==== Performance - scaling ==== <[block]{messaging} * classical messaging: ** messages.tdb and signals between processes ** does not scale well * new massaging in Samba 4.2: ** fast and scalable messaging based on unix datagram messages ** ==> WIP: integrate with AD/DC messaging ** ==> features fd-passing for sockets (SMB3 multi-channel) ** ==> TODO: integrate into CTDB inter-node-messaging [block]> ==== Interop ====[plain] %\transdissolve <[center] <<>> [center]> ==== Interop-Central ==== <[block]{multi-protocol access} * nfs (kernel, ganesha, ...) * afp: netatalk * local access * SMB2+ unix-extensions [block]> ==== File Server Layout/Scope ==== <[center] <<>> [center]> ==== Interop - Fruit ==== <[columns] [[[.9\textwidth]]] * MacOS 10.9: SMB 2.1 preferred file protocol * @vfs\_fruit@ - new module in Samba 4.2 [[[.05\textwidth]]] [columns]> <[columns] [[[.55\textwidth]]] * spotlight ** indexed search ** dcerpc service ** ==> under review * AAPL ** SMB2 create context ** speed up directory listings ** ==> under review [[[.4\textwidth]]] <<>> [columns]> ==== ====[plain] <[center] \Large Fruit Demo [center]> ==== SMB features ====[plain] %\transdissolve <[center] <[columns] [[[.6\textwidth]]] [[[.3\textwidth]]] <<>> [columns]> [center]> ==== SMB features in Samba - SMB2 ==== <[center] <[columns] [[[.7\textwidth]]] * SMB 2.0 (Vista / 2008): ** durable file handles [4.0] * SMB 2.1 (Win7 / 2008R2): ** multi-credit / large mtu [4.0] ** dynamic reauthentication [4.0] ** leasing [WIP++] ** resilient file handles [WIP-tracer] [[[.3\textwidth]]] <<>> [columns]> [center]> ==== SMB features in Samba - SMB3 ==== <[center] <[columns] [[[.7\textwidth]]] * SMB 3.0 (Win8 / 2012): ** new crypto (sign/encrypt) [4.0] ** secure negotiation [4.0] ** durable handles v2 [4.0] ** persistent file handles [WIP.tracer] ** multi-channel [WIP+] ** SMB direct [designed/starting] ** cluster features [designing] *** witness [WIP] ** storage features [WIP] * SMB 3.02 (Win8.1 / 2012R2): [WIP] * SMB 3.1 (Win10 / 2014): [ess.DONE] [[[.3\textwidth]]] <<>> [columns]> [center]> %%==== ====[plain] %% %%old %% % %%==== Clusterd Samba / CTDB (SOFS since 2007) ==== %% %%<[center] %%<<>> %%[center]> %%% === SMB 3.0 ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * new crypto (signing, transport encryption) %%% * persistent file handles %%% * multi-channel %%% * RDMA transport (SMB direct) %%% * storage features %%% * clustering %%% ** witness %%% ** transparent failover (continuous availability) %%% ** all-active (scale-out) %%% } %%% %%% ==== SMB3 - Goals ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * fault tolerance / reliability %%% * performance / throughput / scaling %%% * focus on support for server workloads \\ % %%% (as opposed to workstation workloads) %%% * especially support for: %%% ** Hyper-V %%% ** MS-SQL %%% * goals: %%% ** replace block storage in data center %%% ** block (SCSI) over SMB %%% } %%% %%% ==== Requirements for Hyper-V ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * minimum requirements: %%% ** SMB 3.0 %%% ** is that really all??? - maybe resilient file handles.. %%% } %%% +<3->{ %%% * desired features: %%% ** cluster ($\ge 2$ nodes) %%% ** CA / persistent handles %%% ** RDMA / SMB direct %%% ** multi channel %%% } %%% ==== SMB Protocol in Samba ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * Samba $<$ 3.5: %%% ** SMB 1 %%% * Samba 3.5: %%% ** experimental incomplete support for SMB 2.0 %%% * Samba 3.6: %%% ** official support for SMB 2.0 %%% ** missing: durable handles %%% ** default server max proto: SMB 1 %%% * Samba 4.0: %%% ** SMB 2.0: complete with durable handles %%% ** SMB 2.1: basis, multi-credit, dynamic reauthentication %%% ** SMB 3.0: basis, crypto, secure negotiation, durable v2 %%% ** default server max proto: SMB 3.0 %%% * Samba 4.1 %%% ** SMB 3.02: basic %%% } %%% ==== ==== [plain] %%% <[center] %%% {\Large %%% Technical Details... %%% } %%% [center]> %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <<>> %%% %%% %%% ==== Multi-Channel - Windows/Protocol ==== * find interfaces with interface discovery: \\ % @FSCTL\_QUERY\_NETWORK\_INTERFACE\_INFO@ * bind additional TCP (or RDMA) connection (channel) to established SMB3 session (session bind) * windows: uses connections of same (and best quality) * windows: binds only to a single node * replay / retry mechanisms, epoch numbers ==== Multi-Channel - Samba ==== * samba/smbd: multi-process ** process $\Leftrightarrow$ tcp connection ** ==> transfer new connection to existing smbd ** use fd-passing (sendmsg/recvmsg) * preparation: messaging rewrite using unix dgm sockets with sendmsg [DONE,4.2] * add fd-passing [DONE,4.2] * transfer connection already in negprot (ClientGUID) [ess.DONE] * implement channel epoch numbers [WIP] * implement interface discovery [WIP] ==== Multi-Channel - Samba ==== <[center] <<>> [center]> ==== ====[plain] <[center] \Large Multi-Channel Demo [center]> ==== SMB Direct (RDMA) ==== * windows: ** requires multi-channel ** start with TCP, bind an RDMA channel ** reads and writes use RDMB write/read ** protocol/metadata via send/receive * wireshark dissector: [DONE] * samba (TODO): ** prereq: multi-channel / fd-passing ** buffer / transport abstractions [TODO] ** _red_problem_: libraries: not fork safe and no fd-passing \\ % ==> central daemon (or kernel module) to serve as RDMA "proxy" ==== SMB Direct (RDMA) - Plan ==== <[center] <<>> [center]> %%%==== SMB Direct (RDMA) - Plan ==== %%% %%%+<2->{ %%%* smbd-d (rdma proxy daemon) %%%** listens on unix domain socket (@/var/lib/smbd-d/socket@) %%%** listens for RDMA connection (as told by main smbd) %%%* main smbd: %%%** listens for TCP connections %%%** connects to smbd-d-socket %%%*** request rdma-interfaces, tell smbd-d on which to listen %%%** "accepts" new smb-direct connections on smdb-d-socket %%%} %%% %%%==== SMB Direct (RDMA) - Plan ==== %%% %%%+<2->{ %%%* client %%%** connects via TCP --> smbd forks child smbd (c) %%%** connects via RDMA to smbd-d %%%* smbd-d %%%** creates socket-pair as rdma-proxy-channel %%%** passes one end of socket-pair to main smbd for accept %%%** sends smb direct packages over proxy-channel %%%* main smbd %%%** upon receiving NegProt: pass proxy-socket to c based on ClientGUID %%%* c %%%** continues proxy-communication with smdb-d %%%} %%%+<3->{ %%%* For @rdma\_read@ and @rdma\_write@: %%%** c and smbd-d establish shared memory area %%%} %%% ==== Persistent Handles ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * like durable file handles with strong guarantees %%% * framework is already there in samba (by support for durable v2) %%% ** ==> easy to satisfy at the protocol level %%% } %%% +<3->{ %%% * the difficulty lies in implementing the guarantees %%% ** need make metadata persistent %%% ** but don't kill performance! %%% ** persistent tdbs !would! kill performance %%% ** ideas: %%% *** need to be sync %%% *** record-level transactions (instead of db-level) %%% *** only replicate to some nodes, not all %%% } %%==== Clustering Concepts (Windows) ==== %% %%\transdissolve %% %%+<2->{ %%* Cluster: %%** (``traditional'') failover cluster (active-passive) %%** protocol: @SMB2\_SHARE\_CAP\_CLUSTER@ %%** Windows: %%*** runs off a cluster (failover) volume %%*** offers the Witness service %%} %%+<3->{ %%* Scale-Out (SOFS): %%** scale-out cluster (all-active!) %%** protocol: @SMB2\_SHARE\_CAP\_SCALEOUT@ %%** no client caching %%** Windows: runs off a cluster shared volume (implies cluster) %%} %%+<4->{ %%* Continuous Availability (CA): %%** transparent failover, persistent handles %%** protocol: @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@ %%** can independently turned on on any cluster share (failover or scale-out) %%** ==> changed client retry behaviour! %%} %%% ==== Clustering -- Controlling Flags from Windows ==== %%% %%% \transdissolve %%% %%% +<2->{ %%% * a share on a cluster carries %%% ** @SMB2\_SHARE\_CAP\_CLUSTER@ $\Leftrightarrow$ the shared FS is a cluster volume. %%% } %%% +<3->{ %%% * a share on a cluster carries %%% ** @SMB2\_SHARE\_CAP\_SCALEOUT@ $\Leftrightarrow$ the shared FS is a CSV %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@ %%% } %%% +<4->{ %%% * independently settable on a clustered share: %%% ** @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@ %%% *** implies @SMB2\_SHARE\_CAP\_CLUSTER@ %%% } %%% %%==== Clustering -- Server Behaviour ==== %% %%\transdissolve %% %%+<2->{ %%* @SMB2\_SHARE\_CAP\_CLUSTER@: %%** run witness service (RPC) %%** client can register and get notified about resource changes %%} %%+<3->{ %%* @SMB2\_SHARE\_CAP\_SCALEOUT@: %%** do not grant batch oplocks, write leases, handle leases %%** ==> no durable handles unless also CA %%} %%+<4->{ %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@: %%** offer persistent handles %%** timeout from durable v2 request %%} %% %%==== Clustering -- Client Behaviour (Win8) ==== %% %%\transdissolve %% %% %%+<2->{ %%* @SMB2\_SHARE\_CAP\_CLUSTER@: %%** clients happily work if witness is not available %%} %%+<3->{ %%* @SMB2\_SHARE\_CAP\_SCALEOUT@: %%** clients happily connect if @CLUSTER@ is not set. %%** clients DO request oplocks/leases/durable handles %%** clients are not confused if they get these %%} %%+<4->{ %%* @SMB2\_SHARE\_CAP\_CONTINUOUS\_AVAILABILITY@: %%** clients happily connect if @CLUSTER@ is not set. %%** clients typically request persistent handle with RWH lease %%} %%%+<5->{ %%%* Note:\\ % %%%Win8 sends @SMB2\_FLAGS\_REPLAY\_OPERATION@ in writes and reads (from 2nd in a row) \\ % %%%$\Leftrightarrow$ \\ % %%%The server announces @SMB2\_CAP\_PERSISTENT\_HANDLES@. %%%} %%% ==== Clustering -- Client Behaviour (Win8) : Retries ==== %%% %%% +<2->{ %%% * Test: Win8 against slightly pimped Samba (2 IPs) %%% } %%% +<3->{ %%% * Server-Matrix (on/off): %%% ** persistent handle cap %%% ** durable handles %%% ** cluster share cap %%% ** scale out cap %%% ** ca share cap %%% } %%% +<4->{ %%% * The test: %%% ** connect to share with explorer %%% ** start copying file (2G) %%% ** kill smbd %%% ** wait for the client to pop up an error dialog %%% ** click cancel %%% ** stop capture %%% } %%% %%% ==== Clustering -- Client Behaviour (Win8) : Retries ==== %%% %%% +<2->{ %%% * only two different retry characteristics: CA $\leftrightarrow$ non-CA %%% } %%% +<3->{ %%% * non-CA-case %%% ** 3 consecutive attempt rounds: %%% *** for each of the two IPs: \\ % %%% arp IP \\ % %%% three tcp syn attempts to IP with 0.5 sec breaks %%% ** ==> some 2.1 seconds for 1 round %%% ** between attempts: %%% ** dns, ping, arp ... 5.8 seconds %%% ** ==> _red_18 seconds_ %%% } %%% +<4->{ %%% * CA-Case %%% ** retries attempt rounds from above for _red_14 minutes_ %%% } %%% %%% %%% %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <[center] %%% <<>> %%% [center]> %%% %%% %%==== Clustering with Samba/CTDB ==== %% %%+<2->{ %%* all-active SMB-cluster with Samba and CTDB... \\ % %%+<3->{...since 2007! \smiley } %%} %%+<4->{ %%* transparent for the client %%** CTDB: %%*** metadata and messaging engine for Samba in a cluster %%*** plus cluster resource manager (IPs, services...) %%** client only sees one ``big'' SMB server %%** we could not change the client!... %%** works ``well enough'' %%} %%+<5->{ %%* challenge: %%** how to integrate SMB3 clustering with Samba/CTDB %%** good: rather orthogonal %%** ctdb-clustering transparent mostly due to management %%} %% %%==== Witness Service ==== %% %%+<2->{ %%* an RPC service %%** monitoring of availability of resources (shares, NICs) %%** server asks client to move to another resource %%} %%+<3->{ %%* remember: %%** available on a Windows SMB3 share $\Leftrightarrow$ @SMB2\_SHARE\_CAP\_CLUSTER@ %%** but clients happily connect w/o witness %%} %%+<4->{ %%* status in Samba [WIP (Metze, Gregor Beck)]: %%** async RPC: WIP, good progress ($\Rightarrow$ Metze's talk) %%** wireshark dissector: essentially done %%** client: in @rpcclient@ - done %%** server: dummy PoC / tracer bullet implementation done %%** CTDB: changes / integration needed %%} %%% ==== ====[plain] %%% %%% <[center] %%% {\Large %%% !@https://wiki.samba.org/index.php/SMB3@! %%% } %%% [center]> %%% %%% ==== ====[plain] %%% %%% \transdissolve %%% %%% <[center] %%% <[columns] %%% [[[.6\textwidth]]] %%% %%% [[[.3\textwidth]]] %%% <<>> %%% [columns]> %%% [center]> %%% ==== SMB features in Samba ==== <[center] \Large @https://wiki.samba.org/index.php/Samba3/SMB3@ [center]> ==== Misc ====[plain] %\transdissolve <[center] <<>> [center]> ==== Misc ==== <[block]{File Systems} * gpfs, gluster, ceph, btrfs... * support through vfs modules * fuse-based: avoid context switches * instrument SMB3 storage features (fsctls) [block]> ==== Misc ==== %%<[block]{Under the hood} %%* restructurings, reconsilations %%* ctdb moved into samba tree %%* published libs: talloc, tdb, tevent ... %%[block]> <[block]{Testing} * unprivileged selftest, autobuild * selfcontained testing: wrapper ** socket wrapper ** nss wrapper ** uid wrapper ** resolv wrapper [_red_new_] * externalized as separate projects: ** ==> @http://cwrap.org/@ ** git on samba.org ** ==> Andreas Schneider's talk [block]> ==== Forecast: Cloudy ==== <[block]{Possible involvement with OpenStack} * SMB storage service for Windows (and other) VMs * SMB3 storage backend for Hyper-V images * also: chances for AD-integration into auth [block]> ==== Credits ==== <[block]{especially but not exclusively} * Volker Lendecke * Stefan Metzmacher * Ralph Böhme * Jeremy Allison * David Disseldorp * Andreas Schneider [block]> ==== Conclusion ====[plain] %%\transdissolve <<>> ==== Conclusion ==== <[block]{Remember} * Samba 4.X is quite different from 3.Y [block]> <[block]{What's coming?} * Performance: the story continues * Interop: strengthen strenths * SMB(3) features: a lot to come ( ==> cluster, hyper-v, ...) * Some clouds in the sky... [block]> ==== Thanks for your attention! ====[plain] %\transdissolve <[center] <[columns] [[[.6\textwidth]]] {\Large Questions? --*4em-- @obnox\@samba.org@ @madam\@redhat.com@ } [[[.3\textwidth]]] <<>> [columns]> [center]>