traffic_replay: Write group memberships once per group
authorTim Beale <timbeale@catalyst.net.nz>
Wed, 31 Oct 2018 20:42:33 +0000 (09:42 +1300)
committerTim Beale <timbeale@samba.org>
Sun, 4 Nov 2018 22:55:16 +0000 (23:55 +0100)
Each user-group membership was being written to the DB in a single
operation. With large numbers of users (e.g. 10,000 in average 15 groups
each), this becomes a lot of operations (e.g. 150,000). This patch
reworks the code so that we write the memberships for a group in
one operation. E.g. instead of 150,000 DB operations, we might make
1,500. This makes writing the group memberships several times
faster.

Note that rthere is a performance vs memory tradeoff. When we hit
10,000+ members in a group, memory-usage in the underlying DB modify
operation becomes very inefficient/costly. So we avoid potential memory
usage problems by writing no more than 1,000 users to a group at once.

Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
python/samba/emulate/traffic.py

index 0087b03..ab23652 100644 (file)
@@ -1944,24 +1944,41 @@ class GroupAssignments(object):
 def add_users_to_groups(db, instance_id, assignments):
     """Takes the assignments of users to groups and applies them to the DB."""
 
+    for group in assignments.get_groups():
+        users_in_group = assignments.users_in_group(group)
+        if len(users_in_group) == 0:
+            continue
+
+        # Split up the users into chunks, so we write no more than 1K at a
+        # time. (Minimizing the DB modifies is more efficient, but writing
+        # 10K+ users to a single group becomes inefficient memory-wise)
+        for chunk in range(0, len(users_in_group), 1000):
+            chunk_of_users = users_in_group[chunk:chunk + 1000]
+            add_group_members(db, instance_id, group, chunk_of_users)
+
+
+def add_group_members(db, instance_id, group, users_in_group):
+    """Adds the given users to group specified."""
+
+    start = time.time()
     ou = ou_name(db, instance_id)
 
     def build_dn(name):
         return("cn=%s,%s" % (name, ou))
 
-    for group in assignments.get_groups():
-        for user in assignments.users_in_group(group):
-            user_dn  = build_dn(user_name(instance_id, user))
-            group_dn = build_dn(group_name(instance_id, group))
-
-            m = ldb.Message()
-            m.dn = ldb.Dn(db, group_dn)
-            m["member"] = ldb.MessageElement(user_dn, ldb.FLAG_MOD_ADD, "member")
-            start = time.time()
-            db.modify(m)
-            end = time.time()
-            duration = end - start
-            LOGGER.info("%f\t0\tadd\tuser\t%f\tTrue\t" % (end, duration))
+    group_dn = build_dn(group_name(instance_id, group))
+    m = ldb.Message()
+    m.dn = ldb.Dn(db, group_dn)
+
+    for user in users_in_group:
+        user_dn = build_dn(user_name(instance_id, user))
+        idx = "member-" + str(user)
+        m[idx] = ldb.MessageElement(user_dn, ldb.FLAG_MOD_ADD, "member")
+
+    db.modify(m)
+    end = time.time()
+    duration = end - start
+    LOGGER.info("%f\t0\tadd\tuser(s)\t%f\tTrue\t" % (end, duration))
 
 
 def generate_stats(statsdir, timing_file):