doc/README.wmem

   1 $Id$
   2
   3 1. Introduction
   4
   5 NB: Wmem still does not provide all of the functionality of emem
   6     (see README.malloc), although it should provide most of it. New code
   7     may still need to use emem for the time being.
   8
   9 The 'emem' memory manager (described in README.malloc) has been a part of
  10 Wireshark since 2005 and has served us well, but is starting to show its age.
  11 The framework has become increasingly difficult to maintain, and limitations
  12 in the API have blocked progress on other long-term goals such as multi-
  13 threading, and opening multiple files at once.
  14
  15 The 'wmem' memory manager is an attempt to write a new memory management
  16 framework to replace emem. It provides a significantly updated API, a more
  17 modular design, and it isn't all jammed into one 2500-line file.
  18
  19 Wmem was originally conceived in this email to the wireshark-dev mailing list:
  20 https://www.wireshark.org/lists/wireshark-dev/201210/msg00178.html
  21
  22 The wmem code can now be found in epan/wmem/ in the Wireshark source tree.
  23
  24 2. Usage for Consumers
  25
  26 If you're writing a dissector, or other "userspace" code, then using wmem
  27 should be very similar to using emem. All you need to do is include the header
  28 (epan/wmem/wmem.h) and get a handle to a memory pool (if you want to *create*
  29 a memory pool, see the section "3. Usage for Producers" below).
  30
  31 A memory pool is an opaque pointer to an object of type wmem_allocator_t, and
  32 it is the very first parameter passed to almost every call you make to wmem.
  33 Other than that parameter (and the fact that functions are prefixed wmem_
  34 instead of ep_ or se_) usage is exactly like that of emem. For example:
  35
  36     wmem_alloc(myPool, 20);
  37
  38 allocates 20 bytes in the pool pointed to by myPool.
  39
  40 2.1 Available Pools
  41
  42 2.1.1 (Sort Of) Global Pools
  43
  44 Dissectors that include the wmem header file will have three pools available
  45 to them automatically: wmem_packet_scope(), wmem_file_scope() and
  46 wmem_epan_scope();
  47
  48 The packet pool is scoped to the dissection of each packet, replacing
  49 emem's ep_ allocators. The file pool is scoped to the dissection of each file,
  50 replacing emem's se_ allocators. For example:
  51
  52     ep_malloc(32);
  53     se_malloc(sizeof(guint));
  54
  55 could be replaced with
  56
  57     wmem_alloc(wmem_packet_scope(), 32);
  58     wmem_alloc(wmem_file_scope(),   sizeof(guint));
  59
  60 NB: Using these pools outside of the appropriate scope (eg using the packet
  61     pool when there isn't a packet being dissected) will throw an assertion.
  62     See the comment in epan/wmem/wmem_scopes.c for details.
  63
  64 The epan pool is scoped to the library's lifetime - memory allocated in it is
  65 not freed until epan_cleanup() is called, which is typically at the end of the
  66 program.
  67
  68 2.1.2 Pinfo Pool
  69
  70 Certain places (such as AT_STRINGZ address allocations) need their memory to
  71 stay around a little longer than the usual packet scope - basically until the
  72 next packet is dissected. This is effectively the scope of Wireshark's pinfo
  73 structure, so the pinfo struct has a 'pool' member which is a wmem pool scoped
  74 to the lifetime of the pinfo struct.
  75
  76 2.2 API
  77
  78 Full documentation for each function (parameters, return values, behaviours)
  79 lives (or will live) in Doxygen-format in the header files for those functions.
  80 This is just an overview of which header files you should be looking at.
  81
  82 2.2.1 Core API
  83
  84 wmem_core.h
  85  - Basic memory management functions like malloc, realloc and free.
  86
  87 2.2.2 Strings
  88
  89 wmem_strutl.h
  90  - Utility functions for manipulating null-terminated C-style strings.
  91    Functions like strdup and strdup_printf.
  92
  93 wmem_strbuf.h
  94  - A managed string object implementation, similar to std::string in C++ or
  95    GString from Glib.
  96
  97 2.2.3 Container Data Structures
  98
  99 wmem_slist.h
 100  - A singly-linked list implementation.
 101
 102 wmem_stack.h
 103  - A stack implementation (push, pop, etc).
 104
 105 3. Usage for Producers
 106
 107 NB: If you're just writing a dissector, you probably don't need to read
 108     this section.
 109
 110 One of the problems with the old emem framework was that there were basically
 111 two allocator backends (glib and mmap) that were all mixed together in a mess
 112 of if statements, environment variables and #ifdefs. In wmem the different
 113 allocator backends are cleanly separated out, and it's up to the owner of the
 114 pool to pick one.
 115
 116 3.1 Available Allocator Back-Ends
 117
 118 Each available allocator type has a corresponding entry in the
 119 wmem_allocator_type_t enumeration defined in wmem_core.h.
 120
 121 The currently available allocators are:
 122  - WMEM_ALLOCATOR_SIMPLE (wmem_allocator_simple.*)
 123         A trivial allocator that g_allocs requested memory and tracks
 124         allocations via a GHashTable. As simple as possible, intended more as
 125         a demo than for practical usage. Also has the benefit of being friendly
 126         to tools like valgrind.
 127  - WMEM_ALLOCATOR_BLOCK (wmem_allocator_block.*)
 128         A block allocator that grabs large chunks of memory at a time
 129         (8 MB currently) and serves allocations out of those chunks.
 130         Designed for efficiency, especially in the free_all operation.
 131  - WMEM_ALLOCATOR_STRICT (wmem_allocator_strict.*)
 132         An allocator that does its best to find invalid memory usage via
 133         things like canaries and scrubbing freed memory. Valgrind is the
 134         better choice on platforms that support it.
 135
 136 3.2 Creating a Pool
 137
 138 To create a pool, include the regular wmem header and call the
 139 wmem_allocator_new() function with the appropriate type value.
 140 For example:
 141
 142     #include "wmem/wmem.h"
 143
 144     wmem_allocator_t *myPool;
 145     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 146
 147 From here on in, you don't need to remember which type of allocator you used
 148 (although allocator authors are welcome to expose additional allocator-specific
 149 helper functions in their headers). The "myPool" variable can be passed around
 150 and used as normal in allocation requests as described in section 2 of this
 151 document.
 152
 153 3.3 Destroying a Pool
 154
 155 Regardless of which allocator you used to create a pool, it can be destroyed
 156 with a call to the function wmem_destroy_allocator(). For example:
 157
 158     #include "wmem/wmem.h"
 159
 160     wmem_allocator_t *myPool;
 161
 162     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 163
 164     /* Allocate some memory in myPool ... */
 165
 166     wmem_destroy_allocator(myPool);
 167
 168 Destroying a pool will free all the memory allocated in it.
 169
 170 3.4 Reusing a Pool
 171
 172 It is possible to free all the memory in a pool without destroying it,
 173 allowing it to be reused later. Depending on the type of allocator, doing this
 174 (by calling wmem_free_all()) can be significantly cheaper than fully destroying
 175 and recreating the pool. This method is therefore recommended, especially when
 176 the pool would otherwise be scoped to a single iteration of a loop. For example:
 177
 178     #include "wmem/wmem.h"
 179
 180     wmem_allocator_t *myPool;
 181
 182     myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
 183     for (...) {
 184
 185         /* Allocate some memory in myPool ... */
 186
 187         /* Free the memory, faster than destroying and recreating
 188            the pool each time through the loop. */
 189         wmem_free_all(myPool);
 190     }
 191     wmem_destroy_allocator(myPool);
 192
 193 4. Internal Design
 194
 195 Despite being written in Wireshark's standard C90, wmem follows a fairly
 196 object-oriented design pattern. Although efficiency is always a concern, the
 197 primary goals in writing wmem were maintainability and preventing memory
 198 leaks.
 199
 200 4.1 struct _wmem_allocator_t
 201
 202 The heart of wmem is the _wmem_allocator_t structure defined in the
 203 wmem_allocator.h header file. This structure uses C function pointers to
 204 implement a common object-oriented design pattern known as an interface (also
 205 known as an abstract class to those who are more familiar with C++).
 206
 207 Different allocator implementations can provide exactly the same interface by
 208 assigning their own functions to the members of an instance of the structure.
 209 The structure has eight members in three groups.
 210
 211 4.1.1 Implementation Details
 212
 213  - private_data
 214  - type
 215
 216 The private_data pointer is a void pointer that the allocator implementation can
 217 use to store whatever internal structures it needs. A pointer to private_data is
 218 passed to almost all of the other functions that the allocator implementation
 219 must define.
 220
 221 The type field is an enumeration of type wmem_allocator_type_t (see
 222 section 3.1). Its value is set by the wmem_allocator_new() function, not
 223 by the implementation-specific constructor. This field should be considered
 224 read-only by the allocator implementation.
 225
 226 4.1.2 Consumer Functions
 227
 228  - alloc()
 229  - free()
 230  - realloc()
 231
 232 These function pointers should be set to functions with semantics obviously
 233 similar to their standard-library namesakes. Each one takes an extra parameter
 234 that is a copy of the allocator's private_data pointer.
 235
 236 Note that realloc() and free() are not expected to be called directly by user
 237 code in most cases - they are primarily optimisations for use by data
 238 structures that wmem might want to implement (it's hard, for example, to
 239 implement a dynamically sized array without some form of realloc).
 240
 241 Also note that allocators do not have to handle NULL pointers or 0-length
 242 requests in any way - those checks are done in an allocator-agnostic way
 243 higher up in wmem. Allocator authors can assume that all incoming pointers
 244 (to realloc and free) are non-NULL, and that all incoming lengths (to malloc
 245 and realloc) are non-0.
 246
 247 4.1.3 Producer/Manager Functions
 248
 249  - free_all()
 250  - gc()
 251  - destroy()
 252
 253 The free_all() function takes the private_data pointer and should free all the
 254 memory currently allocated in the pool. Note that this is not necessarilly
 255 exactly the same as calling free() on all the allocated blocks - free_all() is
 256 allowed to do additional cleanup or to make use of optimizations not available
 257 when freeing one block at a time.
 258
 259 The gc() function takes the private_data pointer and should do whatever it can
 260 to reduce excess memory usage in the dissector by returning unused blocks to
 261 the OS, optimizing internal data structures, etc.
 262
 263 The destroy() function does NOT take the private_data pointer - it instead takes
 264 a pointer to the allocator structure as a whole, since that structure may also
 265 need freeing. This function can assume that free_all() has been called
 266 immediately before it (though it can make no assumptions about whether or not
 267 gc() has ever been called).
 268
 269 4.2 Pool-Agnostic API
 270
 271 One of the issues with emem was that the API (including the public data
 272 structures) required wrapper functions for each scope implemented. Even
 273 if there was a stack implementation in emem, it wasn't necessarily available
 274 for use with file-scope memory unless someone took the time to write se_stack_
 275 wrapper functions for the interface.
 276
 277 In wmem, all public APIs take the pool as the first argument, so that they can
 278 be written once and used with any available memory pool. Data structures like
 279 wmem's stack implementation only take the pool when created - the provided
 280 pointer is stored internally with the data structure, and subsequent calls
 281 (like push and pop) will take the stack itself instead of the pool.
 282
 283 4.3 Debugging
 284
 285 The primary debugging control for wmem is the WIRESHARK_DEBUG_WMEM_OVERRIDE
 286 environment variable. If set, this value forces all calls to
 287 wmem_allocator_new() to return the same type of allocator, regardless of which
 288 type is requested normally by the code. It currently has three valid values:
 289
 290  - The value "simple" forces the use of WMEM_ALLOCATOR_SIMPLE. The valgrind
 291    script currently sets this value, since the simple allocator is the only
 292    one whose memory allocations are trackable properly by valgrind.
 293
 294  - The value "strict" forces the use of WMEM_ALLOCATOR_STRICT. The fuzz-test
 295    script currently sets this value, since the goal when fuzz-testing is to find
 296    as many errors as possible.
 297
 298  - The value "block" forces the use of WMEM_ALLOCATOR_BLOCK. This is not
 299    currently used by any scripts, but is useful for stress-testing the block
 300    allocator.
 301
 302 Note that regardless of the value of this variable, it will always be safe to
 303 call allocator-specific helpers functions. They are required to be safe no-ops
 304 if the allocator argument is of the wrong type.
 305
 306 4.4 Testing
 307
 308 There is a simple test suite for wmem that lives in the file wmem_test.c and
 309 should get automatically built into the binary 'wmem_test' when building
 310 Wireshark. It contains at least basic tests for all existing functionality.
 311 The suite is run automatically by the build-bots via the shell script
 312 test/test.sh which calls out to test/suite-unittests.sh.
 313
 314 New features added to wmem (allocators, data structures, utility
 315 functions, etc.) must also have tests added to this suite.
 316
 317 The test suite could potentially use a clean-up by someone more
 318 intimately familiar with Glib's testing framework, but it does the job.
 319
 320 5. TODO List
 321
 322 The following is a list of things that wmem provides but are incomplete
 323 (i.e. missing common operations):
 324
 325  - string buffers
 326  - singly-linked list
 327
 328 The following is an incomplete list of things that emem provides but wmem has
 329 not yet implemented:
 330
 331  - red-black tree
 332  - tvb_memdup
 333
 334 The following is a list of things that emem doesn't provide but that it might
 335 be nice if wmem did provide them:
 336
 337  - radix tree
 338  - dynamic array
 339  - hash table
 340
 341 /*
 342  * Editor modelines  -  http://www.wireshark.org/tools/modelines.html
 343  *
 344  * Local variables:
 345  * c-basic-offset: 4
 346  * tab-width: 8
 347  * indent-tabs-mode: nil
 348  * End:
 349  *
 350  * vi: set shiftwidth=4 tabstop=8 expandtab:
 351  * :indentSize=4:tabSize=8:noTabs=true:
 352  */