Samplez Module

Utility for statistically sampling strings with associated values.

Overview

Install ‘Middleware’ into your WSGI app to use. Call the ‘set’ function at any time during your request to sample a key/value probabilistically. Uses a simple algorithm (http://en.wikipedia.org/wiki/Reservoir_sampling) to approximate incidence rates and sums. Cost per request is ~3 memcache calls.

Gathered stats are available via built-in UI that’s separated into related sections. Users are encouraged to use multiple overlapping time-periods of samplers to provide different levels of resolution.

Usage

Example code

SAMPLEZ_POPULAR_CONTENT = samplez.Section(
    'Popular content',
    samplez.Config(
        'content_10m',
        period=600,
        by_value=True,
        samples=10000,
        value_units='qps'),
    samplez.Config(
        'content_1h',
        period=3600,
        by_value=True,
        samples=10000,
        value_units='qps'))

SAMPLEZ_LATENCY = samplez.Section(
    'Latency',
    samplez.Config(
        'latency_10m',
        period=300,
        by_value=True,
        samples=10000,
        value_units='ms'),
    samplez.Config(
        'latency_1h',
        period=3600,
        by_value=True,
        samples=10000,
        value_units='ms'))

samplez.set(SAMPLEZ_LATENCY, 'http://example.com/some/content', 124)
samplez.set(SAMPLEZ_POPULAR_CONTENT, 'http://example.com/some/content')

Attribution

Code originally from the PubSubHubbub project:

http://pubsubhubbub.googlecode.com

class afterburner.experimental.samplez.samplez.Config(name, period=None, samples=None, by_domain=False, by_value=False, rate=1, max_value_length=75, tolerance=10, title=None, key_name='Key', value_units='')[source]

Configuration for a reservoir sampler.

adjust_value(key)[source]

Adjust the value for a sampling key.

Args:
key: The sampling key to adjust.
Returns:
The adjusted key.
compute_frequency(count, found, total, elapsed)[source]

Computes the frequency of a sample.

Args:
count: Total number of samples of this key. found: Total number of samples present for all keys. total: The total number of sampling events so far, regardless of whether or not the sample was saved. elapsed: Seconds elapsed during the current sampling period.
Returns:
The frequency, in events per second, of this key in the time period, or None if no samples have been taken yet.
is_expired(last_time, current_time)[source]

Checks if this config is expired.

Args:
last_time: UNIX timestamp when this config’s period started. current_time: UNIX timestamp of the current time.
Returns:
True if the config period has expired.
position_key(index)[source]

Generates the position key for the sample slot with the given index.

Args:
index: Numerical index of the sample position who’s key to retrieve.
Returns:
Memcache key to use.
should_sample(key, coin_flip)[source]

Checks if the key should be sampled.

Args:
key: The sampling key to check. coin_flip: Random value between 0 and 1.
Return:
True if the sample should be taken, False otherwise.
exception afterburner.experimental.samplez.samplez.ConfigError[source]

Something is wrong with a configured DoS limit, sampler, or scorer.

class afterburner.experimental.samplez.samplez.Middleware(app)[source]

WSGI middleware that asynchronously updates samplez tables.

class afterburner.experimental.samplez.samplez.MultiSampler(gettime=<built-in function time>)[source]

Sampler that saves key/value pairs for multiple reservoirs in parallel.

The basic algorithm is:

  1. Get the reservoir start timestamp.
  2. If more than period seconds have elapsed, set the timestamp to now, set the reservoir’s event counter to zero (average case this is skipped).
  3. Increment the event counter by the number of new samples.
  4. Set memcache values to incoming samples following the reservoir algorithm, potentially only sampling a subset.

The benefit of this approach is it can be applied to many reservoirs in parallel without incurring additional API calls. The only limit is the 32MB limit on App Engine batch API calls, which puts a cap on the amount of samples that can be made simultaneously.

Samples are stored in keys like: ‘sampler_name:0’, ‘sampler_name:1’

Values stored for samples look like: ‘key_sample:NNNN:WWWW’ where the ‘N’s represent the sample value as a big-endian-encoded 4-byte string, and the ‘W’s are a UNIX timestamp as a big-endian-encoded 4-byte string. The timestamp is used to ignore samples that are not from the current period.

There can be a race for resetting the timestamp for a sampler right after the period starts, but it always favors the caller who inserted last (all earlier data will be overwritten). This results in some missing data for short-period samplers, but it’s okay.

get(config, single_key=None)[source]

Gets statistics for a particular config and/or key.

This will only retrieve samples for the current time period. Samples from previous time periods will be ignored.

Args:
config: The Config to retrieve stats for. single_key: If None, then global stats for the config will be retrieved. When a key value (a string), then only stats for that particular key will be returned to the caller.
Returns:
SampleResult object containing the result data.
sample(reporter, getrandom=<built-in method random of Random object at 0x1fefb70>, randrange=<bound method Random.randrange of <random.Random object at 0x1fefb70>>)[source]

Samples a set of reported key/values synchronously.

Args:
reporter: Reporter instance containing key/values to sample. getrandom: Used for testing. randrange: Used for testing.
class afterburner.experimental.samplez.samplez.Reporter[source]

Contains a batch of keys and values for potential sampling.

all_keys()[source]

Returns all the sampling keys present across all configs.

Each key will be present at least once, but some keys may be present more than once if they were inserted repeatedly. The keys are in insertion order. This simplifies testing of this class.

get(key, config)[source]

Gets the value for a key/config.

Args:
key: The sampling key to retrieve the value for. config: The Config object to get the value for.
Returns:
The value for the key/config or None if it’s not present.
get_keys(config)[source]

Retrieves the keys present for a specific Config.

Args:
config: The Config object to get the keys for.
Returns:
The list of keys present for this config, with no duplicates.
remove(key, config)[source]

Removes a key/value for a specific config.

If the key is not present for the config, this method does nothing.

Args:
key: The sampling key to remove. config: The Config object to remove the key for.
set(config_or_section, key, value=1)[source]

Sets a key/value for one or more configs.

Each config/key combination may only have a single value. Subsequent calls to this method with the same key/config will overwrite the previous value.

Args:
config_or_section: The Config object to set the value for or a Section instance comprised of multiple Config objects that should all be updated by this call. key: The sampling key to add. value: The value to set for this config. Coerced to an integer.
class afterburner.experimental.samplez.samplez.SampleResult(config, total_samples, time_elapsed)[source]

Contains the current results of a sampler for a given config.

add(key, when, value)[source]

Adds a new sample to these results.

Args:
key: The sampling key. when: When the sample was made, as a UNIX timestamp. value: The value that was sampled.
get_average(key)[source]

Gets the weighted average of this key’s sampled values.

Args:
key: The sampling key.
Returns:
The weighted average or None if this key does not exist.
get_count(key)[source]

Gets the count of unique samples for a key.

Args:
key: The sampling key.
Returns:
The number of items. Will be zero if the key does not exist.
get_frequency(key)[source]

Gets the frequency of events for this key during the sampling period.

Args:
key: The sampling key.
Returns:
The frequency as events per second or None if this key does not exist.
get_max(key)[source]

Gets the max value seen for a key.

Args:
key: The sampling key.
Returns:
The maximum value or None if this key does not exist.
get_min(key)[source]

Gets the min value seen for a key.

Args:
key: The sampling key.
Returns:
The minimum value or None if this key does not exist.
get_samples(key)[source]

Gets the unique sample data for a key.

Args:
key: The sampling key.
Returns:
List of tuple (when, value) where:
when: The UNIX timestamp for the sample. value: The sample value.
overall_rate()[source]

Gets the overall rate of events.

Returns:
Total events per second.
sample_objects()[source]

Gets the contents of this result object for use in template rendering.

Returns:
Generator of model objects.
set_single_sample(key)[source]

Sets that this result is for a single key.

Args:
key: The sampling key.
class afterburner.experimental.samplez.samplez.SamplezHandler[source]

Handler that serves samplez data.

class afterburner.experimental.samplez.samplez.Section(title, *configs)[source]

A set of related Configs with a pretty name.

Configs that are added to a Section will be auto-registered with the module so they can be displayed on built-in status pages.

results(**kwargs)[source]

Gets statistics for a Section, optionally for a single key.

Use when retrieving data from multiple configs; ensures that the memory usage of the previous result is garbage collected before the next one is returned.

Args:
sampler: Used for testing; the MultiSampler instance to use for fetching results. kwargs: Keyword arguments to pass to the ‘get’ method of this class.
Returns:
Generator that yields each SampleResult object for each config belonging to the given Section.
afterburner.experimental.samplez.samplez.apply_for_testing()[source]

Applies all pending samples without the need for WSGI middleware.

afterburner.experimental.samplez.samplez.set(*args, **kwargs)[source]

WSGI convenience method; sets a key/value for one or more configs.

Each config/key combination may only have a single value. Subsequent calls to this method with the same key/config will overwrite the previous value.

Args:
config_or_section: The Config object to set the value for or a Section instance comprised of multiple Config objects that should all be updated by this call. key: The sampling key to add. value: The value to set for this config.
afterburner.experimental.samplez.samplez.setup_for_testing()[source]

Set up this module for use in tests without WSGI middleware.

Project Versions

Table Of Contents

This Page