// SPDX-License-Identifier: GPL-2.0-only /* PIPAPO: PIle PAcket POlicies: set for arbitrary concatenations of ranges * * Copyright (c) 2019-2020 Red Hat GmbH * * Author: Stefano Brivio <[email protected]> */ /** * DOC: Theory of Operation * * * Problem * ------- * * Match packet bytes against entries composed of ranged or non-ranged packet * field specifiers, mapping them to arbitrary references. For example: * * :: * * --- fields ---> * | [net],[port],[net]... => [reference] * entries [net],[port],[net]... => [reference] * | [net],[port],[net]... => [reference] * V ... * * where [net] fields can be IP ranges or netmasks, and [port] fields are port * ranges. Arbitrary packet fields can be matched. * * * Algorithm Overview * ------------------ * * This algorithm is loosely inspired by [Ligatti 2010], and fundamentally * relies on the consideration that every contiguous range in a space of b bits * can be converted into b * 2 netmasks, from Theorem 3 in [Rottenstreich 2010], * as also illustrated in Section 9 of [Kogan 2014]. * * Classification against a number of entries, that require matching given bits * of a packet field, is performed by grouping those bits in sets of arbitrary * size, and classifying packet bits one group at a time. * * Example: * to match the source port (16 bits) of a packet, we can divide those 16 bits * in 4 groups of 4 bits each. Given the entry: * 0000 0001 0101 1001 * and a packet with source port: * 0000 0001 1010 1001 * first and second groups match, but the third doesn't. We conclude that the * packet doesn't match the given entry. * * Translate the set to a sequence of lookup tables, one per field. Each table * has two dimensions: bit groups to be matched for a single packet field, and * all the possible values of said groups (buckets). Input entries are * represented as one or more rules, depending on the number of composing * netmasks for the given field specifier, and a group match is indicated as a * set bit, with number corresponding to the rule index, in all the buckets * whose value matches the entry for a given group. * * Rules are mapped between fields through an array of x, n pairs, with each * item mapping a matched rule to one or more rules. The position of the pair in * the array indicates the matched rule to be mapped to the next field, x * indicates the first rule index in the next field, and n the amount of * next-field rules the current rule maps to. * * The mapping array for the last field maps to the desired references. * * To match, we perform table lookups using the values of grouped packet bits, * and use a sequence of bitwise operations to progressively evaluate rule * matching. * * A stand-alone, reference implementation, also including notes about possible * future optimisations, is available at: * https://pipapo.lameexcu.se/ * * Insertion * --------- * * - For each packet field: * * - divide the b packet bits we want to classify into groups of size t, * obtaining ceil(b / t) groups * * Example: match on destination IP address, with t = 4: 32 bits, 8 groups * of 4 bits each * * - allocate a lookup table with one column ("bucket") for each possible * value of a group, and with one row for each group * * Example: 8 groups, 2^4 buckets: * * :: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 * 1 * 2 * 3 * 4 * 5 * 6 * 7 * * - map the bits we want to classify for the current field, for a given * entry, to a single rule for non-ranged and netmask set items, and to one * or multiple rules for ranges. Ranges are expanded to composing netmasks * by pipapo_expand(). * * Example: 2 entries, 10.0.0.5:1024 and 192.168.1.0-192.168.2.1:2048 * - rule #0: 10.0.0.5 * - rule #1: 192.168.1.0/24 * - rule #2: 192.168.2.0/31 * * - insert references to the rules in the lookup table, selecting buckets * according to bit values of a rule in the given group. This is done by * pipapo_insert(). * * Example: given: * - rule #0: 10.0.0.5 mapping to buckets * < 0 10 0 0 0 0 0 5 > * - rule #1: 192.168.1.0/24 mapping to buckets * < 12 0 10 8 0 1 < 0..15 > < 0..15 > > * - rule #2: 192.168.2.0/31 mapping to buckets * < 12 0 10 8 0 2 0 < 0..1 > > * * these bits are set in the lookup table: * * :: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0 1,2 * 1 1,2 0 * 2 0 1,2 * 3 0 1,2 * 4 0,1,2 * 5 0 1 2 * 6 0,1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 7 1,2 1,2 1 1 1 0,1 1 1 1 1 1 1 1 1 1 1 * * - if this is not the last field in the set, fill a mapping array that maps * rules from the lookup table to rules belonging to the same entry in * the next lookup table, done by pipapo_map(). * * Note that as rules map to contiguous ranges of rules, given how netmask * expansion and insertion is performed, &union nft_pipapo_map_bucket stores * this information as pairs of first rule index, rule count. * * Example: 2 entries, 10.0.0.5:1024 and 192.168.1.0-192.168.2.1:2048, * given lookup table #0 for field 0 (see example above): * * :: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0 1,2 * 1 1,2 0 * 2 0 1,2 * 3 0 1,2 * 4 0,1,2 * 5 0 1 2 * 6 0,1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 7 1,2 1,2 1 1 1 0,1 1 1 1 1 1 1 1 1 1 1 * * and lookup table #1 for field 1 with: * - rule #0: 1024 mapping to buckets * < 0 0 4 0 > * - rule #1: 2048 mapping to buckets * < 0 0 5 0 > * * :: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0,1 * 1 0,1 * 2 0 1 * 3 0,1 * * we need to map rules for 10.0.0.5 in lookup table #0 (rule #0) to 1024 * in lookup table #1 (rule #0) and rules for 192.168.1.0-192.168.2.1 * (rules #1, #2) to 2048 in lookup table #2 (rule #1): * * :: * * rule indices in current field: 0 1 2 * map to rules in next field: 0 1 1 * * - if this is the last field in the set, fill a mapping array that maps * rules from the last lookup table to element pointers, also done by * pipapo_map(). * * Note that, in this implementation, we have two elements (start, end) for * each entry. The pointer to the end element is stored in this array, and * the pointer to the start element is linked from it. * * Example: entry 10.0.0.5:1024 has a corresponding &struct nft_pipapo_elem * pointer, 0x66, and element for 192.168.1.0-192.168.2.1:2048 is at 0x42. * From the rules of lookup table #1 as mapped above: * * :: * * rule indices in last field: 0 1 * map to elements: 0x66 0x42 * * * Matching * -------- * * We use a result bitmap, with the size of a single lookup table bucket, to * represent the matching state that applies at every algorithm step. This is * done by pipapo_lookup(). * * - For each packet field: * * - start with an all-ones result bitmap (res_map in pipapo_lookup()) * * - perform a lookup into the table corresponding to the current field, * for each group, and at every group, AND the current result bitmap with * the value from the lookup table bucket * * :: * * Example: 192.168.1.5 < 12 0 10 8 0 1 0 5 >, with lookup table from * insertion examples. * Lookup table buckets are at least 3 bits wide, we'll assume 8 bits for * convenience in this example. Initial result bitmap is 0xff, the steps * below show the value of the result bitmap after each group is processed: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0 1,2 * result bitmap is now: 0xff & 0x6 [bucket 12] = 0x6 * * 1 1,2 0 * result bitmap is now: 0x6 & 0x6 [bucket 0] = 0x6 * * 2 0 1,2 * result bitmap is now: 0x6 & 0x6 [bucket 10] = 0x6 * * 3 0 1,2 * result bitmap is now: 0x6 & 0x6 [bucket 8] = 0x6 * * 4 0,1,2 * result bitmap is now: 0x6 & 0x7 [bucket 0] = 0x6 * * 5 0 1 2 * result bitmap is now: 0x6 & 0x2 [bucket 1] = 0x2 * * 6 0,1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * result bitmap is now: 0x2 & 0x7 [bucket 0] = 0x2 * * 7 1,2 1,2 1 1 1 0,1 1 1 1 1 1 1 1 1 1 1 * final result bitmap for this field is: 0x2 & 0x3 [bucket 5] = 0x2 * * - at the next field, start with a new, all-zeroes result bitmap. For each * bit set in the previous result bitmap, fill the new result bitmap * (fill_map in pipapo_lookup()) with the rule indices from the * corresponding buckets of the mapping field for this field, done by * pipapo_refill() * * Example: with mapping table from insertion examples, with the current * result bitmap from the previous example, 0x02: * * :: * * rule indices in current field: 0 1 2 * map to rules in next field: 0 1 1 * * the new result bitmap will be 0x02: rule 1 was set, and rule 1 will be * set. * * We can now extend this example to cover the second iteration of the step * above (lookup and AND bitmap): assuming the port field is * 2048 < 0 0 5 0 >, with starting result bitmap 0x2, and lookup table * for "port" field from pre-computation example: * * :: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0,1 * 1 0,1 * 2 0 1 * 3 0,1 * * operations are: 0x2 & 0x3 [bucket 0] & 0x3 [bucket 0] & 0x2 [bucket 5] * & 0x3 [bucket 0], resulting bitmap is 0x2. * * - if this is the last field in the set, look up the value from the mapping * array corresponding to the final result bitmap * * Example: 0x2 resulting bitmap from 192.168.1.5:2048, mapping array for * last field from insertion example: * * :: * * rule indices in last field: 0 1 * map to elements: 0x66 0x42 * * the matching element is at 0x42. * * * References * ---------- * * [Ligatti 2010] * A Packet-classification Algorithm for Arbitrary Bitmask Rules, with * Automatic Time-space Tradeoffs * Jay Ligatti, Josh Kuhn, and Chris Gage. * Proceedings of the IEEE International Conference on Computer * Communication Networks (ICCCN), August 2010. * https://www.cse.usf.edu/~ligatti/papers/grouper-conf.pdf * * [Rottenstreich 2010] * Worst-Case TCAM Rule Expansion * Ori Rottenstreich and Isaac Keslassy. * 2010 Proceedings IEEE INFOCOM, San Diego, CA, 2010. * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.212.4592&rep=rep1&type=pdf * * [Kogan 2014] * SAX-PAC (Scalable And eXpressive PAcket Classification) * Kirill Kogan, Sergey Nikolenko, Ori Rottenstreich, William Culhane, * and Patrick Eugster. * Proceedings of the 2014 ACM conference on SIGCOMM, August 2014. * https://www.sigcomm.org/sites/default/files/ccr/papers/2014/August/2619239-2626294.pdf */ #include <linux/kernel.h> #include <linux/init.h> #include <linux/module.h> #include <linux/netlink.h> #include <linux/netfilter.h> #include <linux/netfilter/nf_tables.h> #include <net/netfilter/nf_tables_core.h> #include <uapi/linux/netfilter/nf_tables.h> #include <linux/bitmap.h> #include <linux/bitops.h> #include "nft_set_pipapo_avx2.h" #include "nft_set_pipapo.h" /** * pipapo_refill() - For each set bit, set bits from selected mapping table item * @map: Bitmap to be scanned for set bits * @len: Length of bitmap in longs * @rules: Number of rules in field * @dst: Destination bitmap * @mt: Mapping table containing bit set specifiers * @match_only: Find a single bit and return, don't fill * * Iteration over set bits with __builtin_ctzl(): Daniel Lemire, public domain. * * For each bit set in map, select the bucket from mapping table with index * corresponding to the position of the bit set. Use start bit and amount of * bits specified in bucket to fill region in dst. * * Return: -1 on no match, bit position on 'match_only', 0 otherwise. */ int pipapo_refill(unsigned long *map, unsigned int len, unsigned int rules, unsigned long *dst, const union nft_pipapo_map_bucket *mt, bool match_only) { … } /** * nft_pipapo_lookup() - Lookup function * @net: Network namespace * @set: nftables API set representation * @key: nftables API element representation containing key data * @ext: nftables API extension pointer, filled with matching reference * * For more details, see DOC: Theory of Operation. * * Return: true on match, false otherwise. */ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext) { … } /** * pipapo_get() - Get matching element reference given key data * @net: Network namespace * @set: nftables API set representation * @m: storage containing active/existing elements * @data: Key data to be matched against existing elements * @genmask: If set, check that element is active in given genmask * @tstamp: timestamp to check for expired elements * @gfp: the type of memory to allocate (see kmalloc). * * This is essentially the same as the lookup function, except that it matches * key data against the uncommitted copy and doesn't use preallocated maps for * bitmap results. * * Return: pointer to &struct nft_pipapo_elem on match, error pointer otherwise. */ static struct nft_pipapo_elem *pipapo_get(const struct net *net, const struct nft_set *set, const struct nft_pipapo_match *m, const u8 *data, u8 genmask, u64 tstamp, gfp_t gfp) { … } /** * nft_pipapo_get() - Get matching element reference given key data * @net: Network namespace * @set: nftables API set representation * @elem: nftables API element representation containing key data * @flags: Unused */ static struct nft_elem_priv * nft_pipapo_get(const struct net *net, const struct nft_set *set, const struct nft_set_elem *elem, unsigned int flags) { … } /** * pipapo_realloc_mt() - Reallocate mapping table if needed upon resize * @f: Field containing mapping table * @old_rules: Amount of existing mapped rules * @rules: Amount of new rules to map * * Return: 0 on success, negative error code on failure. */ static int pipapo_realloc_mt(struct nft_pipapo_field *f, unsigned int old_rules, unsigned int rules) { … } /** * pipapo_resize() - Resize lookup or mapping table, or both * @f: Field containing lookup and mapping tables * @old_rules: Previous amount of rules in field * @rules: New amount of rules * * Increase, decrease or maintain tables size depending on new amount of rules, * and copy data over. In case the new size is smaller, throw away data for * highest-numbered rules. * * Return: 0 on success, -ENOMEM on allocation failure. */ static int pipapo_resize(struct nft_pipapo_field *f, unsigned int old_rules, unsigned int rules) { … } /** * pipapo_bucket_set() - Set rule bit in bucket given group and group value * @f: Field containing lookup table * @rule: Rule index * @group: Group index * @v: Value of bit group */ static void pipapo_bucket_set(struct nft_pipapo_field *f, int rule, int group, int v) { … } /** * pipapo_lt_4b_to_8b() - Switch lookup table group width from 4 bits to 8 bits * @old_groups: Number of current groups * @bsize: Size of one bucket, in longs * @old_lt: Pointer to the current lookup table * @new_lt: Pointer to the new, pre-allocated lookup table * * Each bucket with index b in the new lookup table, belonging to group g, is * filled with the bit intersection between: * - bucket with index given by the upper 4 bits of b, from group g, and * - bucket with index given by the lower 4 bits of b, from group g + 1 * * That is, given buckets from the new lookup table N(x, y) and the old lookup * table O(x, y), with x bucket index, and y group index: * * N(b, g) := O(b / 16, g) & O(b % 16, g + 1) * * This ensures equivalence of the matching results on lookup. Two examples in * pictures: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... 254 255 * 0 ^ * 1 | ^ * ... ( & ) | * / \ | * / \ .-( & )-. * / bucket \ | | * group 0 / 1 2 3 \ 4 5 6 7 8 9 10 11 12 13 |14 15 | * 0 / \ | | * 1 \ | | * 2 | --' * 3 '- * ... */ static void pipapo_lt_4b_to_8b(int old_groups, int bsize, unsigned long *old_lt, unsigned long *new_lt) { … } /** * pipapo_lt_8b_to_4b() - Switch lookup table group width from 8 bits to 4 bits * @old_groups: Number of current groups * @bsize: Size of one bucket, in longs * @old_lt: Pointer to the current lookup table * @new_lt: Pointer to the new, pre-allocated lookup table * * Each bucket with index b in the new lookup table, belonging to group g, is * filled with the bit union of: * - all the buckets with index such that the upper four bits of the lower byte * equal b, from group g, with g odd * - all the buckets with index such that the lower four bits equal b, from * group g, with g even * * That is, given buckets from the new lookup table N(x, y) and the old lookup * table O(x, y), with x bucket index, and y group index: * * - with g odd: N(b, g) := U(O(x, g) for each x : x = (b & 0xf0) >> 4) * - with g even: N(b, g) := U(O(x, g) for each x : x = b & 0x0f) * * where U() denotes the arbitrary union operation (binary OR of n terms). This * ensures equivalence of the matching results on lookup. */ static void pipapo_lt_8b_to_4b(int old_groups, int bsize, unsigned long *old_lt, unsigned long *new_lt) { … } /** * pipapo_lt_bits_adjust() - Adjust group size for lookup table if needed * @f: Field containing lookup table */ static void pipapo_lt_bits_adjust(struct nft_pipapo_field *f) { … } /** * pipapo_insert() - Insert new rule in field given input key and mask length * @f: Field containing lookup table * @k: Input key for classification, without nftables padding * @mask_bits: Length of mask; matches field length for non-ranged entry * * Insert a new rule reference in lookup buckets corresponding to k and * mask_bits. * * Return: 1 on success (one rule inserted), negative error code on failure. */ static int pipapo_insert(struct nft_pipapo_field *f, const uint8_t *k, int mask_bits) { … } /** * pipapo_step_diff() - Check if setting @step bit in netmask would change it * @base: Mask we are expanding * @step: Step bit for given expansion step * @len: Total length of mask space (set and unset bits), bytes * * Convenience function for mask expansion. * * Return: true if step bit changes mask (i.e. isn't set), false otherwise. */ static bool pipapo_step_diff(u8 *base, int step, int len) { … } /** * pipapo_step_after_end() - Check if mask exceeds range end with given step * @base: Mask we are expanding * @end: End of range * @step: Step bit for given expansion step, highest bit to be set * @len: Total length of mask space (set and unset bits), bytes * * Convenience function for mask expansion. * * Return: true if mask exceeds range setting step bits, false otherwise. */ static bool pipapo_step_after_end(const u8 *base, const u8 *end, int step, int len) { … } /** * pipapo_base_sum() - Sum step bit to given len-sized netmask base with carry * @base: Netmask base * @step: Step bit to sum * @len: Netmask length, bytes */ static void pipapo_base_sum(u8 *base, int step, int len) { … } /** * pipapo_expand() - Expand to composing netmasks, insert into lookup table * @f: Field containing lookup table * @start: Start of range * @end: End of range * @len: Length of value in bits * * Expand range to composing netmasks and insert corresponding rule references * in lookup buckets. * * Return: number of inserted rules on success, negative error code on failure. */ static int pipapo_expand(struct nft_pipapo_field *f, const u8 *start, const u8 *end, int len) { … } /** * pipapo_map() - Insert rules in mapping tables, mapping them between fields * @m: Matching data, including mapping table * @map: Table of rule maps: array of first rule and amount of rules * in next field a given rule maps to, for each field * @e: For last field, nft_set_ext pointer matching rules map to */ static void pipapo_map(struct nft_pipapo_match *m, union nft_pipapo_map_bucket map[NFT_PIPAPO_MAX_FIELDS], struct nft_pipapo_elem *e) { … } /** * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address * @m: Matching data * @cpu: CPU number */ static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu) { … } /** * pipapo_realloc_scratch() - Reallocate scratch maps for partial match results * @clone: Copy of matching data with pending insertions and deletions * @bsize_max: Maximum bucket size, scratch maps cover two buckets * * Return: 0 on success, -ENOMEM on failure. */ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone, unsigned long bsize_max) { … } static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set) { … } static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old); /** * pipapo_maybe_clone() - Build clone for pending data changes, if not existing * @set: nftables API set representation * * Return: newly created or existing clone, if any. NULL on allocation failure */ static struct nft_pipapo_match *pipapo_maybe_clone(const struct nft_set *set) { … } /** * nft_pipapo_insert() - Validate and insert ranged elements * @net: Network namespace * @set: nftables API set representation * @elem: nftables API element representation containing key data * @elem_priv: Filled with pointer to &struct nft_set_ext in inserted element * * Return: 0 on success, error pointer on failure. */ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set, const struct nft_set_elem *elem, struct nft_elem_priv **elem_priv) { … } /** * pipapo_clone() - Clone matching data to create new working copy * @old: Existing matching data * * Return: copy of matching data passed as 'old' or NULL. */ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old) { … } /** * pipapo_rules_same_key() - Get number of rules originated from the same entry * @f: Field containing mapping table * @first: Index of first rule in set of rules mapping to same entry * * Using the fact that all rules in a field that originated from the same entry * will map to the same set of rules in the next field, or to the same element * reference, return the cardinality of the set of rules that originated from * the same entry as the rule with index @first, @first rule included. * * In pictures: * rules * field #0 0 1 2 3 4 * map to: 0 1 2-4 2-4 5-9 * . . ....... . ... * | | | | \ \ * | | | | \ \ * | | | | \ \ * ' ' ' ' ' \ * in field #1 0 1 2 3 4 5 ... * * if this is called for rule 2 on field #0, it will return 3, as also rules 2 * and 3 in field 0 map to the same set of rules (2, 3, 4) in the next field. * * For the last field in a set, we can rely on associated entries to map to the * same element references. * * Return: Number of rules that originated from the same entry as @first. */ static unsigned int pipapo_rules_same_key(struct nft_pipapo_field *f, unsigned int first) { … } /** * pipapo_unmap() - Remove rules from mapping tables, renumber remaining ones * @mt: Mapping array * @rules: Original amount of rules in mapping table * @start: First rule index to be removed * @n: Amount of rules to be removed * @to_offset: First rule index, in next field, this group of rules maps to * @is_last: If this is the last field, delete reference from mapping array * * This is used to unmap rules from the mapping table for a single field, * maintaining consistency and compactness for the existing ones. * * In pictures: let's assume that we want to delete rules 2 and 3 from the * following mapping array: * * rules * 0 1 2 3 4 * map to: 4-10 4-10 11-15 11-15 16-18 * * the result will be: * * rules * 0 1 2 * map to: 4-10 4-10 11-13 * * for fields before the last one. In case this is the mapping table for the * last field in a set, and rules map to pointers to &struct nft_pipapo_elem: * * rules * 0 1 2 3 4 * element pointers: 0x42 0x42 0x33 0x33 0x44 * * the result will be: * * rules * 0 1 2 * element pointers: 0x42 0x42 0x44 */ static void pipapo_unmap(union nft_pipapo_map_bucket *mt, unsigned int rules, unsigned int start, unsigned int n, unsigned int to_offset, bool is_last) { … } /** * pipapo_drop() - Delete entry from lookup and mapping tables, given rule map * @m: Matching data * @rulemap: Table of rule maps, arrays of first rule and amount of rules * in next field a given entry maps to, for each field * * For each rule in lookup table buckets mapping to this set of rules, drop * all bits set in lookup table mapping. In pictures, assuming we want to drop * rules 0 and 1 from this lookup table: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0 1,2 * 1 1,2 0 * 2 0 1,2 * 3 0 1,2 * 4 0,1,2 * 5 0 1 2 * 6 0,1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 7 1,2 1,2 1 1 1 0,1 1 1 1 1 1 1 1 1 1 1 * * rule 2 becomes rule 0, and the result will be: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 0 * 1 0 * 2 0 * 3 0 * 4 0 * 5 0 * 6 0 * 7 0 0 * * once this is done, call unmap() to drop all the corresponding rule references * from mapping tables. */ static void pipapo_drop(struct nft_pipapo_match *m, union nft_pipapo_map_bucket rulemap[]) { … } static void nft_pipapo_gc_deactivate(struct net *net, struct nft_set *set, struct nft_pipapo_elem *e) { … } /** * pipapo_gc() - Drop expired entries from set, destroy start and end elements * @set: nftables API set representation * @m: Matching data */ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m) { … } /** * pipapo_free_fields() - Free per-field tables contained in matching data * @m: Matching data */ static void pipapo_free_fields(struct nft_pipapo_match *m) { … } static void pipapo_free_match(struct nft_pipapo_match *m) { … } /** * pipapo_reclaim_match - RCU callback to free fields from old matching data * @rcu: RCU head */ static void pipapo_reclaim_match(struct rcu_head *rcu) { … } /** * nft_pipapo_commit() - Replace lookup data with current working copy * @set: nftables API set representation * * While at it, check if we should perform garbage collection on the working * copy before committing it for lookup, and don't replace the table if the * working copy doesn't have pending changes. * * We also need to create a new working copy for subsequent insertions and * deletions. */ static void nft_pipapo_commit(struct nft_set *set) { … } static void nft_pipapo_abort(const struct nft_set *set) { … } /** * nft_pipapo_activate() - Mark element reference as active given key, commit * @net: Network namespace * @set: nftables API set representation * @elem_priv: nftables API element representation containing key data * * On insertion, elements are added to a copy of the matching data currently * in use for lookups, and not directly inserted into current lookup data. Both * nft_pipapo_insert() and nft_pipapo_activate() are called once for each * element, hence we can't purpose either one as a real commit operation. */ static void nft_pipapo_activate(const struct net *net, const struct nft_set *set, struct nft_elem_priv *elem_priv) { … } /** * nft_pipapo_deactivate() - Search for element and make it inactive * @net: Network namespace * @set: nftables API set representation * @elem: nftables API element representation containing key data * * Return: deactivated element if found, NULL otherwise. */ static struct nft_elem_priv * nft_pipapo_deactivate(const struct net *net, const struct nft_set *set, const struct nft_set_elem *elem) { … } /** * nft_pipapo_flush() - make element inactive * @net: Network namespace * @set: nftables API set representation * @elem_priv: nftables API element representation containing key data * * This is functionally the same as nft_pipapo_deactivate(), with a slightly * different interface, and it's also called once for each element in a set * being flushed, so we can't implement, strictly speaking, a flush operation, * which would otherwise be as simple as allocating an empty copy of the * matching data. * * Note that we could in theory do that, mark the set as flushed, and ignore * subsequent calls, but we would leak all the elements after the first one, * because they wouldn't then be freed as result of API calls. * * Return: true if element was found and deactivated. */ static void nft_pipapo_flush(const struct net *net, const struct nft_set *set, struct nft_elem_priv *elem_priv) { … } /** * pipapo_get_boundaries() - Get byte interval for associated rules * @f: Field including lookup table * @first_rule: First rule (lowest index) * @rule_count: Number of associated rules * @left: Byte expression for left boundary (start of range) * @right: Byte expression for right boundary (end of range) * * Given the first rule and amount of rules that originated from the same entry, * build the original range associated with the entry, and calculate the length * of the originating netmask. * * In pictures: * * bucket * group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * 0 1,2 * 1 1,2 * 2 1,2 * 3 1,2 * 4 1,2 * 5 1 2 * 6 1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * 7 1,2 1,2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * * this is the lookup table corresponding to the IPv4 range * 192.168.1.0-192.168.2.1, which was expanded to the two composing netmasks, * rule #1: 192.168.1.0/24, and rule #2: 192.168.2.0/31. * * This function fills @left and @right with the byte values of the leftmost * and rightmost bucket indices for the lowest and highest rule indices, * respectively. If @first_rule is 1 and @rule_count is 2, we obtain, in * nibbles: * left: < 12, 0, 10, 8, 0, 1, 0, 0 > * right: < 12, 0, 10, 8, 0, 2, 2, 1 > * corresponding to bytes: * left: < 192, 168, 1, 0 > * right: < 192, 168, 2, 1 > * with mask length irrelevant here, unused on return, as the range is already * defined by its start and end points. The mask length is relevant for a single * ranged entry instead: if @first_rule is 1 and @rule_count is 1, we ignore * rule 2 above: @left becomes < 192, 168, 1, 0 >, @right becomes * < 192, 168, 1, 255 >, and the mask length, calculated from the distances * between leftmost and rightmost bucket indices for each group, would be 24. * * Return: mask length, in bits. */ static int pipapo_get_boundaries(struct nft_pipapo_field *f, int first_rule, int rule_count, u8 *left, u8 *right) { … } /** * pipapo_match_field() - Match rules against byte ranges * @f: Field including the lookup table * @first_rule: First of associated rules originating from same entry * @rule_count: Amount of associated rules * @start: Start of range to be matched * @end: End of range to be matched * * Return: true on match, false otherwise. */ static bool pipapo_match_field(struct nft_pipapo_field *f, int first_rule, int rule_count, const u8 *start, const u8 *end) { … } /** * nft_pipapo_remove() - Remove element given key, commit * @net: Network namespace * @set: nftables API set representation * @elem_priv: nftables API element representation containing key data * * Similarly to nft_pipapo_activate(), this is used as commit operation by the * API, but it's called once per element in the pending transaction, so we can't * implement this as a single commit operation. Closest we can get is to remove * the matched element here, if any, and commit the updated matching data. */ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set, struct nft_elem_priv *elem_priv) { … } /** * nft_pipapo_do_walk() - Walk over elements in m * @ctx: nftables API context * @set: nftables API set representation * @m: matching data pointing to key mapping array * @iter: Iterator * * As elements are referenced in the mapping array for the last field, directly * scan that array: there's no need to follow rule mappings from the first * field. @m is protected either by RCU read lock or by transaction mutex. */ static void nft_pipapo_do_walk(const struct nft_ctx *ctx, struct nft_set *set, const struct nft_pipapo_match *m, struct nft_set_iter *iter) { … } /** * nft_pipapo_walk() - Walk over elements * @ctx: nftables API context * @set: nftables API set representation * @iter: Iterator * * Test if destructive action is needed or not, clone active backend if needed * and call the real function to work on the data. */ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_iter *iter) { … } /** * nft_pipapo_privsize() - Return the size of private data for the set * @nla: netlink attributes, ignored as size doesn't depend on them * @desc: Set description, ignored as size doesn't depend on it * * Return: size of private data for this set implementation, in bytes */ static u64 nft_pipapo_privsize(const struct nlattr * const nla[], const struct nft_set_desc *desc) { … } /** * nft_pipapo_estimate() - Set size, space and lookup complexity * @desc: Set description, element count and field description used * @features: Flags: NFT_SET_INTERVAL needs to be there * @est: Storage for estimation data * * Return: true if set description is compatible, false otherwise */ static bool nft_pipapo_estimate(const struct nft_set_desc *desc, u32 features, struct nft_set_estimate *est) { … } /** * nft_pipapo_init() - Initialise data for a set instance * @set: nftables API set representation * @desc: Set description * @nla: netlink attributes * * Validate number and size of fields passed as NFTA_SET_DESC_CONCAT netlink * attributes, initialise internal set parameters, current instance of matching * data and a copy for subsequent insertions. * * Return: 0 on success, negative error code on failure. */ static int nft_pipapo_init(const struct nft_set *set, const struct nft_set_desc *desc, const struct nlattr * const nla[]) { … } /** * nft_set_pipapo_match_destroy() - Destroy elements from key mapping array * @ctx: context * @set: nftables API set representation * @m: matching data pointing to key mapping array */ static void nft_set_pipapo_match_destroy(const struct nft_ctx *ctx, const struct nft_set *set, struct nft_pipapo_match *m) { … } /** * nft_pipapo_destroy() - Free private data for set and all committed elements * @ctx: context * @set: nftables API set representation */ static void nft_pipapo_destroy(const struct nft_ctx *ctx, const struct nft_set *set) { … } /** * nft_pipapo_gc_init() - Initialise garbage collection * @set: nftables API set representation * * Instead of actually setting up a periodic work for garbage collection, as * this operation requires a swap of matching data with the working copy, we'll * do that opportunistically with other commit operations if the interval is * elapsed, so we just need to set the current jiffies timestamp here. */ static void nft_pipapo_gc_init(const struct nft_set *set) { … } const struct nft_set_type nft_set_pipapo_type = …; #if defined(CONFIG_X86_64) && !defined(CONFIG_UML) const struct nft_set_type nft_set_pipapo_avx2_type = …; #endif