Back to Blog List

A Data Structure for managing Ranges

2024-11-30

Problem Description

You are an engineer working on a SaaS application. Your company’s biggest clients have requested a new feature. They would like the application to support IP Filtering when accessing their data. The customers want to control which IP addresses their tenants’ users can use to access the application, both on the control plane and the data plane. Essentially, they want the application to store IPv4 addresses that specify an allow-list of IPs for their tenant’s information. Any request to their tenant should only be allowed if the request originates from an IPv4 address in the allow-list. The requirements are as follows:

  1. The app will receive a range of IPs in the form of $[\text{start IP}, \text{end IP})$
  2. The customer may decide to change that range at any time by removing one or more IPs from the list or adding more ranges.

Representing an IPV4 Address

#ifndef IPV4_H
#define IPV4_H

#include <string>
#include <optional>

class IpV4 {
    using B = unsigned char;

    B b1,b2,b3,b4;

    bool ValidateAndCreate(const std::string& ip);
    IpV4():b1(0),b2(0),b3(0),b4(0){}

public:

    static std::optional<IpV4> FromString(const std::string &ip);
    std::strong_ordering operator<=>(const IpV4&) const;
    bool operator==(const IpV4& other) const;

    IpV4(const IpV4& other);
    IpV4& operator=(const IpV4& other);

    IpV4(const IpV4&& other) = delete;
    IpV4& operator=(const IpV4&& other) = delete;

    ~IpV4(){}

    std::string ToString() const;
};

#endif
#include "IpV4.h"
#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

auto IpV4::ValidateAndCreate(const std::string &ip) -> bool{
    if(ip.empty()) {
        return false;
    }

    istringstream ss(ip);
    vector<string> tokens;
    string temp;
    while (getline(ss,temp, '.')) {
        tokens.push_back(temp);
    }

    if(tokens.size() != 4) {
        return false;
    }

    int t1 = stoi(tokens[0]);
    int t2 = stoi(tokens[1]);
    int t3 = stoi(tokens[2]);
    int t4 = stoi(tokens[3]);

    if (t1 <= 255 && t1 > 0 &&
        t2 >= 0 && t2 <= 255 && t3 >= 0 &&
        t3 <= 255 && t4 >= 0 && t4 <= 255) {
        b1 = t1;
        b2 = t2;
        b3 = t3;
        b4 = t4;
        return true;
    }

    return false;
}

auto IpV4::FromString(const std::string &ip)->optional<IpV4> {
    if(ip.empty()) {
        return std::nullopt;
    }

    IpV4 ipv4;
    if (ipv4.ValidateAndCreate(ip)) {
        return std::optional<IpV4>(ipv4);
    }

    return std::nullopt;
}

auto IpV4::operator=(const IpV4 &ip)->IpV4& {
    b1 = ip.b1;
    b2 = ip.b2;
    b3 = ip.b3;
    b4 = ip.b4;
    return *this;
}

IpV4::IpV4(const IpV4& other){
    operator=(other);
}

auto IpV4::ToString() const -> string {
   ostringstream oss;
    oss << to_string(b1) << '.' << to_string(b2) << '.' << to_string(b3) << '.' << to_string(b4);
    return oss.str();
}

auto IpV4::operator<=>(const IpV4& other) const ->strong_ordering {
    if (auto cmp = b1 <=> other.b1; cmp != std::strong_ordering::equal) return cmp;
    if (auto cmp = b2 <=> other.b2; cmp != std::strong_ordering::equal) return cmp;
    if (auto cmp = b3 <=> other.b3; cmp != std::strong_ordering::equal) return cmp;
    return b4 <=> other.b4;
}

bool IpV4::operator==(const IpV4 &other) const {
    return (b1 == other.b1 && b2 == other.b2 && b3 == other.b3 && b4 == other.b4);
}

The code above is a simple C++ class that represents an IPv4 address. Some of the code may not make sense at this point, but it will soon enough.

How to store them

You will obviously store the list of IPs in your database for persistence and durability. However, because every request to your service will need access to that list, and because the list might grow large, you realize you need to keep the IPs in memory in some sort of cache. But how will that cache be structured?

Your first attempt might be to keep all the IPs in some sort of HashSet. When a request arrives, you simply look the IP up in the HashSet. To populate the HashSet, you iterate through all the IPs in the given range and add them to the set. Then, when a request comes in, you look up the IP in the set and accept or reject the request based on the result.

This is a good first try, but you soon realize that the IP ranges might be very large. Also, since the customer might add or remove ranges, you would have to constantly iterate over the set to manage these operations.

Is there a better way?

Storing Ranges

Instead of storing the IPs individually, you can store them as ranges. Each range looks like $[Start, End)$. The challenge is how to store these ranges so they remain disjoint, thereby making lookups efficient. You also need a way to combine disjoint ranges when a new range merges them, and then split them back into disjoint ranges when a portion is removed.

The Range Manager

To store the ranges, we will use a balanced binary search tree, specifically a std::map. The tree will be sorted by the end of the range, with the value storing the start of the range.

Why store them by the end of the range? That might not be immediately obvious, so let’s first look at the operations we need to support and how we can implement them. That will make it clearer why we want the map keyed by the end of the range.

template <std::totally_ordered T>
class RangesManager {

    template <std::totally_ordered Key, typename Value>
    using TotallyOrderedMap = std::map<Key, Value>;

public:

    using Start = T;
    using End = T;

private:
    TotallyOrderedMap<End, Start> ranges;
bool AddRange(Start start, End end);
bool RemoveRange(Start start, End end);
bool IsInRange(const T& value) const;

A couple of things to note here:

  1. The template parameter T is the type of the start and end of the range.
  2. The member ranges is a std::map that stores the end of the range as the key and the start of the range as the value.
  3. Notice that T is required to adhere to the std::totally_ordered concept. This is because the std::map requires the key type to be comparable and we want to support any type that can be ordered.

When adding a range to the manager, we must ensure it does not intersect with existing ranges. If it does, we need to merge them. The same goes for removing a range.

Imagine we stored ranges by their start, with these ranges:
$[s1,e1)$
$[s2,e2)$
$[s3,e3)$
$[s4,e4)$

Suppose we want to add $[s5,e5)$ such that $s5 \in [s1,e1)$ and $e5 \in [s4,e4]$. We must merge the entire set of ranges into a single range $[s1,e5)$.
how do we detect that $s2 \in [s1,e1)$?
If we use C++’s lower_bound, we get the first range whose end is greater than $s5$ which is
$[s2,e2)$. Then we’d need to step back an element in the map, checking if its start is less than or equal to $s5$, and so on. We’d do the same for $e5$, to find the range that contains it.

By storing ranges by their end, our map might hold something like:
$[e1,s1)$
$[e2,s2)$
$[e3,s3)$
$[e4,s4)$

When adding $[s5,e5)$, we use lower_bound(s5) (or similarly lower_bound(start) in practice) and merge accordingly. We keep merging until we reach the range that encompasses $e5$ or exhaust the map.

Implementation of AddRange

bool AddRange(Start start, End end) {
    if (start >= end) return false;
    for (auto iter = ranges.lower_bound(start); iter != ranges.end() && end >= iter->second;) {
        start = std::min(start, iter->second);
        end = std::max(end, iter->first);
        iter = ranges.erase(iter);
    }

    ranges.insert({end, start});
    return true;
}

Note that:

iter = ranges.erase(iter);

removes the current range and returns an iterator to the next item in the map.

RemoveRange is similar to AddRange, except that removing part of a range might split it into multiple disjoint ranges. We then re-add these disjoint fragments.

Implementation of RemoveRange

bool RemoveRange(Start start, End end) {
    if (start >= end) return false;

    std::vector<std::pair<T,T>> addBack;
    bool removed = false;

    for (auto iter = ranges.lower_bound(start); iter != ranges.end() && end >= iter->second;) {
        if (start > iter->second) {
            addBack.push_back({iter->second, start});
        }

        if (end < iter->first) {
            addBack.push_back({end, iter->first});
        }

        iter = ranges.erase(iter);
        removed = true;
    }

    for (auto &p: addBack) {
        AddRange(p.first, p.second);
    }

    return removed;
}

Finally, to check if a value is in any range:

bool IsInRange(const T& value) const {
    auto iter = ranges.lower_bound(value);
    return iter != ranges.end() && iter->first > value && iter->second <= value;
}

This leverages the fact that the ranges are keyed by their end boundaries. We use lower_bound to find the first range whose end is $\geq$ value, then check if that range’s start is $\leq$ value.

All operations run in $O(\log n)$ time, where n is the number of stored ranges.

Full Implementation of RangesManager

#ifndef RANGESMANAGER_H
#define RANGESMANAGER_H

#include <map>
#include <thread>
#include <vector>

template <std::totally_ordered T>
class RangesManager {

    template <std::totally_ordered Key, typename Value>
    using TotallyOrderedMap = std::map<Key, Value>;

public:

    using Start = T;
    using End = T;

private:

    TotallyOrderedMap<End, Start> ranges;
    using _RI = decltype(ranges.begin());

public:

    class RangeIter {
        friend class RangesManager;

        _RI begin;
        RangeIter(_RI b) : begin(b) {}
    public:

        std::pair<Start,End> operator*() {
            return {begin->second, begin->first};
        }

        bool operator==(const RangeIter &other) const {
            return begin == other.begin;
        }

        bool operator!=(const RangeIter &other) const {
            return !(*this == other);
        }

        RangeIter &operator++() {
            ++begin;
            return *this;
        }
    };

    RangesManager(){
    }

    ~RangesManager(){}

    RangesManager(const RangesManager &other) = delete;
    RangesManager &operator=(const RangesManager &other) = delete;

    RangesManager(RangesManager &&other):ranges(std::move(other.ranges)){}

    RangesManager &operator=(RangesManager &&other) {
        ranges = std::move(other.ranges);
    }

    // insert them to the map in reverse order
    bool AddRange(Start start, End end) {
        if (start >= end) return false;
        for (auto iter = ranges.lower_bound(start); iter != ranges.end() && end >= iter->second;) {
            start = std::min(start, iter->second);
            end = std::max(end, iter->first);
            iter = ranges.erase(iter);
        }

        ranges.insert({end, start});
        return true;
    }

    bool IsInRange(const T& value) const {
        auto iter = ranges.lower_bound(value);
        return iter != ranges.end() && iter->first > value && iter->second <= value;
    }

    bool RemoveRange(Start start, End end) {
        if (start >= end) return false;

        std::vector<std::pair<T,T>> addBack;
        bool removed = false;

        for (auto iter = ranges.lower_bound(start); iter != ranges.end() && end >= iter->second;) {
            if (start > iter->second) {
                addBack.push_back({iter->second, start});
            }

            if (end < iter->first) {
                addBack.push_back({end, iter->first});
            }

            iter = ranges.erase(iter);
            removed = true;
        }

        for (auto &p: addBack) {
            AddRange(p.first, p.second);
        }

        return removed;
    }

    RangeIter begin() {
        return RangeIter(ranges.begin());
    }

    RangeIter end() {
        return RangeIter(ranges.end());
    }
};

#endif //RANGESMANAGER_H

Notice how the RangeIter class is nested inside RangesManager. This design allows RangeIter to access the private members of RangesManager, so we can easily iterate over the private map but in a controlled manner.

Sample Test Cases

Below are examples of using the RangesManager class to demonstrate how ranges behave under various scenarios:

bool TestBasic() {
    RangesManager<IpV4> rg;
    auto ip1 = IpV4::FromString("192.168.0.1");
    auto ip2 = IpV4::FromString("192.168.0.10");

    if (!ip1.has_value() || !ip2.has_value()) {
        std::cerr << "TestBasic: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value());

    auto it = rg.begin();
    if (it == rg.end()) {
        std::cerr << "TestBasic: No range added.\n";
        return false;
    }

    const auto& [start, end] = *it;
    if (start.ToString() != "192.168.0.1" || end.ToString() != "192.168.0.10") {
        std::cerr << "TestBasic: Incorrect range: "
                  << start.ToString() << "," << end.ToString() << "\n";
        return false;
    }

    std::cout << "TestBasic passed.\n";
    return true;
}

bool TestOverlappingRanges() {
    RangesManager<IpV4> rg;
    auto ip1 = IpV4::FromString("192.168.0.1");
    auto ip2 = IpV4::FromString("192.168.0.10");
    auto ip3 = IpV4::FromString("192.168.0.5");
    auto ip4 = IpV4::FromString("192.168.0.15");

    if (!ip1.has_value() || !ip2.has_value() || !ip3.has_value() || !ip4.has_value()) {
        std::cerr << "TestOverlappingRanges: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value());
    rg.AddRange(ip3.value(), ip4.value());

    auto it = rg.begin();
    if (it == rg.end()) {
        std::cerr << "TestOverlappingRanges: No range added.\n";
        return false;
    }

    const auto& [start, end] = *it;
    if (start.ToString() != "192.168.0.1" || end.ToString() != "192.168.0.15") {
        std::cerr << "TestOverlappingRanges: Incorrect range: "
                  << start.ToString() << "," << end.ToString() << "\n";
        return false;
    }

    std::cout << "TestOverlappingRanges passed.\n";
    return true;
}

bool TestNonOverlappingRanges() {
    RangesManager<IpV4> rg;
    auto ip1 = IpV4::FromString("10.0.0.1");
    auto ip2 = IpV4::FromString("10.0.0.5");
    auto ip3 = IpV4::FromString("10.0.0.10");
    auto ip4 = IpV4::FromString("10.0.0.20");

    if (!ip1.has_value() || !ip2.has_value() || !ip3.has_value() || !ip4.has_value()) {
        std::cerr << "TestNonOverlappingRanges: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value());
    rg.AddRange(ip3.value(), ip4.value());

    auto it = rg.begin();
    if (it == rg.end()) {
        std::cerr << "TestNonOverlappingRanges: No ranges added.\n";
        return false;
    }

    const auto [start1, end1] = *it;
    if (start1.ToString() != "10.0.0.1" || end1.ToString() != "10.0.0.5") {
        std::cerr << "TestNonOverlappingRanges: Incorrect first range: "
                  << start1.ToString() << "," << end1.ToString() << "\n";
        return false;
    }

    ++it;

    if (it == rg.end()) {
        std::cerr << "TestNonOverlappingRanges: Second range missing.\n";
        return false;
    }

    const auto& [start2, end2] = *it;
    if (start2.ToString() != "10.0.0.10" || end2.ToString() != "10.0.0.20") {
        std::cerr << "TestNonOverlappingRanges: Incorrect second range: "
                  << start2.ToString() << "," << end2.ToString() << "\n";
        return false;
    }

    std::cout << "TestNonOverlappingRanges passed.\n";
    return true;
}

bool TestRemoveRange() {
    RangesManager<IpV4> rg;
    auto ip1 = IpV4::FromString("192.168.1.1");
    auto ip2 = IpV4::FromString("192.168.1.20");
    auto ip3 = IpV4::FromString("192.168.1.10");
    auto ip4 = IpV4::FromString("192.168.1.15");

    if (!ip1.has_value() || !ip2.has_value() || !ip3.has_value() || !ip4.has_value()) {
        std::cerr << "TestRemoveRange: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value());
    rg.RemoveRange(ip3.value(), ip4.value());

    auto it = rg.begin();
    if (it == rg.end()) {
        std::cerr << "TestRemoveRange: Ranges missing after removal.\n";
        return false;
    }

    const auto& [start1, end1] = *it;
    if (start1.ToString() != "192.168.1.1" || end1.ToString() != "192.168.1.10") {
        std::cerr << "TestRemoveRange: Incorrect first range: "
                  << start1.ToString() << "," << end1.ToString() << "\n";
        return false;
    }

    ++it;

    if (it == rg.end()) {
        std::cerr << "TestRemoveRange: Second range missing.\n";
        return false;
    }

    const auto& [start2, end2] = *it;
    if (start2.ToString() != "192.168.1.15" || end2.ToString() != "192.168.1.20") {
        std::cerr << "TestRemoveRange: Incorrect second range: "
                  << start2.ToString() << "," << end2.ToString() << "\n";
        return false;
    }

    std::cout << "TestRemoveRange passed.\n";
    return true;
}

bool TestContainment() {
    RangesManager<IpV4> rg;
    auto ip1 = IpV4::FromString("172.16.0.1");
    auto ip2 = IpV4::FromString("172.16.0.10");
    auto ipCheck1 = IpV4::FromString("172.16.0.5");
    auto ipCheck2 = IpV4::FromString("172.16.0.15");

    if (!ip1.has_value() || !ip2.has_value() || !ipCheck1.has_value() || !ipCheck2.has_value()) {
        std::cerr << "TestContainment: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value());

    if (!rg.IsInRange(ipCheck1.value())) {
        std::cerr << "TestContainment: IP " << ipCheck1.value().ToString()
                  << " should be in range but is not.\n";
        return false;
    }

    if (rg.IsInRange(ipCheck2.value())) {
        std::cerr << "TestContainment: IP " << ipCheck2.value().ToString()
                  << " should not be in range but is.\n";
        return false;
    }

    std::cout << "TestContainment passed.\n";
    return true;
}

bool TestManyIPsInterleaving() {
    RangesManager<IpV4> rg;

    // Step 1: Add initial ranges
    auto ip1 = IpV4::FromString("10.0.0.1");
    auto ip2 = IpV4::FromString("10.0.0.5");
    auto ip3 = IpV4::FromString("10.0.0.10");
    auto ip4 = IpV4::FromString("10.0.0.15");
    auto ip5 = IpV4::FromString("192.168.0.1");
    auto ip6 = IpV4::FromString("192.168.0.10");
    auto ip7 = IpV4::FromString("192.168.0.20");
    auto ip8 = IpV4::FromString("192.168.0.30");

    if (!ip1 || !ip2 || !ip3 || !ip4 || !ip5 || !ip6 || !ip7 || !ip8) {
        std::cerr << "TestManyIPsInterleaving: Failed to parse IPs.\n";
        return false;
    }

    rg.AddRange(ip1.value(), ip2.value()); // Add: 10.0.0.1 - 10.0.0.5
    rg.AddRange(ip3.value(), ip4.value()); // Add: 10.0.0.10 - 10.0.0.15
    rg.AddRange(ip5.value(), ip6.value()); // Add: 192.168.0.1 - 192.168.0.10
    rg.AddRange(ip7.value(), ip8.value()); // Add: 192.168.0.20 - 192.168.0.30

    // Step 2: Validate initial ranges
    std::vector<std::pair<std::string, std::string>> expectedInitialRanges = {
        {"10.0.0.1", "10.0.0.5"},
        {"10.0.0.10", "10.0.0.15"},
        {"192.168.0.1", "192.168.0.10"},
        {"192.168.0.20", "192.168.0.30"}
    };

    size_t idx = 0;
    for (auto [start, end] : rg) {
        if (idx >= expectedInitialRanges.size() || start.ToString() != expectedInitialRanges[idx].first || end.ToString() != expectedInitialRanges[idx].second) {
            std::cerr << "Step 2: Initial range validation failed.\n";
            return false;
        }
        idx++;
    }

    if (idx != expectedInitialRanges.size()) {
        std::cerr << "Step 2: Incorrect number of initial ranges.\n";
        return false;
    }

    // Step 3: Add overlapping ranges
    auto ip9 = IpV4::FromString("10.0.0.4");
    auto ip10 = IpV4::FromString("10.0.0.12");

    rg.AddRange(ip9.value(), ip10.value()); // Merge: 10.0.0.1 - 10.0.0.15

    std::vector<std::pair<std::string, std::string>> expectedAfterOverlap = {
        {"10.0.0.1", "10.0.0.15"},
        {"192.168.0.1", "192.168.0.10"},
        {"192.168.0.20", "192.168.0.30"}
    };

    idx = 0;
    for (auto [start, end] : rg) {
        if (idx >= expectedAfterOverlap.size() || start.ToString() != expectedAfterOverlap[idx].first || end.ToString() != expectedAfterOverlap[idx].second) {
            std::cerr << "Step 3: Overlapping range validation failed.\n";
            return false;
        }
        idx++;
    }

    if (idx != expectedAfterOverlap.size()) {
        std::cerr << "Step 3: Incorrect number of ranges after overlap.\n";
        return false;
    }

    // Step 4: Remove part of a range
    auto ipRemoveStart = IpV4::FromString("192.168.0.5");
    auto ipRemoveEnd = IpV4::FromString("192.168.0.25");

    rg.RemoveRange(ipRemoveStart.value(), ipRemoveEnd.value()); // Split: 192.168.0.1 - 192.168.0.5 and 192.168.0.25 - 192.168.0.30

    std::vector<std::pair<std::string, std::string>> expectedAfterRemoval = {
        {"10.0.0.1", "10.0.0.15"},
        {"192.168.0.1", "192.168.0.5"},
        {"192.168.0.25", "192.168.0.30"}
    };

    idx = 0;
    for (auto [start, end] : rg) {
        if (idx >= expectedAfterRemoval.size() || start.ToString() != expectedAfterRemoval[idx].first || end.ToString() != expectedAfterRemoval[idx].second) {
            std::cerr << "Step 4: Range validation after removal failed.\n";
            return false;
        }
        idx++;
    }

    if (idx != expectedAfterRemoval.size()) {
        std::cerr << "Step 4: Incorrect number of ranges after removal.\n";
        return false;
    }

    // Step 5: Check containment
    auto ipCheck1 = IpV4::FromString("10.0.0.3");
    auto ipCheck2 = IpV4::FromString("192.168.0.7");
    auto ipCheck3 = IpV4::FromString("192.168.0.28");

    if (!ipCheck1 || !ipCheck2 || !ipCheck3) {
        std::cerr << "TestManyIPsInterleaving: Failed to parse check IPs.\n";
        return false;
    }

    if (!rg.IsInRange(ipCheck1.value())) {
        std::cerr << "Step 5: Containment check failed for " << ipCheck1.value().ToString() << " (should be in range).\n";
        return false;
    }

    if (rg.IsInRange(ipCheck2.value())) {
        std::cerr << "Step 5: Containment check failed for " << ipCheck2.value().ToString() << " (should not be in range).\n";
        return false;
    }

    if (!rg.IsInRange(ipCheck3.value())) {
        std::cerr << "Step 5: Containment check failed for " << ipCheck3.value().ToString() << " (should be in range).\n";
        return false;
    }

    std::cout << "TestManyIPsInterleaving passed.\n";
    return true;
}

Conclusion

We have seen how to manage sets of ranges efficiently using a balanced tree and by storing the ranges keyed by their end. This approach simplifies merging, splitting, and lookup. The full code for the RangesManager class can be found here.

Share this post