探讨 Rust 实现自引用结构

Language

December 05, 2024

别名：BT(3-extra) - Flat B+树（Vec like）

– 探索 Rust 编程模型下传统自引用数据结构的最佳实现

– 实验一种基于数组的扁平化节点和键值分离的 B+ 树实现

问题引入

自引用数据结构（Self Referential Structures），指存在递归定义的结构体，最简单的比如链表里的节点，

在 C 里使用指针（Pointer）实现这种定义

struct Node {
    struct Node *next,
    ..
}

在传统的面向对象的语言比如 Java 里，直接使用引用（Reference）

class Node {
    Node next;
}

但对于 Rust 而言，引入指针就破坏了内存安全（Memory safety），使用引用就面临可变性检查。

实在是进退失据，不知如何是好；或者说，按照 Rust 哲学自引用数据结构是一种反模式，事实上它也确实存在内存安全或并发编程方面的脆弱性。

但不管怎样，因为有大量的历史算法沉淀在自引用的结构里（树和图），无论如何都要找到一个在 Rust 里应付这种数据结构的办法，因为缺乏来自语言机制上的大力支持，这并不容易，实际变成了一个考验对 Rust 熟悉程度的一个门槛，像我自己用了 Rust 好几年了，到今天才算找到一点儿感觉，在此写下一个总结性的心得体会。

两种常规方法

一般地讲，有两种常规的方法来实现自引用结构，一种是使用“官方指定”的“作弊器”，另一种是使用“官方使用”的“作弊器”。

官方指定

“官方指定”的作弊器指的是官方文档里推荐的 Rc<RefCell<T>> 套装。

这个套装的两个组件 RefCell 和 Rc 的设计可谓“卧龙凤雏”、“一时瑜亮”。

`RefCell`

RefCell 提供了一种通过不可变引用进行可变修改的办法，原理是直接跳过可变性检查。

pub struct RefCell<T: ?Sized> {
    borrow: Cell<BorrowFlag>,
    // Stores the location of the earliest currently active borrow.
    // This gets updated whenever we go from having zero borrows
    // to having a single borrow. When a borrow occurs, this gets included
    // in the generated `BorrowError`/`BorrowMutError`
    #[cfg(feature = "debug_refcell")]
    borrowed_at: Cell<Option<&'static crate::panic::Location<'static>>>,
    value: UnsafeCell<T>,
}

#[repr(transparent)]
pub struct Cell<T: ?Sized> {
    value: UnsafeCell<T>,
}

#[repr(transparent)]
pub struct UnsafeCell<T: ?Sized> {
    value: T,
}

impl<T: ?Sized> UnsafeCell<T> {
    #[rustc_never_returns_null_ptr]
    pub const fn get(&self) -> *mut T {
        // We can just cast the pointer from `UnsafeCell<T>` to `T` because of
        // #[repr(transparent)]. This exploits std's special status, there is
        // no guarantee for user code that this will work in future versions of the compiler!
        self as *const UnsafeCell<T> as *const T as *mut T
    }
}

impl<T: ?Sized> !Sync for UnsafeCell<T> {}

// Positive values represent the number of `Ref` active. Negative values
// represent the number of `RefMut` active. Multiple `RefMut`s can only be
// active at a time if they refer to distinct, nonoverlapping components of a
// `RefCell` (e.g., different ranges of a slice).
//
// `Ref` and `RefMut` are both two words in size, and so there will likely never
// be enough `Ref`s or `RefMut`s in existence to overflow half of the `usize`
// range. Thus, a `BorrowFlag` will probably never overflow or underflow.
// However, this is not a guarantee, as a pathological program could repeatedly
// create and then mem::forget `Ref`s or `RefMut`s. Thus, all code must
// explicitly check for overflow and underflow in order to avoid unsafety, or at
// least behave correctly in the event that overflow or underflow happens (e.g.,
// see BorrowRef::new).
type BorrowFlag = isize;
const UNUSED: BorrowFlag = 0;

pub struct Location<'a> {
    file: &'a str,
    line: u32,
    col: u32,
}

这个成系列的 XXCell 结构体的设计质量很糟，本来像这样既不能 Send 也不能 Sync ，只是为了绕过可变性检查而设计的结构体根本就没有计划外的使用风险，也不知道它在外部接口的设计上抠抠搜搜些什么 😅，还要分好几个结构体，像这个 Cell 和 RefCell 唯一区别就是 RefCell 可以 Borrow（而 Cell 只能 replace ）。

再说这个 BorrowFlag ，可能就是为了这点儿醋才包的饺子，简直昏了头，本身设计的目的就是为了绕过可变性检查，结果又在这里做 Borrow check ，不是你检查些什么呀 😓？

特别地，为了简短篇幅而没在代码里列出来的 RefCell 的 Borrow 实现也有“巧思”，我一直以为 Borrow 的返回值是 &T 或 &mut T ，结果发现是一层包装： Ref 和 RefMut ，当然标准库的实现一贯如此，这也罢了，那么 Rc，Ref，RefMut 都实现了自动解包（Deref 和 DerefMut ），偏偏 RefCell 不实现又是什么原因呢，喜欢多写一个 .borrow() ，表明显式 Borrow 🤓？

整个标准库里没有任何地方使用 RefCell ，相反其他地方还要为适配它写额外的代码。

`Rc`

Rc 是引用计数器（Reference counter）。

其中 NonNull 技术上的意义是为非 null 指针提供普通对象的语义，比如协变性，比如派生 trait。

pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcInner<T>>,
}

pub struct Weak<T: ?Sized> {
    // This is a `NonNull` to allow optimizing the size of this type in enums,
    // but it is not necessarily a valid pointer.
    // `Weak::new` sets this to `usize::MAX` so that it doesn’t need
    // to allocate space on the heap. That's not a value a real pointer
    // will ever have because RcInner has alignment at least 2.
    // This is only possible when `T: Sized`; unsized `T` never dangle.
    ptr: NonNull<RcInner<T>>,
}

pub struct NonNull<T: ?Sized> {
    pointer: *const T,
}

// This is repr(C) to future-proof against possible field-reordering, which
// would interfere with otherwise safe [into|from]_raw() of transmutable
// inner types.
#[repr(C)]
struct RcInner<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

impl<T> Rc<T> {
    #[cfg(not(no_global_oom_handling))]
    pub fn new(value: T) -> Rc<T> {
        // There is an implicit weak pointer owned by all the strong
        // pointers, which ensures that the weak destructor never frees
        // the allocation while the strong destructor is running, even
        // if the weak pointer is stored inside the strong one.
        unsafe {
            Self::from_inner(
                // get 'static lifetimes raw pointer
                Box::leak(
                    Box::new(
                        RcInner { strong: Cell::new(1), weak: Cell::new(1), value }))
                    .into(),
            )
        }
    }
    
    pub fn downgrade(this: &Self) -> Weak<T> {
        this.inner().inc_weak();
        Weak { ptr: this.ptr }
    }
}
   
impl<T: ?Sized> Rc<T> {
    #[inline]
    unsafe fn from_inner(ptr: NonNull<RcInner<T>>) -> Self {
        Self { ptr }
    }
}

impl<T: ?Sized> Drop for Rc<T> {
    fn drop(&mut self) {
        unsafe {
            self.inner().dec_strong();
            if self.inner().strong() == 0 {
                ptr::drop_in_place(&mut (*self.ptr.as_ptr()).value);
            }
        }
    }
}

impl<T: ?Sized> Drop for Weak<T, A> {
    fn drop(&mut self) {
        let inner = if let Some(inner) = self.inner() { inner } else { return };

        inner.dec_weak();
        // the weak count starts at 1, and will only go to zero if all
        // the strong pointers have disappeared.
        if inner.weak() == 0 {
            unsafe {
                self.alloc.deallocate(
                    self.ptr.cast(), 
                    Layout::for_value_raw(self.ptr.as_ptr()
                ));
            }
        }
    }
}

看代码的时候我就在思考一个问题，为什么要单独搞一个 Weak ，它不就是不“拥有”对象所有权的普通指针吗，怎么机制搞那么复杂，先让 Rc 持有数据 T，再让 Weak 持有T 的容器 RcInner ？

其次，这个引用计数器到底有什么用，计数的机制本身还是生命周期，那直接用生命周期的机制不就行了？要知道引用计数器，是破坏了单一所有权原则的，并且为了给循环引用打补丁，才引入弱引用的机制（补丁还不能从根本上解决问题）。

把这个问题延展开，因为 Rust 是单一所有权，所以可以静态分析生命周期，所以实际既不需要 Tracing 方式的 GC ，也不需要引用计数器这种方式的 GC 。

标准库在 ThreadLocal 的实现里仅此一处的使用了 Rc 。

官方使用

标准库里的自引用数据结构都是特化类型的指针，比如以下是标准库 BTreeMap 里的“树节点”类型

struct LeafNode<K, V> {
    /// We want to be covariant in `K` and `V`.
    parent: Option<NonNull<InternalNode<K, V>>>,

    /// This node's index into the parent node's `edges` array.
    /// `*node.parent.edges[node.parent_idx]` should be the same thing as `node`.
    /// This is only guaranteed to be initialized when `parent` is non-null.
    parent_idx: MaybeUninit<u16>,

    /// The number of keys and values this node stores.
    len: u16,

    /// The arrays storing the actual data of the node. Only the first `len` elements of each
    /// array are initialized and valid.
    keys: [MaybeUninit<K>; CAPACITY],
    vals: [MaybeUninit<V>; CAPACITY],
}

#[repr(C)]
// gdb_providers.py uses this type name for introspection.
struct InternalNode<K, V> {
    data: LeafNode<K, V>,

    /// The pointers to the children of this node. `len + 1` of these are considered
    /// initialized and valid, except that near the end, while the tree is held
    /// through borrow type `Dying`, some of these pointers are dangling.
    edges: [MaybeUninit<NonNull<LeafNode<K, V>>>; 2 * B],
}

更好的方法

可以合并 Rc 和 RefCell 的功能，并且去掉对于 Rust 来说莫名奇妙的计数指针，改为单一所有权的指针和由它衍生出的普通指针。

这种设计和实现优化了人机交互，重新确立了单一所有权原则，从根本上消除了循环引用带来的内存泄漏的可能性。[^5]

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct Ptr<T: ?Sized> {
    value: NonNull<T>,
}

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(transparent)]
pub struct OwnedPtr<T> {
    value: NonNull<T>,
}

impl<T> OwnedPtr<T> {
    pub fn new(value: T) -> Self {
        Self {
            value: NonNull::new(Box::into_raw(Box::new(value))).unwrap(),
        }
    }
}

impl<T: ?Sized> OwnedPtr<T> {
    pub fn from_box(value: Box<T>) -> Self {
        Self {
            value: NonNull::new(Box::into_raw(value)).unwrap(),
        }
    }

    pub fn as_ref(&self) -> &T {
        unsafe { &* self.value.as_ptr() }
    }

    pub fn as_mut(&self) -> &mut T {
        unsafe { &mut *self.value.as_ptr() }
    }

    pub fn ptr(&self) -> Ptr<T> {
        Ptr { value: self.value }
    }
}

impl<T: ?Sized> Drop for OwnedPtr<T> {
    fn drop(&mut self) {
        unsafe {
            let _ = Box::from_raw(self.value.as_ptr());
        }
    }
}

impl<T: ?Sized> Deref for OwnedPtr<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.as_ref()
    }
}

impl<T: ?Sized> DerefMut for OwnedPtr<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.as_mut()
    }
}

impl<T: ?Sized> Borrow<T> for OwnedPtr<T> {
    fn borrow(&self) -> &T {
        self.as_ref()
    }
}

impl<T: ?Sized> BorrowMut<T> for OwnedPtr<T> {
    fn borrow_mut(&mut self) -> &mut T {
        self.as_mut()
    }
}

impl<T: ?Sized> Ptr<T> {
    pub fn as_ref(&self) -> &T {
        unsafe { &* self.value.as_ptr() }
    }

    pub fn as_mut(&self) -> &mut T {
        unsafe { &mut *self.value.as_ptr() }
    }

    pub fn ptr_eq(this: &Self, other: &Self) -> bool {
        std::ptr::eq(this.value.as_ptr(), other.value.as_ptr())
    }
}

impl<T: ?Sized> Clone for Ptr<T> {
    fn clone(&self) -> Self {
        Self { value: self.value }
    }
}

impl<T: ?Sized> Copy for Ptr<T> {}

impl<T: ?Sized> Deref for Ptr<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.as_ref()
    }
}

impl<T: ?Sized> DerefMut for Ptr<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.as_mut()
    }
}

impl<T: ?Sized> Borrow<T> for Ptr<T> {
    fn borrow(&self) -> &T {
        self.as_ref()
    }
}

impl<T: ?Sized> BorrowMut<T> for Ptr<T> {
    fn borrow_mut(&mut self) -> &mut T {
        self.as_mut()
    }
}

其中有几点需要解释：

自动派生的那些 trait 实际上就是 NonNull 自动实现的 trait ，也是最常用的几个用于只读属性的 trait ；
为 Ptr 实现了 Copy trait ，因为它并不拥有数据，我们把它当一个既不为 NULL 也不悬空的有效指针一样使用，反过来 OwnedPtr 连 Clong trait 都没有实现，它应该像一个暴露出指针接口的 Unique 指针，即使实现克隆也应该是是在克隆一个新的值放到堆上；
为 Ptr<T> 和 OwnedPtr<T> 实现了 Deref 和 DerefMut ，这样使用它们就好像直接在 T 上使用一样；
为 Ptr<T> 实现了 Borrow 和 BorrowMut ，因为标准库的数据结构的 get 类方法需要实现这类 trait；
我们没有实现 Send 和 Sync，这意味着它们不会被跨线程的使用。

使用示例

这样我们自己实现的 B+ 树可能就是如下这样：

enum Node<K, V, const M: usize> {
    Leaf {
        entries: Vec<KVEntry<K, OwnedPtr<V>>>,
        next: Option<Ptr<Self>>,
        paren: Option<Ptr<Self>>,
        _marker: PhantomData<[(); M]>
    },
    Internal {
        children: Vec<KVEntry<K, OwnedPtr<Self>>>,
        paren: Option<Ptr<Self>>,
        _marker: PhantomData<[(); M]>
    },
}

pub struct BPT<K, V, const M: usize = 32> {
    len: usize,
    root: OwnedPtr<Node<K, V, M>>,
}

第三种方法：放弃指针

在现行编程框架下仍然可以避免使用指针来实现自引用结构，方法是在根部持有所有节点，使用索引（比如下标）作为传统指针的替代，不过这样有一个限制，就是节点的索引不能变。

首先如果索引是一个 Map ，比如 HashMap 里 Key，那么这是一个方便解决的问题，但是使用 Map 就引入了额外的依赖，这是我们很多时候想要避免的事情，同时也会损失一些性能；

这样的话，考虑一个简单的基于向量的实现，用它的下标作为索引，得到一个惰性删除向量，这样当删除节点的时候不会改变既有节点的位置，而当插入新数据时可以复用被标记为删除的位置。

#[derive(Default, Clone)]
pub struct LazyDeleteVec<T> {
    data: Vec<Option<T>>,
    deleted: Vec<usize>,
}

impl<T> LazyDeleteVec<T> {
    pub const fn len(&self) -> usize {
        self.data.len() - self.deleted.len()
    }

    pub const fn is_empty(&self) -> bool {
        self.len() == 0
    }

    pub fn with_capacity(capacity: usize) -> Self {
        Self {
            data: Vec::with_capacity(capacity),
            deleted: vec![],
        }
    }

    pub fn from_parts(data: Vec<Option<T>>, deleted: Vec<usize>) -> Self {
        Self { data, deleted }
    }

    pub fn push(&mut self, val: T) -> usize {
        if let Some(idx) = self.deleted.pop() {
            self.data[idx] = Some(val);
            idx
        } else {
            self.data.push(Some(val));
            self.data.len() - 1
        }
    }

    pub fn remove(&mut self, idx: usize) -> Option<T> {
        debug_assert!(idx < self.data.len(), "{idx} > len {}", self.data.len());

        let old = self.data[idx].take();

        if old.is_some() {
            self.deleted.push(idx);
        }

        old
    }

    pub fn iter(&self) -> impl Iterator<Item = &T> {
        self.data
            .iter()
            .filter(|x| x.is_some())
            .map(|x| x.as_ref().unwrap())
    }
}

这样的实现虽然多了一层间接访问，但在一个综合了插入、删除和查询的综合测试看，这样结构的一个 B+ 树实现与标准库里的 BTreeMap 的性能几乎一致。

测试的配置

元组第一个数字是概率权重，第二个是操作名

let mut roller = RandomRoller::with_candicates(vec![
    // get
    (30, Q(0)),
    // range
    (45, R(Unbounded, Unbounded)),
    // insert
    (30, A(0, 0)),
    // remove 
    (10, D(0)),
]);

示例片段

完整代码

#[derive(Debug, Clone, Copy)]
pub enum Node<K, const M: usize> {
    Leaf {
        /// (key, dataid)
        entries: PartialInitArray<KVEntry<K, usize>, M>,
        next: Option<usize>,
        /// *NOTE:* For root node, `paren` is nonsense.
        paren: usize,
    },
    Internal {
        /// (min-key of child, nodeid)
        children: PartialInitArray<KVEntry<K, usize>, M>,
        paren: usize,
    },
}


#[derive(Clone)]
pub struct FlatBPT<K, V, const M: usize = 32> {
    /// uncouple data manage (avoid unnecessary mutability check)
    data: LazyDeleteVec<*mut V>,

    /// nodes[0] would be root and should always not to be deleted
    nodes: LazyDeleteVec<Node<K, M>>,
    root: usize,
}


#[derive(Debug)]
pub struct PartialInitArray<T, const C: usize> {
    len: usize,
    arr: [MaybeUninit<T>; C],
}