0336 - Palindrome Pairs

Algorithm

July 12, 2023

通常为了分类页面的干净，不将LeetCode题解的文章放到算法分类里，但这一篇实在精彩，涉及了其他算法没有介绍过的，关于大量子串比较的通解性思路

题干

问题描述

破题

源代码

有必要特别强调下这道题的数据特点：

1 <= words.length <= 5000
0 <= words[i].length <= 300
words[i] consists of lowercase English letters.

如果单从题目名字出发，认为这是一个核心在于用更低时间复杂度寻找回文串的问题，那就是从根本上搞错了方向。

把单词数记为 $n$ ，单词长度记为 $k$ ：

这里情况是 $n$ 比 $k$ 高一个数量级，性能瓶颈首先在于单词数而不是单词长度；
在 300 这个长度上，$O(k^2)$ 的简单算法通常远远好于时间复杂度为 $O(k)$ 的那些复杂算法

而这个问题实际上，如果是每个单词两两比较，那么无论如何都会 TLE（Time Limit Exceed），必须寻找一种能降低单词数的时间复杂度的方法。

这也就引出了本文介绍的大量子串比较的通解性思路。

Tips:

给出的每个单词都是独特的；
在答案里排除标号相同，也就是单词自身就是回文的情况

解① Trie：

当提到多个子串的比较时，我们就可以想到前缀树（Trie）这种思路¹，我们用它保存每个单词的后缀²。

如果作为前缀的单词长度 $\geqslant$ 作为后缀的单词长度：

在单词后缀的最后一个字母上标记单词的索引，表示这是一个单词的结束，此时应该检查前缀单词的其余部分是否为空或者是一个回文串。

否则：

在把单词加入到 Trie 的时候，后缀每向前一位，就要检查下剩余部分是否构成一个回文串，如果是的话，就把这个单词的索引储存在到该位置节点里。

节点的结构如下：

struct PostfixTrieNode {
    is_word: Option<usize>,
    rest_palindromes: Vec<usize>,
    children: [Option<usize>; 26],
}

struct PostfixTrie {
    nodes: Vec<PostfixTrieNode>,
}

is_word 是当前缀单词长度较长或相等时，表示有一个后缀单词结束了，保存该后缀单词的索引；
rest_palindromes 是当前缀单词较短，表示当前缀单词结束后，后缀的其余部分仍然构成回文的那些单词的索引；
children 保存 Trie 下一级节点（的索引），保存索引而不是节点本身，是为了规避 Rust 严格的数据所有权检查

树的创建和节点的创建：

impl PostfixTrie {
    fn new() -> Self {
        Self {
            nodes: vec![PostfixTrieNode::new()],
        }
    }

    fn push_child(&mut self, p: usize, i: usize) -> usize {
        let node_i = self.nodes.len();

        self.nodes.push(PostfixTrieNode::new());
        self.nodes[p].children[i] = Some(node_i);

        node_i
    }
}

impl PostfixTrieNode {
    fn new() -> Self {
        Self {
            is_word: None,
            rest_palindromes: vec![],
            children: [None; 26],
        }
    }
}

根节点就是索引为 $0$ 的节点。

加入一个单词：

对于一个长度为 $k$ 的单词来讲，需要考虑从没有匹配到匹配到最后一个字符，一共 $k$ 种情况，其中没有匹配针对的前缀是空串的情况。最后在最后一个字符的 Trie 节点上打上单词结束的标记。

impl PostfixTrie {
    fn add_word(&mut self, word_i: usize, word: &[u8]) {
        let mut root = 0;

        for i in (1..=word.len()).rev() {
            if is_palindrome(&word[0..i]) {
                self.nodes[root].rest_palindromes.push(word_i);
            }

            let idx = rank(word[i-1]);

            if let Some(child) = self.nodes[root].children[idx] {
                root = child;
            }
            else {
                root = self.push_child(root, idx);
            }
        }

        self.nodes[root].is_word = Some(word_i);
    }
}

搜索匹配

搜索匹配的时候，稍微复杂一些，必须跟踪匹配的情况，如果前缀单词遍历完后没有发生失配，就需要检查此时是否也是后缀单词的一个结束（也就是前后缀长度相等的情况）并且把可能的后缀单词中没结束并且剩余部分是回文加入答案。

追踪匹配，在循环外单独处理有一个好处，就是适用于前缀是空串的情况。

impl PostfixTrie {    
	fn search_palindrome(&self, word_i: usize, word: &[u8]) -> Vec<Vec<i32>> {
        let mut pairs = vec![];

        let mut root = 0;
        let mut matched = true;

        for i in 0..word.len() {
            if let Some(j) = self.nodes[root].is_word {
                if is_palindrome(&word[i..]) {
                    pairs.push(vec![word_i as _, j as _]);
                }
            }

            let idx = rank(word[i]);

            if let Some(child) = self.nodes[root].children[idx] {
                root = child;
            }
            else {
                matched = false;
                break;
            }
        }

        if matched {
            if let Some(j) = self.nodes[root].is_word {
                if j != word_i {
                    pairs.push(vec![word_i as _, j as _]);
                }
            }

            pairs.extend(self.nodes[root]
                .rest_palindromes
                .iter()
                .cloned()
                .map(|j| vec![word_i as _, j as _])
            );
        }

        pairs
    }
}

完整过程：

pub fn solve(words: Vec<String>) -> Vec<Vec<i32>> {
    let mut trie = PostfixTrie::new();

    for (i, word) in words.iter().enumerate() {
        trie.add_word(i, word.as_bytes());
    }

    let mut ans = vec![];

    for (i, word) in words.iter().enumerate() {
        ans.extend(trie.search_palindrome(i, word.as_bytes()));
    }

    ans
}

总结：

总地来说 Trie 也并不能特别地节省内存，总长度为 $n$ 的串存储到 Trie 上也仍然需要 $O(n)$ 的时间复杂度，唯一明显好处是可以缩短单词地匹配范围，把单词比较次数从原来的 $O(n^2)$ 降低为 $O(n)$ 。

对于本题，Rust Trie 实现大概是 276 ms / 373 MB 的水平³。

这个级别的内存消耗显然过分了，特别是对于 Rust 而言，按照以往经验，一个好的 Rust 题解的内存消耗应该在 10 MB 以内，几百 MB 的消耗与其他语言相比也非常糟糕；

时间性能勉强还算可以，我知道 C++ 的题解里的最好地有 200+ ms ，而 Java 的题解最好地可以达到 100+ ms⁴ ，相较之下，这个表现显然还不够好。

解② 长度基数：

这个思路非常类似于基数排序，它成立的关键是单词的长度非常有限，如果能事先哈希所有的单词，然后遍历每个单词的时候根据它的长度范围，构造后缀，在哈希表上查找符合后缀的单词。

对于每个单词检查所有长度不超过它的单词，在做回文对判断时，必须要有较长的那一个单词的信息，然后才能根据它的后缀，检查可能的较短的那个词是否存在，不管它们哪一个在前，哪一个在后。

时间优化：

可以事先统计所有单词的长度，排除掉实际上不存在的长度可能。

空间节省：

在做单词哈希的时候，并不保存单词的反串儿，而是原串的分片，而在查询的时候动态创建反串，这利用了回文的对称性质，哈希表上保存的分片只是原串的引用，不必因此创建新的字符串，而搜索时创建的反串又是一次性的，我们可以非常有把握地认为优化会这个反串分配的空间会被重新利用。

pub fn solve(words: Vec<String>) -> Vec<Vec<i32>> {
    let mut words_map = std::collections::HashMap::new();
    let mut len_maps = std::collections::BTreeSet::new();

    for (i, word) in words.iter().enumerate() {
        words_map.insert(word.as_bytes(), i);
        len_maps.insert(word.len());
    }

    let lens: Vec<usize> = len_maps.into_iter().collect();

    let mut ans = vec![];

    for (i, word) in words.iter().enumerate() {
        let word = word.as_bytes();

        let rev_word = &word.iter().rev().cloned().collect::<Vec<u8>>();
        let n = word.len();

        for &k in &lens[..lens.binary_search(&n).unwrap()] {
            // word is prefix
            // word[0..k] =R= rev_word[0..k]
            if let Some(&j) = words_map.get(&rev_word[n-k..]) {
                if is_palindrome_or_empty(&word[k..]) {
                    ans.push(vec![i as _, j as _]);
                }
            }

            // word is postfix
            // word[n-k..] => rev_word[n-k..]
            if let Some(&j) = words_map.get(&rev_word[..k]) {
                if is_palindrome_or_empty(&word[..n-k]) {
                    ans.push(vec![j as _, i as _]);
                }
            }
        }

        if let Some(&j) = words_map.get(&rev_word[..]) {
            if word > rev_word {
                ans.push(vec![i as _, j as _]);
                ans.push(vec![j as _, i as _]);
            }
        }
    }

    ans
}


/// Quickest method on len(word) <= 300
#[inline]
fn is_palindrome_or_empty(s: &[u8]) -> bool {
    s.iter().eq(s.iter().rev())
}

总结：

单词长度的统计和排序是 $O(n\text{log}n)$ ，对于本题，$\text{log}n$ 相当于一个非常小的常数，因此这段开销可以忽略不计，而相比于 Trie 解，不需要检查每个串的所有后缀是否是回文，而只检查部分后缀（存在该长度的串），并且有更好的局部性，因此提高了性能。

性能表现：107 ms / 6.6 MB 。

运行时间比 Trie 快了一倍，但也只是和 Java 的最好情况打平，只能说差强人意，不过内存的占用控制在 10 MB 以内，还是令人非常满意的。

解③ 串排序：

有什么能比长度基数的比较更快的方法吗？

在前面基数长度的题解中，仍然是有无谓地单词比较，最好的办法是能只比较确定有共同回文部分的串。

这个方法的思路并没有出现在热门题解中，是我从 Python 的最佳提交中看来的，非常简单，而且具有通用性，在很多其他问题上也有这样的处理思路。

如果把所有单词的正串和反串放在一起排序，那么按照字符串比较的字典序⁵，会有如下几个关键性质：

有最长公共前缀的字符串会紧挨在一起，而且短的在前，长的在后，不管它们是正串还是反串，这都不影响回文的判断；
如果存在构成回文对的一对单词的正反串，那么其中的短串一定是长串的前缀，而且两个串的正反性是不同的；
反之从某个串 $s_i$ 的位置 $i$ 向后检查，如果发现了一个前缀不包含 $s_i$ 的串 $s_j$ ，那显然 $s_j$ 以及之后的串肯定不与 $s_i$ 构成回文对

pub fn solve(words: Vec<String>) -> Vec<Vec<i32>> {
    let mut words_and_revs = Vec::with_capacity(words.len() * 2);

    for (i, word) in words.into_iter().enumerate() {
        let forward = word.into_bytes();
        let backward = forward.iter().rev().cloned().collect();

        words_and_revs.push((forward, true, i as i32));
        words_and_revs.push((backward, false, i as i32));
    }

    words_and_revs.sort_unstable();

    let mut ans = vec![];

    for (i, (short, short_is_forward, short_i)) in
        words_and_revs.iter().enumerate()
    {
        for (long, long_is_forward, long_i) in words_and_revs[i + 1..].iter() {
            if short_is_forward == long_is_forward {
                continue;
            }

            if long.starts_with(&short) {
                if long_i != short_i && is_palindrome_or_empty(&long[short.len()..]) {
                    if *long_is_forward {
                        ans.push(vec![*long_i, *short_i]);
                    }
                    else {
                        ans.push(vec![*short_i, *long_i]);
                    }
                }
            }
            else {
                break;
            }
        }
    }

    ans
}

总结：

排序的方法省掉了所有不必要的串的比较，只检查每个前后缀对称的单词对儿一次回文，局部性也不错，拥有理论和实际的最佳时间性能，唯一难受的点在于数据读取的方向是固定的，没有高效的从后向前读的方法，这使得我们不得不实际上为每一个反串创建一份正向的版本，这增加了一些内存的开销。

性能表现：61 ms / 7.7 MB

没什么好说的，就是双优地时、空表现。

串排序（内存节省版本）

接下来我们带着做实验的心态，尝试用无实体数据的方法表示反串，从而节省内存。

静态类型语言要求严格地向量是单态的，因此对于正串和反串，都需要同一类型的包装，需要实现它们的比较方法：

显然由于反串的存在，它们的比较方法只能逐个字节进行，这不影响时间复杂度，但很可能极大地影响实际的性能。

这里使用堆上地动态 Trait 对象来统一正串和反串创建的两种不同的迭代器。

#[derive(PartialEq, Eq)]
struct Slice<'a> {
    raw: &'a [u8],
    dir: bool
}

impl<'a> Slice<'a> {
    fn new(raw: &'a [u8], is_forward: bool) -> Self {
        Self { raw, dir: is_forward }
    }

    #[inline]
    fn len(&self) -> usize {
        self.raw.len()
    }

    fn iter(&'a self) -> Box<dyn Iterator<Item = &u8> + 'a> {
        if self.dir {
            Box::new(self.raw.iter())
        }
        else {
            let mut i = self.raw.len();

            Box::new(std::iter::from_fn(move || {
                if i == 0 {
                    None
                }
                else {
                    i -= 1;
                    Some(&self.raw[i])
                }
            }))
        }
    }
}

impl<'a> PartialOrd for Slice<'a> {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        self.iter().partial_cmp(other.iter())
    }
}


impl<'a> Ord for Slice<'a> {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        self.partial_cmp(&other).unwrap()
    }
}

同时由于不存在实际的串结构，那么 startswith 和余部回文检测的方法也要改写：

impl<'a> Slice<'a> {    
	fn starts_with(&self, other: &Self) -> bool {
        let mut it = self.iter();

        for e in other.iter() {
            if let Some(pe) = it.next() {
                if e != pe {
                    return false;
                }
            }
            else {
                return false
            }
        }

        true
    }

    fn rem_is_palindrome_or_empty(&self, skipped: usize) -> bool {
        if self.dir {
            is_palindrome_or_empty(&self.raw[skipped..])
        }
        else {
            is_palindrome_or_empty(&self.raw[..self.len()-skipped])
        }
    }
}

主过程方法：

pub fn solve(words: Vec<String>) -> Vec<Vec<i32>> {
    let mut words_and_revs = Vec::with_capacity(words.len() * 2);

    for (i, word) in words.iter().enumerate() {
        words_and_revs.push((Slice::new(word.as_bytes(), true), i as i32));
        words_and_revs.push((Slice::new(word.as_bytes(), false), i as i32));
    }

    words_and_revs.sort_unstable();

    let mut ans = vec![];

    for (i, (short, short_i)) in
        words_and_revs.iter().enumerate()
    {
        for (long,  long_i) in words_and_revs[i+1..].iter() {
            if short.dir == long.dir {
                continue;
            }

            if long.starts_with(&short) {
                if long_i != short_i && long.rem_is_palindrome_or_empty(short.len()) {
                    if long.dir {
                        ans.push(vec![*long_i, *short_i]);
                    }
                    else {
                        ans.push(vec![*short_i, *long_i]);
                    }
                }
            }
            else {
                break;
            }
        }
    }

    ans
}

总结：

运行性能：323 ms / 5.8 MB

动态地逐字节比较确实非常慢，但确实省到了内存，至少内存上可以说 beats 100% 。

注解

很多批量查询地算法都使用了这种结构，比如基于KMP 前缀数组和 Trie 的 AC 自动机（AC Automaton） ↩
或者每个单词串的反串儿的前缀 ↩
有一种看起来会节省内存的调整是把节点上为孩子预先分配的数组结构换成哈希表，但这不改变结构的本质上，毕竟哈希表的基础也是数组，只是时间与空间的平衡因子不同，实际测试结果也支持了这样的判断：换成哈希表后消耗的空间减半，但运行时间加倍 ↩
很符合我对 C++ 的一贯想象，值得一提的是 LeetCode 平台使用得还是 clang ，如果是 gnu g++ 那局面更不敢想象了 ↩
线性结构的比较默认采用字典序，它可以这样理解：从首元素开始两两比较，直到发现不同元素或者两个串都耗尽，较短的那一个串后面补空，相当于是权值最小的元素。但是有时候有些领域为了它们的某些方便，会判断较短的串一定比较长的串要小，但这种比较方法不能称之为字典序 ↩