Decided I would use this time in writing a blog post to be productive and review data structures. Especially one that wasn’t covered in CS 261.
What is a Trie? This is also called a prefix tree, and this is a tree data structure that is used for retrieval of a key in a dataset of strings. This data structure can be used in various applications such as:
- Autocomplete
- Spell Checker
- IP Routing (Longest prefix matching)
- T9 Predictive Text
- Solving Word Games
But why a trie over another data structure like a hash table? A hash table has O(1) time complexity looking for a key, but it’s not efficient in 1) finding keys with a common prefix, and 2) enumerating a dataset of strings in lexicographical order. There can also be hash collisions with a hash table. A trie can have O(m) time complexity, where m is the key length, and searching for a key in a balanced tree costs O(mlogn) time complexity.
A trie is a rooted tree, and its nodes generally have two fields: 1) Links to its children and 2) Boolean field that specifies whether the node corresponds to the end of the key or is just a prefix.
The following is an implementation of a Trie in Python, with the following conditions (source: https://leetcode.com/problems/implement-trie-prefix-tree/description/).
Trie()Initializes the trie object.void insert(String word)Inserts the stringwordinto the trie.boolean search(String word)Returnstrueif the stringwordis in the trie (i.e., was inserted before), andfalseotherwise.boolean startsWith(String prefix)Returnstrueif there is a previously inserted stringwordthat has the prefixprefix, andfalseotherwise.
class Trie:
def __init__(self):
"""Initialize a TrieNode."""
self.root = TrieNode()
def insert(self, word: str) -> None:
"""Inserts a word into the trie."""
cur_node = self.root
for char in word:
if char not in cur_node.children:
cur_node.children[char] = TrieNode()
cur_node = cur_node.children[char]
cur_node.is_end = True
def search(self, word: str) -> bool:
"""Returns True if word is in the trie, else False."""
cur_node = self.root
for char in word:
if char not in cur_node.children:
return False
cur_node = cur_node.children[char]
return cur_node.is_end
def startsWith(self, prefix: str) -> bool:
"""Returns True if word exists in the trie that starts with the given prefix."""
cur_node = self.root
for char in prefix:
if char not in cur_node.children:
return False
cur_node = cur_node.children[char]
return True
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
# Your Trie object will be instantiated and called as such:
# obj = Trie()
# obj.insert(word)
# param_2 = obj.search(word)
# param_3 = obj.startsWith(prefix)
In this case, the insertion of a key to the trie is O(m) time complexity, where m is the key length, and O(m) space complexity.
Searching for a key in the trie is O(m) time complexity, since in the worst case, we perform m operations. The space complexity is O(1).
Searching for a prefix in the trie is also O(m) time complexity and O(1) space complexity. This is one benefit of a trie over a hash map.