Decided I would use this time in writing a blog post to be productive and review data structures. Especially one that wasn’t covered in CS 261.
What is a Trie? This is also called a prefix tree, and this is a tree data structure that is used for retrieval of a key in a dataset of strings. This data structure can be used in various applications such as:
- Autocomplete
- Spell Checker
- IP Routing (Longest prefix matching)
- T9 Predictive Text
- Solving Word Games
But why a trie over another data structure like a hash table? A hash table has O(1) time complexity looking for a key, but it’s not efficient in 1) finding keys with a common prefix, and 2) enumerating a dataset of strings in lexicographical order. There can also be hash collisions with a hash table. A trie can have O(m) time complexity, where m is the key length, and searching for a key in a balanced tree costs O(mlogn) time complexity.
A trie is a rooted tree, and its nodes generally have two fields: 1) Links to its children and 2) Boolean field that specifies whether the node corresponds to the end of the key or is just a prefix.
The following is an implementation of a Trie in Python, with the following conditions (source: https://leetcode.com/problems/implement-trie-prefix-tree/description/).
Trie()
Initializes the trie object.void insert(String word)
Inserts the stringword
into the trie.boolean search(String word)
Returnstrue
if the stringword
is in the trie (i.e., was inserted before), andfalse
otherwise.boolean startsWith(String prefix)
Returnstrue
if there is a previously inserted stringword
that has the prefixprefix
, andfalse
otherwise.
class Trie:
def __init__(self):
"""Initialize a TrieNode."""
self.root = TrieNode()
def insert(self, word: str) -> None:
"""Inserts a word into the trie."""
cur_node = self.root
for char in word:
if char not in cur_node.children:
cur_node.children[char] = TrieNode()
cur_node = cur_node.children[char]
cur_node.is_end = True
def search(self, word: str) -> bool:
"""Returns True if word is in the trie, else False."""
cur_node = self.root
for char in word:
if char not in cur_node.children:
return False
cur_node = cur_node.children[char]
return cur_node.is_end
def startsWith(self, prefix: str) -> bool:
"""Returns True if word exists in the trie that starts with the given prefix."""
cur_node = self.root
for char in prefix:
if char not in cur_node.children:
return False
cur_node = cur_node.children[char]
return True
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
# Your Trie object will be instantiated and called as such:
# obj = Trie()
# obj.insert(word)
# param_2 = obj.search(word)
# param_3 = obj.startsWith(prefix)
In this case, the insertion of a key to the trie is O(m) time complexity, where m is the key length, and O(m) space complexity.
Searching for a key in the trie is O(m) time complexity, since in the worst case, we perform m operations. The space complexity is O(1).
Searching for a prefix in the trie is also O(m) time complexity and O(1) space complexity. This is one benefit of a trie over a hash map.