Dictionary-Based Syntactic Pattern Recognition Using Tries

Ghada Badr

Abstract. This paper deals with the problem of estimating a transmitted string by processing the corresponding string Y , which is a noisy version of . We assume that Y contains substitution, insertion and deletion errors, and that is an element of a finite (but possibly, large) dictionary, H. The best estimate X+ of is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X; Y ) between X and Y , for all X 2 H. All existing techniques for computing X+ requires a separate evaluation of the edit distances between Y and every X 2 H. In this paper, we show how we can evaluate D(X; Y ) for every X 2 H simultaneously, without resorting to any parallel computations. This is achieved by resorting to the use of an additional data structure called the Linked List of Prefixes (LLP), which is built on top of the trie representation of the dictionary. The computational advantage (for a dictionary made from the set of 1023 most common words augmented by computer-related words) gained is at least 50% and 80% measured in terms of the time and the number of operations required respectively. The accuracy forfeited is negligible.