Lower Bounds for Edit Distance Estimation

Speaker: Alexandr Andoni , MIT CSAIL
Date: March 22 2007
Time: 4:00PM to 5:15PM
Location: 32-G575
Host: Ronitt Rubinfeld, MIT CSAIL
Contact: Joanne Hanley, 617 253 6054, joanne@csail.mit.edu
Relevant URL: http://theory.csail.mit.edu/~madhu/algcomp/alex-abs.html
Edit distance between two strings is the number of insertions/deletions/substitutions needed to transform one string into the other. This distance is of key importance in fields like computational biology, text processing, and others. Some of the most interesting problems involving edit distance are:
1) Compute distance between two strings;
2) Design a small-space sketching algorithm (sketch=a short representation of a string, permitting distance estimation between two strings given sketches only).
Edit distance is a notoriously "hard" distance. Although there is a classic quadratic time algorithm for computing the edit distance, all known near-linear algorithms achieve at best a polynomial approximation. Furthermore, the best constant space sketching algorithm achieves only roughly 2^{sqrt{log d}} approximation, where d is the length of the strings.
Natural question is thus: why is edit distance hard (or is it)? The only known (non-trivial) lower bounds for edit distance are the non-embeddability results. The best such result says that we cannot embed edit distance into l_1 with distortion better than Omega(log d). (We note that the upper bound for sketching is obtained through an embedding into l_1.) While non-embeddability into l_1 is a good evidence for hardness, it does not imply much about other approaches to the problem. For example, it does not rule out embedding into the square of l_2 with O(1) distortion -- which would be enough for, say, constant space sketching with O(1) approximation. More generally, non-embeddability results are statements about /geometric properties/ of edit distance, and do not necessarily say much about its /computational properties/.
In this work, we prove first lower bounds on computational properties of edit distance. For this, we study the communication complexity setting, where two players, Alice and Bob, each have a string, and they estimate the edit distance up to an approximation, by exchanging a number of bits. We lower bound the number of bits exchanged, obtaining as corollaries, \log^c d non-embeddability into the square of l_2, and \Omega(log log d) sketching space lower bound for constant approximation.
Joint work with Robert Krauthgamer (IBM Almaden).
See other events that are part of Algorithms and Complexity Series 2006/2007
See other events happening in March 2007