Pattern Matching

String Matching is one of the most widely studied problems in computer science. Part of its appeal is in its direct applicability to ``real world'' problems. Every person who uses a text editor, and every user of a digial library or search engine, needs to find patterns in a text. The Boyer-Moore algorithm is directly implemented the search command of practically all text editors. The longest common ubsequence dynamic programming algorithm is implemented in system commands that test differences between files. The largest overlap heuristic for finding the shortest common superstring has been used in DNA sequencing, and there are many other examples.

Advances in Multimedia, Digital libraries and Computational Biology have shown that a much more generalized theoretical basis of pattern matching could be of tremendous benefit. To this end, pattern matching has to adapt itself to increasingly broader definitions of ``matching''. In computational biology one may be interested in finding a ``close'' mutation, in communications one may want to adjust for transmission noise, in texts it may be desirable to allow common typing errors. In multimedia one may want to adjust for lossy compressions, occlusions, scaling, affine transformations or dimension loss.

All above applications motivate the Pattern Matching field. In pattern matching the input is a text and pattern and a ``matching'' relation. The output is all locations in the text where the pattern ``matches'' under the given definition of match. The different applications define the matching relation.

Under the given matching relation there is still a distinction between exact matching and approximate matching. In the latter case, a metric is defined on the text. A text location is considered a match if the distance between it and the pattern, under the given metric, is within the tolerated bounds.

The fundamental question is what type of approximations are inherently hard computationally, and what types are faster to compute. This question motivated much of the pattern matching research in the last twenty years.

The pattern matching group at Bar-Ilan University is the largest and most productive in the world. We are funded by the most prestigious grants in Israel and abroad (e.g. ISF, BSF, GIF, NSF, The Ministry of Science, and the Ministry of Industry and Commerce). Results produced in BIU appear in the most prestigious conferences and journals, and are found in modern text books. Our graduate students go on to Ph.D. and post-doc positions in the best institutions world-wide. Our faculty members set the international research agenda by being in the program committees of all major conferences in the field, and in journal editorial boards. For example, Amihood Amir was on the program committees of  CPM 2007, SPIRE 2007, IWOCA 2007, IFIP 2006, PSC 2006, SPIRE 2006, SPIRE 2005, WABI 2005, PSC 2005, SPIRE 2004, CompBioNets 2004 and WABI 2004. Moshe Lewenstein was in the program committees of CPM 2005, SODA 2006, and chairman of the PC of CPM 2006. Tommi Klein was in the program committee of PSC 2005. Ely Porat was on the program committee of CPM 2004.

Some of the work pioneered by Amihood Amir's group is compressed matching, and multidimensional matching.

Compressed Matching. As technology develops in diverse areas, from medicine to multimedia, there is a continuous increase in the amount of stored digital data. This increase  has made it critically important to store and transmit files in a compressed form. The need to quickly access this data has given rise to a new paradigm in searching, that of compressed matching. In traditional pattern matching, the pattern and text are explicitly given, and all occurrences of the pattern in the text are sought. In compressed pattern matching the goal is the same, however, the pattern and text are given in compressed form. This is necessary in case of searches done by small isolated units, e.g. communication satellites in space, and also in case local searches need to be done on the web.

The model and tools of compressed search were defined by Amir and his students over a decade ago. It has mushroomed into an extremely active international field of research. Some of the recent successes of the Bar-Ilan team are in doing research in compressed images. Digitized images are known to be extremely space consuming. However, regularities in the images can be exploited to reduce the necessary storage area. Thus we find that many systems store images in a compressed form (e.g. jpeg). If searching for appearances of a pattern in an image can be done without decompressing then compression becomes a time saving tool in addition to its being a space saving device.

We developed algorithms for search in images compressed by run-length compression (such as fax transmissions), and LZ-78 (such as the gif standard). The group is now working on search in jpg images - a task that has been hitherto not been successfully done by anyone.

Multidimensional Matching. Computer Vision is an extremely important research area with possible applications that will revolutionize our lives. Imagine a car that is capable of automatically analysing the road conditions and driving autonomously. It could significantly drop car accident fatalities - currently one of the leading causes of death. Such a scenario is still not a reality due mainly to issues of complexity. We simply can not yet efficiently solve the necessary problems.

We have been the world leaders in trying to understand the theory of multidimensional matching. This is the theoretical underpinnings of low-level vision. Our efforts led to seminal work on scaled matching, rotated matching, as well as indexing multidimensional images, and doing efficient matching from a pre-processed vast dictionary of images. The latter models biological vision, since all biological beings have a huge dictionary of all objects they recognize. The search through this dictionary is extremely rapid.

People

  •  Post Doctorates:  Zvika Hartman (2005-2006), Mathieu Raffinot (1999)
  •  Graduate Students:  Haim Parienty, Moshe Butman, Estrella Eisenberg, Yair Kaufman, Shir Landau, Tzvi Kopelowitz, Benny Porat, Klim Efremenko, Amir Rothschild, Udi Conley
    •  Graduates:  Hagai Aronowitz, Miri Ben-Nissan, Ayelet Butman, Eran Chencinsky, Emanuel Dar, Tirza Doniger, Carmit Hazay, Yair Horesh, Oren Kapah, Reuven Kashi, Orgad Keller, Avivit Levy, Noa Lewenstein, Ohad Lipsky, Igor Nor, Rivi Shalom, Dina Sokol, Liron Reuveny, Julia Umansky