Saturday 24 April 2010

Regular Expression Matching using Partial Derivatives

Regular expression matching is a classical and well-studied problem.
Prior work applies DFA and Thompson NFA methods for the construction
of the matching automata. We propose the novel use of derivatives and
partial derivatives for regular expression matching.
We show how to obtain algorithms for various
matching policies such as POSIX and greedy left-to-right.
Our benchmarking results show that the run-time performance is promising
and that our approach can be applied in practice.

This is a topic, Kenny and I have pursued for quite a while and we have finally
managed to write up the ideas.
Here's the link to the paper
The implementation is on hackage. See regex-pderiv
[Edit: regex-pderiv replaces xhaskell-library]

5 comments:

Anonymous said...

You may wish to include this paper in your bibliography:

Regular-expression derivatives re-examined
SCOTT OWENS, JOHN REPPY and AARON TURON
Journal of Functional Programming, Volume 19, Issue 02, March 2009, pp 173-190

Anonymous said...

Please rename your library. "xhaskell-library" is a horrible name; it contains haskell (well we already knew that) and library (we knew that as well).

Bad package names make it difficult for people who package your library for say Debian. Normally for a hackage library named FooBar Debian will use a name like libghc6-foobar-dev.

For your library, that would be libghc-xhaskell-library-dev.

Uwe Schmidt said...

There is a regex library on hackage working with derivatives for the W3C XML Schema regexp spec for describing XML Schema datatypes. The implementation is rather efficient compared to other Haskell regex packages (regex-xmlschema). The library was part of the HXT package for XML processing with Haskell, and has been separated and extended into an extra package.

Martin Sulzmann said...

Owens et al paper:
Thanks for the pointer, will definitely take a closer look.

'bad name':
This work evolved out of the xhaskell project. But we agree, it's a badly choosen name, will be fixed later.

Martin Sulzmann said...

Uwe Schmidt mentions some further related work. Thanks, we take a look.

Uwe Schmidt says:
There is a regex library on hackage working with derivatives for the W3C XML Schema regexp spec for describing XML Schema datatypes. The implementation is rather efficient compared to other Haskell regex packages (regex-xmlschema). The library was part of the HXT package for XML processing with Haskell, and has been separated and extended into an extra package.