Wednesday, March 04, 2009

Porting CodeMirror syntax parser to Bespin

The first screenshot (or the second, if the images are in a row) shows the current bespin syntax highlighting engine. It applies on a per-line base regular expressions to highlight the different parts. This approach is very fast, and new language color definitions can easily be added, but the engine is not aware of the internal structure of the code the user is typing in, and therefore such an engine cannot be extended to show in realtime syntax errors and provide intelligent code indentation and completion. The engine splits code into the following elements for colorizing:
  • comments
  • C-style comments
  • keywords
  • strings
  • punctuation
The second screenshot shows an early stage of a new approach I am working on, a syntax engine which uses a real tokenizer and parser (borrowed from the CodeMirror project), fully aware of the structure of the code. So far the initial coloring is fine, but to get it work continuously and smooth and with large files, there are a lot of problems and a lot of work to solve them ahead.

The second screenshot (or first, if the images are in a row) also shows that a few more Javascript code elements got identified and colored:
  • atoms
  • variables
  • variable definitions
  • local variables
  • properties
  • operators
I have chosen Javascript as the first language to port to this new syntax engine, because it is the most important one for me and maybe for web apps in general. But JS is a dynamic language and syntax analys is tricky. The good thing about CodeMirror is that beside of JS there exist already tokenizers / parsers for:
  • CSS
  • HTML mixed (with JS, CSS)
  • PHP mixed (with JS, CSS, HTML)
  • Django mixin (my proof-of-concept, never got officially released)
This new syntax engine, if I ever get it working in an acceptable way, is not supposed to replace the existing one. If desired and enabled, it will try to provide a better user experience, but will use the original syntax engine as fail back.