{"id":29,"date":"2015-03-13T19:31:10","date_gmt":"2015-03-13T18:31:10","guid":{"rendered":"http:\/\/67bricks.com\/blog\/2015\/03\/13\/parser-combinators\/"},"modified":"2021-12-16T10:57:03","modified_gmt":"2021-12-16T10:57:03","slug":"parser-combinators","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=29","title":{"rendered":"Parser combinators"},"content":{"rendered":"<p>In our developer meeting this week, we discussed parsing, and particularly parser combinators.<\/p>\n<p>We&#8217;ve used the <a href=\"http:\/\/www.scala-lang.org\/api\/2.10.2\/index.html#scala.util.parsing.combinator.Parsers\">Scala parser combinator library<\/a> in the past for parsing search query syntax &#8211; for example, to support a custom search syntax used by a legacy system and convert it into an XQuery for searching XML. We&#8217;ve also used <a href=\"https:\/\/github.com\/sirthias\/parboiled2\">Parboiled<\/a>, a Java\/Scala parser library, for parsing geographic latitude and longitude values from within scientific journal articles about geology. We&#8217;ve done simpler parsing with regular expressions in C# to identify citations within text like &#8220;(Brown et al, 2012)&#8221; and &#8220;(Brown and Smith, 2010; Jones, 2009)&#8221;.<\/p>\n<p>The parser combinator approaches are typically better than using a traditional parsing method like Lex and YACC or JavaCC, because they&#8217;re written in the host language (e.g. Java or Scala), and so it&#8217;s much easier to write unit tests for them and to update them easily. They&#8217;re particularly approachable in Scala, because Scala&#8217;s support for domain-specific languages means that you can write code that looks like:<\/p>\n<blockquote><p> &nbsp;   &#8220;{&#8221; ~ ( comment | directive ) ~ &#8220;}&#8221;<\/p><\/blockquote>\n<p>where the symbols like ~ and | are Scala method invocations &#8211; which means that you can focus on the parsing, rather than the parser library syntax.<\/p>\n<p>We briefly discussed where it makes sense to use regular expressions for parsing, and where it makes sense to use a more powerful parsing approach. We agreed that there was a danger of creating overly complex regular expressions by incremental &#8220;boiling a frog&#8221; extensions to an initially simple regex, rather than stopping to rewrite using a parser library.<\/p>\n<p>For further processing of the content once it&#8217;s been parsed, we discussed using the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Visitor_pattern\">Visitor pattern<\/a>. For example, having created an abstract syntax tree from a search query, it&#8217;s useful to use a visitor approach to turn that tree into a pretty printed form, or into an HTML form for display, or into a query language form suitable for the underlying datastore.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In our developer meeting this week, we discussed parsing, and particularly parser combinators. We&#8217;ve used the Scala parser combinator library in the past for parsing search query syntax &#8211; for example, to support a custom search syntax used by a legacy system and convert it into an XQuery for searching XML. We&#8217;ve also used Parboiled, &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=29\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Parser combinators&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-29","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/29","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=29"}],"version-history":[{"count":1,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/29\/revisions"}],"predecessor-version":[{"id":304,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/29\/revisions\/304"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=29"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=29"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=29"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}