{"id":473,"date":"2022-05-13T09:51:10","date_gmt":"2022-05-13T09:51:10","guid":{"rendered":"https:\/\/blog.67bricks.com\/?p=473"},"modified":"2022-05-13T09:52:47","modified_gmt":"2022-05-13T09:52:47","slug":"dev-forum-parsing-data","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=473","title":{"rendered":"Dev Forum &#8211; Parsing Data"},"content":{"rendered":"\n<p>Last Friday we had a dev forum on parsing data that came up as some devs had pressing question on Regex. Dan provided us with a rather nice and detailed overview of different ways to parse data. Often we encounter situations where an input or a data file needs to be parsed so our code can make some sensible use of it.<\/p>\n\n\n\n<iframe loading=\"lazy\" src=\"https:\/\/docs.google.com\/presentation\/d\/e\/2PACX-1vTCVpTclaaU2aARj8JUntBVpmpZsdM6KYIkg9w69H4Bkv7umxIbKKJ-30m-jCxoETEd28U3YcgS-q5H\/embed?start=false&amp;loop=false&amp;delayms=15000\" allowfullscreen=\"true\" mozallowfullscreen=\"true\" webkitallowfullscreen=\"true\" width=\"960\" height=\"569\" frameborder=\"0\"><\/iframe>\n\n\n\n<p>After the presentation, we looked at some code using the <a href=\"https:\/\/github.com\/sirthias\/parboiled\">parboiled<\/a> library with Scala. A simple example of checking if a sequence of various types of brackets has matching open and closing ones in the correct positions was given. For example the sequence <code>({[&lt;&lt;&gt;&gt;]})<\/code> would be considered valid, while the sequence <code>((({(&gt;&gt;])<\/code> would be invalid.<\/p>\n\n\n\n<p>First we define the set of classes that describes the parsed structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java\">object BracketParser {\n\n  sealed trait Brackets\n\n  case class RoundBrackets(content: Brackets)\n     extends Brackets\n\n  case class SquareBrackets(content: Brackets)\n     extends Brackets\n\n  case class AngleBrackets(content: Brackets)\n     extends Brackets\n\n  case class CurlyBrackets(content: Brackets)\n     extends Brackets\n\n  case object Empty extends Brackets\n\n}\n<\/code><\/pre>\n\n\n\n<p>Next, we define the matching rules that parboiled uses:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"java\" class=\"language-java\">package com.sixtysevenbricks.examples.parboiled\n\nimport com.sixtysevenbricks.examples.parboiled.BracketParser._\nimport org.parboiled.scala._\n\nclass BracketParser extends Parser {\n\n  \/**\n   * The input should consist of a bracketed expression\n   * followed by the special \"end of input\" marker\n   *\/\n  def input: Rule1[Brackets] = rule {\n    bracketedExpression ~ EOI\n  }\n\n  \/**\n   * A bracketed expression can be roundBrackets,\n   * or squareBrackets, or... or the special empty \n   * expression (which occurs in the middle). Note that\n   * because \"empty\" will always match, it must be listed\n   * last\n   *\/\n  def bracketedExpression: Rule1[Brackets] = rule {\n    roundBrackets | squareBrackets | \n    angleBrackets | curlyBrackets | empty\n  }\n\n  \/**\n   * The empty rule matches an EMPTY expression\n   * (which will always succeed) and pushes the Empty\n   * case object onto the stack\n   *\/\n  def empty: Rule1[Brackets] = rule {\n    EMPTY ~> (_ => Empty)\n  }\n\n  \/**\n   * The roundBrackets rule matches a bracketed \n   * expression surrounded by parentheses. If it\n   * succeeds, it pushes a RoundBrackets object \n   * onto the stack, containing the content inside\n   * the brackets\n   *\/\n  def roundBrackets: Rule1[Brackets] = rule {\n    \"(\" ~ bracketedExpression ~ \")\" ~~>\n         (content => RoundBrackets(content))\n  }\n\n  \/\/ Remaining matchers\n  def squareBrackets: Rule1[Brackets] = rule {\n    \"[\" ~ bracketedExpression ~ \"]\"  ~~>\n        (content => SquareBrackets(content))\n  }\n\n  def angleBrackets: Rule1[Brackets] = rule {\n    \"&lt;\" ~ bracketedExpression ~ \">\" ~~>\n        (content => AngleBrackets(content))\n  }\n\n  def curlyBrackets: Rule1[Brackets] = rule {\n    \"{\" ~ bracketedExpression ~ \"}\" ~~>\n        (content => CurlyBrackets(content))\n  }\n\n\n  \/**\n   * The main entrypoint for parsing.\n   * @param expression\n   * @return\n   *\/\n  def parseExpression(expression: String):\n    ParsingResult[Brackets] = {\n    ReportingParseRunner(input).run(expression)\n  }\n\n}\n\n<\/code><\/pre>\n\n\n\n<p>While this example requires a lot more code to be written than a regex, parsers are more powerful and adaptable. Parboiled seems to be an excellent library with a rather nice syntax for defining them.<\/p>\n\n\n\n<p>To summarize, regexes are very useful, but so are parsers. Start with a regex (or better yet, a pre-existing library that specifically parses your data structure) and if it gets too complex to deal with, consider writing a custom parser.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last Friday we had a dev forum on parsing data that came up as some devs had pressing question on Regex. Dan provided us with a rather nice and detailed overview of different ways to parse data. Often we encounter situations where an input or a data file needs to be parsed so our code &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=473\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Dev Forum &#8211; Parsing Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,2,66,21,23],"tags":[70,68,67,69],"class_list":["post-473","post","type-post","status-publish","format-standard","hentry","category-blogging","category-blogroll","category-regex","category-scala","category-text-processing","tag-dev-forum","tag-parsing","tag-regex","tag-scala"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=473"}],"version-history":[{"count":11,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/473\/revisions"}],"predecessor-version":[{"id":485,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/473\/revisions\/485"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}