{"id":437,"date":"2022-03-18T09:20:49","date_gmt":"2022-03-18T09:20:49","guid":{"rendered":"https:\/\/blog.67bricks.com\/?p=437"},"modified":"2022-03-18T09:21:29","modified_gmt":"2022-03-18T09:21:29","slug":"marklogic-profiler","status":"publish","type":"post","link":"https:\/\/blog.67bricks.com\/?p=437","title":{"rendered":"MarkLogic profiler"},"content":{"rendered":"\n<p>When investigating slow XQuery or XSLT queries in MarkLogic, one of the tools it provides is a query profiler. In this post I&#8217;ll explain how to make use of the profiler and how to interpret the results.<\/p>\n\n\n\n<p>There are 3 ways you can make use of the profiler:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>use the MarkLogic QConsole;<\/li><li>use an IDE that supports the profiler like IntelliJ with the <em>XQuery and XSLT<\/em> plugin;<\/li><li>use the MarkLogic profiler API.<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Using the QConsole<\/h2>\n\n\n\n<p>In the MarkLogic QConsole you can enable profiling by clicking on the Profile tab next to the Results tab and pressing the Run button. This will display a table with the profiling results. The profile can be saved by clicking on the download button on the right.<\/p>\n\n\n\n<p>The profile result table contains the following information:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><code>Module:Line No.: Col No.<\/code> \u2013 This is where the expression being profiled starts;<\/li><li><code>Count<\/code> \u2013 This is the number of times the expression was evaluated;<\/li><li><code>Shallow %<\/code>\/<code>Shallow ms<\/code> \u2013 This is how long the expression took to run on its own across all <code>Count<\/code> times it was evaluated;<\/li><li><code>Deep %<\/code>\/<code>Deep ms<\/code> \u2013 This is how long the expression took to run in addition to the time subexpessions took to evaluate;<\/li><li><code>Expression<\/code> \u2013 This is the string representation of the query expression being evaluated.<\/li><\/ol>\n\n\n\n<p><strong>NOTE:<\/strong> Here, <code>ms<\/code> is microseconds not milliseconds.<\/p>\n\n\n\n<p>If you want to do further processing on the downloaded <code>profile:report<\/code> XML, you can load that XML by using:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">let $data := xdmp:filesystem-file(\"C:\\profiling\\profile.xml\")\nlet $profile := xdmp:unquote($data)\/prof:report<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Analysing Profile Results<\/h2>\n\n\n\n<p>Sorting by shallow time is useful for identifying the expressions that take a long time to evaluate. This can identify places where you should modify the query or add indices.<\/p>\n\n\n\n<p>The count column is also useful. It can indicate places where you have nested loops that result in quadratic time queries.<\/p>\n\n\n\n<p>When making optimizations, it can be tricky to know if a change results in a speed up of the query or is not significant, but shows up in the normal time variation when running a query multiple times. To handle this, I like to use a  spreadsheet and measure the average (mean) time and standard deviation over 5 or 10 runs.<\/p>\n\n\n\n<p>This can also be useful when varying the number of items being processed (e.g. documents) in order to identify if the query scales with that item, and whether the scaling is linear, quadratic, or some other curve. For this, you can use line graphs to visualise the performance and linear or quadratic regression formulae to check what shape the graph is.<\/p>\n\n\n\n<p>The stock chart graph is useful for comparing the variation between runs. Here, the mininum and maximum times form the extents (low, high) of the chart, and the mean +\/- one standard deviation form the inner bar (open, close) of the chart.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using the Profiler API<\/h2>\n\n\n\n<p>Using the profiler API you can use the following to collect the profile data from a query expression:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">let $_ := prof:enable(xdmp:request())\n(: code to profile, e.g. -- let $_ := local:test() :)\nlet $_ := prof:disable(xdmp:request())\nlet $profile := prof:report(xdmp:request())<\/code><\/pre>\n\n\n\n<p><strong>NOTE:<\/strong> If you are running this from an XQuery 1.0 query instead of a 1.0-ml query then you need to declare the <code>prof<\/code> namespace:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">declare namespace prof = \"http:\/\/marklogic.com\/xdmp\/profile\";<\/code><\/pre>\n\n\n\n<p>You can then either save the profile to a file, e.g.:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">let $_ := xdmp:save(\"C:\\profiling\\profile.xml\", $profile)<\/code><\/pre>\n\n\n\n<p>or process the resulting <code>profile:report<\/code> XML in the XQuery script.<\/p>\n\n\n\n<p>Alternatively you can use one of the <code>prof:eval<\/code> or <code>prof:xslt-eval<\/code> to evaluate an in memory query, or <code>prof:invoke<\/code> or <code>prof:xslt-invoke<\/code> to evaluate a module file. These return the profile data as the first item and the result as the remaining items, so you can use the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">let $ret := prof:eval(\"for $x in 1 to 10 return $x\")\nlet $profile := $ret[1]\nlet $results := $ret[position() != 1]<\/code><\/pre>\n\n\n\n<p>With the profile report XML, you can then process it as you need. For example, to create a CSV version of the QConsole profile table you can use the following query:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"\" class=\"\">declare function local:to-seconds($value) {\n  ($value cast as xs:dayTimeDuration) div xs:dayTimeDuration(\"PT1S\")\n};\n\nlet $overall-elapsed := local:to-seconds($profile\/prof:metadata\/prof:overall-elapsed)\nfor $expression in $profile\/prof:histogram\/prof:expression\nlet $source := $expression\/prof:expr-source\/string()\nlet $uri := string(($expression\/prof:uri\/text(), \"main\")[1])\nlet $line := $expression\/prof:line cast as xs:integer\nlet $column := ($expression\/prof:column cast as xs:integer) + 1\nlet $count := $expression\/prof:count cast as xs:integer\nlet $shallow-time := local:to-seconds($expression\/prof:shallow-time)\nlet $deep-time := local:to-seconds($expression\/prof:deep-time)\norder by $shallow-time descending\nreturn\n  string-join((\n    $uri, $line, $column, $count,\n    $shallow-time, ($shallow-time div $overall-elapsed) * 100,\n    $deep-time, ($deep-time div $overall-elapsed) * 100,\n    $source\n  ) ! string(.), \",\")<\/code><\/pre>\n\n\n\n<p><strong>NOTE:<\/strong> This does not handle escaping of commas in the <code>$source<\/code> string.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When investigating slow XQuery or XSLT queries in MarkLogic, one of the tools it provides is a query profiler. In this post I&#8217;ll explain how to make use of the profiler and how to interpret the results. There are 3 ways you can make use of the profiler: use the MarkLogic QConsole; use an IDE &hellip; <a href=\"https:\/\/blog.67bricks.com\/?p=437\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;MarkLogic profiler&#8221;<\/span><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[57,56],"tags":[60,58,59,61],"class_list":["post-437","post","type-post","status-publish","format-standard","hentry","category-marklogic","category-xquery","tag-marklogic","tag-performance","tag-profiling","tag-xquery"],"_links":{"self":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=437"}],"version-history":[{"count":7,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/437\/revisions"}],"predecessor-version":[{"id":447,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=\/wp\/v2\/posts\/437\/revisions\/447"}],"wp:attachment":[{"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.67bricks.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}