Dev meeting – debugging XQuery and XSLT – 8th January 2021

In the dev meeting today, we talked about debugging XQuery.

XQuery is a query language (primarily) for XML (in very loose terms, it’s like SQL for XML). Reece is the developer of an IntelliJ IDEA plugin that supports development of XQuery in IDEA. The latest changes he’s made to it allow it to be used for debugging XQuery in MarkLogic.

In IDEA, after installing his plugin, you need to set up an XQuery Run/Debug configuration for the MarkLogic server you are debugging. Then, you can run an XQuery file from within IDEA against MarkLogic, and it will display the output in the MarkLogic console.

Using standard IDEA breakpoints, you can add a breakpoint to an XQuery expression. Then, you can debug, which shows the stack frame, and the current variables and their values.

There is some complexity in working out the stack frames for eval expressions, which dynamically execute XQuery code in strings. Inigo expressed the opinion that this was generally a bad idea anyway.

At the moment, it’s not possible to debug into X-Ray XQuery tests, but Reece is doing further work to make it possible to run X-Ray tests easily from IDEA, which would also include work to make X-Ray tests debuggable.

Reece is also working on debug support for Saxon, which will support debugging in XQuery and in XSLT via Saxon.

Clarity and XQuery – dev meeting fun with blockchains and Intellij IDEA

Alex talked about Clarity, a language for running on the Blockstack blockchain. He recently worked on a hackathon for it. One of the motivations was to improve his functional programming – Clarity is very Lisp-like, and so being forced into a functional mindset was useful for learning. Another motivation was that Clarity were offering cash prizes. Clarity is pretty new – and things like the test framework and code syntax highlighting are very new.

Alex wrote a high score system that runs in Clarity – that will maintain the high scores for a game long-term, rather than needing to be maintained by a particular game developer on their own server.

There are various limitations to Clarity because of the nature of running on the blockchain – like an inability to perform loops, because it’s not possible to prove that they resolve in a reasonable length of time, due to the halting problem. And some primitives cost money, like storing and retrieving variables in the blockchain.

There is a built-in testing framework, that allows you to test your code on your local machine without writing to the blockchain. It’s incomplete at the moment, because it does not support transactions.

Native lists cannot be unbounded, for similar reasons of cost. So Alex also wrote an “endless list” that is essentially a linked list of bounded lists, so you can treat it as an unbounded list but you are only acting on the portions of list that you are currently looking at.

He then wrote code that can be used for redistributing money from a group of donors to a large group without having to use existing infrastructure like Paypal – because organizations like Paypal don’t like it when individuals send money to lots of people. This was inspired by some work that was done based on US stimulus cheques – individuals got cheques from the government, but not all of them needed them, and plenty of people needed more than the stimulus cheque provided. This allowed a group of donors to store money in the contract itself, in a way that is transparent to everyone; and then it to be distributed out to a set of recipients.

There’s more information at https://community.blockstack.org/clarity-winners.

——–

Reece talked about the latest improvements to his IntelliJ IDEA XQuery plugin. It now shows inline parameter names on function calls, and shows the function structure. It also displays documentation for functions from MarkLogic or from the standard XQuery and XPath libraries – it will download the documentation.

It will also show a query log in a tab for the various MarkLogic log files, so you can see the latest log messages without viewing a separate file. This works for both local log files, and for remote MarkLogic servers.

It will parse MarkLogic rewriter XML files, and display the results as a list of REST endpoints. In the upcoming version, it will automatically retrieve all the registered MarkLogic schemas, so for the rewriter.xml it provides auto-complete for the various structural elements and options in the rewriter XML. It also parses RestXQ APIs as used in systems like BaseX.

It will handle relative imports correctly if you set an XQuery directory as a source root – so “/myLibraries/whatever.xqy” will resolve to the right place. In the upcoming version, it will be able to recognize these automatically.

When you’re executing queries locally using a configured query processor (e.g. if you’ve configured MarkLogic or another system so you can do the local equivalent of using the MarkLogic console), then you can pretty-print the output and you can also see profiling information for the code that you execute.

Reece is currently working on debug support, so you can set breakpoints in your XQuery code.

Inigo stated that Reece’s plugin is great, and that everyone using XQuery or MarkLogic should be using it.

Dev meeting – 6th March 2020

Chris talked about a taxonomy management service that we’d created for a publisher. We created a system that allowed for a large number of taxonomies to be stored and updated, so they could be used by the client for categorizing their content. When a new version of a taxonomy arrived (for example a medical taxonomy like SNOMED-CT), then it would be stored inside the data store. We would often want to provide a subset of that taxonomy, because the ontologies were too large and sprawling to be useful to individual consumers.

Rich talked about SNOMED-CT; which is an ontology of medical terms. It describes tools, procedures, drugs, devices and many other things. It’s important that two different medical systems use the same terms for what they’re referring to, like “paracetamol”. By using the same identifier from SNOMED-CT for each, then it makes it easier to correlate the data between different systems.

Reece talked about a project for a customer that uses a financial ontology to manage financial documents. Those documents discuss a range of financial topics – so the system we developed uses the ontology to classify the sections within the document according to the terms matching those sections. Hence – the customer can find documents and sections within them that are relevant to the specific

We briefly discussed the differences between term lists, taxonomies, thesauri, and ontologies. Inigo expressed the heretical view that none of the definitions really matter.

We sometimes care about classifying content against geographical regions, and understanding the hierarchy of those regions. For example, we might classify a document as discussing a legal case based in Paris. However, the user is interested in legal cases of that sort occurring in the EU. Because our ontology provides information that Paris is in France, and that France is in the EU, then we should be able to provide that information to the user based on inferring this additional information for the document. Loic talked about this, and also about the need to make subsets of ontologies to a certain depth only – we don’t necessarily care about small villages, and may harm performance as well as providing false positives.

Nikolay talked about micro-frontends – a frontend composed of UIs provided from different servers. He talked about using view components in ASP.NET Core MVC, and how they allow you to compose the information displayed in a view in a slightly different way to using partials. This is done server-side. They retrieve their data via dependency injection – so the parent doesn’t need to change when the contained component changes. It helps you think in terms of features rather than pages.

“It’s full of stars” – Dev meeting 24th January 2020

Alex A talked about the “Astronomy Picture of the Day” – he has liked having this as his desktop background. But getting hold of all the images is hard – there’s an torrent but that’s very old, you can download the site via wget but that leads to duplication and an odd file structure. They do have an API via https://api.nasa.gov/planetary/apod (and there are a lot of other NASA APIs) – but the parameters that the API describes don’t actually work! You can retrieve a chunk of JSON for an individual picture at a time – but if you repeatedly call it, then you get banned for too many accesses. It turns out that you can pass in a start_date and an end_date, and get a big chunk of JSON back from it, which resolves the “too many accesses” problem. He parsed this with the Newtonsoft JSON library (very useful for .NET JSON parsing), dealing with a few dirty data issues. One of the problems is that there are some specific dates that don’t work – and if they’re inside your date range, then the whole date range fails! So, Alex wrote code to bisect the date range repeatedly to find what the problematic date was so it could be excluded from the range. There were also some broken images in the data. Alex has now downloaded all of these files, organized them by size, and put them in a torrent, so they are now freely available.

Alex G talked about Powershell, and how he uses it. It uses an object-oriented scripting language, based on .NET (or .NET core for Powershell Core), and also has ksh-style syntax. It supports basic calculation on the command-line like addition and multiplication – and can do more complex maths with e.g. [Math]::Exp(2) – and it will autocomplete from within .NET packages too. It supports setting up aliases. “ogv” (Out-GridView) is a useful graphical tool for filtering text… but it only works on Windows. Because it’s acting on objects and passing objects between components, you can act on them as objects – e.g. “ls | select -Property length” will select the “length” property of each filename. “ls | convertto-json” will create a JSON format of the input, and convertfrom-json will conver it back again into a JSON object that can be traversed via JSON dot notation. Powershell Core is available on OS X and on Windows – and if you’re working on a mix of Windows and Unix environments, then it can be useful to use Powershell across each environment.

Inigo talked about Amazon Polly, a speech synthesis tool that we’ve been using for a client. The impressive thing about it is how well it reads out scientific text – while we’ve looked at speech synthesis before, it’s always been good at doing common words but has fallen over with technical text. Polly copes very well.

All the trees!

We planted 192 trees this month!

This is part of our ongoing plan to be carbon negative – we wrote earlier about our plan to be carbon neutral, but we upgraded this a few months ago to instead be carbon negative. So, after working out how many trees would be needed to offset our server hosting (our single biggest source of CO2), we then multiplied it by ten to make sure that we were carbon negative.

One of the areas where trees have been planted on our behalf is:

https://tree-nation.com/projects/la-pedregoza/updates

Ever tried? Ever failed?

This week in the dev meeting we talked about failure.

The format we used was a Software Developers Anonymous meeting:

Start with: “Hello, my name is …. and I have failed”

And everyone cheers.

And then after that, you describe the failure and discuss it.

Unfortunately, most of the failures are too private to publish via our blog!

Among the things that we can repeat:

Stephen N talked about “The Meaning of Liff” – a book listing a number of placenames and definitions for them. One of these is “Ely” – the first tiniest inkling that something has gone wrong. Every one of Stephen’s failures has begun with an Ely – something that’s not quite right, something that’s a bit suspicious – and what he has now learned is that this is an indication that he should act immediately.

Alex talked about the importance of good coffee, and how a project he worked on had been saved from failure by the company installing (only two) good quality coffee machines, which made the teams work together while they were standing in the queue to get coffee.

Dan talked about a problem when he was working late on a server, and he rebooted it, and it didn’t come back. He reported this to his bosses at the time, and they didn’t blame him, but instead immediately went into damage limitation to deal with it, contact customers, and cope with the problem. It eventually turned out that the server was absolutely fine, but someone at the hosting company had disconnected the VT220 terminal from it and it refused to boot without it. The main lessons are – be careful when working on a live server, and when failure happens, deal with it well.

Dev meeting – Lightning Round! AWS Layers, Scala 3, becoming carbon neutral, and more

In our dev meeting this week, we had a “lightning round” where developers talked for up to 5 minutes each about a topic that interested them or that they had recently been working on.

Loic talked about “AWS Layers” – this is a way of sharing code across multiple lambdas. One advantage is that your turnaround cycle for lambdas can become faster, because you can just update the lambda, and the layer remains the same each time. This is very useful if you are pulling in lots of third-party libraries, such as in a Java library.

Velizar talked about Scala 3, which is due to be released in 2020. This is interesting to us because we use Scala 2 for a significant number of projects. Scala 3 has a nicer syntax. It removes implicits and replaces them with a number of different items. It will be “mostly” backwards-compatible, with the older syntax deprecated over time. The compiler will have LSP built-in, which means that support in VS.Code becomes much easier.

Reece talked about using fixed headers in HTML tables. There are various hacky ways of managing this to make the table body scrollable separately. The new way of doing this is to use a “position sticky” CSS class that can be added to the th elements, which allows the browser to sort out all of the problems. You need to be careful with z-indexes and transparent backgrounds to make sure that the header and body display correctly. Older browsers will not support this, but will gracefully degrade to a traditional non-scrolling header.

Nikolay talked about Razor parameters – if a parameter is set to null, then the entire parameter is removed, which is handy for setting attributes on HTML elements.

Dan talked about testing one of our client applications which is a Scala and JavaScript application for managing drugs. We have been updating an application that we wrote a few years ago to a newer system. To make sure that the migration is successful, Dan has added instrumentation code to the old system to quickly and easily generate test data. This instrumentation code makes it easier to carry out various developer-friendly actions in the old system, and then to download JSON from it. This is an ad-hoc way of creating 300 additional unit tests.

Bart talked about the AWS CloudFront CDN that we have been using for a client. We have been implementing a third-party user behaviour tool on top of the platform that we have created for them. In the test environment, this is tricky to test when working from home, because a home IP address is not included in the list of whitelisted IPs that are allowed access. It would be possible to configure our VPN so it forwarded all traffic from a home computer to CloudFront – but this affects all home users. The CDN itself has a very large number of IP addresses, so it’s not easy to list just the IPs that should be forwarded to it. He talked about setting up the default gateway on his Linux installation. Ultimately, he resolved the problem by whitelisting his home office IP address, because it’s relatively stable.

Rich talked about recruitment and recruiting developers better. He talked about becoming carbon neutral as a company, and how we should be socially responsible as a company. We have recently been talking internally about our company values, and being carbon neutral is a thing that we are all enthusiastic about (Sam, one of the directors, is currently with his children at Extinction Rebellion). We also talked about the work that we do for charities, and that we should perhaps advertise it more.

Finally, Ian talked about .NET Core 3, which is being launched next week. It has a number of new interesting features. However, it is not LTS, which will be coming out in version 3.1. Then Simon talked about Java 13, which was released yesterday.

Dev meeting – the Lead Dev conference

Chris went to the Lead Dev conference a few weeks ago. It is a single-track conference; there are many short talks, and there’s no need to choose between them.

There are four main areas that Chris focussed on: misc things; making teams work well; diversity and hiring; and operations. These are some of the talks he went to.

Nickolas Means told stories – this time about the building of the Eiffel Tower, and how it is relevant to software development. He talked about how to get things done – you need to do networking, self-promotion, and negotiation – better thought of as “making friends, telling stories, and co-operating”. He thinks that team leads shouldn’t be, um umbrellas entirely protecting a team from what’s going on in a business, but should instead be heat shields that reduce the impact of these things.

James Birnie talked about Quantum Cryptography, and how we should all be worried that governments can record encrypted network traffic now that can be decrypted in a few years.

Lara Hogan talked about dealing with friction in teams – “Forming”, “Storming”, “Norming” and “Performing”. When you feel under attack, you become defensive and you don’t make good decision. Instead, think about peoples’ core needs – Belonging, Improvement, Choice, Equality, Predictability, and Significance. If there are problems in a team, then it may be because one or more of these needs isn’t being met. Lara has several blog posts on these topics.

Paula Kennedy talked about Silence Isn’t Golden – how to deal with distributed teams. They had a weekly standup to discuss things inside and outside woek; a monthly retrospective to discuss team functioning; and having a regular “coffee break” meeting.

Bethan Vincent talked about increasing diversity in hiring. She discussed the issues with a fully anonymized process, which didn’t work very well for various reasons. She talked about the importance of plenty of information in job listings, and the issues of mandatory take home tests for people who have limited time available. We discussed this more, because this is an area where we’ve been putting in effort internally.

Kate Beard talked about 10% time at the FT, working on side projects, and its relevance to diversity. Without that, people who have limited free time outside work are not able to do side projects, which limits their learning opportunities and career development.

Ola Sitarska talked about diversity again. She discussed code review as an interview technique.

Steve Williams talked about teams performing in crisis situations. Companies don’t plan for business crises as well as they should. He discussed his experience volunteering for the RNLI. It’s important to think about appropriate roles and responsibilites; rituals to follow; carrying out exercises. He also talked about “SMEAC” – Situation/Scenario – Mission – Execution – Admin – Comms, where you’re thrown into a situation and then need to deal with it (also known as Five Paragraph Order). We talked about how you could arrange a communication process in the event of a disaster, and how to manage disaster recovery processes.

All of the conference sessions are available online.

Arrows, and outrageous fortune

Tim talked about arrow functions in JavaScript, and how “this” works with arrow functions. The behaviour of “this” differs between ES6 arrow functions, and traditional ES5 and before JS functions.

He suggested some rules of thumb:

  • Don’t use arrow functions as methods in object literals, because they do not pick up the scope from the object literal that you might expect
  • Don’t use arrow functions as methods in classes – it does work, but it’s more verbose than using the concise class method syntax
  • Do use arrow functions as function parameters – e.g. arr.map(el => this.doSomething(el)) is much clearer than using bind or assigning this to another variable outside the method
  • Be careful using “this” with an arrow function if you’re using JQuery – because JQuery reassigns “this”
  • Don’t use them in lifecycle functions in Vue – Vue reassigns “this” instead

We also talked about lambdas in TypeScript and in other languages.

Then, Ed talked about the past and future of programming, based on a talk he’d recently attended by Uncle Bob. He talked about the history of programming, using mercury delay lines for memory, and how the profile of what a “programmer” is has changed over time. Half of programmers have less than five years experience – so passing information on is harder than it used to be, and understanding the importance of disciplines such as TDD is less common. Uncle Bob thinks that a combination of lack of experience, and that programmers are managing increasingly important systems, means that there will be a disaster created by poor software in the near future. Uncle Bob suggests that more attention towards craftsmanship will help prevent this – the alternative being greater regulation of programmers.

We expressed many strong opinions on these topics.

Observability and hot reloading

Chris talked about “Observability”, based on an article he had been reading from Martin Fowler’s site (https://martinfowler.com/articles/domain-oriented-observability.html). This covers topics such as logging, analytics, and metrics. In the modern era of cloud servers, this is more complicated and more important than in old-style server setups. It’s a set of cross-cutting concerns, that you do not want to be obscuring your domain logic. Domain level logging and metrics are the interesting topic – low-level server level events are easy to handle – and if the domain level metrics are covered, then the low-level metrics are less important.

One approach for improving this is to pull out the domain logic into a focussed domain class, and then have a separate instrumentation class to deal with the metrics – a “domain probe” pattern. Testing the instrumentation is important, but is easy to ignore and hard to test if the logic for it is scattered around your code. Testing is easier if you break out the instrumentation into a separate class.

When you’re doing instrumentation, you typically care about other metadata apart from the domain values that are being passed around – such as request IDs, versions, etc. This can be wrapped up in an execution context – which you can perhaps retrieve via a factory. An alternative approach is an event-based approach, whereby you fire off events when events happen.

The article suggests that AOP isn’t the right tool to use for this, because AOP is typically at the method level, and the domain level importance of activities typically doesn’t match up totally with the method code structure. It also adds more magic, which is bad.

We discussed that this could be done with a decorator pattern, and we discussed the value of decorator patterns in value. In some projects we have used a similar approach in a decorator-like fashion. This does have the same issue as AOP that the important domain logic might not match up with the method level granularity, but doesn’t involve magic. We also discussed that the execution context could be passed around via implicits when using Scala, and agreed this was useful.

Alex talked about live reloading of Java and JavaScript. We want two things – that changing Java source code leads to immediate updates, and that changes to JavaScript lead to immediate page code updates. The approach he has been using is to run “gradle –build continuous” to continuously rebuild the source; and a separate gradle process to run the development server using those classes. Then, for the JavaScript, there’s a separate webpack instance that rebuilds the JavaScript and runs on a separate port with hot reload. There are CORS issues too that need to be overcome.

We also talked about using a similar approach here with Spring Boot devtools. We also discussed how to achieve the same results in Scala Play.