Inigo – Page 4 – 67 Bricks blog

EMACS org mode, tea and sprints – dev meeting for 5th May

Rodney discussed EMACS org mode – after some discussion of the relative merits of EMACS vs Vim vs inferior other editors. It essentially allows you to edit bullet points, but it can be used for calendars, todo lists, estimates, and so on. Via a keypress, you can add templates for a todo item, including schedules and other metadata, and include it in your todo list. This metadata is used in the display – e.g. you can display an agenda for the day, which will include all of the tasks scheduled for today. You can mark those tasks as done once complete. Tasks can also be split into a hierarchy of tasks, and the top task can display a percentage of the remaining tasks under it. It also has time tracking – once you start a task, you can start time tracking, and then it will be logged against that task.

Advantages of it are – it’s simple, it’s fast, it’s free, and it’s text based so other tools will work with it.

To achieve similar results, Chris uses todo.txt – which is essentially a text file in Dropbox, but it’s supplemented by command-line tools, and a phone app. Because it’s backed by a text file, then you can always access your data anywhere without any other tools. Reece uses a text file in Vim. Inigo used to use a text file in Vim, but now uses Trello.

We then discussed tea making. There was no consensus reached.

Stephen then talked about sprint planning. In various of our projects, we are doing sprint planning in different ways. The default Agile approach is to start with a demo of the work done in the previous sprint, then a retrospective of the previous sprint, then the project owner provides the things that they would like to be done in the sprint, and the project team discusses which ones can be done in the sprint. In one of our projects, we have a looser sprint planning, that doesn’t involve prior discussion of the statistics from previous sprints, and not much backlog grooming. We discussed the merits of estimation – even though the results of the estimation are not always useful, the process of doing estimation and giving developers a good idea of how long tasks are expected to take is useful. We also discussed prioritization, and how new tasks should be dealt with – whether they should be completely rejected until the end of the sprint, or whether they should be prioritized against exising tasks. We agreed that we would look at more explicit prioritization in some projects. We also agreed to combine some brief explicit estimation of tasks in the sprint planning meetings, with announcing on Slack a brief description of the task and how we would approach it and how long it would take.

Elm, and Punishment

Chris talked about Elm, which he subtitled “JavaScript for people who like Haskell not JavaScript”.

However, he immediately explained that Elm is not like Haskell, and isn’t intended to be compared to Haskell, and that it is aimed at JavaScript developers not Haskell developers, and avoids the community avoids using Haskell terminology. This is intended to help increase the uptake of the language.

However, it is a pure functional language, and does look quite Haskell-ish: for example, it supports pattern matching, immutability, and so on. It is event-driven.

One thing it concentrates on is having good error messages. It will even do things like identify references to variables that don’t exist, and suggest what you might have intended. It’s also helpful that there’s an Elm plugin for IntelliJ IDEA.

Like most modern JavaScript frameworks, it uses a virtual DOM, to make it faster.

There is a time-travelling debugger, which is a very cool feature, but Chris didn’t use it in practice.

It has automatic semantic versioning, so it updates the major version automatically when existing signatures change. However, the Elm language itself has changed significantly in minor versions, so you can’t depend on code for older Elm versions being compatible. This was frustrating for working in the language – finding older libraries that hadn’t been updated, and older examples that were out of date.

You can incrementally rewrite an existing JavaScript codebase to Elm, on a file-by-file basis. Chris wasn’t sure how compatible Elm output is with older browsers.

There are currently no books (although there will be later this year). There is a “try elm” site that is good.

It’s primarily aimed at interactive UIs, and games.

Chris’s sample Elm code is at http://nespera.github.io/elm-slide/. However, he hasn’t actually completed the game, because it’s too hard.

Then, Loic talked about Punishment Driven Development. He talked about reasons why companies punish people, and the effects of that punishment. He described the importance of understanding why people are behaving as they do, and that sometimes you may need to change your own behaviour to work with them in order to achieve your objectives. Then, he talked about the axioms of Punishment Driven Development, and contrasted them with People Driven Development.

Dev meeting 10th March – Laravel, Slack bots, and Android automation

At the dev meeting this week, we talked about the PHP framework Laravel, about Slack integrations, and about automating Android phones with Tasker.

Bart gave a presentation on Laravel. It’s an MVC framework for PHP, that bills itself as “The PHP Framework for Web Artisans”. It makes extensive use of “artisan”, a command-line code generation tool for generating models, view templates, etc. It supports various features familiar from MVC frameworks in other languages, such as database migrations, programmatic definition of routes, and ORM.

We’re not using Laravel, and we’re not likely to be using it for any client projects, but we are using PHP for some infrastructure things like our website.

Then we discussed Slack integrations that we are finding useful, and Slack integrations that we wish we did have. We have a webhook-based Gitlab integration, which sends a message whenever code is pushed to a Gitlab repo – this is useful for low-traffic repositories like document repos, but less useful for code repos. We have a Jenkins CI integration set up as part of our standard Jenkins script, which sends messages on success/fail of the build. This is more convenient than the email notifications we also have set up. For the client projects that use it, we have a Bamboo integration that works similarly. We have also tried the Trello bot, but didn’t find it useful.

Other Slack integrations that we thought might be useful were:

A wiki search bot, for searching our internal wiki
A JIRA slack bot for creating and tracking JIRA tasks
A cake bot, for working out whose responsibility it is to provide cake or cheese for our Cheese/cake Mondays
An uptimerobot bot, so downtime notifications can be sent to Slack as well as email
A password bot, for reminding us of passwords for our infrastructure

We also discussed the usage of Hubot at Github, which they use to automate all of their infrastructure. There is an Open Source version of Hubot available.

Joe talked about Tasker, an automation app for Android. Joe originally installed this after receiving a parking ticket in the Park and Ride. His use case was to detect when his car had stopped moving, so he could then automatically launch the Park and Ride app to pay for parking.

Tasker allows you to write small scripts for managing Android, without having to write a full Android app. For example, “I can see this cell tower” therefore “trigger these actions”. “I am plugged into power, and can see the car’s bluetooth” so “launch SatNav and turn off wifi” – and then when moving out of that state, “check the GPS and launch the Park and Ride app if at the car park”. “On receiving a text message with a particular text”, “turn volume up to maximum and play music”, in case the phone is lost. We also briefly discussed Llama, which is a similar Android automation app, although less sophisticated.

‘Jumble sale’ developer meeting

We had another “jumble sale” developer meeting this week, in which a number of people talked briefly about topics that interested them.

Rhys, Reece and Richard H talked about integrating with third-party authentication and authorization systems. For some of our projects, we have specified a simple API that the third-party authentication provider needs to support. This has been an effective approach, and we’ve found it particularly useful that we wrote a complete test suite to begin with against a dummy provider – that meant that we could very easily tell if the third-party had followed the specification, and potentially they could have run the tests themselves (although they didn’t). For another project, we have used SAML to support single-sign-on to integrate with existing sign-on systems. We’ve used pac4j from Scala to do this integration, and combined it with MarkLogic authentication to allow login to our systems.

Rodney talked about TypeScript, and how it improves JavaScript by adding better typing. It’s a strict superset of JavaScript, so all valid JavaScript code is valid TypeScript, and it compiles to JavaScript. However, we haven’t yet looked into how we can transpile TypeScript using the frameworks we typically work with.

Inigo talked about his investigation of why an AWS-hosted MarkLogic cluster was somehow transferring many terabytes of data per month. The possible causes of this were:

the communication between end users and Play
… between Play and MarkLogic
… between the hosts in the ML cluster,
… and from the servers to their S3 backups.

He used stats from the AWS load balancer combined with Firefox’s network tab to analyze page weight to rule out the communication with end users. By enabling logging on the load balancer used between Play and MarkLogic, he was able to rule out the second cause. Storing data from AWS in S3 is free as long as it’s all in the same region – which it was. Running iftop on the cluster nodes showed which sockets the data was flowing between and when – which pinned it down to the MarkLogic intra-cluster communication, in regular bursts. Looking at what the system was doing, there was a regular background task running that queried all of the documents. This took about 30s, but we hadn’t previously been concerned about the inefficiency since it didn’t affect users. However, because it wasn’t using indices, it needed to do a lot of data transfer between the cluster nodes to check the content – which led to our data transfer volume. Rewriting the query to use indices made it much faster, but also prevented it from having to do any data transfer, significantly dropping our AWS data transfer bill. The lesson of this is that when running a cluster on AWS, we care about different sorts of inefficiency than we would do running it on local machines.

Richard B talked about Effective Scala Futures – how to use Scala futures without using global variables that make testing hard, and how to bulkhead separate threadpools to manage different tasks. His slides are here: Effective Scala Futures dev talk

NixOS and MarkLogic 9

This week, Rodney talked about Nix – the NixOS Linux distro and NixOps tooling, and Simon talked about MarkLogic 9 Early Access that we have been working with for a few months.

Nix Expression Language is a declarative functional language, with aspects that are similar to Haskell, Scheme, and YAML. It’s a domain-specific language that can perform tasks similar to Ansible for configuration, and similar to make for building packages and projects while keeping track of dependencies and when they change.

NixOS is a Linux distribution that uses Nix EL to describe the configuration of a machine, built on the Nix package manager. It has atomic upgrades and rollbacks of packages.

NixOps again uses Nix EL, but at a higher level to describe how to provision a set of machines in a cloud environment like AWS. Unlike other tools like Puppet or Ansible, NixOps describes the complete desired state of the machine – so if you manually install a package that NixOps isn’t expecting, it will uninstall it again.

A Nix file can be NixOps will provision machines based on a description in Nix EL. However, Nix builds are run in an isolated envi

We’re not currently using Nix for deployment (we generally use Ansible), but Rodney is using it on his desktop, and thinks it is a good set of tools.

We have been using MarkLogic 9 Early Access for a few months. Simon talked about the new features around data integration, security, and manageability, and the improvements to the Query Console.

The ‘template driven extraction’ feature allows us to define templates that convert XML or CSV files to row data or triples on insert. It is best on ‘row like’ data, such as XML with a set of repeated components, and allows you to define anchor elements via XPath, and then a set of relative XPaths that define particular data fields to be extracted from that content. You can then query this data via a select, or in conjunction with other data via the Optic API.

The ‘Optic API’ allows joins and aggregates over documents, row data and triples – for example, combining a SPARQL query of graph data with a SQL query of relational data. This is valuable because different data is often represented and queried in a form that is particular to the style of data – but needs to be combined with other data to be useful. For example, purchase data might be best represented as a relational database style set of rows, but then processed by aggregating it and combining it with metadata on what is being purchased that is best represented as nodes in a graph database.

The ‘data movement SDK’ extends capabilities that were previously available via tools such as mlcp and corb, and exposes them via a Java API. This allows bulk ingestion of documents, and their transformation.

The ‘entity services’ is a modelling framework for data stored in MarkLogic. It makes it easier to harmonize data from different sources into a consistent model, using the ‘envelope pattern’ to wrap and preserve the original data. It’s not doing anything that couldn’t have been done manually in previous versions of MarkLogic, but simplifies the code for this.

The most interesting new security features allows permissions for XML content to be assigned at the element level, rather than at the document level. So, hiding elements within a document based on the role of the user, or making elements read-only by role. This is interesting for some of our applications that make particular parts of documents available based on the user.

One of the small but important improvements to the query console is displaying different results for each query tab. This makes working with it a lot better. It also displays type-ahead autocomplete suggestions for function names – this is also useful, although initially disconcerting.

In all, MarkLogic 9 offers a number of interesting new features. We are currently using the Early Access version of it. We’ve encountered and raised a few issues with it, which have been fixed, and we’re looking forward to it being launched.

Folly Bridge rowers

The view from Folly Bridge, a few metres from our office, the other morning:

rowers

Dev meeting – Lambda, Elixir, XQuery, Rider and coding standards

In this week’s dev meeting, we had a ‘jumble sale’ meeting, in which various developers each brought along something short to talk about.

First, Joe talked about Amazon Lambda again, in an update to the discussion we’d had a few weeks ago. This was prompted by Amazon’s update to the way that configuration works for Lambda. Previously, to manage lambdas running in separate environments (e.g. dev, test and production), we had to construct zip files containing JSON config files and upload a different file to each environment. Now, Amazon has made environment variables available in lambdas – so you can pass in these parameters instead, which considerably simplifies our deployment process.

Chris talked about the Secret Elixir Club, which has been rumoured to be meeting occasionally at lunch time to discuss Elixir. Elixir is a dynamically typed, pure functional language, based on the Erlang VM. Erlang is a little dated as a language, but its environment is excellent – with massive scalability, great support for concurrency via its actor model, and with deployments that reputedly have a ‘9 9s uptime’ – i.e. availability of 99.9999999%. Elixir takes this foundation, and adds to it a language which is more modern and more akin to languages like Ruby and Scala – supporting pattern matching, higher-order functions, and so on.

Reece talked about the new features in XQuery 3.1. This adds features like arrays, maps and the arrow operator to XQuery, and string interpolation. Some features of XQuery 3.1 are available in MarkLogic 9, which is due for release soon, and that we’ve been doing some early work with. Unfortunately Simon was ill, and so wasn’t able to talk more about our MarkLogic 9 work.

Inigo talked about our coding standards document. Many companies have long and detailed coding standards documents, that aren’t actually followed in practice, and largely cover cosmetic features of code like brace style and indentation. Our coding standards document is the following:

Strive for immutability
Avoid side-effects when possible
Prefer package level and overview documentation over method or in-code documentation
Write unit tests where possible, and execute them via continuous integration
Use existing libraries and third-party code when reasonable
Prefer JQuery as a JavaScript library except when something more complex is necessary. Value accessibility.
In user interfaces, prefer allowing forgiveness to asking permission (i.e. make operations reversible, rather than prompting the user to confirm them)
For Java coding, read Effective Java, and use it (and apply to .NET and Scala where appropriate)
Use logging – effectively

Nikolay effectively summarized these as “be competent and code well”.

Finally, Nikolay talked about Rider, JetBrains new IDE for C#. Everyone with experience of Java or Scala is keen on IntelliJ IDEA, which is by far the best IDE for those languages. Rider is the equivalent for C#, and is now in public Early Access. Various of us have used it, and found it generally better than Visual Studio, and its tendency to crash regularly has largely been fixed.

Hallowe’en – new additions to the office

I love the cat – its expression is the best.

But Spooky McOwlface is pretty great as well.

The spider is best avoided.

owl

spider cat

Dev meeting – IntelliJ IDEA XQuery, and AWS Lambda

Reece talked about the work that he’s been doing on an improved XQuery plugin for IntelliJ IDEA. This was inspired by some annoyances he was encountering with the existing XQuery plugin – although it’s generally good, it doesn’t have complete support for MarkLogic’s brand of XQuery, and it doesn’t correctly parse all XQuery.

Reece’s implementation covers more areas of the XQuery language, has better error reporting for issues in your XQuery code, can check for file encoding issues, supports Find Usages, and has a number of other features.

Reece is uploading his XQuery plugin to the Jetbrains repository this weekend, so it should soon be publicly available.

Joe and Chris talked about the work that we’ve been doing with AWS Lambda. AWS Lambda is what Joe hoped that cloud computing would be all along. AWS EC2 is a server in the cloud, but not fundamentally different from having a server on your own network. However, AWS Lambda is effectively serverless – you upload a piece of code, and then it runs without you worrying about the infrastructure at all. However, the underlying infrastructure can make a difference to you, in terms of the latency of invoking it when it hasn’t recently been called.

We are using AWS Lambda as part of a filter for an Amazon SQS queue. We’re testing it using an SBT plugin – SBTMocha. The challenges with testing it are:

When uploading the lambda, you don’t need the AWS libraries. However, when you’re testing it locally, you need to have the AWS library available. We dealt with this by separating the AWS-specific content from the testable content, and just testing the interesting part that doesn’t depend on AWS.
We have a number of environments: dev, qa, production. The configuration for the function needs to vary between each of them. The standard way of dealing with this is to include a “conf” directory as part of the upload, from which the lambda reads its configuration. Our build process in Jenkins constructs an appropriate zip file for the environment.

Using Lambda has been very useful for us – it has removed the need for a server.

Dev meeting – Terraform and CloudFormation

In our dev meeting this week, we discussed Terraform and Cloud Formation.

Terraform is a tool for managing infrastructure. We’ve previously talked about Puppet, Ansible, and so on – it’s not the same as them, but is instead the step before those tools. It creates the cloud servers that can then be setup by Puppet or Ansible (although, for example, Ansible can be used to set up cloud servers too – and Terraform can be used to set up software like MySQL too).

Terraform isn’t cloud agnostic – it has providers that are specific to a platform. This means that you can take full advantage of those platforms – but means that it isn’t trivial to switch from one to another. We’ve only used it with AWS, so far.

You write “.tf” files to configure your infrastructure – this is in a declarative JSON-esque format. You list things like the servers you want, and the groups they’re in, and so on. It will generally work out the order in which these things need to be created itself. You can apply a prefix to the things it generates, to make it easier to run it repeatedly against the same account.

You install Terraform locally (which is typically by downloading a binary – written in Go), then point it at the configuration files you have created (a module full of files). You need to make your AWS API key available to it – this is via a Terraform specific mechanism (e.g. putting it in a tfvars file). Then you proclaim “Behold! I am the Terraformer!” (at least, Chris does), and execute Terraform. It then writes out a state file, describing what has been created – and this should be checked in to Git. When you next run Terraform, then it will check your Terraform state file, and make appropriate changes to your system based on it. It will check the local state against the cloud state, so if you’ve made manual changes to the cloud servers, it will revert them.

Having created the servers using Terraform, it will run a “provisioner”, to run initial setup the first time. This should then call on to another configuration system. Out-of-the-box, it only supports Chef – and running scripts. In our case, we’ve needed to install Puppet on the server via a script, and then invoke that newly added Puppet to configure itself. The provisioning step should be as minimal as possible – and Puppet should do all the subsequent configuration, not the server. Once Puppet is up-and-running, it will poll the puppet master to keep itself up-to-date and configured.

We’ve generally had a good experience with Terraform – all of our pain has come from Puppet, not from Terraform.

CloudFormation is similar to Terraform, but is AWS-specific. It has a pretty GUI that shows all the components that will be set up. We have been using it for setting up MarkLogic clusters – it’s easiest to start with the standard MarkLogic cluster CloudFormation template, and edit that. It supports some limited conditional logic in the templates – such as creating different things based on region. You can open a template with the live state of a current stack – which provides an easy way of altering the configuration of that stack. There are example CloudFormation templates on the AWS marketplace. The MarkLogic template that we’re using creates servers using MarkLogic AMIs.

We all agreed that using either of these tools was better than managing things via the console – and of the two, we broadly preferred Terraform. We are likely to use Terraform more in future. However, for simple things, we are likely to use Ansible in preference to either – because it can do the creation of AWS servers, as well as doing their configuration, all within one tool.