In praise of functional programming

At 67 Bricks, we are big proponents of functional programming. We believe that projects which use it are easier to write, understand and maintain. Scala is one of the most common languages we make use of and we see more object oriented languages like C# are often written with a functional perspective.

This isn’t to say that functional languages are inherently better than any other paradigms. Like any language, it’s perfectly possible to use it poorly and produce something unmaintainable and unreadable.

Immutable First

At first glance immutable programming would be a challenging limitation. Not being able to update the value of a variable is restrictive but that very restriction is what makes functional code easier to understand. Personally I found this to be one of the hardest concepts to wrap my head around when I first started functional programming. How can I write code if I can’t change the value of variables?

In practice, reading code was made much easier as once a value was assigned, it never changed, no matter how long the function was or what was done with the value. No more trying to keep track in my head how a long method changes a specific variable and all the ways it may not thanks to various control flows.

For example, when we pass a list into a method, we know that the reference to the list won’t be able to change, but the values within the list could. We would hope that the function was named something sensible, perhaps with some documentation that makes it clear what it will do to our list but we can never be too sure. The only way to know exactly what is happening is to dive into that method and check for ourselves which then adds to the cognitive load of understanding the code. With a more functional programming language, we know that our list cannot be changed because it is an immutable data structure with an immutable reference to it.

High Level Constructs

Functional code can often be more readable than object oriented code thanks to the various higher level functions and constructs like pattern matching. Naturally readability of code is very dependent on the programmer; it’s easy to make unreadable code in any language, but in the right hands these higher level constructs make the intended logic easier to understand and change. For example, here are 2 examples of code using regular expressions in Scala and C#:

def getFriendlyTime(string time): String = {
  val timestampRegex = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
  time match {
    case timestampRegex(hour, minutes, _, _) => s"It's $minutes minutes after $hour"
    case _ => "We don't know"
  }
}
public static String GetFriendlyTime(String time) {
  var timestampRegex = new Regex("([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})");
  var match = timestampRegex.Match(time);
  if (match == null) {
    return "We don't know";
  } else {
    var hours = match.Groups[2].Value;
    var minutes = match.Groups[1].Value;
    return $"It's {minutes} minutes after {hour}";
  }
}

I would argue that the pattern matching of Scala really helps to create clear, concise code. As time goes on features from functional languages like pattern matching keep appearing in less functional ones like C#. A more Java based example would be the streams library, inspired by approaches found in functional programming.

This isn’t always the case, there are some situations where say a simple for loop is easier to understand than a complex fold operation. Luckily, Scala is a hybrid language, providing access to object oriented and procedural styles of programming that can be used when the situation calls for it. This flexibility helps programmers pick the style that best suits the problem at hand to make understandable and maintainable codebases.

Pure Functions and Composition

Pure functions are easier to test than impure functions. If it’s simply a case of calling a method with some arguments and always getting back the same answer, tests are going to be reliable and repeatable. If a method is part of a class which maintains lots of complex internal state, then testing is going to be unavoidably more complex and fragile. Worse is when the class requires lots of dependencies to be passed into it, these may need to be mocked or stubbed which can then lead to even more fragile tests.

Scala and other functional languages encourage developers to write small pure functions and then compose them together in larger more complex functions. This helps to minimise the amount of internal state we make use of and makes automated testing that much easier. With side effect causing code pushed to the edges (such as database access, client code to call other services etc.), it’s much easier to change how they are implemented (say if we wanted to change the type of database or change how a service call is made) making evolution of the code easier.

For example, take some possible Play controller code:

def updateUsername(id: int, name: string) = action {
  val someUser = userRepo.get(id)
  someUser match {
    case Some(user) =>
      user.updateName(name) match {
        case Some(updatedUser) =>     
         userRepo.save(updatedUser)
         bus.emit(updatedUser.events)
         Ok()
        None => BadRequest()
    case None => NotFound()
}

def updateEmail(id: int, email: string) = action {
  val someUser = userRepo.get(id)
  someUser match {
    case Some(user) =>
      user.updateEmail(email) match {
        case Some(updatedUser) =>
          userRepo.save(updatedUser)
          bus.emit(updatedUser.events)
          Ok()
        case None => BadRequest()
    case None => NotFound()
}

Using composition, the common elements can be extracted and we can make a more dense method by removing duplication.

def updateUsername(id: int, name: string) = action {
  updateEntityAndSave(userRepo)(id)(_.updateName(name))
}

def updateEmail(id: int, email: string) = action {
  updateEntityAndSave(userRepo)(id)(_.updateEmail(email))
  }
}

private def updateEntityAndSave(repo: Repo[T])(id: int)(f: T => Option[T]): Result = {
  repo.get(id) match {
    case Some(entity) =>
      f(entity) match {
        case Some(updatedEntity) =>
         repo.save(updatedEntity)
         bus.emit(updatedEntity.events)
         Ok()
        None => BadRequest()
    case None => NotFound()
}

None Not Null

Dubbed the billion-dollar mistake by Tony Hoare, nulls can be a source of great nuisance for programmers. Null Pointer Exceptions are all too often encountered and confusion reigns over whether null is a valid value or not. In some situations a null value may never happen and need not be worried about while in others it’s a perfectly valid value that has to be considered. In many OOP languages, there is no way of knowing if an object we get back from a method can be null or not without either jumping into said method or relying on documentation that often drifts or does not exist.

Functional languages avoid these problems simply by not having null (often Unit is used to represent a method that doesn’t return anything). In situations where we may want it, Scala provides the helpful Option type. This makes it explicit to the reader that you call this method, you will get back Some value or None. Even C# is now introducing features like Nullable Reference Types to help reduce the possible harm of null.

Scala also goes a step further, providing an Either construct which can contain a success (Right) of failure (Left) value. These can be chained together to using a railway styles of programming. This chaining approach can lead us to have an easily readable description of what’s happening and push all our error handling to the end, rather than sprinkle it among the code.

Using Option can also improve readability when interacting with Java code. For example, a method returning null can be pumped through an Option and combined with a match statement to lead to a neater result.

Option(someNullReturningMethod()) match {
  case Some(result) =>
    // do something
  case None =>
    // method returned null. Bad method!
}

Conclusions

There are many reasons to prefer functional approaches to programming. Even in less functionally orientated languages I have found myself reaching for functional constructs to arrange my code. I think it’s something all software developers should try learning and applying in their code.

Naturally, not every developer enjoys FP, it is not a silver bullet that will overcome all our challenges. Even within 67 Bricks we have differing opinions on which parts of functional programming are useful and which are not (scala implicits can be quite divisive). It’s ultimately just another tool in our expansive toolkit that helps us to craft correct, readable, flexible and functioning software.

Introduction to 67 Bricks immutable deployments

Since the rise of cloud computing, the way of creating and maintaining infrastructure has changed. While some years ago an unhealthy machine needed to be logged in and fixed manually, or even worse, turned off and a new one had to be configured from scratch by a systems administrator, thanks to modern technologies it has become possible to not worry about whether a resource is configured in exactly the same way.

This post summarizes the main tools used at 67 Bricks for deployment of the services that we host and manage, and describes our deployment process.

A dictionary definition of immutability is “the state of not changing, or being unable to be changed”. And this is what deploying immutably is about – after being released, infrastructure does not change in place, and in order to make changes, a new version is provisioned and the old version is destroyed.

Some of the benefits of immutability include the following:

  1. Consistency of environments – the same processes and procedures are applied to create environments for staging, testing and live, thereby making it easier to test.
  2. An immutable deployment pipeline means that what happens during deployments is documented by using configuration as code – this enables developers to understand the services better and know exactly what happens at each stage.
  3. Having configuration stored as code in a repository facilitates spinning up new services thus improving speed of development.
  4. The ability to reproduce resources aids immensely in automated disaster recovery where it might only take a few minutes to replace an unhealthy resource with a new one.
  5. Immutable deployments allow dynamic scaling of resource in the cloud based on demand.

Tool Used

  • We use Gitlab as our internal code repository, and CI/CD system.

At 67 Bricks we use a variety of modern tools and technologies to deploy our services.

  • Packer is used to create a server image with the software baked onto it.
  • AWS is typically where we deploy applications.
  • Ansible is used to configure servers, called by Packer during the image build phase.
  • Terraform is used to provision infrastructure.

Deployment process

Firstly, an environment or multiple environments are created in an AWS account using Terraform, and resources such as VPC (virtual private cloud) are launched.

Secondly, a base image is built daily off one of AWS AMIs (Amazon Machine Images). By means of Ansible, latest packages are fetched and tools such as AWS CLI and snap updates are installed, as well as automatic security updates are configured. This image is available to developers to use for their application. Since a lot of our applications run on Linux, we use Ubuntu to build the base image

When code is merged into the main/master branch in Gitlab, this starts a pipeline during which the application is built, tests are run and an AMI with the application installed is created via Packer, with Ansible tasks configuring the machine for the use by a specific application. This image is tagged so that we know which application is on it.

After that, the pipeline deploys the application to test and then to live. We use autoscaling for our applications (specifically, AWS AutoScaling Groups), and to use the newly built image with the new version of the application, a script is used to destroy existing application servers and create new ones with the updated version of the application installed. Autoscaling Group configuration is also amended to use the new AMI for any autoscaling events.

When servers (or EC2 instances, in AWS speak) are launched in an autoscaling group, a user_init script gets run on initialization. This is the stage at which any environment specific setup can be done.

Immutability is of paramount importance if you want to create a simpler, more predictable deployment pipeline whose outcome is trusted. Our teams have adopted the immutable deployment approach, and it helps us to ensure repeatable processes and reliable services and applications are in place.

Case Genie or How to Find Unknown Unknowns

It was Paul Magrath, Head of Product Development and Online Content at ICLR, who first used the late Donald Rumsfeld’s phrase to describe the business case for what later became known as Case Genie.

The idea was simple. Lawyers needed to discover historical cases that might impact a case they were preparing. Frequently-cited cases would already be known to them: they’d be at the tips of their fingers, ready to type into their skeleton argument; cases known to every barrister specialising in the field they were arguing; cases that had been cited many times both by other cases and in text books; or cases that had recently changed how the law should be interpreted, and were therefore big news within a narrowly-focused legal community.

But there may be some cases that are relatively unknown that might make all the difference. Enter Case Genie.

This blog post presents a technical overview of Case Genie. How, exactly, is it possible to find “unknown unknowns”?

Word embeddings

Document embeddings are considered the state-of-the-art way of finding similarities between documents. But before document embeddings come word embeddings. A good way to start to think about word embeddings, is to think about words in a document. Let’s take Milton’s Areopagitica as an example. This single work is our corpus (the body of text we’re interested in). Let’s take a single sentence from Milton’s Areopagitica:

Many a man lives a burden to the Earth; but a good Book is the precious life-blood of a master spirit, embalmed and treasured up on purpose to a life beyond life.

Now, if we take each distinct word, we can give it a score based on how often it occurs in the text:

  many  — 1
  a     — 5
  man   — 1
  lives — 1
  …etc…

Now imagine graphing three of these words. I’m choosing three because this is easy to visualise, but rather than take the first three words in the sentence, let’s take the following:

  a         — 5
  life      — 3
  treasured — 1

Imagine these graphed across the x, y and z axes: ‘a’ has the value 5 on the x axis; ‘life’ has the value 3 on the y axis; and ‘treasured’ has the value 1 on the z axis. Further, imagine that for each value, there is an arrow from the origin to the point along each axis, so that we have 3 arrows of different lengths pointing in different directions. Now, in mathematics a value with a direction is called a vector, so we can say that each distinct word within a corpus can be represented as a vector; and within a document, a vector’s value is the number of occurrences of the word within that document.

Three dimensions aren’t too hard to visualise. Extrapolating, we can add further axes, which is much harder to visualise; as many axes as there are distinct words. Each distinct word in the corpus will therefore be represented by a vector.

Of course, that’s just one sentence and there are many in Areopagitica. The full vector space is defined by the number of unique words in the corpus, let’s suppose there are 2,000 in Areopagitica. Initially this will sound confusing, but the word embedding for a given word is made up of a value for every vector in the vector space, which is the same as saying a value for every word. For each distinct word, its word embedding will therefore comprise 1,999 zero values and one non-zero value — 1. If we order the distinct words alphabetically, we can represent an embedding as an implicitly ordered array of numbers. Therefore, for each word embedding, the non-zero value will be in a different place; ‘a’ would be:

  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 …

whereas ‘and’ would be:

  0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 …

This is crazily sparse; the cleverness comes when using AI to use all that empty space more wisely.

Algorithms like word2vec and fastText (there are other algorithms, but these are the two we looked at for ICLR) provide a more compact representation of word embeddings. Whereas with Areopagitica, using the algorithm I described above, we would need 2,000 values (dimensions) for each word, word2vec and fastText can use a few hundred dimensions. You can download a 300-dimensional model from the fastText website trained on 2 million words from Wikipedia. Word embeddings actually use floating point numbers, so the word ‘a’ in the fastText model we trained for ICLR starts:

  -0.20005 -0.019533 -0.15494 -0.11114 0.074793 0.1194 -0.046101 …

The model for ICLR has 600 dimensions, so there are 600 numbers for each word embedding.

Exactly how word2vec and fastText create these word vectors is outside the scope of this blog post. Essentially, you start by training a model from the corpus, which you feed to the tool. The corpus and the training algorithms define relationships between words or character combinations. The corpus used to train the model therefore determines how the words’ relationships are actually represented in the model. If you train with a French corpus, you will get good results when calculating document embeddings for French text; but you would get bad results if you presented English text. In the same vein, if you train a model using legal documents, you will get a model that is more sensitive to legal meanings and definitions.

So, if we were to use Areopagitica as our corpus, we would have a model trained to recognise only uses of the words as used in Areopagitica. We could calculate sentence embeddings (which we could treat like document embeddings) for every sentence in Areopagitica to determine which sentences were most similar. However, the results would likely be quite poor. The reason for this is that the corpus is too small. Ideally, you need a lot of words, the more the better. For ICLR, we used all of the judgment transcripts and Case Reports contained in ICLR.online as our corpus. This represents around 200,000 documents; about 600,000,000 words.

But what are document embeddings, and how do you create them?

Document embeddings

Once you have word embeddings, you will want to create document embeddings. To create a document embedding, take all the word embeddings of the document and use cosine similarity to combine the dimensions into a single representation with the same number of dimensions.

Cosine similarity combines two vectors by calculating the cosine of the angle between them and multiplying by the length of the side not participating in the cosine calculation.

Cosine similarity is a nice way to explain it visually, but there is an algebraic isomorphism using dot products of normalised vectors.

This algorithm combines word embeddings into document embeddings:

  1. Create an empty embedding (where all values are 0) called A. This is our aggregate.
  2. For each word embedding, w:
    1. normalise w;
    2. for each value in A, add the corresponding value in w (add the first number in A to the first number in w, the second number in A to the second number in w, and so on).
  3. Divide each value in A by the number of words used to calculate it.

To normalise a word embedding, take the square root of the sum of each value’s square (let’s call it n); then divide each value in the embedding by n.

This is the implementation used by fastText. However we have tweaked the algorithm to make use of tf–idf. tf–idf weights words that are considered important. A word is considered important if its frequency within the corpus is low compared to its frequency within a document. Since words like ‘a’ occur throughout the corpus, it is weighted low; whereas a word like ‘theft’ will be weighted higher, because it occurs only in a subset of documents within the corpus.

Calculating the distance between document embeddings

The final puzzle piece is to quantify how similar documents are based on their document embeddings. The distance between two documents is used as an indication of how similar they are. Documents that are close together are more similar than those that are further away.

To calculate the distance between two document embeddings, Cosine Similarity can be used again. This time, the vectors of the document embeddings are collapsed to a single value between 0 and 1. Identical documents have the value 1, completely orthogonal documents have the value 0.

We use a library called Faiss, created by Facebook Research, to store and query document embeddings. This is very fast, and you can query multiple input embeddings simultaneously.

And finally…

This post describes some of the major components used in Case Genie and delves a little into the concepts and some of the algorithms; but there is a lot more that I don’t have time to cover: specifically, how do we prepare the text of the corpus for training the model? I will cover that in another post.

Sharing Failures Dev Forum: The most “interesting” bugs you’ve caused

Ely (n.)

The first, tiniest inkling you get that something, somewhere, has gone terribly wrong.

The Deeper Meaning of Liff – Douglas Adams & John Lloyd (Thanks to Stephen for pointing this quote out)

At 67 Bricks we value open and honest communication, and aim to treat each other with respect and humility. Normalising failure helps everyone understand that mistakes happen and shouldn’t be used as a mechanism to punish but instead are an opportunity to learn and become more resilient to future failures. This week’s Dev Forum was on the topic of sharing failures, specifically talking about the most “interesting” bugs you have caused.

64 Bit vs 32 Bit Ubuntu

Our first fail came from a developer working on converting data from base 64 as part of a side project. Developing on the 64 bit version of Ubuntu, the code was stable and the tests were green and so they continued to build the application out. The application was intended to work on a number of different environments including 32 bit systems and this is where the first problems emerged.

The 64 bit OS zeroed out data after it was read and the code was designed to work with this, the 32 bit OS didn’t and instead exhibited random test failures. Sometimes expected values would match up and other times it would be wrong by a character or two, typically at the end of the strings. Finding the root cause required a trial and error approach to eliminate possibilities and eventually get a grasp on the problem and come up with a solution.

Transactions and Logging

Working on a client application this developer made two perfectly reasonable design decisions:

  1. Use transactions with the SQL database to ensure integrity of data changes
  2. Write logs to a table to ensure they are persisted in a convenient location

This approach was perfectly fine while the system worked, but the mystery began when the client started to complain about errors with the system. The logs didn’t show any error messages that lined up with what the client was reporting. Why? The same transaction was used to store logs and to update the data in SQL. If an exception was thrown, the transaction was rolled back preventing any data corruption problems, but also rolled back all the logs!

The application was changed to use a different transaction for logging to ensure logs were persisted. Using these logs meant the root cause of the client problem could be resolved and a lesson learnt about unintended consequences.

Overlogging

Another mystery around missing logs. A web application used a separate log shipping application to take logs from the server and send them to a remote server. However, under heavy loads the logs would become spotty and clear gaps appeared. The reason for this was due to the sheer volume of logs the shipper had to deal with eventually becoming too great and causing it to crash.

The solution was to reduce the number of logs written so the shipper would continue to function even when the main application was under heavy load. This triggered an interesting conversation at the dev forum on the ideas of how many logs should be written. Should you write everything including debug level logs to help with debugging faulty systems? Or should you write no logs whatsoever and only enable them when something starts going wrong?

Naturally we seemed to settle somewhere in the middle, but there were disagreements. Possibly a future dev forum topic?

The Wrong Emails

A client needed to receive important data daily via fax and the application did this by sending an email from a VB app to an address which would convert the body to a fax and send it on. The application needed testing and so had a separate test environment that was presumed to be safe to use for running tests.

It turned out that the test system was using the same email address as the live system, meaning the client was receiving both test and live data. Luckily the client caught on and asked why they were receiving two emails a day and the test system was quickly updated.

From then on this developer now always uses some form of override mechanism to prevent sending actual emails to end users. Others use apps like smtp4dev which will never send on emails (and just capture them so they can be inspected) or services like SES which can be configured to run in a sandbox mode.

Hidden Database Costs

AWS can be very costly, especially if no one is watching the monthly spend. In this case, one database drove up costs to astronomical new highs due to a single lambda running regularly and reading the entire database. The lambda was supposed to delete old data, but had a bug which meant it never deleted anything. Several months of operating the system meant a lot of data had piled up and so did the costs. The fix was simple enough to apply and we added additional monitoring to catch cases of runaway costs.

Similar experiences have been had before with runaway costs on clustered databases like MarkLogic. Normally a well-built index will be able to efficiently query the right data on the right node, but if the index is missing, MarkLogic will pull all the data across from the other node to evaluate it. This can drive up some eye-wateringly high data transfer prices in AWS and as such we now always monitor data transfer costs to ensure we haven’t missed an index.

Caching Fun

Our next issue is about users appearing to be logged in as other users. The system in question used CloudFront CDN to reduce server load and response times. The CDN differentiated between authenticated and unauthenticated users so different pages could be served for users who were authenticated.

The system made use of various headers set by lambdas to differentiate between authenticated and unauthenticated users. The problem arose when session handling was changed and the identifier used was accidentally stored within the CDN. This caused an issue where a user could load a page with a set-cookie header that set the identifier used for a different user’s session.

The team solved this bug by tweaking the edge lambdas to ensure only non-user specific data was cached. Caching in authenticated contexts can be challenging and need to be very carefully considered how they will be used reliably.

Deactivate / Activate

In this bug the business asked for a feature where user accounts could have a deactivated time set in the future that when reached, the user would be considered inactive and unable to access the system. This feature was implemented with a computed field in the SQL server which could be used to determine if the user is active or not.

As the system was already in use, migration scripts were developed to update the database. These needed to be applied in a certain order, however, deployment practices for this system meant that someone else applied the scripts and ended up causing an error where all users ended up deactivated preventing any users from accessing the system. To restore service, the database was rolled back and ultimately the feature was abandoned as too risky by the business.

Some viewed this as an example of why services should be responsible for their own database state and handle database migrations automatically.

Creating Bugs for Google to Index

Magic links can be a very useful feature for making authenticated access to a website easy, as long as these urls remain private to the correct user. In one case the url got cached by Google including the authentication token meaning anyone who could find the link would be able to access the authenticated content! This was fixed by asking Google to remove the URL, invalidating the token in the database and ensuring metadata was added to appropriate pages to prevent bots from indexing pages.

Another Google based bug next; after building a new system, the old one needed to be retired and part of this involved setting up permanent redirects to the new system. However, Google continues to serve up the old site’s urls as opposed to the new system’s urls, and a fix is still being worked out. A lesson learned on how important it is to carefully consider how search engines will crawl and store web sites.

As we see, failures can come from any number of sources, including ourselves. Bugs are a perfectly normal occurrence and working through them is an unavoidable part of building a robust system. By not fearing them, we can become more adept at fixing them and build better systems as a result.

Session musician programmers

As a software consultancy, we are always in the business of trying to recruit good developers. One of the more annoying phrases that has cropped up in the industry lexicon of late is “rockstar programmer”, as both an ideal to which developers are assumed to aspire, and a glib description of the kind of programmer that software publishers are assumed to be desperate to employ.

In my very humble opinion, the software industry has no need for rockstar programmers (or rockstar programmers, as it happens). If we are going to use a musical analogy, a better kind of programmer could be termed a “session musician programmer”. Session musicians are the (usually) fairly anonymous musicians who are relied upon for actually getting records recorded. Attributes of a good session musician would include at least some of the following.

  • Courtesy and professionalism towards colleagues and clients.
  • Playing, or being willing to learn, a number of different instruments.
  • Playing in a range of styles depending on what is required.
  • Being able to read music, in order to play someone else’s arrangement.
  • Ability to improvise, if the arrangement just calls for “a solo”.
  • Ability to work with other musicians to come up with a workable arrangement for a piece if there isn’t one.
  • Some understanding of music production, and an understanding of how what is played will translate to what is eventually heard on the recording.

Translated roughly into development terms, with a lot of artistic license, these might be equivalent to the following desirable attributes in a developer:

  • Courtesy and professionalism towards colleagues and clients.
  • Having some knowledge of a number of different languages.
  • Being able to fit one’s code into the “style” of the project being worked on.
  • Willingness to follow direction from a technical lead and implement a precisely written specification, where necessary.
  • Ability to understand and fill gaps in the technical description of a project.
  • Ability to work with other developers to come up with a plan for implementing something.
  • Some understanding of the tools used for building, testing and deploying code.

I am not sure that the concept of the “rockstar programmer” embodies these things. Rockstars exist to do one thing well, generally singing or playing guitar, and that often seems to come with a whole lot of tiresome baggage in the form of massive ego, tantrums, and demands for excessive amounts of money (yes, there are exceptions). It so happens that plenty of session musicians can sing, plenty of them can play guitar, and they are capable of generating far more in the way of recorded music in a year than most rockstars would manage in their careers.

My advice, therefore, is that aspiring software developers looking to the world of music for role models should aim to be part of The Wrecking Crew rather than Guns N’ Roses.