How we do centralized logging at 67 Bricks

If you’ve had a look around 67 Bricks website, you probably know that we work with quite a few clients. For most of the clients we host their infrastructure, which makes it easier for us to manage it and troubleshoot any issues when they occur. Each client’s infrastructure resides in its own AWS account, which is a part of AWS Organizations. We also have a logging AWS account which is used for infrastructure resources used by client accounts. In this shared account we have set up an ELK stack to collect logs from multiple clients in one place. In this post I will explain how it is set up.

What is ELK?

ELK stands for ElasticSearch, Logstash and Kibana.

A note: in this post I’m going to mention Amazon OpenSearch Service. In the past was called Amazon ElasticSearch Service. Amazon OpenSearch Service uses a fork of older version of ElasticSearch and Kibana. The name ELK, however, seems to have stuck even if OpenSearch is used instead of ElasticSearch (and ELK sounds nicer than OLK, in my opinion).

What is the infrastructure like?

The main elements are AWS Managed OpenSearch instance and an EC2 reverse proxy, which directs requests to OpenSearch. In terms of networking we have VPC peering connections between the VPC of the logging account, where OpenSearch instance resides, and the client account VPCs.

To clarify the above diagram:

Applications send log entries via peering connections. In order for them to be able to do that, the following is required:

1) The security group attached to the servers or containers running the applications must have a rule that allows traffic on port 443 from the CIDR block of the logging account VPC

2) The route table of the VPC in the client accounts must have a route with the logging account VPC CIDR as destination and peering connection as target

3) The security group attached to the OpenSearch instance must allow traffic on port 443 from CIDR ranges of client account VPCs

4) The route table of the VPC in the logging account must have routes with the client account VPC CIDRs are destination and peering connection as target

How do the applications send logs to the ELK instance?

Most applications that send logs are Scala Play applications – they use Logback framework for logging, and the logback.xml file with configuration. We have an appender section for ELK logs – we add it to all applications that send their logs to ELK, thereby ensuring that log entries are the same regardless of the system and have the same fields:

 <appender name="ELK" class="com.internetitem.logback.elasticsearch.ElasticsearchAppender">
    <url>${elkEndpoint}/_bulk</url>
    <!-- This nested %replace expression takes the first letter of the level and maps D and T
    (for DEBUG and TRACE) to d and maps other levels to i -->
    <index>someClient-${environment}-someApp-logs-%replace(%replace(%.-1level){'[DT]', 'd'}){'[A-Z]', 'i'}-%date{yyyy-MM-dd}</index>
    <type>log</type>
    <loggerName>es-logger</loggerName>
    <errorsToStderr>false</errorsToStderr>
    <includeMdc>true</includeMdc>
    <maxMessageSize>4096</maxMessageSize>
    <properties>
      <property>
        <name>client</name>
        <value>iclr</value>
      </property>
      <property>
        <name>service</name>
        <value>ingestion</value>
      </property>
      <property>
        <name>host</name>
        <value>${HOSTNAME}</value>
        <allowEmpty>false</allowEmpty>
      </property>
      <property>
        <name>severity</name>
        <value>%level</value>
      </property>
      ...
  </appender>

Here, environment, elkEndpoint and HOSTNAME are environment variables. environment and elkEndpoint are injected in the EC2 launch template and are populated when the instances are being started.

YOu can also see the <index> element. This will create a new index every day. Before we start sending application logs to ELK, we create an index pattern for each application. An index pattern allows you to select data and can include one or more indices. For example, if we have an index pattern someClient-live-someApp-logs-d-*, it would include indices someClient-live-someApp-logs-d-2023-02-27 , someClient-live-someApp-logs-d-2023-03-13, someClient-live-someApp-logs-d-2023-03-14 and so on.

How do you know if there is a problem with logs?

We have monitors set up in Kibana which check that there are logs coming in. This check is run every 10 minutes on each index pattern, and if it doesn’t find any log entries, it sends an alert to an SNS topic which in turn sends an email to inform us that there is a problem. The configuration of alarms and monitors can be done from the OpenSearch Plugins menu of Kibana.

At the moment these monitors are created manually for each index pattern, which is not ideal because it does take a bit of time setting them up; therefore one of the tasks on our to-do list is to automate monitor creation.

How do you make sure that the OpenSearch instance always has enough space?

When each new index pattern is created, we apply a lifecycle policy to it. For example, we delete info logs after a week; when the index agent is 7 days old, it starts to transition into the Delete state. We also have a Cloudwatch alarm which monitors FreeStorageSpace metric in the AWS/ES namespace.

How do YOU centralize logs from multiple systems on AWS? 🙂

Setting up local AWS environment using Localstack

When Cloud services are used in an application, it might be tricky to mock them during local development. Some approaches include: 1) doing nothing thus letting your application fail when it makes a call to a Cloud service; 2) creating sets of fake data to return from calls to AWS S3, for example; 3) using an account in the Cloud for development purposes. A nice in-between solution is using Localstack, a Cloud service emulator. Whereas the number of services available and the functionality might be a bit limited compared to the real AWS environment, it works rather well for our team.

This article will describe how to set it up for local development in Docker.

Docker-compose setup:

In the services section of our docker-compose.yml we have Localstack container definition:

localstack:
    image: localstack/localstack:latest
    hostname: localstack
    environment:
      - SERVICES=s3,sqs
      - HOSTNAME_EXTERNAL=localstack
      - DATA_DIR=/tmp/localstack/data
      - DEBUG=1
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test
      - AWS_DEFAULT_REGION=eu-central-1
    ports:
      - "4566:4566"
    volumes:
      - localstack-data:/tmp/localstack:rw
      - ./create_localstack_resources.sh:/docker-entrypoint-initaws.d/create_localstack_resources.sh

Although we don’t need to connect to any AWS account, we do need dummy AWS variables (with any value). We specify which services we want to run using Localstack – in this case it’s SQS and S3.

We also need to set HOSTNAME_EXTERNAL because SQS API needs the container to be aware of the hostname that it can be accessed on.

Another point is that that we cannot use the entrypoint definition because Localstack has a directory docker-entrypoint-initaws.d from where shell scripts are run when the container starts up. That’s why we’re mapping the container volume to a folder wherer those scripts are. In our case create_localstack_resources.sh will create all the necessary S3 buckets and the SQS queue:

EXPECTED_BUCKETS=("bucket1" "bucket2" "bucket3")
EXISTING_BUCKETS=$(aws --endpoint-url=http://localhost:4566 s3 ls --output text)

echo "creating buckets"
for BUCKET in "${EXPECTED_BUCKETS[@]}"
do
  echo $BUCKET
  if [[ $EXISTING_BUCKETS != *"$BUCKET"* ]]; then
    aws --endpoint-url=http://localhost:4566 s3 mb s3://$BUCKET
  fi
done

echo "creating queue"
if [[ $EXISTING_QUEUE != *"$EXPECTED_QUEUE"* ]]; then
    aws --endpoint-url=http://localhost:4566 sqs create-queue --queue-name my-queue\
    --attributes '{
      "RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:eu-central-1:000000000000:my-dead-letter-queue\",\"maxReceiveCount\":\"3\"}",
      "VisibilityTimeout": "120"
    }'
fi

Note that AWS CLI command syntax is different to the real AWS CLI (otherwise you’d create resources in the account for which you have the credentials set up!), and includes Localstack endoint flag: –endpoint-url=http://localhost:4566

Configuration files

We use Scala with Play framework for this particular application, and therefore have .conf files. In local.conf file we have the following:

aws { localstack.endpoint="http://localstack:4566" region = "eu-central-1" s3.bucket1 = "bucket1" s3.bucket2 = "bucket2" sqs.my_queue = "my-queue" sqs.queue_enabled = true }

The real application.conf file has resource names injected at the instance startup. They live in an autoscaling group launch template where they are created by Terraform (out of scope of this post).

Initializing SQS client based on the environment

The example here is for creating an SQS client. Below are snippets most relevant to the topic.

In order to initialize the SQS Service so that it can be injected into other services we can do this:

lazy val awsSqsService: QueueService = createsSqsServiceFromConfig()

In createsSqsServiceFromConfig we check if the configuration has a Localstack endpoint and if so, we build LocalStack client:

protected def createsSqsServiceFromConfig(): QueueService = { readSqsClientConfig().map { config => val sqsClient: SqsClient = config.localstackEndpoint match { case Some(endpoint) => new LocalStackSqsClient(endpoint, config.region) case None => new AwsSqsClient(config.region) } new SqsQueueService(config.queueName, sqsClient) }.getOrElse(fakeAwsSqsService) }

readSqsClientConfig is used to get configuration values from .conf files:

private def readSqsClientConfig = {
val sqsName = config.get[String]("aws.sqs.my_queue")
val sqsRegion = config.get[String]("aws.region")
val localStackEndpoint = config.getOptional[String]("aws.localstack.endpoint")
SqsClientConfig(sqsName, sqsRegion, localStackEndpoint)
}

Finally LocalStackSqsClient initialization looks like this:

class LocalStackSqsClient(endpoint: String, region:String) extends SqsClient with Logging {
private val sqsEndpoint = new EndpointConfiguration(endpoint, region)
private val awsCreds = new BasicAWSCredentials("test", "test")
private lazy val sqsClientBuilder = AmazonSQSClientBuilder.standard()
.withEndpointConfiguration(sqsEndpoint)
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
private lazy val client = sqsClientBuilder.build()

override def BuildClient(): AmazonSQS = { log.debug("Initializing LocalStack SQS service") client } }

Real AWS Client for the test/live environment (a snippet):

    AmazonSQSClientBuilder.standard()
      .withCredentials(new DefaultAWSCredentialsProviderChain)
      .withRegion(region)

Notice that we need fake BasicAWSCredentials that allows us to pass in dummy AWS access key and secret key and then we use AWSStaticCredentialsProvider, an implementation of AWSCredentialsProvider that just wraps static AWSCredentials. When real AWS environment is used, instead of AWSStaticCredentialsProvider we use DefaultAWSCredentialsProviderChain, which picks the EC2 Instance Role if it’s unable to find credentials by any other methods.

And that’s it. Happy coding!

Introduction to 67 Bricks immutable deployments

Since the rise of cloud computing, the way of creating and maintaining infrastructure has changed. While some years ago an unhealthy machine needed to be logged in and fixed manually, or even worse, turned off and a new one had to be configured from scratch by a systems administrator, thanks to modern technologies it has become possible to not worry about whether a resource is configured in exactly the same way.

This post summarizes the main tools used at 67 Bricks for deployment of the services that we host and manage, and describes our deployment process.

A dictionary definition of immutability is “the state of not changing, or being unable to be changed”. And this is what deploying immutably is about – after being released, infrastructure does not change in place, and in order to make changes, a new version is provisioned and the old version is destroyed.

Some of the benefits of immutability include the following:

  1. Consistency of environments – the same processes and procedures are applied to create environments for staging, testing and live, thereby making it easier to test.
  2. An immutable deployment pipeline means that what happens during deployments is documented by using configuration as code – this enables developers to understand the services better and know exactly what happens at each stage.
  3. Having configuration stored as code in a repository facilitates spinning up new services thus improving speed of development.
  4. The ability to reproduce resources aids immensely in automated disaster recovery where it might only take a few minutes to replace an unhealthy resource with a new one.
  5. Immutable deployments allow dynamic scaling of resource in the cloud based on demand.

Tool Used

  • We use Gitlab as our internal code repository, and CI/CD system.

At 67 Bricks we use a variety of modern tools and technologies to deploy our services.

  • Packer is used to create a server image with the software baked onto it.
  • AWS is typically where we deploy applications.
  • Ansible is used to configure servers, called by Packer during the image build phase.
  • Terraform is used to provision infrastructure.

Deployment process

Firstly, an environment or multiple environments are created in an AWS account using Terraform, and resources such as VPC (virtual private cloud) are launched.

Secondly, a base image is built daily off one of AWS AMIs (Amazon Machine Images). By means of Ansible, latest packages are fetched and tools such as AWS CLI and snap updates are installed, as well as automatic security updates are configured. This image is available to developers to use for their application. Since a lot of our applications run on Linux, we use Ubuntu to build the base image

When code is merged into the main/master branch in Gitlab, this starts a pipeline during which the application is built, tests are run and an AMI with the application installed is created via Packer, with Ansible tasks configuring the machine for the use by a specific application. This image is tagged so that we know which application is on it.

After that, the pipeline deploys the application to test and then to live. We use autoscaling for our applications (specifically, AWS AutoScaling Groups), and to use the newly built image with the new version of the application, a script is used to destroy existing application servers and create new ones with the updated version of the application installed. Autoscaling Group configuration is also amended to use the new AMI for any autoscaling events.

When servers (or EC2 instances, in AWS speak) are launched in an autoscaling group, a user_init script gets run on initialization. This is the stage at which any environment specific setup can be done.

Immutability is of paramount importance if you want to create a simpler, more predictable deployment pipeline whose outcome is trusted. Our teams have adopted the immutable deployment approach, and it helps us to ensure repeatable processes and reliable services and applications are in place.

Women in Tech Festival Global 2021

If you have worked in the tech industry for some time, you are likely to have noticed the issue with diversity. Information Technology was probably thought of as a male domain, and we can see the consequences of such thinking on a global level now.

67 Bricks strives to be a diverse and inclusive workplace, and we continuously improve our D&I awareness and practices. That is why for the second year in a row we attended the Women in Tech Festival aimed to champion diversity and empower companies and individuals to be allies for underrepresented groups. I did a presentation titled “It’s good to give back” at the event, which I immensely enjoyed, because, as a woman in tech myself, the topic of diversity is very close to my heart, and I take great interest in it.

This blog post gives a summary of some sessions I attended virtually.

Opening Note

The event started with the opening note from the Belonging, Inclusion and Diversity Lead of Investec, Zandi Nkhata. She spoke about reasons why women leave the tech industry: 

  • the lack of female role models.
  • experience of microaggressions – that is things people say to you that kind of remind you that you do not belong.  
  • the fact that your experience at a company might vary on whether or not you have an inclusive leader.

She also explained the difference between diversity and inclusion which I think is excellent: diversity is inviting someone to a party and inclusion is asking someone to dance. She also highlighted that only 20% of the workforce in the industry are women.

What can companies do to make their places diverse and inclusive? As an example, Investec’s vision is to make it a place where it is easy for people to be themselves, and to achieve that they set up different networks for people to speak up and listen to their feedback, provide learning and training opportunities about bullying, harassment and discrimination and have an allies programme.

Zandi also mentioned that it’s good to set KPIs with regards to diversity and inclusion, but they are not quotas, you have to be fair in achieving these targets.

Glass Ceiling or Sticky Floor

This panel discussion was about career progression – either knowing you’re probably the best candidate for a promotion yet not getting this promotion, or being capable enough but being obstructed by impostor syndrome, not having a career plan or a mentor.

The main point of the discussion was that a person finding themselves not progressing needs to ask themselves: “what is limiting my growth and what is in my control?”. You need to create a career plan and ensure you are in control. The importance of networking for women was highlighted, and events like Women in Tech is a great opportunity to do that. 

Another piece of advice was to focus on progress rather than perfection, and to learn to not be scared of asking questions even if you might think they are stupid (because they are not!).

Employers also have a duty to help with career progression. It is important to create career paths, understand them and enable employees to understand them as well, making it clear what is needed to get from A to B. It’s also vital to identify the strengths of each individual and know the exact purpose of each person in a team.

The panel also spoke about those who are in search of a new job and what question they might want to ask potential employers to decide about the suitability of a company for them; the suggestions were to look at the leadership gender balance and whether the company is doing any work regarding diversity and inclusion, among others.

Companies should not be scared to bring in people outside of the tech industry, reskill them and tap into their wealth of experience and transferable skills because the mixture of these experiences, strengths and insights can enable the team to grow.

How Old Are You?

This was about progressing in your career when you’re older. A lot of the focus here was on menopause awareness. This topic is still taboo, so safe spaces need to be created to make this conversation more visible, allowing people to speak about it without embarrassment. A lot of people still don’t know much about it even though their female relatives or friends might be experiencing menopause.

The speaker suggests that companies start with things like short talks about it in staff meetings. Some employers hold regular menopause cafes, others hold sessions on what to expect during this challenging time.  An emphasis was made on educating men (especially line managers) to feel comfortable about discussing menopause, and strategies for coping with it in all-male environments, which was mainly to push towards diversity and inclusion, having company policies around menopause and working together.

You Do Belong Here

This session was focused on combating impostor syndrome. This is typically associated with women (men do experience it too though) and the panellists shared useful tips that help them to overcome it:

  • Try to understand if it’s impostor syndrome or the culture that doesn’t let you grow. Some level of self-doubt is experienced by everyone.
  • Having a conversation about it helps combat it. It is the manager’s responsibility to create space where people can discuss it.
  • Some people use journaling. For example, you made a mistake, and 5 days later it’s still eating at you, and you still think about what you could have done. So to avoid that, by writing down what happened and what you could do next time, you get it out of your system.
  • Keep a list of your successes to read from time to time.
  • Refresh your CV and bio regularly as it allows you to focus on your achievements.
  • Educate yourself in neuroscience; humans are programmed to think negatively, and understanding this enables you to interpret your behaviour and thoughts.
  • Instead of changing yourself and trying to adopt a new personality type in certain situations to suit someone else, decide for yourself how you want to come across. However, beware if you go too far, If you’re not genuine, it’s not a sensible place to be. The best thing is to be your authentic self.

To sum up, thanks to this year and last year’s events I spoke to several inspiring females and got a bigger picture of what issues exist for women and LGBTQ+ communities in the tech industry, and what we can do to deal with them. I definitely learnt a lot from the Women in Tech Festival 2021. It was also great to realise that we, 67 Bricks, are doing all the right things to be as diverse and inclusive a workplace as we can. I look forward to sharing more of my learnings with my colleagues.