• Recruiting and building engineering teams of Clojure engineers is challenging compared to other programming languages due to the lack of its popularity and the absence of a large pool of experienced engineers


our postgreSQL migration story from serial to bigserial

I assume most of the readers of this blog post have heard about youtube’s famous incident with its video view counter but if you haven’t here’s a brief summary: When youtube first launched, they used a 32-bit signed integer to hold the views count for each video. They never thought that a single video would have more than 2³¹-1 (2,147,483,647) views, which is the highest value of a signed 32-bit integer (I recommend this video for a bit more info [pun intended]). …

For a long time, the StatsD + Graphite stack was the go-to solution when considering backend stacks for time-series collection and storage.

In recent years, with the increased adoption of Kubernetes, Prometheus has been gaining more and more attention as an alternative solution for the classic Graphite + StatsD stack. As a matter of fact, Prometheus was initially developed by SoundCloud exactly for that purpose — to replace their Graphite + StatsD stack they used for monitoring. …

In the previous post we discussed the “why” — we went over some of the benefits of integrating automatic testing into your development flow. In this post, we’ll go over the “how” — some guidelines for forming a healthy, safe and rapid development process around your test suite.

Continuous Integration (CI)

  1. Code cannot be pushed directly into master — only pull requests should be used to…


This series has been heavily influenced by Robert Martin’s clean code series which I extremely recommend for every developer.

Software testing has always been a controversial topic. Some say it is a waste of time while others say that it is the only sane way to develop and extend large software systems.

Personally, I belong to the latter camp. I believe testing is one of the greatest practices one could apply to produce high quality systems while keeping them maintainable for the long run.

There are 3 main reasons that make testing such an essential tool in software development:

  1. Avoid…

<TLDR> Check out eks_cli — a one stop shop for bootstrapping and managing your EKS cluster </TLDR>


When kops came to life things became much better. Working with a command line utility made cluster creation a lot easier. Environment variables got replaced by well documented flags. Cluster state was saved and changes could be easily made to existing clusters…

RabbitMQ is one of the most widely used message brokers today. A large portion of nanit’s inter-service communication goes through RabbitMQ, which led us on a journey of finding the best way to retry processing a message upon failure.
Surprisingly, RabbitMQ itself does not implement any retry mechanism natively. In this blog post I explore 4 different ways to implement retries on RabbitMQ. On each option we will go through:

  1. The RabbitMQ topology diagram
  2. The flow of retrying
  3. An example ruby code to replicate the topology and a subscriber which retries processing a message
  4. The output of running the ruby…

Plug is an Elixir specification for composable modules between web application. That’s a very nice way to describe middlewares. For those of you that come from the Ruby world it pretty much takes the role of Rack middlewares.

A few weeks ago I searched Google for a Plug library to validate path and query parameters declaratively on the router. I got a single result but it didn’t have any documentation and from going over the code it didn’t provide what I was looking for.
In my vision I would write my app routes as:

Validating parameters declaratively in the route…

I’m pretty new to Elixir. This language fascinates me as it is based on a paradigm I never experienced before.
The ideas of functional programming, processes, message passing and fault tolerance are bundled together into a language and eco system which is fun and productive to work with.

Recently during a feature I was working on, I had to code an Elixir module that receives and dispatches tasks. I wanted to share with you my journey to the final module I ended up with.

The Spec

  1. The module receives and runs tasks.
  2. Each task…

nanit has been using kubernetes on production from its early days and for almost two years now. As with every large and complicated system, we experienced failures on all levels:

  1. The Kubernetes level: Node failures, Pod allocation failures etc.
  2. The applicative infrastructure level: Redis, RabbitMQ etc.
  3. The applicative level: nanit’s web services and video processing mechanisms.

Every failure led us to push our monitoring capabilities further ahead so we can know about failures as early as possible and have the ability to interrogate our system and resolve them as fast as possible.

In this post I’ll go over our monitoring…

Erez Rabih

Backend & Infra team leader @

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store