I assume most of the readers of this blog post have heard about youtube’s famous incident with its video view counter but if you haven’t here’s a brief summary: When youtube first launched, they used a 32-bit signed integer to hold the views count for each video. They never thought that a single video would have more than 2³¹-1 (2,147,483,647) views, which is the highest value of a signed 32-bit integer (I recommend this video for a bit more info [pun intended]). …
For a long time, the StatsD + Graphite stack was the go-to solution when considering backend stacks for time-series collection and storage.
In recent years, with the increased adoption of Kubernetes, Prometheus has been gaining more and more attention as an alternative solution for the classic Graphite + StatsD stack. As a matter of fact, Prometheus was initially developed by SoundCloud exactly for that purpose — to replace their Graphite + StatsD stack they used for monitoring. …
In the previous post we discussed the “why” — we went over some of the benefits of integrating automatic testing into your development flow. In this post, we’ll go over the “how” — some guidelines for forming a healthy, safe and rapid development process around your test suite.
The first and most important thing you can do when dealing with tests is integrating them into your development and shipping process.
To fully enforce the test suite we need to make sure two conditions are satisfied:
Software testing has always been a controversial topic. Some say it is a waste of time while others say that it is the only sane way to develop and extend large software systems.
Personally, I belong to the latter camp. I believe testing is one of the greatest practices one could apply to produce high quality systems while keeping them maintainable for the long run.
There are 3 main reasons that make testing such an essential tool in software development:
<TLDR> Check out eks_cli — a one stop shop for bootstrapping and managing your EKS cluster </TLDR>
We’ve been running Kubernetes over AWS since the very early
kube-up.sh days. Configuration options were minimal and were passed to
kube-up.sh with a mix of confusing environment variables. Concepts like high availability, Multi-AZ awareness and cluster management almost didn’t exist back then.
When kops came to life things became much better. Working with a command line utility made cluster creation a lot easier. Environment variables got replaced by well documented flags. Cluster state was saved and changes could be easily made to existing clusters…
RabbitMQ is one of the most widely used message brokers today. A large portion of nanit’s inter-service communication goes through RabbitMQ, which led us on a journey of finding the best way to retry processing a message upon failure.
Surprisingly, RabbitMQ itself does not implement any retry mechanism natively. In this blog post I explore 4 different ways to implement retries on RabbitMQ. On each option we will go through:
Plug is an Elixir specification for composable modules between web application. That’s a very nice way to describe middlewares. For those of you that come from the Ruby world it pretty much takes the role of Rack middlewares.
A few weeks ago I searched Google for a Plug library to validate path and query parameters declaratively on the router. I got a single result but it didn’t have any documentation and from going over the code it didn’t provide what I was looking for.
In my vision I would write my app routes as:
I’m pretty new to Elixir. This language fascinates me as it is based on a paradigm I never experienced before.
The ideas of functional programming, processes, message passing and fault tolerance are bundled together into a language and eco system which is fun and productive to work with.
Recently during a feature I was working on, I had to code an Elixir module that receives and dispatches tasks. I wanted to share with you my journey to the final module I ended up with.
The requirements from the module were the following:
nanit has been using kubernetes on production from its early days and for almost two years now. As with every large and complicated system, we experienced failures on all levels:
Every failure led us to push our monitoring capabilities further ahead so we can know about failures as early as possible and have the ability to interrogate our system and resolve them as fast as possible.
In this post I’ll go over our monitoring…