Can’t make sense of words like Docker, Containers, Kubernetes, ETCD? You are not alone. The container technology space is on a tear lately. Let me try to make sense of a few the buzz words.
Caution: I only have limited understanding of this ecosystem. I’d suggest reading the article to understand concepts rather than individual companies/projects and where they stand.
Architecture style in which large application is split into multiple small applications (aka services). Each service performs a small specific task of the application hence called micro-service. Each microservice can use its own set of technologies (language, db etc). The services communicate with each other using queues or REST API. Read Martin Fowler’s excellent guide on the topic. Following diagram is from my experience converting monolith application to microservices.
Since each microservice is small, a few/all of them can be deployed on a single server. There are 2 ways of doing this: Virtual Machines and Containers.
An entire Operating System wrapped in a installable software. You can install your application on a single OS, wrap it in a Virtual Machine and then install it on any server (OS/Linux/Windows) and application will work correctly. VMs are resource heavy and large (since it is an entire OS), thus it is slow and you can only use limited number of VMs on single server.
Instead of full-fledged Operating Systems we can use set of core libraries that most applications use to interact with the server. Container engine is a thin layer that acts as an intermediary between application and OS/Other-applications.
Containers are thin wrappers surrounding the microservice/application. Like VMs if your application works well within the container, you can be assured it will work on any server (OS/Linux/Windows). If application is wrapped in a container, it can be easily replicated or migrated to another server. (example: replicate front-end container in response to increase in requests). Also, it ensures, containers restrict application’s abilities on the server. (Example: application can access only restricted set of users, file system, ports, CPU, memory etc).
These containers need to have specific format so that the runtime can start and run them properly. There are mainly 2 competing formats
Industry has not yet standardized the format for containers. Thus, all container orchestration frameworks advertise their compatibility with different containers. For example: Kubernetes is capable of running both Docker and Rocket containers.
Docker and CoreOS are 2 companies which not only have their container formats (Docker and Rkt) but also entire ecosystem to manage them (some of which are listed below).
Once a container is created for the application, it can be stored on a server called container registry. This helps in reusing the container image, to create new copies/instances.
Registries are of two types
- Public - Which hosts basic software images like Tomcat, Nginx, MySQL etc.
- Private - Which host your custom application containers.
Typically, first a basic image is downloaded from Docker’s Open Repository. Then the image is customized as per the application’s needs, and then saved to any of the private registries mentioned below.
Once all microservices/applications are wrapped in containers, there needs to be a way to automate their deployments, rollbacks, restart, healing etc. Also, there is a need to monitor services, discover new services, authenticate them, scale/replicate etc.
All these requirements call for an orchestration framework.
Few of such frameworks are listed below:
- Kubernetes - Most popular one.
- Docker Swarm - Docker’s own embedded orchestration.
- Mesos - Highly popular cluster manager, used to manage servers. Is used with Hadoop, Kafka and Spark. Recently gained container (Docker/Rkt/OCI) support.
- Amazon ECS - AWS’s own cloud based orchestration.
- Google Container Engine - Kubernetes on Google Compute Engine.
As part of the orchestration, each service needs to register itself, so that other services and orchestrator can discover and work with it. For this functionality, a highly available, strongly consistent system is required. Few of softwares which satisfy these requirements include ETCD, Consul and Zookeeper. These projects are used by frameworks internally, and we need not have complete understanding of it. I’d recommend this Google paper if interested in knowing how these systems are implemented.
The ecosystem surrounding microservices and containers is vibrant and fast-paced. Since, it is still going through the churn and yet to mature, it can be difficult to keep track of all projects. I guess, we will soon see consolidation of these technologies. Trends are beginning to show Docker leading in containers while Kubernetes in container management. Fascinating times.
Left over buzzwords
- Hadoop: System used to store and process large amounts of data.
- Kafka: Distributed messaging system.
- Spark: Big data processing engine. Similar to (& allegedly much better than) Hadoop
- Redis: Distributed key-Value store.
If you want to learn about some of the modern databases, check out this article.