“Science is what we understand well enough to explain to a computer. Art is everything else we do.” — Donald Knuth

Recently I was having this conversation with my colleague that why Erlang has good support for concurrency and why it should be use for building distributed systems.

I would like to share my findings on this topic in this article. This article is just a high level overview of philosophy of Erlang hence it contains no code example.

The Problem

No description of a software is complete without describing the type of problem that is supposed to be solved using the software. Every software piece ever made till date is to solve some specific problem. Erlang is no exception! Erlang was originally designed to build telecom switching systems. To understand philosophy of Erlang we must understand the what are requirements for the properties of a telecom system.

The requirements:

  1. System must be able to handle very large number of concurrent activities
  2. Action must be completed in certain time. It is strictly time bounded.
  3. System must be distributed over several computers.
  4. The system should be in continuous operation for many years.
  5. Reconfiguration should be performed without stopping the system.
  6. They have strict requirements of quality and reliability.
  7. System must be fault tolerant.
  8. Telecom system expect to run FOREVER!

So to meet all this above requirements a programming language and underlying system (on which code will run) must provide and support this following features:

  1. High support for concurrency
  2. Real-time actions. (Actions must be completed in specific time)
  3. System should easily scale from single to multi-node distributed system.
  4. Continuous operation and in-place up-gradation. Software upgrade must be performed “in-place” i.e. without stopping system.

So the ultimate problem which Erlang supposed to solve is “how to make reliable distributed systems in the presence of software errors”.

Conventional programming languages do not solve this problem because they do not allow different software modules to co-exist in such a way that there is no interference between modules.

The commonly used threads model of programming, where resources are shared, makes it extremely difficult to isolate components from each other so that errors in one component can propagate to another component and damage the internal consistency of the system.

So how Erlang is different than this conventional programming languages? Well Erlang belongs to the family of pure message passing languages. It is a concurrent process-based language having strong isolation between concurrent processes.

The Philosophy

Having said that, lets see some important principals from philosophy of Erlang which make it suitable for solving this problem.

  • Everything is a process.

In context of Erlang a process can be thought of as a self-contained virtual machine. Each process must be identified by a unique unforgettable identifier. Pid of the process.

If you know the Pid of a process only then you can send a message to the process. Process creation and destruction is a lightweight operation so that system should scale easily

  • Processes are strongly isolated

Two processes operating on the same machine must be as independent as if they ran on physically separated machines.

Processes have “share nothing” semantics. This is obvious since they are imagined to run on physically separated machines.

Hence there is no shared data or shared resource between two processes then obvious problems of concurrency due to sharing are automatically avoided.

  • Message passing is the only way for processes to interact

Message passing is the only way to pass data between processes. Again since nothing is shared this is the only means possible to exchange data.

Process isolation implies that message passing is asynchronous.

If it is synchronous then error in receiver of message could block indefinitely sender of message destroying the property of isolation.

Message passing is assumed to be atomic which means that a message is either delivered in its entirety or not at all.

  • Let it crash

If error occurs withing a process then rather than trying to actively prevent it, you let it crash, and put in place a policy that allows it to immediately restart and recover on a clean slate.

  • Error handling is non-local

That means a process won’t do its own error handling.

Erlang has “fail-fast” philosophy that means processes do what they are supposed to do or fail. i.e. Process must obey single responsibility principle.

It should be possible for one process to detect failure in another process and we should also know the reason for failure.

There should be another separate process to do error handling of other process. Again single responsibility principle.

This philosophy of Erlang makes system highly concurrent, reliable, scalable and fault tolerant.

The Real World Use

So who uses Erlang? Does it deliver its promise to make reliable distributed systems in the presence of software errors?

Well one of the leading messaging app company ‘WhatsApp’ uses Erlang for its communication module. They adopted Erlang cause they had strong requirement of concurrency, scalability, real-time actions and fault tolerant system. Whatsapp handles over 65 billion messages every day while providing reliable service to its users.

So from this example I can say Yes! Erlang does deliver its promise.

Other companies also use Erlang, one of the largest user of Erlang is Ericsson. Ericsson use it to write software used in telecommunications systems.

Final Note

They say code is obsolete ideas are not. Whether to you use Erlang may vary on various parameters, but philosophy behind it very generic and can be applied to solve real world problems in distributed computing. Hence Erlang is not just a language, it is a philosophy.

I hope that this article has helped you is some way. Criticism is always welcome!