Let us start with logistic regression. Recall that a logistic regression divides 2 sets by a line (or a hyperplane if we have higher dimensions)

The logistic regression yields values form 0 to 1, and we can consider the process as making an evaluation. In the process we get data and we calculate our evaluation by a formula.

For example we may have the following assignment: to compute if we have enough goods in storage to last for a week of sales. This is quite a common problem, and say some clerks report their numbers to their manager to figure it out. The manager collects information, processes it and makes an evaluation.

Note that this is how a logistic regression functions.

Usually computing if an amount of goods is sufficient is not the only problem. In addition we need to know, for example, if our storage is full to optimal capacity (75% -85% or something like this). Therefore we need to evaluate another statistic.

And of course these people should report to their supervisor who will make another evaluation:

So we get a whole hierarchy of evaluations and at the end they report to CEO. We can compare it with a neural network structure:

We can observe a lot of in common with a corporation chain of command. As we see middle managers are hidden layers which do the balk of the job. We have the similar information flow and processing which is analogous to forward propagation and backward propagation.

What is left now is to explain that dealing with sigmoid function at each node is too costly so it mostly reserved for CEO level.