Skip to main content

Getting started with neural networks - the single neuron

Neural networks can model the relationship between input variables and output variables. A neural networks is built of artificial neurons which are connected. For the start it's the best to look at the architecture of a single neuron.

They are motivated by the architecture and functionality of neuron cells, of which brains are made of.  The neurons in the brain can receive multiple input signals, process them and fire a signal which again can be input to other neurons. The output is binary, so the signal can be fired (1) or not be fired (0) which depends on the input.

The artificial neuron has some inputs which we call \(x_1, x_2, ... x_p\). There can be an additional input \(x_0\), which is always set to \(1\) and is often referred to as bias. The inputs can be weighted with weights \(w_1, w_2, ..., w_p\) and \(w_0\) for the bias.  With the input and the weights we can calculate the activation of the neuron \[  a_i = \sum_{k = 1}^p w_k x_ik + w_0 \].
The output of the neuron is a function of it's activation. Here we are free to choose whatever function we want to use. If our output shall be binary or in the intervall \([0, 1]\) a good choice is the logistic function.

So the calculated output for the neuron and the observation i is  \[ o_i= \frac{1}{1 + exp(-a_i)}\]

Pretty straightforward, isn't it? If you know about logistic regression this might be already familiar to you.

Now you know about the basic structure. The next step is to "learn" the right weights for the input. Therefore you need a so called loss function which tells you how wrong you are with your predicted output. The loss function is a function of your calculated output \(o_i\) (which depends on your data \(x_i1, ..., x_ip\) and the weights) and of course on the true output \(y_i\). Your training data set is given, so the only variable part of the loss function are your weights \(w_k\). Low values of the loss function tell you, that you make an accurate prediction with your neuron.

One simple loss function would be the simple difference \(y_i - o_i \). A more sophisticated function is \(y_i ln(o_i) \cdot (1-y_i) ln(1-o_i) \), which is the negative log- Likelihood of your data, if you see \(o_i\) as the probability that the output is 1. So minimizing the negative log - Likelihood is the same as maximizing the Likelihood of your parameters given your training data set.

The first step for learning about neural networks is made! The next thing to look at is the gradient descent algorithm. This algorithm is a way to find weights, which minimize the loss function.

Have fun!

Comments

Popular posts from this blog

Explaining the decisions of machine learning algorithms

Being both statistician and machine learning practitioner, I have always been interested in combining the predictive power of (black box) machine learning algorithms and the interpretability of statistical models.

I thought the only way to combine predictive power and interpretability is by using methods that are somewhat in the middle between 'easy to understand' and 'flexible enough', like decision trees or the RuleFit algorithm or, additionally, by using techniques like partial dependency plots to understand the influence of single features. Then I read the paper "Why Should I Trust You" Explaining the Predictions of Any Classifier [1], which offers a really decent alternative for explaining decisions made by black boxes.


What is LIME? The authors propose LIME, an algorithm for Local Interpretable Model-agnostic Explanations. LIME can explain why a black box algorithm assigned a specific classification/prediction to one datapoint (image/text/tabular data) b…

Statistical modeling: two ways to see the world.

This a machine-learning-vs-traditional-statistics kind of blog post inspired by Leo Breiman's "Statistical Modeling: The Two Cultures". If you're like: "I had enough of this machine learning vs. statistics discussion,  BUT I would love to see beautiful beamer-slides with an awesome font.", then jump to the bottom of the post and for my slides on this subject plus source code.

I prepared presentation slides about the paper for a university course. Leo Breiman basically argued, that there are two cultures of statistical modeling:
Data modeling culture: You assume to know the underlying data-generating process and model your data accordingly. For example if you choose to model your data with a linear regression model you assume that the outcome y is normally distributed given the covariates x. This is a typical procedure in traditional statistics. Algorithmic modeling culture:  You treat the true data-generating process as unkown and try to find a model that is…