Skip to main content

Getting started with neural networks - the single neuron

Neural networks can model the relationship between input variables and output variables. A neural networks is built of artificial neurons which are connected. For the start it's the best to look at the architecture of a single neuron.

They are motivated by the architecture and functionality of neuron cells, of which brains are made of.  The neurons in the brain can receive multiple input signals, process them and fire a signal which again can be input to other neurons. The output is binary, so the signal can be fired (1) or not be fired (0) which depends on the input.

The artificial neuron has some inputs which we call \(x_1, x_2, ... x_p\). There can be an additional input \(x_0\), which is always set to \(1\) and is often referred to as bias. The inputs can be weighted with weights \(w_1, w_2, ..., w_p\) and \(w_0\) for the bias.  With the input and the weights we can calculate the activation of the neuron \[  a_i = \sum_{k = 1}^p w_k x_ik + w_0 \].
The output of the neuron is a function of it's activation. Here we are free to choose whatever function we want to use. If our output shall be binary or in the intervall \([0, 1]\) a good choice is the logistic function.

So the calculated output for the neuron and the observation i is  \[ o_i= \frac{1}{1 + exp(-a_i)}\]

Pretty straightforward, isn't it? If you know about logistic regression this might be already familiar to you.

Now you know about the basic structure. The next step is to "learn" the right weights for the input. Therefore you need a so called loss function which tells you how wrong you are with your predicted output. The loss function is a function of your calculated output \(o_i\) (which depends on your data \(x_i1, ..., x_ip\) and the weights) and of course on the true output \(y_i\). Your training data set is given, so the only variable part of the loss function are your weights \(w_k\). Low values of the loss function tell you, that you make an accurate prediction with your neuron.

One simple loss function would be the simple difference \(y_i - o_i \). A more sophisticated function is \(y_i ln(o_i) \cdot (1-y_i) ln(1-o_i) \), which is the negative log- Likelihood of your data, if you see \(o_i\) as the probability that the output is 1. So minimizing the negative log - Likelihood is the same as maximizing the Likelihood of your parameters given your training data set.

The first step for learning about neural networks is made! The next thing to look at is the gradient descent algorithm. This algorithm is a way to find weights, which minimize the loss function.

Have fun!


Popular posts from this blog

My first deep learning steps with Google and Udacity

I did my first steps in deep learning by taking the deep learning course at Udacity.

Deep learning is a hot topic. Deep neural networks can classify images, describe scenes, translate text and do so much more. It's great that Google and Udacity offer this course which helped me getting started with learning about deep learning.

How does the course work? The course consists of dozens 1-2 minute videos and assignments accompanying the videos.

Well, actually it's the other way round: The assignments are the heart of the course and the videos just give you the basic understanding you need to get started building networks. There are no exams.

The course covers basic neural networks, softmax, stochastic gradient descent, backpropagation, ReLU units, hidden layers, regularization, dropout, convolutional networks, recurrent networks, LSTM cells and more. Building deep neural networks is a bit like playing Legos and the course shows you the building bricks and teaches you how to use th…

Statistical modeling: two ways to see the world.

This a machine-learning-vs-traditional-statistics kind of blog post inspired by Leo Breiman's "Statistical Modeling: The Two Cultures". If you're like: "I had enough of this machine learning vs. statistics discussion,  BUT I would love to see beautiful beamer-slides with an awesome font.", then jump to the bottom of the post and for my slides on this subject plus source code.

I prepared presentation slides about the paper for a university course. Leo Breiman basically argued, that there are two cultures of statistical modeling:
Data modeling culture: You assume to know the underlying data-generating process and model your data accordingly. For example if you choose to model your data with a linear regression model you assume that the outcome y is normally distributed given the covariates x. This is a typical procedure in traditional statistics. Algorithmic modeling culture:  You treat the true data-generating process as unkown and try to find a model that is…