Back to posts

Artificial neurons : The Formal Definition of an Artificial Neuron

This post provides a comprehensive introduction to the artificial neuron, also known as the perceptron, a fundamental building block in neural networks. It demystifies the core concepts behind perceptrons, including input features, weights, bias, and the crucial role of the dot product in calculating the net input.

PythonAI
AG
Ala GARBAA 🚀 Full-Stack & DevOps Engineer

In a basic binary classification setting, an artificial neuron (or perceptron) aims to distinguish between two classes, often labeled 0 and 1. The neuron computes a so-called net input \( z \) by taking a weighted sum of input features, then applies a decision function to decide which class the input belongs to.

An artificial neuron (specifically the “perceptron”) is one of the simplest building blocks of neural networks. It is used to classify inputs into two categories (for example, “yes/no,” “spam/not spam,” “0/1,” etc.).

The term "neuron" here refers to biological neurons in the brain but is applied in an artificial context as a mathematical model mimicking their function.
figure 1
Figure 1: A neuron processing chemical and electrical signals.

1. The basic idea of the Linear Combination

  1. You have multiple input features (like data columns), which we group into a vector, often denoted \( \mathbf{x} \).
  2. You have a corresponding set of weights, grouped into a vector \( \mathbf{w} \).
    These weights represent the importance or influence each input feature has on the final decision of the neuron. They are numerical values assigned to each input, helping determine how strongly each input contributes to the overall outcome.
  3. You usually add one extra number, called the bias, denoted \( \mathbf{b} \).
    The bias in this context is a constant offset or threshold, that helps shift the decision boundary basically. It's an extra number added to the weighted sum of the inputs, allowing the neuron to activate. Even if all the weighted inputs are zero.

Think of these as follows:

  • \( \mathbf{x} \) (feature vector): your data points or input signals (for example, the pixels of an image, the words in a text, or the fields in a database record).
  • \( \mathbf{w} \) (weight vector): the importance or influence each input feature has on the final decision.
  • \( \mathbf{b} \) (bias): a constant offset or threshold that helps shift the decision boundary.

We combine all this information into a single numeric value called the net input, denoted \( \mathbf{z} \). Mathematically:

\[ z = \mathbf{w}^T \mathbf{x} + b \]

But don’t worry if the math notation looks scary! Let’s break down the pieces:

  1. \( \mathbf{w}^T \mathbf{x} \) is called a dot product (see below).
  2. + b just means we add the bias to that dot product.

The Dot Product (a.k.a. Vector Multiplication)

When we say \( \mathbf{w}^T \mathbf{x} \), we’re doing a dot product:

  • A dot product takes two lists (vectors) of numbers of the same length and multiplies each pair of numbers, then sums everything up.

Learn by example step by step !

Let's break down the example step by step to understand how the dot product and bias are used in the context of a perceptron.

Given:

  • Weight Vector \( \mathbf{w} = \begin{bmatrix} 0.5 \\ 0.2 \\ -0.1 \end{bmatrix} \)
  • Input Vector \( \mathbf{x} = \begin{bmatrix} 2 \\ 4 \\ 1 \end{bmatrix} \)
  • Bias \( b = 0 \) (for simplicity, we assume no bias here)

Objective:

Calculate the net input \( z \) using the formula:

\[ z = \mathbf{w}^T \mathbf{x} + b \]

Step 1: Transpose the Weight Vector

The weight vector \( \mathbf{w} \) is transposed to turn it from a column vector into a row vector.

\[ \mathbf{w}^T = [0.5 \quad 0.2 \quad -0.1] \]

Step 2: Perform the Dot Product

The dot product of \( \mathbf{w}^T \) and \( \mathbf{x} \) is calculated as follows:

\[ \mathbf{w}^T \mathbf{x} = 0.5 \times 2 + 0.2 \times 4 + (-0.1) \times 1 \]

Let's compute each term individually:

  • \( 0.5 \times 2 = 1.0 \)
  • \( 0.2 \times 4 = 0.8 \)
  • \( -0.1 \times 1 = -0.1 \)

Now sum these results:

\[ 1.0 + 0.8 - 0.1 = 1.7 \]

So, the dot product \( \mathbf{w}^T \mathbf{x} \) results in \( 1.7 \).

Step 3: Add the Bias

Since the bias \( b \) is assumed to be zero:

\[ z = \mathbf{w}^T \mathbf{x} + b = 1.7 + 0 = 1.7 \]

Thus, the net input \( z \) is \( 1.7 \).

Summary Example

Given:

  • \( \mathbf{w} = \begin{bmatrix} 0.5 \\ 0.2 \\ -0.1 \end{bmatrix} \)
  • \( \mathbf{x} = \begin{bmatrix} 2 \\ 4 \\ 1 \end{bmatrix} \)
  • \( b = 0 \)

Steps:

  1. Transpose \( \mathbf{w} \): \[ \mathbf{w}^T = [0.5 \quad 0.2 \quad -0.1] \]
  2. Compute the dot product: \[ \mathbf{w}^T \mathbf{x} = 0.5 \times 2 + 0.2 \times 4 + (-0.1) \times 1 = 1.0 + 0.8 - 0.1 = 1.7 \]
  3. Add the bias: \[ z = 1.7 + 0 = 1.7 \]

Final Result

The net input \( z \) is \( 1.7 \).

This process shows how the weighted inputs are combined with the bias to produce a single value that can be used for decision-making in the perceptron model. If you have a specific threshold or activation function, you would then apply that to \( z \) to get the final output classification.

figure 2
Figure 2: 3D Visualization of Weight and Input Vectors in a Perceptron Model.

We can say now:

  • A weight vector \( \mathbf{w} \in \mathbb{R}^m \),
  • A feature vector \( \mathbf{x} \in \mathbb{R}^m \),
  • A bias term \( b \in \mathbb{R} \),

the net input \( z \) is defined as:

\[ z = \mathbf{w}^T \mathbf{x} + b. \]

- \( \mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_m \end{bmatrix} \), - \( \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} \), - \( b \) is a scalar (real number).

Check out the code and output, and try running the notebook yourself on Colab:

Dot Product Refresher

If we have two column vectors,

\[ a = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix}, \quad b = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}, \]

then the dot product \( a^T b \) is:

\[ a^T b \;=\; a_1 b_1 \;+\; a_2 b_2 \;+\; a_3 b_3. \]

Taking the transpose \( a^T \) turns a column vector into a row vector, which is how we can multiply it (as a 1Ă—n row) with \( b \) (as an nĂ—1 column).

When we say \(\mathbf{w}^T \mathbf{x}\), we’re doing a dot product:

  • A dot product takes two lists (vectors) of numbers of the same length and multiplies each pair of numbers, then sums everything up.

For example, if:

\[ \mathbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ w_3 \end{bmatrix} \quad \text{and} \quad \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}, \]

then

\[ \mathbf{w}^T \mathbf{x} = w_1 \cdot x_1 + w_2 \cdot x_2 + w_3 \cdot x_3. \]

- If you’re used to thinking in code, it’s roughly:

dot_product = w[0]*x[0] + w[1]*x[1] + w[2]*x[2]
This Python snippet demonstrates how the dot product works by multiplying corresponding elements of two lists and summing them up.

The symbol \(\mathbf{w}^T\) just means we’re taking the “transpose” of \(\mathbf{w}\) to convert it from a column vector into a row vector—but in practical coding terms, it’s basically how we align the two lists for multiplication and summation.

2. From Threshold to Bias

Originally, the perceptron can be expressed using a threshold \( \theta \):

  • If \( z \geq \theta \), output 1 (Class 1).
  • If \( z < \theta \), output 0 (Class 0).

We often rewrite \( \theta \) as a bias \( b \) by setting:

\[ b = -\theta. \]

Hence,

\[ z - \theta \geq 0 \quad \Longleftrightarrow \quad z + b \geq 0 \quad \text{(since } b = -\theta\text{)}. \]

Rewriting the threshold as a bias simplifies the expression and keeps all important components in one place.

We generally prefer \( z + b \geq 0 \) because it looks simpler, and we can keep all the important components in one expression (\( \mathbf{w}^T \mathbf{x} + b \)).

3. The Decision Function

After we compute the net input \( z \), the perceptron applies a very simple function—often called a “step function” or “activation function”—to decide which class the input should belong to.

The perceptron uses a simple unit step function, often denoted as \( \sigma(\cdot) \), to map the net input \( z \) to either 0 or 1:

\[ \sigma(z) = \begin{cases} 1 & \text{if } z \geq 0, \\ 0 & \text{otherwise}. \end{cases} \]

So, combining everything:

  1. Compute \( z = \mathbf{w}^T \mathbf{x} + b \).
  2. Output \( \sigma(z) \), where
    • \( \sigma(z) = 1 \) if \( z \geq 0 \),
    • \( \sigma(z) = 0 \) if \( z < 0 \).

Visually, you can think of \( \mathbf{w}^T \mathbf{x} + b = 0 \) as defining a decision boundary in the input space.

Points for which \( z \geq 0 \) are classified into one category (Class 1), and points for which \( z < 0 \) are classified into the other category (Class 0).

4. Geometric Interpretation

If your inputs \( \mathbf{x} \) have only 2 features (\( x_1, x_2 \)), you can imagine all the points \( (x_1, x_2) \) on a 2D plane. The equation \( \mathbf{w}^T \mathbf{x} + b = 0 \) (or \( w_1 x_1 + w_2 x_2 + b = 0 \)) represents a straight line dividing the plane into two half-spaces:

  • On one side of the line, \( \mathbf{w}^T \mathbf{x} + b \geq 0 \) \( \rightarrow \) output = 1.
  • On the other side, \( \mathbf{w}^T \mathbf{x} + b < 0 \) \( \rightarrow \) output=0.

In higher dimensions (more features), it’s a “hyperplane,” but the idea is the same.

  • The vector \( \mathbf{w} \) is normal (perpendicular) to the decision boundary (a hyperplane in \( \mathbb{R}^m \)).
  • The bias \( b \) shifts this decision boundary away from the origin.
figure 3
Figure 3: Visualizing the Geometric Interpretation of a Linear Decision Boundary in 2D.

Check out the code and output, and try running the notebook yourself on Colab:

5. Quick Recap of Math Terms

  • Vector: A list (or array) of numbers.
  • Transpose (\( \mathbf{v}^T \)): Turning a column vector into a row vector (or vice versa). In code, think of it as how you shape your array.
  • Dot Product: Multiply corresponding elements of two vectors, then sum the results (alignment measure).
  • Bias (\( b \)): Shifts the decision boundary away from the origin, sometimes called the “intercept” in linear regression.
  • Step Function (\( \sigma(z) \)): Returns 1 if \( z \ge 0 \), else 0.

6. Example: Email Spam Filter

Let’s imagine we have two features for emails:

  1. \( x_1 \) = count of suspicious words
  2. \( x_2 \) = number of links in the email

We learn weights \( \mathbf{w} = (w_1, w_2) \) and a bias \( b \). Then:

\[ z = w_1 \cdot (\text{suspicious word count}) + w_2 \cdot (\text{link count}) + b. \]

If \( z \ge 0 \), classify it as spam; otherwise, not spam.

The term "perceptron" refers to a fundamental unit in machine learning that mimics the behavior of a biological neuron. By combining inputs with learned weights and applying a decision rule, it forms the basis for more advanced neural networks.

Released under the MIT License. Ala GARBAA © 2009-2025.

Built & designed by Ala GARBAA. RSS Feed