Category: Expert stories

Machine Learning Algorithm: Logistic Regression

Logistic Regression is a Linear Model for classification; a traditional linear model that is used to predict a numerical value, like how it is used by many e-mail providers to determine whether an e-mail is Spam or Ham.

Understand how logistic Regression works and how it differs from Linear Regression.

By Alan Lehane, Developer

In this blog, you will learn about Logistic Regression, another type of Linear Algorithm. I recommend reading my previous blog on Linear Regression to understand this blog fully.

Read: Linear Regression

emagine’s Technical Machine Learning Series:

What is logistic Regression?

Logistic Regression is a Linear Model for classification; a traditional linear model is used to predict a numerical value, whereas a logistic model is used to predict into which category an example belongs.

An example of a Logistic Regression Model is a system many e-mail providers use to determine whether an e-mail is Spam or Ham.

Spam Expand

Spam is junk e-mail. While there are several generally compatible definitions of “spam,” Spam is defined by the Spam {{`Assassin project as Unsolicited Bulk E-mail (UBE).

From cwiki.apache.org

Ham Expand

“Ham” is an e-mail that is not Spam.

In other words, “non-spam” or “good mail”. It should be considered a shorter, snappier synonym for “non-spam”. Its usage is particularly common among anti-spam software developers and not widely known elsewhere; generally, it is probably better to use the term “non-spam” instead.

From cwiki.apache.org

The e-mail details are given to the model, and the model categorises the e-mail as Spam or Ham and moves the mail into the appropriate folder.

How does logistic Regression work?

A Logistic Regression algorithm calculates the probability of an example belonging to each category.

The algorithm structure is the same as a Linear Algorithm, which I covered in a previous blog. Each input variable is assigned a coefficient; the difference is that the Log of the Equation is determined, confining the Equation between 0 & 1.

This answer is the probability of the current example being a member of the default category.

y = Log (C_1 x+C_2 x_2+C_3 x_3)
0<y<1

In the determine spam example, y is the probability of whether an e-mail is Spam. For instance, if y=0.7, there is a 70% chance that the mail is Spam.

Training

A Logistic Regression model is trained the same as a Linear Regression Model; the modal iterates through a training set of known examples, adjusting the coefficients when predictions are incorrect.

Data Preparation

Data Preparation is very similar to Linear Regression with some key differences.

Non-Discrete Output Variable – All the Training Set output variables must be members of a fixed number of categories.
Noise Sensitivity – Logistic Regression is much more sensitive to outliers and incorrect data than linear Regression, so Data Quality standards need to be higher.
Gaussian Distribution – It is assumed that there is a linear relationship between the input and output variables.

Types of Logistic Regression

The Linear Regression algorithms I discussed in my previous blog, such as Ordinary Least Squares & Gradient Descent, can also be applied to logistic problems. However, a more suitable method is Maximum-likelihood Estimation. This algorithm favours and prioritises predictions that are closer to the extremes of the prediction range. EG.

In our e-mail example, as close to 1 or 0 as possible, resulting in more definitive predictions; predictions close to 0.5 are discouraged.

Alan Lehane, Software Developer

Alan has worked with Aspira/emagine for several years as a Software Developer, specialising in Data Analytics and Machine Learning. He has provided various services to Aspira’s clients, including Software Development, Test Automation, Data Analysis and Machine Learning.

More insights

[Blogs_Slider_Arrows]

[Blogs_Slider category=tech-development]

Follow

Facebook

Instagram

Subscribe to emagine Insights

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.

For companies

For consultants