Category: Expert stories
Logistic Regression is a Linear Model for classification; a traditional linear model that is used to predict a numerical value, like how it is used by many e-mail providers to determine whether an e-mail is Spam or Ham.
Understand how logistic Regression works and how it differs from Linear Regression.
By Alan Lehane, Developer
In this blog, you will learn about Logistic Regression, another type of Linear Algorithm. I recommend reading my previous blog on Linear Regression to understand this blog fully.
emagine’s Technical Machine Learning Series:
Logistic Regression is a Linear Model for classification; a traditional linear model is used to predict a numerical value, whereas a logistic model is used to predict into which category an example belongs.
An example of a Logistic Regression Model is a system many e-mail providers use to determine whether an e-mail is Spam or Ham.
Spam ExpandSpam is junk e-mail. While there are several generally compatible definitions of “spam,” Spam is defined by the Spam {{`Assassin project as Unsolicited Bulk E-mail (UBE).
Ham Expand“Ham” is an e-mail that is not Spam.
In other words, “non-spam” or “good mail”. It should be considered a shorter, snappier synonym for “non-spam”. Its usage is particularly common among anti-spam software developers and not widely known elsewhere; generally, it is probably better to use the term “non-spam” instead.
The e-mail details are given to the model, and the model categorises the e-mail as Spam or Ham and moves the mail into the appropriate folder.
A Logistic Regression algorithm calculates the probability of an example belonging to each category.
The algorithm structure is the same as a Linear Algorithm, which I covered in a previous blog. Each input variable is assigned a coefficient; the difference is that the Log of the Equation is determined, confining the Equation between 0 & 1.
This answer is the probability of the current example being a member of the default category.
y = Log (C_1 x+C_2 x_2+C_3 x_3)In the determine spam example, y is the probability of whether an e-mail is Spam. For instance, if y=0.7, there is a 70% chance that the mail is Spam.
A Logistic Regression model is trained the same as a Linear Regression Model; the modal iterates through a training set of known examples, adjusting the coefficients when predictions are incorrect.
Data Preparation is very similar to Linear Regression with some key differences.
The Linear Regression algorithms I discussed in my previous blog, such as Ordinary Least Squares & Gradient Descent, can also be applied to logistic problems. However, a more suitable method is Maximum-likelihood Estimation. This algorithm favours and prioritises predictions that are closer to the extremes of the prediction range. EG.
In our e-mail example, as close to 1 or 0 as possible, resulting in more definitive predictions; predictions close to 0.5 are discouraged.
Alan Lehane, Software Developer
Alan has worked with Aspira/emagine for several years as a Software Developer, specialising in Data Analytics and Machine Learning. He has provided various services to Aspira’s clients, including Software Development, Test Automation, Data Analysis and Machine Learning.
[Blogs_Slider_Arrows]
[Blogs_Slider category=tech-development]