Link Search Menu Expand Document

Predicting numbers vs. predicting classes

Table of contents

  1. Predicting numbers vs. predicting classes
    1. Predicting numbers: regression
      1. In a nutshell
    2. Predicting classes: classification
      1. In a nutshell
    3. Visual comparison
      1. Regression problem
      2. Classification problem

Predicting numbers vs. predicting classes

Predicting numbers: regression

In a nutshell

Regression problems are about predicting the next couple of numbers in a sequence. ___

The so-called regression is all about using the data we have on hand about a certain number sequence to infer what number or numbers should come next.

For example, we look at stock prices on different dates to predict what the stock price the next day might be.

Sometimes we use just one feature, or property to make regression predictions.

For example, we might predict house prices depending on distance from the city center. We make a reasonable assumption that the farther away from the city center the house is located, the cheaper it should be.

\[\text{house price} = \text{distance from city center} \times \$0.25m + \$10\]

However, this is an extremely simplistic model, and if we want to make good predictions, we probably want something slightly more complicated, for example

\[\text{house price} = \text{distance from city center} \times \$0.25m + \text{number of bedrooms} \times \$1m+\text{total surface} \times \$0.01m+\$10\]

And so on. We can add more features (properties such as distance to the city center, number of bedrooms, total surface in this case) to make this model more and more realistic, as we collect more data and see what works best.

A detailed example of implementing regression models in MagicSheets is available here.

Predicting classes: classification

In a nutshell

Classification problems are about predicting what class (or type) of data the new point is most similar to.

The so-called classification is about figuring out - based on the data we have at hand - how our data should be split and which data points are the new points more similar to.

For example, we collect activity data concerning users of our e-commerce platform. We figure out that users with a certain behavior pattern - for example, users who often cancel purchase just before proceeding to payment interface - usually end up leaving our platform within 3 months. We can use this data to predict that new users who also often change their mind about the purchase just before making the payment will also be likely to belong in this group.

In the above example we could use an extremely simple metric: the number of times a user leaves the purchase before making payment. This might be too simple in some cases for strong predictions concerning the user churn in the future, but sometimes it is just enough.

In other cases, we might want to take into account more data - more features - concerning the users.

Regardless of how many features (properties) we want to use, MagicSheets makes it very simple to implement classification models and gives you predictions directly based on the data you have.

You can find a detailed walk-through classification case study here

Visual comparison

Regression problem

Our data follows a certain trend and we are predicting the values for new data points. The grey points are the ones for which we predict the value on the trend line. We only consider a single feature in this case.

Classification problem

We know that our data comes from 2 different classes (2 different types), marked on the chart with distinct colors. We predict which class new (grey) points should belong to based on the pattern that we train our model to “see” in the data.

See how

Back to top