Features and labels

Table of contents

1. Features and labels

Features and labels

In a nutshell

Features are the properties of the data that is used to predict labels. For example, if we use the Monthly Active Users (MAU) metric to predict sales, then

$\text{MAU} = \text{features} \\ \text{sales} = \text{labels}$

Features explained

A synonym of features is properties.

For example, features of a person could be: eye color, hairstyle, shape of one’s nose, lips etc. Machine Learning algorithms can use these features to recognize who you are (this is in a nutshell how smartphone facial recognition works).

When we are not trying to predict an identity for a face, but instead something useful for our business, for example to make customer churn, we need to use different features. Good features for customer churn prediction might be demographic data, customer activity data, etc. Based on these features, we would predict whether a customer will churn or not - the label - of the new customer, based on the labels (yes/no) our previous customers with given feature values had.

As another example, say we have a loan business, estimating credit scores for new clients. The predicted credit scores are labels. The record of a client, its demographic data, data concerning personal health etc. are features.

Which features should I use?

Precisely which features you should use to make your predictions will depend on the type of your business, the problem at hand, your expertise, and of course, availability of the data.

In practice, if you are on top of your operations you can make reasonable judgment as to which features should affect the values you are predicting. Think of what you would logically look at to predict the label.

Back to top

Labels explained

For regression problems, labels would be values for given set of features. For classification problems, labels would be the classes of data points.

For example, when predicting user growth based on past user activity, the value of the number of users for given data is the labels.

When predicting customer churn based on various features, whether or not the given client was lost is the label (it can take 2 values: yes/no.)

Back to top

Features and labels in Machine Learning models

When we use Machine Learning to make predictions, we typically look at what the features and labels in data we have (for example, historical, past performance data) can tell us about the labels of the new data points (for example new clients).

In other words, we look at how certain combination of factors (features) combines to give a certain effect (label).

For example, if we observe the following pattern:

We can quickly deduce that (for example, by trial and error, or simple calculation) that the revenue depends on MAU as

$\text{revenue} = 1.2\times \text{MAU}+1000$

In this case, if we know (based e.g. on our expertise) that our MAU is expected to increase to 750 next month, we can apply this simple formula to forecast revenue to be

$\text{revenue} = 1.2\times 750+1000=1840$

MagicSheets lets you conduct similar analysis for much more complicated situations and on much larger data sets, all in the matter of seconds.

Back to top