Link Search Menu Expand Document

Advanced: algorithm settings

Table of contents

  1. Advanced: algorithm settings
    1. “Set intercept = 0”
      1. In practice
    2. 2. Regularization strength
      1. In practice
    3. 3. Degree
      1. In practice
    4. 4. Tolerance
      1. In practice
    5. 5. Number of estimators
      1. In Practice

Advanced: algorithm settings

Algorithm settings help run the algorithms better. Sometimes certain algorithms don’t run well for e.g. small data sets, but if we tweak the settings slightly this can make algorithm run perfect. In other situations, tweaking the settings can improve your score, reduce the prediction errors, increase accuracy, etc.

“Set intercept = 0”

Use for: Linear Regression

Linear models are all about fitting a straight line to your data set. Sometimes you might think that your straight line should go through 0. For example, if your model is

\[\text{Unique visitors} = \text{marketing budget} \times \text{conversion rate} + \text{free UV}\]

we might think that sometimes if we spend nothing on marketing, the \(\text{free UV}\) (unique visitors to our website if we do nothing to attract them) should equal 0. In this situation we restrict the straight line model to definitely pass through 0, and fit the data as appropriate.

In practice

  • Set intercept = 0 when you believe that, given the situation you are analyzing, your value should = 0 for all features = 0.
  • Set intercept =0 when you get negative values where you know you can’t possibly see negative values (for example, you can’t get a negative click-through rate, there is no such thing as negative number of site views, etc.)

Back to top

2. Regularization strength

Use for: SVM Regression, SVM Classification, Logistic Classification

The stronger regularization, the less likely we are to mis-fit our model. In other words, for larger values of regularization parameter, the model will fit our existing data better, but we will also increase the risk of not fitting new data points very well.

In practice

  • If the model’s score does not satisfy you, you might wish to try and lower the regularization strength.

  • For smaller data sets, the regularization strength should be higher

  • For larger data sets, you can relax the regularization requirements and set it to a lower value: the more data points you have, the more certain you can be that the model will fit well without the regularization “trick”.

Back to top

3. Degree

Use for: SVM Regression

Defines what function we are using to fit the data.

In practice

Degree = 1 is equivalent to the linear regression model.

Degree = 2 means that we are using a parabola to fit to the data.

Degree > 2 means that we use a higher order polynomial to fit the data.

Back to top

4. Tolerance

Use for: SVM Regression, Logistic Classification, SVM Classification

Tolerance for error.

In practice

High tolerance means that the algorithm stops optimizing itself while it is still relatively far away from the “perfect data fit”.

Models trained with high tolerance might generalize better for new data points but have lower accuracy.

Low tolerance means that the algorithm keeps improving itself for a longer time, until the model is much closer to the “perfect data fit”.

Models with lower tolerance might fit the training data better but not generalize as well for new data.

Back to top

5. Number of estimators

Use for: Random Forest Regression, Random Forest Classification

Random Forest algorithm creates a family of small models (trees). Each small model is run on your data to create a prediction.

For regression problems, each prediction is a number. These predictions are then averaged to give you the prediction for the whole model (the forest).

For classification problems, each prediction is a class. The class predicted by the most trees is then the class predicted by the forest.

We call this family of small models an “ensemble model”.

The number of “trees” in the ensemble model “forest” is precisely the “number of estimators” parameter.

In Practice

The more data rows you have, the higher the number of estimators should be for a good prediction.

Back to top