Link Search Menu Expand Document

Scores

Table of contents

  1. Scores
    1. R2 score (regression)
      1. In a nutshell
      2. What to do if my model has a low R2 value?
        1. 1. Use a different model
        2. 2. See if you can get more - or better - data
        3. 3. If R² gets negative: adjust the number of k-folds in the “Advanced” tab
        4. 5. Do nothing
    2. Accuracy score (classification)
      1. In a nutshell
      2. What if my model has low accuracy?
      3. What to do if my model has a low R2 value?
        1. 1. Use a different model
        2. 2. See if you can get more - or better - data
        3. 3. Do nothing
    3. Why do we care about the score?
      1. Why is my score bad?…

Scores

See how

R2 score (regression)

In a nutshell

The higher the score the better.

R² score should be a number between 0 and 1

Score of 0 means that your model does not really work well with your dataset.

Score of 1 means that your model is likely to be the right choice for your dataset.

Back to top


What to do if my model has a low R2 value?

1. Use a different model

In the field “My algorithm of choice is…” you can select one of many algorithms (models) to make predictions. If you know a little about Machine Learning already or are willing to do some extra reading on the internet, you can pick the algorithm that seems most suitable for your case.

If you don’t, just experiment with different algorithms on the list and see which one will produce predictions with the best R² score.

With MagicSheets you can test all available models in under 1 minute and see what works best!

2. See if you can get more - or better - data

If the model does not train well (that is, when R² value is far from perfect), sometimes the data itself is the culprit!

If your model has too few datapoints to learn from, or if the data is “messy” (spread around chaotically with high variation), the model might not fit your data well.

Try to collect more data, or see if the data you have can be in some way adjusted or cleaned.

One of the easiest fixes could be for example to get rid of data points that seem “too far away” from the dataset mean. Be careful: you should not arbitrarily delete datapoints just because they look suspicious, but often removing one or two points from a set of hundreds of datapoints might be a smart thing to do.

3. If R² gets negative: adjust the number of k-folds in the “Advanced” tab

This trick might work for a small dataset (around <30 data points)

This sounds fancy but can be a very simple way to fix your score when it goes “crazy” and drops below 0.

  • Go to “Advanced” tab.
  • Set “k-folds” to 1
  • Re-run your algorithm and chekc the score again
  • Adjust the model’s hyper-parameters (settings) (advanced!) If you understand a little bit about Machine Learning, you can adjust hyperparameters (AKA settings) of your model, such as regularization level for Support Vectors Machines model.

5. Do nothing

R² is just one measure of how well your model is performing.

Sometimes a model that simply does not fit the dataset well has a great R² value (close to 0).

At other times, a fairly decent model has an awful R² value

If you are confident in your assessment of the situation at hand: your data should be behaving linearly, then it might be OK to have confidence in your model, even if R² value seems off.

Back to top

Accuracy score (classification)

In a nutshell

If we train a classification algorithm, and test (validate) it on 100 historical data points, if the label the model predicts is correct in 90 cases, the accuracy is 90%.

Accuracy answers the question: after training, the model is right about the classes of how many predicted classes?


What if my model has low accuracy?

What to do if my model has a low R2 value?

1. Use a different model

In the field “My algorithm of choice is…” you can select one of many algorithms (models) to make predictions. If you know a little about Machine Learning already or are willing to do some extra reading on the internet, you can pick the algorithm that seems most suitable for your case.

If you don’t, just experiment with different algorithms on the list and see which one will produce predictions with the best R² score.

With MagicSheets you can test all available models in under 1 minute and see what works best!

2. See if you can get more - or better - data

If the model does not train well (that is, when R² value is far from perfect), sometimes the data itself is the culprit!

If your model has too few datapoints to learn from, or if the data is “messy” (spread around chaotically with high variation), the model might not fit your data well.

Try to collect more data, or see if the data you have can be in some way adjusted or cleaned.

One of the easiest fixes could be for example to get rid of data points that seem “too far away” from the dataset mean. Be careful: you should not arbitrarily delete datapoints just because they look suspicious, but often removing one or two points from a set of hundreds of datapoints might be a smart thing to do.

3. Do nothing

Sometimes the accuracy seems low, but is “high enough” - whether the number you are seeing is good or bad depends on what you want to use it for.

For example, if you are building a model to predict if new patients have a malignant type of cancer, then if the model is wrong in 30% of the cases that is probably not great for the patients. You want medical diagnosis to be as precise as possible to minimize risks to patients’ health, emotional well-being as well as costs.

But if you are predicting which of the new customers are likely to purchase the new premium product, then sometimes accuracy of 70% is good enough: your prediction will be correct in 7 out of 10 cases, which can still be enough for you to make a reasonable business decision about the new product rollout.

Back to top


Why do we care about the score?

Once we have fitted the model to our data (that is, once we have used our data to make new predictions), we might wonder how good our model is.

This is because sometimes our model of choice does not work perfectly well with all data sets.

Why is my score bad?…

Possible causes may include:

  1. Too little data. You can try to collect more data, or pick more features to enrich your dataset.
  2. Poorly chosen features. You can try to re-evaluate which features you picked to make your predictions.
  3. Noisy data (data is not “clean”, for example contains a lot of variation due to random mistakes). Try to examine if your dataset is noisy - for example, is the standard deviation of the labels very high?
  4. Model is unsuitable for the data. Try to make predictions with other models and see if your score improves. MagicSheets lets you experiment with different models rapidly so you can test 3-4 algorithms in under one minute.

Back to top