Difference Between Machine Learning Tasks: Classification, Regression, Clustering

A description of each type of task and what these terms actually mean.

As you’re studying machine learning, you’ve probably came across these vocabulary words one way or another (especially classification, a popular problem type introduced in the beginning stages of ML curriculum). Later on, these other terms may have popped up as well. Below is a description of differences between classification, regression, and clustering.

This is where ML models predict the category/class based on the type of input data. There can be two classes to predict (binary classification) or more than 2 classes (multi-class classification). For example, if a dataset has different types of animals like dog, cat, and bunny, then the task of classification is to predict, when given an image from this dataset, what animal category this image belongs to.

A Type of Model Used:
Convolutional Neural Networks: nonlinear model that determines the probability of classes a certain input data point could be. For example, given an image of a cat, the resulting probabilities could be 0.9 cat, 0.04 bunny, and 0.06 dog.

More applications:
- Emotion detection (is the person happy, sad, etc.)
- Positive/negative review detection
- Face recognition (assign name to the face)

Cat, bunny, and dog :) (Source: One Green Planet)


Clustering is a problem type whose object is to divide a dataset and cluster certain data points together into groups that have similar properties or features. For example, given a dataset with 3 colors of circles, black, green, and red, the task is to separate them out into 3 groups based on the color feature.

A Type of Model Used:
K-means clustering: this algorithm divides n observations/data points into k numbers of clusters.

More Applications
- Urban Planning- group certain houses together based on geographical location to study their differences in pricing, features, etc.
- Libraries: group certain books together based on genre (science fiction, nonfiction, etc)
- Earthquake study: mark areas that have been affected by earthquakes to figure out danger zones and non-danger zones

Source: Geeks For Geeks

In Regression, the task is to find the best fit line for data or determine the probability of an event happening to predict the output, a numerical value of a variable. We need to find correlations between dependent (y) and independent (x)variables. To give the ML model information on predicting the output value accurately, the model needs to be trained on past data to predict future data values.

A Type of Model Used:

Linear Regression: linear model that determines the value of the y variable based on the linear relationship between x and y. For example, medical researchers can use linear regression predict the relationship between drug dosage and blood pressure of a patient: basically, trying to understand the relationship between the amount of drug dosage and how the blood pressure of patients changes based on this drug dosage.

More Applications

  • Based on housing prices in the last 10 years in a certain location, predict housing prices in this location 3 years into the future
  • Based on the weather that occurred in the last 10 days in this certain area, predict the weather tomorrow for this area
  • Predict stock trends for a certain company based on this company’s past stock values
Source: Towards Data Science

Cited Sources









Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
technojules/Julia Huang

technojules/Julia Huang


Student and aspiring coder and musician. Has interests in both tech and music.