Difference Between Machine Learning Tasks: Classification, Regression, Clustering

technojules/Julia
3 min readSep 1, 2022

A description of each type of task and what these terms actually mean.

As you’re studying machine learning, you’ve probably came across these vocabulary words one way or another (especially classification, a popular problem type introduced in the beginning stages of ML curriculum). Later on, these other terms may have popped up as well. Below is a description of differences between classification, regression, and clustering.

Classification
This is where ML models predict the category/class based on the type of input data. There can be two classes to predict (binary classification) or more than 2 classes (multi-class classification). For example, if a dataset has different types of animals like dog, cat, and bunny, then the task of classification is to predict, when given an image from this dataset, what animal category this image belongs to.

A Type of Model Used:
Convolutional Neural Networks: nonlinear model that determines the probability of classes a certain input data point could be. For example, given an image of a cat, the resulting probabilities could be 0.9 cat, 0.04 bunny, and 0.06 dog.

More applications:
- Emotion detection (is the person happy, sad, etc.)
- Positive/negative review detection
- Face recognition (assign name to the face)

Cat, bunny, and dog :) (Source: One Green Planet)

Clustering

Clustering is a problem type whose object is to divide a dataset and cluster certain data points together into groups that have similar properties or features. For example, given a dataset with 3 colors of circles, black, green, and red, the task is to separate them out into 3 groups based on the color feature.

A Type of Model Used:
K-means clustering: this algorithm divides n observations/data points into k numbers of clusters.

More Applications
- Urban Planning- group certain houses together based on geographical location to study their differences in pricing, features, etc.
- Libraries: group certain books together based on genre (science fiction, nonfiction, etc)
- Earthquake study: mark areas that have been affected by earthquakes to figure out danger zones and non-danger zones

Source: Geeks For Geeks

Regression
In Regression, the task is to find the best fit line for data or determine the probability of an event happening to predict the output, a numerical value of a variable. We need to find correlations between dependent (y) and independent (x)variables. To give the ML model information on predicting the output value accurately, the model needs to be trained on past data to predict future data values.

A Type of Model Used:

Linear Regression: linear model that determines the value of the y variable based on the linear relationship between x and y. For example, medical researchers can use linear regression predict the relationship between drug dosage and blood pressure of a patient: basically, trying to understand the relationship between the amount of drug dosage and how the blood pressure of patients changes based on this drug dosage.

More Applications

  • Based on housing prices in the last 10 years in a certain location, predict housing prices in this location 3 years into the future
  • Based on the weather that occurred in the last 10 days in this certain area, predict the weather tomorrow for this area
  • Predict stock trends for a certain company based on this company’s past stock values
Source: Towards Data Science

Cited Sources

https://www.simplilearn.com/tutorials/data-analytics-tutorial/classification-vs-clustering

https://www.statology.org/linear-regression-real-life-examples/

https://www.onegreenplanet.org/animalsandnature/selling-nonrescued-dogs-cats-and-rabbits-in-stores-is-now-illegal-in-california/

https://www.geeksforgeeks.org/clustering-in-machine-learning/

https://www.javatpoint.com/regression-vs-classification-in-machine-learning

https://towardsdatascience.com/everything-you-need-to-know-about-linear-regression-b791e8f4bd7a

--

--