Author- AI Legend
10 Feb|06min read

Unsupervised Machine Learning...

Let's analyze the cluster data and implementation on some of the algorithms of the unsupervised machine learning techniques via Python...
Fig. Unsupervised Machine Learning

Let's study the Python codes in unsupervised ways .....

So, we are working in Python code along with pandas, seaborn, numpy etc. libraries to determine pattern inside unlabeled data without humanoid intervision , we'll visualize the data and analyze the cluster via various algorithms in the search of achieving higher accuracy.
Now let's discuss little bit about Unsupervised learning, it is a genre of machine learning and artificial intelligence that uses untagged datasets to learn algorithms for getting required outcomes via recognize patterns or similarities or differences in information , we ca say. Unlike supervised learning, unsupervised learning algorithms are not having labeled data to learn the relationship between the input and the outputs as we are seeing through fig. visually. Fig. Mechanism of Unsupervised Machine Learning


So the question arises here, What is unsupervised learning? As per knowledge, Unsupervised learning is a kind of machine learning technique which generally utilizes artificial intelligence algorithms needed to pick out patterns through data cluster which generally not even categorized nor yet docket respectively. Unsupervised learning models don't need humanoid supervision while searching the pattern on data sets, making it an ideal ML technique for discovering patterns, groupings and differences in unstructured data. It's well-suited for processes such as customer segmentation, exploratory data analysis or image recognition. In conclusion, unsupervised learning algorithms can classify, label and group the data points contained within data sets without requiring any external guidance in performing that task. In other words, unsupervised learning allows a system to identify patterns within data sets on its own.
If we talk about unlabeled data, then it comprise of input attribute said to be cluster or predictors and finding the structures and correlate in between the data or finding the hidden patterns or targets respectively. What is unsupervised machine learning used for?...Basically our model's aim is to search the patterns and correlation amidst the input attributes and the output docket, permits it to exploratory data and construct meticulously prediction from hidden features by using measure of similarities beside the cluster data respectively.

Algorithms

Let's study some of the algorithms present in unsupervised machine learning .....


1. Clustering


In general, clustering is an explodatory approach which is generally used to recognize the similarities on given cluster, is one kind of unsupervised learning. AI programmed with clustering algorithms categorizes data points into various groups based on their commonalities or differences. Some clustering algorithms only sort each color smiley into one group, while others may classify the same color smiley as belonging to two or more overlapping groups as we can see in fig. also.

Fig. Clustering : Shorting the different color smiley in separate box

Clustering is a popular type of unsupervised learning approach. You can even break it down further into different types of clustering

1. Clustering

In exclusive clustering, generally features are collected thatswhy any feature point exclusively be connected to one cluster. Overlapping clustering- Here, A pale cluster having a uni feature point be connected to multiple clusters with assorted degrees of belonging respectively.

2. Hierarchical clustering

A type of clustering in which groups are created such that similar instances are within the same group and different objects are in other groups. Probalistic clustering: Clusters are created using probability distribution.

Let's be familiar with some of the examples of Clustering ::
1. Analyze document clustering.
2. Analyze fraud detection .
3.Assess fake news detection.
4. customer segmentation, etc.

Let's implement the Python codes in unsupervised ways .....



K-Means: Numpy First, we will import the necessary python packages and create a 2-dimensional data set using Scikit-learn’s make_blob function. For this article, we will be generating 300 data points that are distributed amongst 4 clusters. The generated data is shown below.

We can see that there are 4 distinct clusters in the sample, with a slight overlap between outlying data points in each cluster. Next, we will initiate two hash maps. One will keep track of the cluster centroids, and the other will keep track of the data points in each cluster.

The final three functions put it all together by providing a means to check if the model has converged, to have a master training function, and to visualize the model results.

# Load the data

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

n_clusters = 3
n_samples = 250
epochs = 111
X, y_true = make_blobs(n_samples=n_samples, centers=n_clusters, cluster_std=0.60, random_state=0)
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], s=40)
plt.title("Data Points")
plt.show()


result ::

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

n_clusters = 4
n_samples = 333
epochs = 111
X, y_true = make_blobs(n_samples=n_samples, centers=n_clusters, cluster_std=0.60, random_state=0)

kmeans = KMeans(n_clusters=n_clusters, random_state=0, max_iter=epochs).fit(X)

fig, axis = plt.subplots(figsize=(10, 6))
axis1 = plt.subplot(1, 1, 1)
axis1.scatter(X[:, 0], X[:, 1], c='tab:green', s=25)
for k in kmeans.cluster_centers_:
axis1.scatter(k[0], k[1], c='red', s=50)
axis1.set_title("Scikit-Learn K-means Results")
plt.show()

result ::


Now, the advantages of using existing libraries, in which they are optimized to reduce training time, they frequently arises with various parameters, and they need little bit code to implement. Scikit-learn also contains various other machine learning models, and accessing different models is done using a consistent syntax. In the above coding, we implement the same k-means clustering algorithm.

Let's see certain admired Clustering algorithms which comes under unsupervised learning ::

1. K-Means Clustering
3. Mean-Shift Clustering
4. Hierarchical Clustering
5. Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM) or EM GMM
6. Deep Neural Networks

2. Association mining rule


If we discuss about sassociation rule, it is a kind of unsupervised learning algorithm, we can say, and utilized in machine learning to recognize the pattern . Association rule emphasizes discovering the connections between data points. AI trained in association rule might find relationships between data points within one group or relationships between various data sets. For example, this type of unsupervised learning might try to determine if one variable or data type influences or directly causes another variable

See the graphical representation ::

Let's take an example, if customers often shop for cheese and egg together, the shopkeeper can keep them in accessibility to each other to encourage more sales. One more known application of association in unsupervised learning is in anomaly detection.

Let's be familiar with some other examples of Association rule ::
1. Market Basket Analysis.
2. Social network analysis.
3. Fraud Detection.
4. Recommendation systems.
5. Apriori algorithm .

Let's see certain admired Association algorithms which comes under unsupervised learning ::

1. Learning Classifier System
2. Association Rule Learning
3. Apriori Algorithm
4. Singular-Value Decomposition

3. Dimensionality reduction


Let's discuss little bit about dimensionality reduction. Dimensionality reduction is a key technique within unsupervised learning. It squeeze the features by finding a smaller, different set of variables that capture what matters most in the original features, while minimizing the loss of information. Which algorithm is used for dimensionality reduction? Principal Component Analysis (PCA) In the context of Machine Learning (ML), PCA is an unsupervised machine learning algorithm that is used for dimensionality reduction. Here, two components present in dimensionality reduction ::

1. Feature selection - In general, we try to find a subset of the original set of variables, or features, to get a smaller subset which can be utilizd to model the problem. It generally presumes three ways-
a. Filter
b. Wrapper
c. Embedded
2. Feature extraction - Here, deduct the features in higher dimensional space to a lower dimension space, i.e. a space with lesser no. of dimensions.

Let's see the various methods utilized for dimensionality reduction include ::

1. Principal Component Analysis (PCA)
2. Linear Discriminant Analysis (LDA)
3. Generalized Discriminant Analysis (GDA)

Advantages of Dimensionality Reduction

1. It helps in features contraction, and thus diminish storage space.
2. It diminishes computation time.
3. It also assist to eliminate redundant features, if any.
4 Improved Visualization- High dimensional input is problamatic to visualize, and dimensionality reduction techniques assist in visualizing the data in 2D or 3D, that help in better understanding and analysis.
5. Overfitting Prevention - High dimensional input may conduct towards overfitting in machine learning models, which can conduct to poor hypothesis performance. Dimensionality reduction can help in reducing the complexity of the data, and thus prevent overfitting.
6. Feature Extraction - Dimensionality reduction can help in extracting important features from high dimensional data, which can be useful in feature selection for machine learning models.
7. Data Preprocessing - Dimensionality reduction can be used as a preprocessing step before applying machine learning algorithms to reduce the dimensionality of the data and hence improve the performance of the model.

Disadvantages of Dimensionality Reduction

1. It may conduct to some amount of data loss.
2. PCA tends to find linear correlations between variables, which is sometimes undesirable.
3. PCA fails in cases where mean and covariance are not enough to define datasets.
4.We may not know how many principal components to keep- in practice, some thumb rules are applied.
4. Interpretability: The reduced dimensions may not be easily interpretable, and it may be difficult to understand the relationship between the original features and the reduced dimensions.
5. Overfitting: In a bit instances, dimensionality reduction may lead to overfitting, especially when the number of components is chosen based on the training data.
6.Sensitivity to outliers: Any dimensionality reduction techniques are sensitive to outliers, which can result in a biased representation of the data.
7. Computational complexity: Any dimensionality reduction techniques, such as manifold learning, can be computationally intensive, especially when dealing with large datasets.

Let's implement the Python codes in unsupervised ways .....


Principal component analysis (PCA) is the process of computing the principal components then using them to perform a change of basis on the data. In other words, PCA is an unsupervised learning dimensionality reduction technique.

It’s useful to reduce the dimensionality of a dataset for two main reasons:
1. When there are too many dimensions in a dataset to visualize
2. To identify the most predictive n dimensions for feature selection when building a predictive model.
In this section, we will implement the PCA algorithm in Python on the Iris dataset and then visualize it using matplotlib. Check out this DataCamp Workspace to follow along with the code used in this tutorial.

# firstly we need to import required libraries

import pandas as pd
from sklearn.datasets import load_iris # Dataset
from sklearn.decomposition import PCA # Algorithm
import matplotlib.pyplot as plt # Visualization




# Load the data
iris_data = load_iris(as_frame=True)
# Preview
iris_data.data.head()


result ::

#now let's check the shape of our data

plt.rcParams["figure.figsize"] = [9.00, 4.00]
plt.rcParams["figure.autolayout"] = True
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
sepal_length = iris_data.data["sepal length (cm)"]
sepal_width = iris_data.data["sepal width (cm)"]
petal_length = iris_data.data["petal length (cm)"]
petal_width = iris_data.data["petal width (cm)"]
ax.scatter(sepal_length, sepal_width, petal_length, c=petal_width) plt.show()

It’s quite difficult to get insights from this visualization because all of the inststances are jumbled together since we only have access to one viewpoint when we visualize data in three dimensions in this scenario. With PCA, we can reduce the dimensions of the data down to two, which would then make it easier to visualize our data and tell apart the classes.

result ::

#now let's check the required information we needed during analysis

# Instantiate PCA with 2 components
pca = PCA(n_components=2)

# Train the model
pca.fit(iris_data.data)
iris_data_reduced = pca.fit_transform(iris_data.data)

# Plot data
plt.scatter(
iris_data_reduced[:,0],
iris_data_reduced[:,1],
c=iris_data.target
)
plt.show()

result ::


Final Thoughts K-means is the go-to unsupervised clustering algorithm that is easy to implement and trains in next to no time. As the model trains by minimizing the sum of distances between data points and their corresponding clusters, it is relatable to other machine learning models.

Applications of Unsupervised Machine Learning


Some other applications and advantages are written below ::

1. Speech recognition

Many apps or software programs that use speech recognition rely on unsupervised learning techniques. Computing professionals train speech recognition apps to understand basic human sounds, words and phrases. When you download or install the app, it then begins to learn the specific sounds, intonations and pronunciations you use when issuing software commands. Over time, the speech recognition software improves its ability to recognize your unique voice.

2. Natural language processing (NLP)

As per knowledge, Google News is known to clout unsupervised learning to categorize articles based on the similar tale from several news vent. For instance, the results of the football transfer window can all be categorized under football. Image and video analysis. Visual Perception tasks such as object recognition leverage unsupervised learning.

3. Anomaly detection

Unsupervised learning is used to identify data points, events, and/or observations that deviate from a dataset's normal behavior. Customer segmentation. Interesting buyer persona profiles can be created using unsupervised learning. This helps businesses to understand their customers' common traits and purchasing habits, thus, enabling them to align their products more accordingly. Recommendation Engines. Past purchase behavior coupled with unsupervised learning can be used to help businesses discover data trends that they could use to develop effective cross-selling strategies.

4. Splitting of data into groups based upon their similarity measure. Clustering methods are used to study cancer gene expression data and predict cancer at early stages.

5. Neural Networks The principle that neurons that fire together wire together. In Hebbian Learningthe connection is reinforced irrespective of an error, but is exclusively a function of the coincidence between action potentials between the two neurons.

6. Deteccting of unusual data points in a data set. Example-

(a). Detect any sort of outliers in data from transportation and logistics companies, where anomaly detection is used to identify logistical obstacles or expose defective mechanical parts.
(b).Detect faulty equipment or breaches in security. Fraud detection in transactions(a).

7. Object recognition: In the field of computer vision, it is utilized for visual perception tasks. It comes in really handy in image recognition.

8. Dimensionality Reduction allows us to lower the number of features in a dataset prevent overfitting. It also reduces the computational complexity of algorithms.

9. Utilizing the association rule to build recommender systems such as those of online and offline retail stores. It is used to develop cross-selling strategies, hence facilitating the add-on recommendations during checkout process.

10. Medical Imaging devices are required to do image detection, classification, and segmentation. Unsupervised learning is leveraged to make this feasible.

11. Building of a customer persona. Understanding the common traits and business client purchasing habits helps identify the customer persona, hence aligning the product goals a lot better.

Limitations of Unsupervised Learning


Let's see some of the limititions or disadvantages of unsupervised learning ::

1. Results may be unpredictable or difficult to understand.
2. Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
3. One of the major drawbacks of unsupervised learning is that we cannot get precise information on the data sorting.
4. It can be costlier as often human intervention is required to correlate the patterns obtained to the domain knowledge.


References


1. Garbade, Dr Michael J. (2018-09-12). "Understanding K-means Clustering in Machine Learning". Medium. Retrieved 2019-10-31.
2. Wikipedia
3. IBM social sites

People also search...


People who read also read this

article

Learn more about Deep Learning techniques via joining this channel...

AI Legend
25 December|2024| 6min. read
article

Learn more about Deep Learning techniques via joining this channel...

AI Legend
25 December|2024| 6min. read
article

Learn more about Deep Learning techniques via visiting this channel...

AI Legend
03 January 6min. read