Conteúdo do Curso
Principal Component Analysis
Principal Component Analysis
Noise Reduction
Let's look at the way PCA works, when the algorithm does not act as a data processing stage, but as the main stage. The task of noise reduction in images is just that case.
The pipeline in this case looks like this: we load the noisy data into the model, after which we can process other data using PCA and the model will restore that data. How it works? By reducing the number of main components - literally only the most important elements of the image remain, i.e. noise will be reduced.
We use the USPS dataset with numbers and the scikit-learn
library:
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
X, y = fetch_openml(data_id = 41082, as_frame = False, return_X_y = True)
X = MinMaxScaler().fit_transform(X)
Let's add some noise to our images:
X_train, X_test, y_train, y_test = train_test_split(
X, y, stratify = y, random_state = 0, train_size = 1_000, test_size = 100
)
rng = np.random.RandomState(0)
noise = rng.normal(scale = 0.25, size = X_test.shape)
X_test_noisy = X_test + noise
noise = rng.normal(scale = 0.25, size = X_train.shape)
X_train_noisy = X_train + noise
Create a PCA model:
from sklearn.decomposition import PCA
pca_model = PCA(n_components=40)
pca_model.fit(X_train_noisy)
Let's see what came of it! Initial noisy images:
And here is the result of PCA work:
Obrigado pelo seu feedback!