PCA rotates the data along the axis of highest variance, thus allowing us to determine the relative contribution of each feature of our data towards the variance between classes.However, since PCA uses the absolute variance of a feature to rotate the data, a feature with a broader range of values will overpower and bias the algorithm relative to the other features..To avoid this, we must first normalize our data..There are a few methods to do this, but a common way is through standardization, such that all features have a mean = 0 and standard deviation = 1 (the resultant is a z-score).Let’s check how our data frame looks like after standardization4..Principal component analysis on our scaled dataNow that we have preprocessed our data, we are ready to use PCA to determine by how much we can reduce the dimensionality of our data..We can use scree-plots and cumulative explained ratio plots to find the number of components to use in further analyses.Scree-plots display the number of components against the variance explained by each component, sorted in descending order of variance..Scree-plots help us get a better sense of which components explain a sufficient amount of variance in our data..When using scree plots, an ‘elbow’ (a steep drop from one data point to the next) in the plot is typically used to decide on an appropriate cutoff.Unfortunately, there does not appear to be a clear elbow in this scree plot, which means it is not straightforward to find the number of intrinsic dimensions using this method.5..Further visualization of PCABut all is not lost!.Instead, we can also look at the cumulative explained variance plot to determine how many features are required to explain, say, about 90% of the variance (cutoffs are somewhat arbitrary here, and usually decided upon by ‘rules of thumb’)..Once we determine the appropriate number of components, we can perform PCA with that many components, ideally reducing the dimensionality of our data.Now we can use the lower dimensional PCA projection of the data to classify songs into genres..To do that, we first need to split our dataset into ‘train’ and ‘test’ subsets, where the ‘train’ subset will be used to train our model while the ‘test’ dataset allows for model performance validation.6..Train a decision tree to classify the genreIn this article, we will be using a simple algorithm known as a decision tree.. More details