Each of these variables are measured on very different scales, and the amount of variance that you would expect across multiple batches that you brew will probably be very different from variable to variable. These may include the mass (in g) of grain used, the temperature (in ☌) at which the beer is brewed, the volume (in L) of water used, or the length of time (in hours, days, or maybe weeks) that you allow for fermentation. As an example, consider some variables that could be involved while making beer. This may not seem like a bad thing at first, but often these differences in variances between variables isn’t due to the data itself, but the scale on which it was measured. If the original variables have variances that are quite different from each other, the analysis will end up favoring variables with larger variances and ignoring those that have smaller variances. Specifically, the way PCA decides which variables are the “most important” when determining how to best reduce the dimension of the dataset is by determining which variables present the largest variance (more on this in the next section). This step is actually very important to ensure that the results of PCA are interpreted correctly, as PCA is very sensitive to variances of the original variables. Subsequently, each variable has a variance of 1 since variance is simply the square of the standard deviation: In effect, this transforms the data in such a way that each variable has a mean of zero and a standard deviation of 1. Where x std is the standardized value, x i is the original value, x̄ is the variable’s mean, and s x is the variable’s standard deviation. This is almost always accomplished by standardizing the data. The first step in performing PCA is to ensure that the variables being analyzed are all on similar measurement scales. The scale of data for PCA is extremely important.
0 Comments
Leave a Reply. |