More docs

David Stirling · David Stirling · commit a0a73b04701b · 2021-04-22T17:47:21.000-04:00
diff --git a/docs/source/16_dimensionality_reduction.rst b/docs/source/16_dimensionality_reduction.rst
@@ -13,7 +13,7 @@ will allow you to perform several reduction methods and visualise the results as
 Dimensionality reduction condenses large numbers of measuremnts into a more managable number of components, this can
 help to visualise results and identify clusters of objects and outliers.
 
-To use the **Dimenaionality Reduction Plot**, select a reduction method from the available choices and click
+To use the **Dimensionality Reduction Plot**, select a reduction method from the available choices and click
 **Update Chart**. The different methods are explained further in the sections below. CPA will normalise measurements
 before applying these methods.
 
@@ -36,30 +36,17 @@ tool open, you'll also see the option to *send the selected objects directly to
 Reduction Methods
 *****************
 
-- **Principal Component Analysis (PCA)**: PCA attempts to generate a series of features which capture the variance of
-the original dataset. Measurements which vary in the same manner are collapsed towards a single new measurement, termed
-a *Principal Component*. On the resulting axis labels, CPA will also display the proportion of the original variance
-which is explained by each principal component. Components are sorted by their contribution to variance, so PC1 will
-always be the most significant feature.
+- **Principal Component Analysis (PCA)**: PCA attempts to generate a series of features which capture the variance of the original dataset. Measurements which vary in the same manner are collapsed towards a single new measurement, termed a *Principal Component*. On the resulting axis labels, CPA will also display the proportion of the original variance which is explained by each principal component. Components are sorted by their contribution to variance, so PC1 will always be the most significant feature.
 
-- **Singular Value Decomposition (SVD)**: SVD is very similar to PCA, but does not center the data before processing.
-This can be much faster and more memory efficient than PCA when working with very large datasets, but a trade-off is
-that the resulting components will not be ordered by significance (i.e. PC1 may not be the most important feature).
+- **Singular Value Decomposition (SVD)**: SVD is very similar to PCA, but does not center the data before processing. This can be much faster and more memory efficient than PCA when working with very large datasets, but a trade-off is that the resulting components will not be ordered by significance (i.e. PC1 may not be the most important feature).
 
-- **Gaussian Random Projection (GRP)**: This method reduces the dimensionality of the dataset by projecting samples into
-fewer dimensions while preserving the pairwise distances between them. The random matrix used for projection is
-generated using a gaussian distribution.
+- **Gaussian Random Projection (GRP)**: This method reduces the dimensionality of the dataset by projecting samples into fewer dimensions while preserving the pairwise distances between them. The random matrix used for projection is generated using a gaussian distribution.
 
-- **Sparse Random Projection (SRP)**: Similar to GRP, but uses a sparse matrix instead of a gaussian one. This can be
-more memory efficient with large datasets.
+- **Sparse Random Projection (SRP)**: Similar to GRP, but uses a sparse matrix instead of a gaussian one. This can be more memory efficient with large datasets.
 
-- **Factor Analysis (FA)**: Like PCA, Factor Analysis generates a series of components which describe the variance
-of the dataset. However, with FA the variance in each direction within the input space can be modelled independently.
+- **Factor Analysis (FA)**: Like PCA, Factor Analysis generates a series of components which describe the variance of the dataset. However, with FA the variance in each direction within the input space can be modelled independently.
 
-- **Feature Agglomeration (FAgg)**: This method utilises hierarchical clustering to group together features that behave
-similarly. The generated clusters can then be treated like components
+- **Feature Agglomeration (FAgg)**: This method utilises hierarchical clustering to group together features that behave similarly. The generated clusters can then be treated like components
 
-- **t-Distributed Stochastic Neighbor Embedding (t-SNE)**: t-SNE helps to visualise high dimensional data by giving
-individual datapoints a coordinate on a 2D map, on which similar points are placed close together. The resulting
-clusters can help to visualise different object types within a dataset.
+- **t-Distributed Stochastic Neighbor Embedding (t-SNE)**: t-SNE helps to visualise high dimensional data by giving individual datapoints a coordinate on a 2D map, on which similar points are placed close together. The resulting clusters can help to visualise different object types within a dataset.
 
diff --git a/docs/source/5_classifier.rst b/docs/source/5_classifier.rst
@@ -292,10 +292,15 @@ V.B.7 Data preparation
 
 Typically one wouldn't use the raw features as input for the machine learning, but the data is cleaned in some ways (e.g., by removing zero variance features) and normalized. Data preparation takes place before the machine learning is done, i.e., before training a classifier.  We here describe how you can perform data preparation steps in CPA. 
 
+*Scaling*
+*********
+
+Features can be normalised and centered before training/classification by activating the Scaler option in *Advanced > Use Scaler*. The features are centered to have mean 0 and scaled to have standard deviation 1
+
 *Normalization Tool*
 ********************
 
-Typically the features are normalized before training a classifier. For example, the features are centered to have mean 0 and scaled to have standard deviation 1. This can be done in CPA with the Normalization Tool. From the main menu, navigate to Tools > Normalization Tool. You can choose which features to normalize.
+Outside the classifier scaling can be done with the Normalization Tool. From the main menu, navigate to Tools > Normalization Tool. You can choose which features to normalize and save the resulting table for later use.
 
 *Removing zero variance features*
 *********************************
@@ -306,3 +311,27 @@ A zero variance feature is a feature that has the same entry for all objects, fo
 ***************
 
 A standard procedure is finding features with NAN (not a number) entries in the data and removing those cells. CPA automatically ignores cells with NANs, so this step is already been taken take of.
+
+
+V.B.8 Classifier types
+----------------------
+
+CPA supports several different classifier types:
+
+- **RandomForest**: Produces a series of decision tree classifiers and uses averaging across all trees to generate predictions.
+
+- **AdaBoost**: Fits a series of weak learners (simple classification rules which don't perform well alone). The input data is adjusted after each cycle to add weight to samples which the previous learner classified incorrectly. As learners are added, examples that are difficult to predict receive increasing influence. A final prediction is generated from a weighted majority vote from all learners.
+
+- **SVC**: Support Vector Classification. This technique considers all features and attempts to generate multi-dimensional dividing lines (termed "hyperplanes") which will distinguish between classes.
+
+- **GradientBoosting**: This takes a similar approach to AdaBoost, but uses gradients instead of weights to make adjustments to the importance of individual samples.
+
+- **LogisticRegression**: Classifies objects via logistic regression. Classifications are made based on a series of curves corresponding to decision boundaries.
+
+- **LDA**: Linear Discriminant Analysis. This method projects the input data to a linear subspace consisting of the directions which maximize the separation between classes, then establishes a boundary which discriminates the classes.
+
+- **KNeighbors**: Classifies based on the majority class of the nearest *k* known samples. Classification is inferred from the training points nearest to the test sample.
+
+- **FastGentleBoosting**: A modification of the AdaBoost classification strategy, with optimisations for working with limited training data.
+
+- **Neural Network**: Generates a multi-layer perceptron neural network. Layers of neurons link each input feature to output features. Each neuron generates a signal based on it's input and weighting from each source. The user can customise the number of intermediate 'hidden' layers between the input (measurement) and output (class) neurons. Additional hidden layers can help to generate more complex classifications. Neuron count per layer should generally be set to between the number of classes and number of input features.