Purpose
To evaluate a random forest model that counts silicone oil droplets and non-silicone oil particles in protein formulations with large class imbalance.
Methods
In this work, we present a novel approach for automated image analysis of flow microscopy data based on random forest classification enabling rapid analysis of large data sets. The random forest approach overcomes many of the limitations of traditional classification schemes derived from simple filters or regression models. In particular, the approach does not require a priori selection of important morphology parameters.
Results
We analyzed silicone oil droplets and non-silicone oil particles observed in four model systems with protein concentrations of 20, 50 and 125 mg/mL. Filters based on random forests achieve higher classification accuracies when compared to regression based filters. Additionally, we showcase a procedure that allows for accurate counting of particles ≥1 μm.
Conclusions
Our method is generally applicable for classification and counting of different classes of particles as long as class morphologies are differentially expressed.