摘要

We report a quantitative approach to optimize implementation of discovery-based software for comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC X GC-TOFMS). The software performs a tile-based Fisher ratio (F-ratio) analysis and facilitates a supervised nontargeted analysis based upon the experimental design to aid in the discovery of analytes with statistically different variances between sample classes. The quantitative approach for software optimization uses receiver operating characteristic (ROC) curves. The area under the curve (AUC) for each ROC curve serves as a quantitative metric to optimize two key algorithm parameters: the signal-to-noise ratio (S/N) threshold of the data prior to calculating F-ratios at each m/z mass channel and the number of these F-ratios per m/z used to calculate the average F-ratio of a tile. A total of 25 combinations of S/N threshold by number of m/z were studied. Fifty analytes were spiked into a diesel fuel at two concentration levels to produce two sample classes that should in principle produce 50 positive instances in the ROC curves. The "sweet spot" for F-ratio analysis was determined to be a S/N threshold of 10 coupled with a maximum of the 10 most chemically selective m/z (requiring a minimum of 3 m/z), corresponding to an similar to 21% improvement in the discrimination of true positives relative to prior studies. This equates to an additional 9 true positives being discovered at a false positive probability of 0.2 and 5 additional true positives being found overall. Furthermore, optimization of these software parameters did not depend upon a priori determination of the statistically correct number of positive instances in the sample classes. The AUC metric appears to be suitable for the evaluation of all data analysis methods that utilize the proper experimental design.

  • 出版日期2017-3-21