Date of Award
2023
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Engineering
Abstract
This study explored the use of deep learning artificial intelligence, machine learning, and non-parametric statistics for successfully detecting various classes of malware attacks against Raspberry Pi hardware devices over unrestrained digital communication networks. Furthermore, the use of permutated tests of statistical significance were applied to these artificial intelligence and machine learning models for the purpose of providing scientific evidence for the statistical significance of the findings from the predictive classification models. Much effort has taken place in recent years to apply various types of machine learning (non-parametric statistical learning systems) to use case problems involving network intrusion detection and malware identification. This research provides thoroughly tested methodologies that provide a wealth of prediction capabilities for this problem space, with little if any attempt at scientific proof of the effectiveness of these predictions. Much was being predicted in the field of applied machine learning and artificial intelligence for digital communication systems, but little was being scientifically proven. In general, the application of scientific tests of falsifiability have been lacking in this field because of the gap that exists between the non-linear, non-parametric nature of machine learning models and the generally accepted methodologies for providing scientific evidence of causal relationships. Scientific tests of falsifiable hypothesis testing are rooted in the general linear model, such as linear regression, logistic regression, ANOVA, MANOVA, etc. Because those classical statistical methods address linear relationships between the independent and dependent variables, they are generally incapable of adequately analyzing relationships and predictions developed by these non-parametric statistical learning system models. Literature reviews support the principal concept for this research that non-parametric statistical learning systems can be the subject of statistical hypothesis testing based on classical general linear model-based statistical methods, under specific frameworks with which to create regularization effects. Thus, by incorporating permuted tests of statistical significance using methods such as PERMANOVA, scientific tests of statistical significance were able to take place over the course of this research. The net result was the development of a defense in depth study across a range of different types of non-parametric statistical learning models that analyzed various aspects and use cases of malware across Raspberry Pi networked devices. Various types of permuted tests of statistical significance took place to produce non-parametric p-values as well as measures of non-Euclidean distance tests of homogeneity between groups, to produce non-parametric visualization outputs. These non-parametric validation models successfully analyzed various cause and effect relationships responsive to the four separate research questions, addressed through null and alternative hypothesis statements. The permuted tests of statistical significance and non-parametric measurement tests of homogeneity analyzed the predicted output from the cohort of supervised learning and unsupervised learning predictive models, providing strong scientific evidence for the effectiveness of artificial intelligence models applied to this specific domain.
Recommended Citation
Woolman, Thomas A., "A Multinomial Classification And Prediction Model Utilizing Deep Learning For Malware Detection On Raspberry Pi Internet Of Things Devices Using Unrestrained Network Connections" (2023). All-Inclusive List of Electronic Theses and Dissertations. 1827.
https://scholars.indianastate.edu/etds/1827