MDPI - Publisher of Open Access Journals

16 pages, 1349 KiB

Open AccessArticle

An Optimal House Price Prediction Algorithm: XGBoost

by

Hemlata Sharma

,

Hitesh Harsora

and

Bayode Ogunleye

Analytics 2024, 3(1), 30-45; https://doi.org/10.3390/analytics3010003 - 02 Jan 2024

Viewed by 568

Abstract

An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. [...] Read more.

An accurate prediction of house prices is a fundamental requirement for various sectors, including real estate and mortgage lending. It is widely recognized that a property’s value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighborhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning (ML) techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare XGBoost, support vector regressor, random forest regressor, multilayer perceptron, and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction. Our findings present valuable insights and tools for stakeholders, facilitating more accurate property price estimates and, in turn, enabling more informed decision making to meet the housing needs of diverse populations while considering budget constraints. Full article

► Show Figures

Figure 1

16 pages, 3053 KiB

Open AccessArticle

Exploring Infant Physical Activity Using a Population-Based Network Analysis Approach

by

Rama Krishna Thelagathoti

,

Priyanka Chaudhary

,

Brian Knarr

,

Michaela Schenkelberg

,

Hesham H. Ali

and

Danae Dinkel

Analytics 2024, 3(1), 14-29; https://doi.org/10.3390/analytics3010002 - 31 Dec 2023

Viewed by 444

Abstract

Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying [...] Read more.

Background: Physical activity (PA) is an important aspect of infant development and has been shown to have long-term effects on health and well-being. Accurate analysis of infant PA is crucial for understanding their physical development, monitoring health and wellness, as well as identifying areas for improvement. However, individual analysis of infant PA can be challenging and often leads to biased results due to an infant’s inability to self-report and constantly changing posture and movement. This manuscript explores a population-based network analysis approach to study infants’ PA. The network analysis approach allows us to draw conclusions that are generalizable to the entire population and to identify trends and patterns in PA levels. Methods: This study aims to analyze the PA of infants aged 6–15 months using accelerometer data. A total of 20 infants from different types of childcare settings were recruited, including home-based and center-based care. Each infant wore an accelerometer for four days (2 weekdays, 2 weekend days). Data were analyzed using a network analysis approach, exploring the relationship between PA and various demographic and social factors. Results: The results showed that infants in center-based care have significantly higher levels of PA than those in home-based care. Moreover, the ankle acceleration was much higher than the waist acceleration, and activity patterns differed on weekdays and weekends. Conclusions: This study highlights the need for further research to explore the factors contributing to disparities in PA levels among infants in different childcare settings. Additionally, there is a need to develop effective strategies to promote PA among infants, considering the findings from the network analysis approach. Such efforts can contribute to enhancing infant health and well-being through targeted interventions aimed at increasing PA levels. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

13 pages, 4697 KiB

Open AccessArticle

Does Part of Speech Have an Influence on Cyberbullying Detection?

by

,

,

,

,

and

Analytics 2024, 3(1), 1-13; https://doi.org/10.3390/analytics3010001 - 21 Dec 2023

Viewed by 335

Abstract

With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely on the extraction of part-of-speech [...] Read more.

With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely on the extraction of part-of-speech (POS) tags to improve their performance. However, the current study only arbitrarily used part-of-speech labels that it considered reasonable, without investigating whether the chosen part-of-speech labels can better enhance the effectiveness of the cyberbullying detection task. In other words, the effectiveness of different part-of-speech labels in the automatic cyberbullying detection task was not proven. This study aimed to investigate the part of speech in statements related to cyberbullying and explore how three classification models (random forest, naïve Bayes, and support vector machine) are sensitive to parts of speech in detecting cyberbullying. We also examined which part-of-speech combinations are most appropriate for the models mentioned above. The results of our experiments showed that the predictive performance of different models differs when using different part-of-speech tags as inputs. Random forest showed the best predictive performance, and naive Bayes and support vector machine followed, respectively. Meanwhile, across the different models, the sensitivity to different part-of-speech tags was consistent, with greater sensitivity shown towards nouns, verbs, and measure words, and lower sensitivity shown towards adjectives and pronouns. We also found that the combination of different parts of speech as inputs had an influence on the predictive performance of the models. This study will help researchers to determine which combination of part-of-speech categories is appropriate to improve the accuracy of cyberbullying detection. Full article

► Show Figures

Figure 1

22 pages, 358 KiB

Open AccessArticle

Learning Analytics in the Era of Large Language Models

by

,

,

and

Analytics 2023, 2(4), 877-898; https://doi.org/10.3390/analytics2040046 - 16 Nov 2023

Viewed by 1454

Abstract

Learning analytics (LA) has the potential to significantly improve teaching and learning, but there are still many areas for improvement in LA research and practice. The literature highlights limitations in every stage of the LA life cycle, including scarce pedagogical grounding and poor [...] Read more.

Learning analytics (LA) has the potential to significantly improve teaching and learning, but there are still many areas for improvement in LA research and practice. The literature highlights limitations in every stage of the LA life cycle, including scarce pedagogical grounding and poor design choices in the development of LA, challenges in the implementation of LA with respect to the interpretability of insights, prediction, and actionability of feedback, and lack of generalizability and strong practices in LA evaluation. In this position paper, we advocate for empowering teachers in developing LA solutions. We argue that this would enhance the theoretical basis of LA tools and make them more understandable and practical. We present some instances where process data can be utilized to comprehend learning processes and generate more interpretable LA insights. Additionally, we investigate the potential implementation of large language models (LLMs) in LA to produce comprehensible insights, provide timely and actionable feedback, enhance personalization, and support teachers’ tasks more extensively. Full article

(This article belongs to the Special Issue New Insights in Learning Analytics)

► Show Figures

Figure 1

24 pages, 857 KiB

Open AccessArticle

A Comparative Analysis of VirLock and Bacteriophage ϕ6 through the Lens of Game Theory

by

Dimitris Kostadimas

,

Kalliopi Kastampolidou

and

Theodore Andronikos

Analytics 2023, 2(4), 853-876; https://doi.org/10.3390/analytics2040045 - 06 Nov 2023

Viewed by 719

Abstract

The novelty of this paper lies in its perspective, which underscores the fruitful correlation between biological and computer viruses. In the realm of computer science, the study of theoretical concepts often intersects with practical applications. Computer viruses have many common traits with their [...] Read more.

The novelty of this paper lies in its perspective, which underscores the fruitful correlation between biological and computer viruses. In the realm of computer science, the study of theoretical concepts often intersects with practical applications. Computer viruses have many common traits with their biological counterparts. Studying their correlation may enhance our perspective and, ultimately, augment our ability to successfully protect our computer systems and data against viruses. Game theory may be an appropriate tool for establishing the link between biological and computer viruses. In this work, we establish correlations between a well-known computer virus, VirLock, with an equally well-studied biological virus, the bacteriophage

ϕ 6

. VirLock is a formidable ransomware that encrypts user files and demands a ransom for data restoration. Drawing a parallel with the biological virus bacteriophage

ϕ 6

, we uncover conceptual links like shared attributes and behaviors, as well as useful insights. Following this line of thought, we suggest efficient strategies based on a game theory perspective, which have the potential to address the infections caused by VirLock, and other viruses with analogous behavior. Moreover, we propose mathematical formulations that integrate real-world variables, providing a means to gauge virus severity and design robust defensive strategies and analytics. This interdisciplinary inquiry, fusing game theory, biology, and computer science, advances our understanding of virus behavior, paving the way for the development of effective countermeasures while presenting an alternative viewpoint. Throughout this theoretical exploration, we contribute to the ongoing discourse on computer virus behavior and stimulate new avenues for addressing digital threats. In particular, the formulas and framework developed in this work can facilitate better risk analysis and assessment, and become useful tools in penetration testing analysis, helping companies and organizations enhance their security. Full article

► Show Figures

Figure 1

17 pages, 543 KiB

Open AccessArticle

Can Oral Grades Predict Final Examination Scores? Case Study in a Higher Education Military Academy

by

Antonios Andreatos

and

Apostolos Leros

Analytics 2023, 2(4), 836-852; https://doi.org/10.3390/analytics2040044 - 02 Nov 2023

Viewed by 572

Abstract

This paper investigates the correlation between oral grades and final written examination grades in a higher education military academy. A quantitative, correlational methodology utilizing linear regression analysis is employed. The data consist of undergraduate telecommunications and electronics engineering students’ grades in two courses [...] Read more.

This paper investigates the correlation between oral grades and final written examination grades in a higher education military academy. A quantitative, correlational methodology utilizing linear regression analysis is employed. The data consist of undergraduate telecommunications and electronics engineering students’ grades in two courses offered during the fourth year of studies, and spans six academic years. Course One covers period 2017–2022, while Course Two, period 1 spans 2014–2018 and period 2 spans 2019–2022. In Course One oral grades are obtained by means of a midterm exam. In Course Two period 1, 30% of the oral grade comes from homework assignments and lab exercises, while the remaining 70% comes from a midterm exam. In Course Two period 2, oral grades are the result of various alternative assessment activities. In all cases, the final grade results from a traditional written examination given at the end of the semester. Correlation and predictive models between oral and final grades were examined. The results of the analysis demonstrated that, (a) under certain conditions, oral grades based more or less on midterm exams can be good predictors of final examination scores; (b) oral grades obtained through alternative assessment activities cannot predict final examination scores. Full article

(This article belongs to the Special Issue New Insights in Learning Analytics)

► Show Figures

Figure 1

12 pages, 506 KiB

Open AccessArticle

Relating the Ramsay Quotient Model to the Classical D-Scoring Rule

by

Alexander Robitzsch

Analytics 2023, 2(4), 824-835; https://doi.org/10.3390/analytics2040043 - 17 Oct 2023

Viewed by 504

Abstract

In a series of papers, Dimitrov suggested the classical D-scoring rule for scoring items that give difficult items a higher weight while easier items receive a lower weight. The latent D-scoring model has been proposed to serve as a latent mirror of the [...] Read more.

In a series of papers, Dimitrov suggested the classical D-scoring rule for scoring items that give difficult items a higher weight while easier items receive a lower weight. The latent D-scoring model has been proposed to serve as a latent mirror of the classical D-scoring model. However, the item weights implied by this latent D-scoring model are typically only weakly related to the weights in the classical D-scoring model. To this end, this article proposes an alternative item response model, the modified Ramsay quotient model, that is better-suited as a latent mirror of the classical D-scoring model. The reasoning is based on analytical arguments and numerical illustrations. Full article

► Show Figures

Figure 1

15 pages, 3294 KiB

Open AccessArticle

An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market

by

Jeen Mary John

,

Olamilekan Shobayo

and

Bayode Ogunleye

Analytics 2023, 2(4), 809-823; https://doi.org/10.3390/analytics2040042 - 12 Oct 2023

Cited by 1 | Viewed by 1361

Abstract

Recently, peoples’ awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer [...] Read more.

Recently, peoples’ awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80. Full article

► Show Figures

Figure 1

28 pages, 1699 KiB

Open AccessArticle

A Novel Curve Clustering Method for Functional Data: Applications to COVID-19 and Financial Data

by

Ting Wei

and

Bo Wang

Analytics 2023, 2(4), 781-808; https://doi.org/10.3390/analytics2040041 - 08 Oct 2023

Viewed by 823

Abstract

Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to [...] Read more.

Functional data analysis has significantly enriched the landscape of existing data analysis methodologies, providing a new framework for comprehending data structures and extracting valuable insights. This paper is dedicated to addressing functional data clustering—a pivotal challenge within functional data analysis. Our contribution to this field manifests through the introduction of innovative clustering methodologies tailored specifically to functional curves. Initially, we present a proximity measure algorithm designed for functional curve clustering. This innovative clustering approach offers the flexibility to redefine measurement points on continuous functions, adapting to either equidistant or nonuniform arrangements, as dictated by the demands of the proximity measure. Central to this method is the “proximity threshold”, a critical parameter that governs the cluster count, and its selection is thoroughly explored. Subsequently, we propose a time-shift clustering algorithm designed for time-series data. This approach identifies historical data segments that share patterns similar to those observed in the present. To evaluate the effectiveness of our methodologies, we conduct comparisons with the classic K-means clustering method and apply them to simulated data, yielding encouraging simulation results. Moving beyond simulation, we apply the proposed proximity measure algorithm to COVID-19 data, yielding notable clustering accuracy. Additionally, the time-shift clustering algorithm is employed to analyse NASDAQ Composite data, successfully revealing underlying economic cycles. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

36 pages, 34844 KiB

Open AccessArticle

Image Segmentation of the Sudd Wetlands in South Sudan for Environmental Analytics by GRASS GIS Scripts

by

Polina Lemenkova

Analytics 2023, 2(3), 745-780; https://doi.org/10.3390/analytics2030040 - 21 Sep 2023

Viewed by 1065

Abstract

This paper presents the object detection algorithms GRASS GIS applied for Landsat 8-9 OLI/TIRS data. The study area includes the Sudd wetlands located in South Sudan. This study describes a programming method for the automated processing of satellite images for environmental analytics, applying [...] Read more.

This paper presents the object detection algorithms GRASS GIS applied for Landsat 8-9 OLI/TIRS data. The study area includes the Sudd wetlands located in South Sudan. This study describes a programming method for the automated processing of satellite images for environmental analytics, applying the scripting algorithms of GRASS GIS. This study documents how the land cover changed and developed over time in South Sudan with varying climate and environmental settings, indicating the variations in landscape patterns. A set of modules was used to process satellite images by scripting language. It streamlines the geospatial processing tasks. The functionality of the modules of GRASS GIS to image processing is called within scripts as subprocesses which automate operations. The cutting-edge tools of GRASS GIS present a cost-effective solution to remote sensing data modelling and analysis. This is based on the discrimination of the spectral reflectance of pixels on the raster scenes. Scripting algorithms of remote sensing data processing based on the GRASS GIS syntax are run from the terminal, enabling to pass commands to the module. This ensures the automation and high speed of image processing. The algorithm challenge is that landscape patterns differ substantially, and there are nonlinear dynamics in land cover types due to environmental factors and climate effects. Time series analysis of several multispectral images demonstrated changes in land cover types over the study area of the Sudd, South Sudan affected by environmental degradation of landscapes. The map is generated for each Landsat image from 2015 to 2023 using 481 maximum-likelihood discriminant analysis approaches of classification. The methodology includes image segmentation by ‘i.segment’ module, image clustering and classification by ‘i.cluster’ and ‘i.maxlike’ modules, accuracy assessment by ‘r.kappa’ module, and computing NDVI and cartographic mapping implemented using GRASS GIS. The benefits of object detection techniques for image analysis are demonstrated with the reported effects of various threshold levels of segmentation. The segmentation was performed 371 times with 90% of the threshold and minsize = 5; the process was converged in 37 to 41 iterations. The following segments are defined for images: 4515 for 2015, 4813 for 2016, 4114 for 2017, 5090 for 2018, 6021 for 2019, 3187 for 2020, 2445 for 2022, and 5181 for 2023. The percent convergence is 98% for the processed images. Detecting variations in land cover patterns is possible using spaceborne datasets and advanced applications of scripting algorithms. The implications of cartographic approach for environmental landscape analysis are discussed. The algorithm for image processing is based on a set of GRASS GIS wrapper functions for automated image classification. Full article

(This article belongs to the Special Issue Feature Papers in Analytics)

► Show Figures

Figure 1

37 pages, 6017 KiB

Open AccessReview

Application of Machine Learning and Deep Learning Models in Prostate Cancer Diagnosis Using Medical Images: A Systematic Review

by

,

,

,

,

,

,

and

Analytics 2023, 2(3), 708-744; https://doi.org/10.3390/analytics2030039 - 19 Sep 2023

Viewed by 1199

Abstract

Introduction: Prostate cancer (PCa) is one of the deadliest and most common causes of malignancy and death in men worldwide, with a higher prevalence and mortality in developing countries specifically. Factors such as age, family history, race and certain genetic mutations are some [...] Read more.

Introduction: Prostate cancer (PCa) is one of the deadliest and most common causes of malignancy and death in men worldwide, with a higher prevalence and mortality in developing countries specifically. Factors such as age, family history, race and certain genetic mutations are some of the factors contributing to the occurrence of PCa in men. Recent advances in technology and algorithms gave rise to the computer-aided diagnosis (CAD) of PCa. With the availability of medical image datasets and emerging trends in state-of-the-art machine and deep learning techniques, there has been a growth in recent related publications. Materials and Methods: In this study, we present a systematic review of PCa diagnosis with medical images using machine learning and deep learning techniques. We conducted a thorough review of the relevant studies indexed in four databases (IEEE, PubMed, Springer and ScienceDirect) using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. With well-defined search terms, a total of 608 articles were identified, and 77 met the final inclusion criteria. The key elements in the included papers are presented and conclusions are drawn from them. Results: The findings show that the United States has the most research in PCa diagnosis with machine learning, Magnetic Resonance Images are the most used datasets and transfer learning is the most used method of diagnosing PCa in recent times. In addition, some available PCa datasets and some key considerations for the choice of loss function in the deep learning models are presented. The limitations and lessons learnt are discussed, and some key recommendations are made. Conclusion: The discoveries and the conclusions of this work are organized so as to enable researchers in the same domain to use this work and make crucial implementation decisions. Full article

► Show Figures

Figure 1

14 pages, 1727 KiB

Open AccessArticle

The Use of a Large Language Model for Cyberbullying Detection

by

Bayode Ogunleye

and

Babitha Dharmaraj

Analytics 2023, 2(3), 694-707; https://doi.org/10.3390/analytics2030038 - 06 Sep 2023

Cited by 1 | Viewed by 1140

Abstract

The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need [...] Read more.

The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models. Full article

► Show Figures

Figure 1

18 pages, 462 KiB

Open AccessArticle

Heterogeneous Ensemble for Medical Data Classification

by

,

,

and

Analytics 2023, 2(3), 676-693; https://doi.org/10.3390/analytics2030037 - 04 Sep 2023

Viewed by 655

Abstract

For robust classification, selecting a proper classifier is of primary importance. However, selecting the best classifiers depends on the problem, as some classifiers work better at some tasks than on others. Despite the many results collected in the literature, the support vector machine [...] Read more.

For robust classification, selecting a proper classifier is of primary importance. However, selecting the best classifiers depends on the problem, as some classifiers work better at some tasks than on others. Despite the many results collected in the literature, the support vector machine (SVM) remains the leading adopted solution in many domains, thanks to its ease of use. In this paper, we propose a new method based on convolutional neural networks (CNNs) as an alternative to SVM. CNNs are specialized in processing data in a grid-like topology that usually represents images. To enable CNNs to work on different data types, we investigate reshaping one-dimensional vector representations into two-dimensional matrices and compared different approaches for feeding standard CNNs using two-dimensional feature vector representations. We evaluate the different techniques proposing a heterogeneous ensemble based on three classifiers: an SVM, a model based on random subspace of rotation boosting (RB), and a CNN. The robustness of our approach is tested across a set of benchmark datasets that represent a wide range of medical classification tasks. The proposed ensembles provide promising performance on all datasets. Full article

► Show Figures

Figure 1

20 pages, 4201 KiB

Open AccessFeature PaperArticle

Surgery Scheduling and Perioperative Care: Smoothing and Visualizing Elective Surgery and Recovery Patient Flow

by

John S. F. Lyons

,

Mehmet A. Begen

and

Peter C. Bell

Analytics 2023, 2(3), 656-675; https://doi.org/10.3390/analytics2030036 - 21 Aug 2023

Viewed by 844

Abstract

This paper addresses the practical problem of scheduling operating room (OR) elective surgeries to minimize the likelihood of surgical delays caused by the unavailability of capacity for patient recovery in a central post-anesthesia care unit (PACU). We segregate patients according to their patterns [...] Read more.

This paper addresses the practical problem of scheduling operating room (OR) elective surgeries to minimize the likelihood of surgical delays caused by the unavailability of capacity for patient recovery in a central post-anesthesia care unit (PACU). We segregate patients according to their patterns of flow through a multi-stage perioperative system and use characteristics of surgery type and surgeon booking times to predict time intervals for patient procedures and subsequent recoveries. Working with a hospital in which 50+ procedures are performed in 15+ ORs most weekdays, we develop a constraint programming (CP) model that takes the hospital’s elective surgery pre-schedule as input and produces a recommended alternate schedule designed to minimize the expected peak number of patients in the PACU over the course of the day. Our model was developed from the hospital’s data and evaluated through its application to daily schedules during a testing period. Schedules generated by our model indicated the potential to reduce the peak PACU load substantially, 20-30% during most days in our study period, or alternatively reduce average patient flow time by up to 15% given the same PACU peak load. We also developed tools for schedule visualization that can be used to aid management both before and after surgery day; plan PACU resources; propose critical schedule changes; identify the timing, location, and root causes of delay; and to discern the differences in surgical specialty case mixes and their potential impacts on the system. This work is especially timely given high surgical wait times in Ontario which even got worse due to the COVID-19 pandemic. Full article

► Show Figures

Figure 1

38 pages, 3586 KiB

Open AccessArticle

Cyberpsychology: A Longitudinal Analysis of Cyber Adversarial Tactics and Techniques

by

Marshall S. Rich

Analytics 2023, 2(3), 618-655; https://doi.org/10.3390/analytics2030035 - 11 Aug 2023

Viewed by 1741

Abstract

The rapid proliferation of cyberthreats necessitates a robust understanding of their evolution and associated tactics, as found in this study. A longitudinal analysis of these threats was conducted, utilizing a six-year data set obtained from a deception network, which emphasized its significance in [...] Read more.

The rapid proliferation of cyberthreats necessitates a robust understanding of their evolution and associated tactics, as found in this study. A longitudinal analysis of these threats was conducted, utilizing a six-year data set obtained from a deception network, which emphasized its significance in the study’s primary aim: the exhaustive exploration of the tactics and strategies utilized by cybercriminals and how these tactics and techniques evolved in sophistication and target specificity over time. Different cyberattack instances were dissected and interpreted, with the patterns behind target selection shown. The focus was on unveiling patterns behind target selection and highlighting recurring techniques and emerging trends. The study’s methodological design incorporated data preprocessing, exploratory data analysis, clustering and anomaly detection, temporal analysis, and cross-referencing. The validation process underscored the reliability and robustness of the findings, providing evidence of increasingly sophisticated, targeted cyberattacks. The work identified three distinct network traffic behavior clusters and temporal attack patterns. A validated scoring mechanism provided a benchmark for network anomalies, applicable for predictive analysis and facilitating comparative study of network behaviors. This benchmarking aids organizations in proactively identifying and responding to potential threats. The study significantly contributed to the cybersecurity discourse, offering insights that could guide the development of more effective defense strategies. The need for further investigation into the nature of detected anomalies was acknowledged, advocating for continuous research and proactive defense strategies in the face of the constantly evolving landscape of cyberthreats. Full article

► Show Figures

Figure 1

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Article Types

Countries / Regions

Search Results (63)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI