Neural Networks are a dominant force in machine learning and are responsible for the massive momentum in deep learning in numerous application domains. MAL2 will apply Deep Neural Networks and Unsupervised Learning to advance cybercrime prevention by a) automating the discovery of fraudulent eCommerce and b) evaluating the capabilities of detecting Potentially Harmful Apps (PHAs) in Android operating systems.
Online shopping is commonplace, with 61.6% of Austrians already using this form of commerce. The turnover of the top 250 online shops in Austria in 2016 was € 2.3 billion, which corresponds to growth of around 9 percent compared to the previous year. Ripping of customers through fraudulent eCommerce shops is a rapidly growing area in cybercrime. Since July 2013, the Internet Ombudsman (ÖIAT) offers preventive information and maintains a blacklist on the "Watchlist Internet" portal. Exposing such fake offerings however is a labour intensive, manual task as often, dozens or more of these copies exist at the same time - every week more than 150 new fake online-shops are entered for manual verification. MAL2 provides means for advancing the automation and detection of fake-shop cybersquatting through machine learning technologies by classifying sites based on their structural similarity.
With over two billion monthly active devices, the Android operating system for tablets, phones and smart devices it by far the most widespread mobile operating system in the world. Four million new malware programs were released for this platform in the year 2016. The total market share of exploits that target the Android platform is 21% which makes it the second most targeted platform for running exploit attacks. By Q42016 0.71% of all devices had potentially harmful applications (PHAs) installed. The goal of the project is to train a Neural Network to evaluate the discoverability and explainability of upcoming attack patterns.
Classification capabilities of Neural Networks are heavily reliant on the quality of the underlying datasets, and subsequently dependent even more on the granularity of extracted features. Up to date no web-archive dataset of fraudulent eCommerce sites has been collected and released. MAL2 will collect/harvest and curate two large-scale Ground-Truth dataset existing of a) malware/benign applications and b) web-archives of fake-shops, to train its machine learning detection models in the application domains.
Currently there is a lack of technology supporting an integrated solution of large-scale feature extraction and Neural Network training. The goal of the MAL2 project is (i) to release Open Source framework which provides integrated functionality along the required pipeline – from data extraction, feature composition up to Neural Network training and analysis of results (ii) to execute its components at large-scale within Hadoop and GPU cluster support and (iii) to publish the harvested Ground-Truth dataset, the extracted features as well as the trained Neural Network in both application domains on open data platforms.
The MAL2 project is funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) in the 6th call of the Austrian Research Promotion Agency (FFG).