Top 7 Most Common Errors When Implementing AI and Machine Learning Systems in 2021
Organizations will spend $327.5 billion in 2021 for AI systems according to the IDC report, but not all AI and Machine Leaning investments will bring the desired outcomes.
While the European Commission (EC) proposes to regulate high-risk AI systems in the near future, the global AI market is growing rapidly according to IDC. In this blog post, we will try to analyze the most common pitfalls and practical mistakes that organizations make when acquiring, designing or implementing AI and Machine Learning systems:
1. Ignorance of AI security and privacy aspects at the design stage
Proper implementation of AI system requires at least the same level of cyber threat modelling and preventive security controls as any corporate system. As a rule, cybercriminals swiftly find the weakest link in corporate defence and forcefully hit there. Moreover, a special attention should be given to compliance and regulatory aspects when AI system leverages regulated data in training or is designed to process or otherwise handle corporate trade secrets. Design-stage Privacy Impact Assessment (PIA) is always recommended to ensure that AI system will not infringe existing privacy laws that may limit or prohibit automated decision making.
Huawei AI Security White Paper indicates that one of the key differences between security vulnerabilities in traditional software and AI-driven solutions is poor explainability of the latter. The lack of explainability can be aptly exploited by adversarial Machine Learning techniques such as evasion, poisoning and backdoor attacks aimed to influence and mislead the original AI system. For instance, when AI training data comes from external sources, attackers can stealthily inject malicious data in the training data and manipulate the AI system.
For example, attackers can train a WAF to accept certain malicious HTTP requests as legitimate ones, and contrariwise, block legitimate users. Similarly, spammer groups try to poison the Gmail spam filter. As shown in the figure below, taken from blog post of Google's anti-abuse research team lead, in the span between the end of 2017 and early 2018, there were at least four malicious large-scale attempts to skew our spam classifier:
Finally, attackers may also implant backdoors in models and launch targeted attacks or extract model parameters or training data from query results. Therefore, security of AI system is to be considered and planned during the design stage, not in a pre-production testing when it is already too late.
2. Insufficient, unstructured or unreliable training data
It is a widespread misconception that AI’s success is mostly about PhD experts, powerful hardware and modern Machine Learning algorithms. No AI can exist without relevant and sufficiently large volumes of high-quality data: there is no expert or set of algorithms that can build a reliable Machine Learning model without the requisite data. In many industries, training data is also to be continuously updated and otherwise improved when a flaw is detected in the system decision-making. For example, in cybersecurity, data sets with threats or attacks vectors are to be updated almost every day to stay ahead of cybercriminals.
Companies frequently underestimate the importance of data in their AI projects and end up with their time and resources being wasted on unusable, inaccurate or biased systems that harm their business reputation and cause long-lasting financial losses. The foregoing, however, does not mean that you need petabytes of unique data in AWS. For some tasks, one needs little data, but this data should be relevant, adequately cleaned and pre-processed for training purposes. Furthermore, even if a company has a lot of data, the pivotal question is whether the data, its representability and format are relevant for AI training purposes.
The best way to start developing a trustworthy AI solution is to implement a data-first approach to ascertain that the requisite data is generalizable and otherwise usable for AI training. This will also help better understand and analyze the outputs of AI systems for possible inaccuracies or potentially biased outcomes that raise growing concerns among privacy advocates.
3. Lack of coordinated AI business strategy
The eventual business purpose is the underlying and crucial element of any AI system. Prior to implementing an AI system, its purposes should be identified, discussed and approved internally. Legal and regulatory questions should also be considered and assessed. This will help avoid a situation where the AI interferes with reasonable interests of external stakeholders or violate a law.
For example, Facebook has a team of leading AI experts who develop powerful AI algorithms to detect suicidal tendencies of its users. The original purpose of this laudable feature was to reduce the number of suicides on Facebook and prevent suicidal behaviour at the early stage. Sadly, the project caused a tsunami of privacy and data protection concerns that brought Facebook's data usage in question. Ultimately, Facebook’s idea to create and store sensitive mental health data without explicit user’s consent made privacy experts worried about whether Facebook can be trusted to make and store inferences about the most intimate details of our minds.
4. Lack of explainability over AI-generated results
The “black box” and “white box” Machine Learning systems have their advantages and disadvantages. As elaborated in the IEEE publication, both approaches are suitable for solving practical and complex problems, but organization need to understand the input data, the problem they eventually trying to solve and the best way to structure and present the output data in a simple and explainable manner.
Even if AI system is based on a black box approach, it should not be left to itself. For example, AI system can be instructed to generate alerts only if a Machine Learning model detects anomalies or novel patterns that are unknown to the team. Therefore, architecture and design of AI system should be transparent, comprehensible and manageable for the team in charge. Otherwise, human experts can never be certain whether they are potentially being misguided by the AI system.
Importantly, most of the modern Deep Learning models are obscure by design, however, this can be partly compensated by regular testing, review and analysis. Human supervision is described as the best way of controlling AI in Harvard Journal of Law & Technology. The more transparent AI system is, the more its findings are understandable for human beings. The more the finding are understandable, the more certainty we have about their reliability, unbiasedness and suitability for business usage.
5. Attempt to replace skilled human labour with “strong” AI
As of 2021, the so-called strong AI does not exist. AI is certainly a promising solution to automate and accelerate numerous time-consuming tasks, however, the AI has certain limits. It cannot replace a qualified cybersecurity team but merely enhance its efficiency by taking care of redundant processes to enable human experts performing highly complex and sophisticated tasks that truly deserve their valuable time. In general, AI systems can be used as an additional tool or smart assistant, but not as a replacement for experienced cybersecurity specialists who, among other things, also understand the underlying business context of their daily workload.
The Enterprise Strategy Group concluded in their “Automation and Analytics versus the Chaos of Cybersecurity Operations” report: automation, orchestration, Machine Learning and Artificial Intelligence can enable machines to perform most of the routine, repetitive “grunt” work. This can free up skilled personnel to focus on anomalies or discrepancies surfaced by the AI along with providing more time for critical decision making and strategic planning.
In a nutshell, don’t overestimate technical capacities of the modern AI. Despite a huge progress made in the last decade, we are still far from creation of the Strong AI. If a vendor tells you that they fully replace trained cybersecurity professionals with their AI-driven solution, there is only one thing to do – run away.
6. Uncoordinated implementation of AI
AI implementation strategy should be developed in close coordination with the existing processes and technologies leveraged by organization. Oftentimes, organization management decides to invest into AI solutions without talking to their teams to understand where AI is most needed, and where it will likely be futile - regardless of what external experts say to sell their solution. Such unthinking actions will likely lead to excessive costs, internal conflicts and may even exacerbate the existing problems instead of solving them.
In the first place, AI implementation strategy should be aligned with the existing solutions and processes. As mentioned above, AI should not be intended to replace qualified cybersecurity employees but rather to augment their capacities and optimize their efficiency by focusing on important and untrivial tasks in a risk-based manner. Gathering insights from employees at all levels of the organization may be invaluable to better understand the root causes of current bottlenecks to effectively implement AI systems later.
7. AI implementation with deficient budget
Actionable AI system are no different from any other modern technologies: they require adequate budget to be allocated prior to acquisition or development. Continuous improvement, training and maintenance costs - should also be meticulously incorporated into the financial planning of AI project. According to the survey conducted by Project Management Institute, 43% of organizations completed AI projects that significantly exceeded their initial budgets. These numbers unambiguously indicate that organizations largely underestimate the financial component of AI.
For example, after implementation of AI in a cybersecurity solution, the main factor that will affect the ongoing AI’s efficiency is the amount of qualified data available to train the model. The more diversified and representative data is used to train the model, the better and more reliable are the results. The problem is that data collection and curation is both a lengthy and costly process that requires a coherent, ongoing and systematic approach that will likely trigger non-negligeable costs to be planned at the design phase of the project.
At ImmuniWeb, we do not believe that AI can fully substitute a qualified human being. Instead, we combine the best of two worlds: the power of Machine Learning and the genius of human intelligence. Our award-winning ImmuniWeb® AI Platform intelligently automates full spectrum of tasks and processes that can be efficiently automated without impact on their quality or reliability, while all the rest is escalated to our security analysts and penetration testers who works closely with our data scientists to continuously improve our Machine Learning models. Eventually, we offer our customers and partners the best value for money: the best quality of service available for the best price on the global market. Compare with your existing solutions to see the benefits of cost savings we offer.