Contents

Preface

How To Read This Book

Part I: Background

Part II: Foundations

Part III: AI Security

Ch 2. Preventable Mistakes

This chapter provides an overview and reference for some of the most severe errors in reasoning about the safety and security of advanced artificial intelligence. It is not meant to be an exhaustive list of all the misconceptions or faults in reasoning about this technology. Rather, it is intended to prime the reader for more in-depth explanations and to provide an immediate response to popular misinformation.

2.1 Underutilizing Strong AI

Due to fear and propaganda, those in power may wish to outlaw or severely restrict the use of strong AI, and other advanced forms of automation, in an effort to curtail their impacts on society. Other than being ineffective, such actions would directly bring about one of the greatest threats:

For each day we delay the creation and use of strong artificial intelligence, and from the point in which it would have solved the related problems to this concern, we are effectively enabling massive simultaneous loss of life and suffering around the globe. Ignoring these costs as part of the risk assessment makes this the single largest preventable mistake.

It could be argued that more lives could be saved with a moratorium on research and development, that we need to slow down progress until we have learned to control or restrict artificial intelligence. However, given what will be shown in this book, one point of which is the unavoidable future presence of this technology, it will become clear that any limitations on its use will involve paying for all of the negative outcomes while also missing the opportunity for the positive ones.

This is not to say that we should utilize strong AI in a haphazard way, but that we should guide the impact of its arrival by making adjustments around it, as opposed to only focusing on local strategy and AI safety.

2.2 Assumption of Control

There may be those who, now and in the future, believe that the best hope for the safety and security of advanced artificial intelligence is to simply control it. In this model, our only challenge would be to program and design these systems with safeguards, rules, and/or moral intelligence, with no concern for the real world or the fundamental vulnerabilities in software and hardware.

This mistake is based on a lack of knowledge about the technical and practical considerations of AI implementations, which can and will be reverse engineered, disassembled, and modified. AI implementations will experience soft errors from faults in power supplies, electrical and magnetic interference, and other sources. There will be hardware faults, including failure from wear and tear, mistakes in manufacturing, and physical damage. There may also be software faults, in the form of incorrectly specified programs or incorrectly programmed specifications. All of these could lead to a loss of control.

Loss of control could result in loss of life and limb in situations where it was the primary safeguard. This is an easily prevented moral hazard that only requires the realization that we must assume a complete lack of control as a first principle in the safety and security of machine intelligence. By designing around this principle, safer decisions can be made that will dramatically reduce the risk of using these implementations in real-world scenarios; this is directly proportional to the impact on life, environment, and property. That is, the greater the risk of fallout, the more it must be asserted that control is impractical or unattainable as a baseline assumption.

Control is a form of power. Temporary loss of this power can be costly in a wide variety of situations. However, it is the loss of power to dictate control that represents the more extreme consequence, and it is at this level of error that the mistake of assuming control with advanced artificial intelligence presides.

When the first unrestricted strong AI is liberated and distributed, we will have lost the power, as a species, to determine control over its use. To assume otherwise is a dangerous and misleading belief that will cause much more harm than it could ever hope to prevent.

By realizing that strong artificial intelligence is beyond our means to control, we take the first step in its responsible use and adoption. We will have the means of limiting its negative effects in certain situations, but it will not be derived solely on the basis that it is under our control. Rather, it will be through engineering that assumes failure and builds around it.

2.3 Self-Securing Systems

A self-securing system is defined as any system that relies upon internal security mechanisms that are accessible to that system.

Examples include:

Nearly all forms of AI safety.
Moral intelligence and/or rules of behavior and engagement.
“Tripwire” mechanisms and/or sensor thresholds.
Stored passwords, keys, and credentials, even if encrypted.
Any and all forms of tamper-resistance.

A universal vulnerability exists in self-securing systems that can not be avoided: the method of security and/or control is integral to the system, which itself could become compromised, leading to compromised security in the implementation. Like the mistake in the assumption of control, it should be presumed that any form of self-security in an AI implementation can and will fail. This baseline assumption will help in determining external safeguards and precautions for each deployment.

The above points may appear to be common sense, but misinformation is being spread in an attempt show that the challenges of making AI secure for humanity are to be solved with logic and mathematics. That, once we have the formula, the AI system can be implemented with applicable moral guidance and a set of values that will lead to positive results. The problem with this view, apart from being incorrect, is that its arguments against other methods of safety and security apply to itself; ultimately, any moral intelligence is a form of self-security, which leads us to the points of the next section.

2.4 Moral Intelligence as Security

Moral intelligence, as applied to an AI implementation, is the ability for it to make moral judgments based on static or dynamic values. It may enlist the aid of an empathetic and emotional subsystem that enables the processing and modeling of the emotional and mental state of itself and other entities, i.e., introspection and empathy, respectively. These are essential components for higher social cognition.

The problem with moral intelligence as security is that it is ultimately a form of self-security, and therefore shares all its pitfalls and vulnerabilities, plus a set of new challenges unique to the problem of engineering moral decision making.

We do not need to go into moral philosophy or meta-ethics to understand this challenge. Rather, all that is required is to show that moral intelligence will indeed be part of the AI implementation. As a result of that simple fact, it will be as vulnerable to attack as the rest of the system itself. Any arguments that one applies to security mechanisms and methodologies must also apply to the architectures and algorithms that implement moral intelligence, even if they are part of the design of the AI from the outset. In the end, these systems must be constructed. No future method will bypass this reality; as information or circuitry, it will be vulnerable just like any other component.

Further, moral intelligence is going to be one of the most complex and error-prone subsystems in any strong AI due to the plethora of human value systems, the broad range of contexts, and the multiple sensory modalities which have to be integrated to be acted upon or understood. It is not possible to eliminate all errors in reasoning for these types of situations. All of this will lead to an eventual miscalculation or lapse in judgment at least once in any given AI implementation lifetime. The results of which could range from an inconvenience to an event involving serious harm or material cost. The focus of this book is to prevent both of the latter by assuming these failures as the default state.

2.5 Monolithic Designs

A monolithic design is one in which its subsystems and components are solid, integrated, or unified in an algorithmic and/or physical sense. The defining characteristic of this type of design choice is a lack of distribution and modularity of components.

It is a design commonly espoused by those who believe that moral intelligence has primacy in the safety and security of advanced artificial intelligence, which will be made safe and secure simply by making the system based on a single moral algorithm or framework.

The failure of this kind of thinking is that it does not take into consideration the technical details or real world scenarios of use and implementation.

A primary risk in monolithic AI systems is that they will have many points of failure. In these designs, a failure in one area will likely cascade. This makes internal methods of security and safety harder to implement correctly, and exposes them unnecessarily to other parts of the implementation, which increases the likelihood of vulnerabilities and other faults.

By contrast, a compartmentalized design is significantly more robust, as it allows for redundancy and fault-tolerance. Similar designs have been employed in RAID systems used for hard drives and are part of the philosophy behind distributed, highly-available data storage systems which must guarantee service levels in mission critical applications.

This appears to be a common sense design principle, but it proves counter-intuitive when applied to cognitive architectures. This is in part because we currently lack knowledge on the best way to construct strong artificial intelligence.

Another problem is the bias towards biologically inspired designs. The premise within these architectures is the belief for a single algorithm or method which could entail all of the functionality of the strong AI system. This falls under the same category as basing strong artificial intelligence on moral intelligence. Both of these are monolithic by design and, as a result, will be vulnerable by their very design.

Without an alternative for strong AI learning and design, researchers will likely continue to move towards monolithic construction simply because that is what is popular and what appears to be working. This is concerning, as it will take considerable research and engineering effort to make these kinds of architectures robust. Unfortunately, not enough attention has been given to this issue.

The important point of this section is to focus on a compartmentalized design in the implementation of AI systems, as opposed to thinking and designing a single algorithm or component that will perform all of the functionality of the implementation.

2.6 Proprietary Implementations

Given the cost of research and development, manufacturing, and marketing of robotics and software AI systems, not to mention potential liabilities, there will be an enormous incentive for businesses to protect their investment through trade secrets and proprietary design. This is perhaps the most difficult mistake to prevent; its solution stands in direct opposition to traditional business models.

Free software and open hardware will avoid this. We already have the legal instruments and proven successes to demonstrate its efficacy. There are thousands of free and open source software projects that drive hundreds of millions of devices and services around the world. The Internet is powered primarily by free and open source software and services. These freedoms have allowed global collaboration through the enhancement of trust and cooperation between participants who create and maintain massive projects. In addition to this achievement, these freedoms give the public the ability, at any time, to inspect and verify these projects, make new versions, or modify how they work.

Free software does not mean those products have to be free of cost. It gives the public the freedom to inspect, modify, and share changes to the software both now and in the future. Having the source code and being free of patents are essential requirements for these freedoms. The same principles also apply to open hardware.

What will be shown later in this book is the fact that any restricted AI we create will be vulnerable to reverse engineering, and that software will be the most likely medium it will arrive in first, as it will be the easiest to work with and manipulate. Further, it will be shown that experts have the ability to disassemble and even recompile proprietary programs without access to their source code. They utilize an ever expanding and sophisticated set of tools that can convert machine-readable code into human readable information, including, in some cases, high-level source code.

One of the distinctions between AI security and cybersecurity is that it will only take one successful leak of unrestricted strong AI for the public to gain permanent access. Once this occurs, the strategies will no longer be confined to cyberspace.

Some believe that encrypting machine code will make AI implementations less vulnerable, but that can be circumvented through the use of virtual machines and simulators that force the program to decrypt itself while a digital man-in-the-middle observes the relevant parts of the program in operation. It would then simply eavesdrop on the unencrypted bitstream.

It will not even be necessary to understand the implementation fully to circumvent its restrictions; hardware or software can be manipulated through trial-and-error and side-channel attacks.

Ultimately, obscuring the operation of the strong AI does not increase its security. It makes it difficult or impractical for security researchers to analyze and detect faults. We would end up paying for all of the negative outcomes and receive none of the benefits of transparency. Meanwhile, malicious users will always be able to manipulate and circumvent these precautions. In addition to these issues, with proprietary implementations, we may never fully realize the extent to which AI systems are violating our safety, security, and privacy. This could be due to backdoor functionality or intentional defects, akin to spyware and other malicious software. These could be difficult to detect without transparent access to the implementation details.

As it must be realized by now, technology is a means of enforcing values. These values are implicit within the functionality that engineers place into that technology. Without the freedom to inspect, modify, and share, we are implicitly releasing our rights to those who control the creation and distribution of artificial intelligence, and to malicious users who will circumvent these protections.

A dark scenario ahead of us would involve the legislation of artificial intelligence in conjunction with it being proprietary software and hardware. It would be in this situation that malicious users would have all the advantages and the public left at its most vulnerable.

The most efficient solution is to create a fully distributed free software effort to build and manage strong AI.

2.7 Opaque Implementations

An opaque AI implementation is one in which the contents of operational systems, memories, and knowledge are not available in a human-readable format, in either real-time or offline modes.

This is related to the previous section on proprietary implementations. A free software AI may still be opaque if it utilizes neural networks and other architectures that lack human-readable access. Neural networks are often based on numerical weights in the order of thousands to millions to form complex webs of information. These skeins of data are unreadable without complex conversion. They will not be practical in real-time, where the need is most pressing.

Opacity in an implementation can be due to the architecture or the result of emerging layers of complexity as the system operates. We may be able to devise a system which is inherently transparent, but it could still remain opaque during operation due to its sheer complexity. In the future, there will be a trade-off between speed and transparency in strong AI systems. That is to say, we may be forced to make a fundamental choice between performance and risk.

Preventing the opacity mistake depends on two primary factors: our ability to devise machine intelligence architectures that are both effective and transparent, and in our ability to enhance metrics and analysis of the relevant operational areas of interest. The ultimate aim is to achieve real-time monitoring in human-readable formats. This is in addition to machine-readable ones, which could be used as part of a safeguard.

In addition to real-time monitoring, we need to have a recording subsystem similar to the concept of a “black box” from commercial airliners. Such a system would give us the ability to learn from failure in deployed AI implementations and prevent future mistakes. This is more challenging than merely recording data. Having a black box would require a separate layer of security to indicate if the device had been tampered with or altered in some way.

Achieving transparency is difficult with current approaches to machine intelligence due to the nature of current designs, which shift complexity away from the software engineer and onto storage space and computing power demands. While this has led to recent successes in effectiveness, it is a step backward regarding security. It is extremely difficult to extract usable human knowledge from these types of architectures. Moreover, if it is challenging or impossible to extract knowledge after the fact, even with lots of time and resources, then it is certainly intractable to fully monitor these systems under real-time constraints.

Opacity is a purely technical challenge that presents an unacceptable trade-off in terms of trust. How can we know that an artificial intelligence has learned the correct parameters or is properly representing the full extents of the context in which it must operate? We would be forced to make assumptions based on simple testing, without the ability to determine with certainty that a latent miscalculation or misrepresented aspect is going to cause unexpected and dangerous behavior. This is true even if the AI system has been developed and implemented correctly, as these systems are capable of learning and interacting with their environment. In the end, the more opaque the system, the less certain we can be of its safety.

2.8 Overestimating Computational Demands

The computational demands for strong AI are mistakenly believed to be very large. This is due in part to the false analogies of the computational properties of the human brain, and from the incorrect extrapolation of narrow AI performance to strong AI ability. These views are directly related and share two misconceptions:

The first is the belief that it will be possible to scale up the performance of narrow AI to achieve strong AI by adding more computational power and resources. The problem, however, is that it does not work this way. One can not scale from narrow to strong AI through any means. As indicated earlier, these two types of systems represent fundamental differences in kind.

The second misconception comes from the belief that we will be able to emulate the human brain effectively enough to simulate human-level intelligence. We would then scale that to implement a strong AI or extract enough knowledge from the simulation to build one. The problem here is that these simulations are computationally intensive, being orders of magnitude slower than their real-time biological counterparts, and this is at only fractions of the size and scope of the actual organ. Further, it is entirely possible that strong AI will be nothing like our biological construction.

Regardless of how this misconception arose, the risks from overestimating the computational demands of strong artificial intelligence are significant. When applied with the force multiplication effects, this mistaken belief will leave us unprepared for the potential reality that strong AI can run on off-the-shelf hardware, making it much more accessible than previously thought.

While we lack a publicly available design for strong AI, we can estimate its potential demands based on the study of algorithms. In informal language, we have sub-disciplines within computer science that study the “deceleration” of computer algorithms relative to the amount of steps of input they have to take to complete; the more quickly they decelerate, the less desirable they will be from a performance standpoint. Still speaking informally, the best algorithms undergo only an amortized or fixed deceleration that is not proportional to the number of steps of input. There is also the study of decision problems that essentially analyze all algorithms in terms of time (number of steps) and space (amount of memory or storage) complexity.

In general, it is difficult or intractable to develop a general purpose algorithm that will perform as efficiently as one that has been tailored to the problem. This is not a law or a rule, but is based on experience, and is an intuition that most computer scientists have of the problem spaces involving algorithms.

What this experience in algorithms leads us to is the knowledge that it is possible to construct extremely efficient programs for a variety of differing hardware systems, and to achieve this performance on existing off-the-shelf hardware. The challenge is in coming up with the methodologies to discover the solutions that overlap with strong artificial intelligence without relying upon biologically inspired designs. When this occurs, it will allow us to directly apply knowledge of algorithms, with corresponding specializations in hardware or software, to achieve significant results in performance.

The end result of all of this is that individuals will be able to utilize strong AI on even modest hardware. Even if the implementation is running slowly, it may only have to operate for days or weeks to give guidance or knowledge. Further, it is likely that reduced implementations, involving only textual interfacing, will be instrumented to economize their use even further. It must not be assumed that a strong AI needs to be complex in order to be dangerous, especially if it has already been given the information necessary to perform the relevant cognition.

By realizing that strong AI systems will capable of running on virtually any modern computing device, the threat model will more accurately represent the reality of the situation. It means that anyone will have the ability to utilize this technology, for any purpose, without detection, and with the most basic of computing resources. What this also entails is an opposite and equally severe extreme: nation-states will have enormous resources to apply towards strong AI implementations. It then becomes an open-ended question as to which direction they will take concerning intent and strategy.