# An Information Integration Theory of Consciousness

69 min read#### Measuring the capacity to integrate information: The Φ of a complex

If consciousness corresponds to the capacity to integrate information, then a physical system should be able to generate consciousness to the extent that it has a large repertoire of available states (information), yet it cannot be decomposed into a collection of causally independent subsystems (integration). How can one identify such an integrated system, and how can one measure its repertoire of available states [2, 8]?

As was mentioned above, to measure the repertoire of states that are available to a system, one can use the entropy function, but this way of measuring information is completely insensitive to whether the information is integrated. Thus, measuring entropy would not allow us to distinguish between one million photodiodes with a repertoire of two states each, and a single integrated system with a repertoire of 21,000,000 states. To measure information integration, it is essential to know whether a set of elements constitute a causally integrated system, or they can be broken down into a number of independent or quasi-independent subsets among which no information can be integrated.

To see how one can achieve this goal, consider an extremely simplified system constituted of a set of elements. To make matters slightly more concrete, assume that we are dealing with a neural system. Each element could represent, for instance, a group of locally interconnected neurons that share inputs and outputs, such as a cortical minicolumn. Assume further that each element can go through discrete activity states, corresponding to different firing levels, each of which lasts for a few hundred milliseconds. Finally, for the present purposes, let us imagine that the system is disconnected from external inputs, just as the brain is virtually disconnected from the environment when it is dreaming.

##### Effective information

Consider now a subset S of elements taken from such a system, and the diagram of causal interactions among them (Fig. 1a). We want to measure the information generated when S enters a particular state out of its repertoire, but only to the extent that such information can be integrated, i.e. each state results from causal interactions within the system. How can one do so? One way is to divide S into two complementary parts A and B, and evaluate the responses of B that can be caused by all possible inputs originating from A. In neural terms, we try out all possible combinations of firing patterns as outputs from A, and establish how differentiated is the repertoire of firing patterns they produce in B. In information-theoretical terms, we give maximum entropy to the outputs from A (AHmax), i.e. we substitute its elements with independent noise sources, and we determine the entropy of the responses of B that can be induced by inputs from A. Specifically, we define the *effective information* between A and B as EI(A→B) = MI(AHmax;B). Here MI(A;B) = H(A) + H(B) – H(AB) stands for mutual information, a measure of the entropy or information shared between a source (A) and a target (B). Note that since A is substituted by independent noise sources, there are no causal effects of B on A; therefore the entropy shared by B and A is necessarily due to causal effects of A on B. Moreover, EI(A→B) measures all possible effects of A on B, not just those that are observed if the system were left to itself. Also, EI(A→B) and EI(B→A) in general are not symmetric. Finally, note that the value of EI(A→B) is bounded by AHmax and BHmax, whichever is less. In summary, to measure EI(B→A), one needs to apply maximum entropy to the outputs from B, and determine the entropy of the responses of B that are induced by inputs from A. It should be apparent from the definition that EI(A→B) will be high if the connections between A and B are strong and specialized, such that different outputs from A will induce different firing patterns in B. On the other hand, EI(A→B) will be low or zero if the connections between A and B are such that different outputs from A produce scarce effects, or if the effect is always the same. For a given bipartition of a subset, then, the sum of the effective information for both directions is indicated as EI(A B) = EI(A→B) + EI(B→A). Thus, EI(A B) measures the repertoire of possible causal effects of A on B and of B on A.

**Figure 1**

**Effective information, minimum information bipartition, and complexes.** *a. Effective information*. Shown is a single subset S of 4 elements ({1,2,3,4}, blue circle), forming part of a larger system X (black ellipse). This subset is bisected into A and B by a bipartition ({1,3}/{2,4}, indicated by the dotted grey line). Arrows indicate causally effective connections linking A to B and B to A across the bipartition (other connections may link both A and B to the rest of the system X). To measure EI(A→B), maximum entropy Hmax is injected into the outgoing connections from A (corresponding to independent noise sources). The entropy of the states of B that is due to the input from A is then measured. Note that A can affect B directly through connections linking the two subsets, as well as indirectly via X. Applying maximum entropy to B allows one to measure EI(B→A). The effective information for this bipartition is EI(A B) = EI(A→B) + EI(B→A). *b. Minimum information bipartition*. For subset S = {1,2,3,4}, the horizontal bipartition {1,3}/{2,4} yields a positive value of EI. However, the bipartition {1,2}/{3,4} yields EI = 0 and is a minimum information bipartition (MIB) for this subset. The other bipartitions of subset S = {1,2,3,4} are {1,4}/{2,3}, {1}/{2,3,4}, {2}/{1,3,4}, {3}/{1,2,4}, {4}/{1,2,3}, all with EI>0. *c. Analysis of complexes*. By considering all subsets of system X one can identify its complexes and rank them by the respective values of Φ – the value of EI for their minimum information bipartition. Assuming that other elements in X are disconnected, it is easy to see that Φ>0 for subset {3,4} and {1,2}, but Φ = 0 for subsets {1,3}, {1,4}, {2,3}, {2,4}, {1,2,3}, {1,2,4}, {1,3,4}, {2,3,4}, and {1,2,3,4}. Subsets {3,4} and {1,2} are not part of a larger subset having higher Φ, and therefore they constitute complexes. This is indicated schematically by having them encircled by a grey oval (darker grey indicates higher Φ). *Methodological note*. In order to identify complexes and their Φ(S) for systems with many different connection patterns, each system X was implemented as a stationary multidimensional Gaussian process such that values for effective information could be obtained analytically (details in [8]). Briefly, in order to identify complexes and their Φ(S) for systems with many different connection patterns, we implemented numerous model systems X composed of n neural elements with connections CONij specified by a connection matrix CON(X) (no self-connections). In order to compare different architectures, CON(X) was normalized so that the absolute value of the sum of the afferent synaptic weights per element corresponded to a constant value w<1 (here w = 0.5). If the system’s dynamics corresponds to a multivariate Gaussian random process, its covariance matrix COV(X) can be derived analytically. As in previous work, we consider the vector **X** of random variables that represents the activity of the elements of X, subject to independent Gaussian noise **R** of magnitude c. We have that, when the elements settle under stationary conditions, **X** = **X** * CON(X) + c**R**. By defining Q = (1-CON(X))-1 and averaging over the states produced by successive values of **R**, we obtain the covariance matrix COV(X) = <**X*****X**> = <Qt * **R**t * **R** * Q> = Qt * Q, where the superscript t refers to the transpose. Under Gaussian assumptions, all deviations from independence among the two complementary parts A and B of a subset S of X are expressed by the covariances among the respective elements. Given these covariances, values for the individual entropies H(A) and H(B), as well as for the joint entropy of the subset H(S) = H(AB) can be obtained as, for example, H(A) = (1/2)ln [(2π e)n|COV(A)|], where |•| denotes the determinant. The mutual information between A and B is then given by MI(A;B) = H(A) + H(B) – H(AB). Note that MI(A:B) is symmetric and positive. To obtain the effective information between A and B within model systems, independent noise sources in A are enforced by setting to zero strength the connections within A and afferent to A. Then the covariance matrix for A is equal to the identity matrix (given independent Gaussian noise), and any statistical dependence between A and B must be due to the *causal* effects of A on B, mediated by the efferent connections of A. Moreover, all possible outputs from A that *could* affect B are evaluated. Under these conditions, EI(A→B) = MI(AHmax;B). The independent Gaussian noise **R** applied to A is multiplied by cp, the perturbation coefficient, while the independent Gaussian noise applied to the rest of the system is given by ci, the intrinsic noise coefficient. Here cp = 1 and ci = 0.00001 in order to emphasize the role of the connectivity and minimize that of noise. To identify complexes and obtain their capacity for information integration, one considers every subset S of X composed of k elements, with k = 2,…, n. For each subset S, we consider all bipartitions and calculate EI(A B) for each of them. We find the *minimum information bipartition* MIB(S), the bipartition for which the normalized effective information reaches a minimum, and the corresponding value of Φ(S). We then find the *complexes* of X as those subsets S with Φ>0 that are not included within a subset having higher Φ and rank them based on their Φ(S) value. The complex with the maximum value of Φ(S) is the *main complex*. MATLAB functions used for calculating effective information and complexes are at http://tononi.psychiatry.wisc.edu/informationintegration/toolbox.html.

##### Information integration

Based on the notion of effective information for a bipartition, we can assess how much information can be integrated within a system of elements. To this end, we note that a subset S of elements cannot integrate any information (as a subset) if there is a way to partition S in two parts A and B such that EI(AB) = 0 (Fig. 1b, vertical bipartition). In such a case, in fact, we would clearly be dealing with at least two causally independent subsets, rather than with a single, integrated subset. This is exactly what would happen with the photodiodes making up the sensor of a digital camera: perturbing the state of some of the photodiodes would make no difference to the state of the others. Similarly, a subset can integrate little information if there is a way to partition it in two parts A and B such that EI(A B) is low: the effective information across that bipartition is the limiting factor on the subset’s information integration capacity. Therefore in order to measure the information integration capacity of a subset S, we should search for the bipartition(s) of S for which EI(A B) reaches a minimum (the informational “weakest link”).” Since EI(A B) is necessarily bounded by the maximum entropy available to A or B, min{EI(A B)}, to be comparable over bipartitions, should be normalized by Hmax(A B) = min{Hmax(A); Hmax(B)}, the maximum information capacity for each bipartition. The *minimum information bipartition* MIBA B of subset S – its ‘weakest link’ – is its bipartition for which the normalized effective information reaches a minimum, corresponding to min{EI(A B)/Hmax(A B)}. The *information integration* for subset S, or Φ(S), is simply the (non-normalized) value of EI(A B) for the minimum information bipartition: Φ(S) = EI(MIBA B). The symbol Φ is meant to indicate that the information (the vertical bar “I”) is integrated within a single entity (the circle “O”, see Appendix, iii).

##### Complexes

We are now in a position to establish which subsets are actually capable of integrating information, and how much of it (Fig. 1c). To do so, we consider every possible subset S of m elements out of the n elements of a system, starting with subsets of two elements (m = 2) and ending with a subset corresponding to the entire system (m = n). For each of them, we measure the value of Φ, and rank them from highest to lowest. Finally, we discard all those subsets that are included in larger subsets having higher Φ (since they are merely parts of a larger whole). What we are left with are *complexes* – individual entities that can integrate information. Specifically, a *complex* is a subset S having Φ>0 that is not included within a larger subset having higher Φ. For a complex, and only for a complex, it is appropriate to say that, when it enters a particular state out if its repertoire, it generates and amount of integrated information corresponding to its Φ value. Of the complexes that make up a given system, the one with the maximum value of Φ(S) is called the *main complex* (the maximum is taken over all combinations of m>1 out of n elements of the system). Some properties of complexes worth pointing out are, for instance, that a complex can be causally connected to elements that are not part of it (the input and output elements of a complex are called *ports-in* and *ports-out*, respectively). Also, the same element can belong to more than one complex, and complexes can overlap.

In summary, a system can be analyzed to identify its complexes – those subsets of elements that can integrate information, and each complex will have an associated value of Φ – the amount of information it can integrate (see Appendix, iv). To the extent that consciousness corresponds to the capacity to integrate information, complexes are the “subjects” of experience, being the locus where information can be integrated. Since information can only be integrated *within* a complex and not outside its boundaries, consciousness as information integration is necessarily subjective, private, and related to a single point of view or perspective [1, 9]. It follows that elements that are part of a complex contribute to its conscious experience, while elements that are not part of it do not, even though they may be connected to it and exchange information with it through ports-in and ports-out.

##### Information integration over space and time

The Φ value of a complex is dependent on both spatial and temporal scales that determine what counts as a state of the underlying system. In general, there will be a “grain size”, in both space and time, at which Φ reaches a maximum. In the brain, for example, synchronous firing of heavily interconnected groups of neurons sharing inputs and outputs, such as cortical minicolumns, may produce significant effects in the rest of the brain, while asynchronous firing of various combinations of individual neurons may be less effective. Thus, Φ values may be higher when considering as elements cortical minicolumns rather than individual neurons, even if their number is lower. On the other hand, Φ values would be extremely low with elements the size of brain areas. Time wise, Φ values in the brain are likely to show a maximum between tens and hundreds of milliseconds. It is clear, for example, that if one were to stimulate one half of the brain by inducing many different firing patterns, and examine what effects this produces on the other half, no stimulation pattern would produce any effect whatsoever after just a tenth of a millisecond, and Φ would be equal to zero. After say 100 milliseconds, however, there is enough time for differential effects to be manifested, and Φ would grow. On the other hand, given the duration of conduction delays and of postsynaptic currents, much longer intervals are not going to increase Φ values. Indeed, a neural system will soon settle down into states that become progressively more independent of the stimulation. Thus, the search for complexes of maximum Φ should occur over subsets at critical spatial and temporal scales.

To recapitulate, the theory claims that consciousness corresponds to the capacity to integrate information. This capacity, corresponding to the *quantity* of consciousness, is given by the Φ value of a complex. Φ is the amount of effective information that can be exchanged across the minimum information bipartition of a complex. A complex is a subset of elements with Φ>0 and with no inclusive subset of higher Φ. The spatial and temporal scales defining the elements of a complex and the time course of their interactions are those that jointly maximize Φ.