\sloppy

1 Network Information Theory (From Cover’s Elements of Information Theory textbook)

A system with many senders and receivers contains many new elements in communications problem: interference, cooperation, and feedback.

These are the issues that are the domain of network information theory. These are the issues that are the domain of network information theory. The general problem is easy to state. Given many senders and receivers and a channel transition matrix that describes the effects of the interference and noise in network, decide whether or not the sources can be transmitted over the channel. The problem involves distributed source coding (data communication) as well as distributed ~~communication~~ (finding the capacity region of the network). This general problem has not yet been solved, so we consider various special cases in this chapter.

Examples of large communication networks include computer networks, satellite networks, and the phone system. Even within a single computer, there are various components that talk to each other. A complete theory of network information would have wide implications for the design of communication and computer networks.

Suppose that m stations wish to communicate with a common satellite over a common channel, as shown in 1↓. This is known as a multiple-access channel.

figure Fig15.1 Multiple-access channel.png

Figure 1 Multi-access channel

How do the various senders cooperate with each other to send information to the receiver? What rates of communication are achievable simultaneously? What limitations does interference among the senders put on the total rate of communication? This is the best understood mutiuser channel, and the above questions have satisfying answers.

In contrast, we can reverse the network and consider one TV station sending information to m TV receivers, as in 2↓.

Figure 2 Broadcast channel.

How does the sender encode information meant for different receivers in common signal? For this channel, the answers are known only in special cases. There are other channels, such as the relay channel where there is one source and one destination, but one or more intermediate sender-receiver pairs that achts as relays to facilitate the communications between the source and the destination), the interference channel (two senders and two receivers with crosstalk), and the two-way channel (two sender-receiver pairs sending information to each other). For all these channels, we have only some of the answers to questions about achievable communication rates and the appropriate coding strategies.

All these channels can be considered special cases of a general communication network that consists of m nodes trying to communicate with each other, as shown in 3↓.

figure Fig15.3. Communication network.png

Figure 3 Communication network

At each instant of time, the i -th node sends a symbol x_i that depends on the messages that it wants to send and the past received symbols at the node. The simultaneous transmission of the symbols (x₁, x₂, ..., x_m) resulsts in random received symbols (Y₁, Y₂, ...Y_m) drawn according to the conditonal probability distribution p(y⁽¹⁾, y⁽²⁾, ..., y^(m)|x⁽¹⁾, x⁽²⁾, ..., x^(m)). Here p(⋅|⋅) expresses the effects of the noise and interference present in the network. If p(⋅|⋅) takes only the values 0 and 1 ,the network is deterministic.

Associated with some of the nodes in the network are stochastic data sources, which are to be communicated to some of the other nodes in the network. If the sources are independent, the messages sent by the nodes are also independent. However, for full generality, we must allow the sources to be dependent. How does one take advantage of the dependence to reduce the amount of information transmitted? Given the probability distribution of the sources and the channel transition function, can one transmit these sources over the channel and recover the sources at the destinations with the appropriate distortion?

We consider various special cases of network communication. We consider the problem of source coding when the channels are noiseless and without interference. In such cases, the problem reduces to finding the set of rates associated with each source such that the required sources can be decoded at the destination with a low probability of error (or appropriate distortion). The simplest case for distributed source coding is the Slepian-Wolf source coding problem, where we have two sources that must be encoded separately, but decoded together at common node. We consider extensions to this theory when only one of the two sources needs to be recovered at the destination.

The theory of flow in networks has satisfying answers in such domains as circuit theory and the flow of water in pipes. For example, for the single-source single-sink network of pipes shown in 4↓, the maximum flow form A to B can be computed easily from the Ford-Fulerson theorem.

Figure 4 Network of watter pipes

Assume that the edges have capacities C_i as shown. Clearly, the maximum flow across any cut set cannot be greater than the sum of the capacities of the cut edges. Thus minimizing the maximum flow across cut sets yields an upper bound on the capacity of the network. The Ford-Flulkerosn Theorem shows that this capacity can be achieved.

The theory of information flow in networks doesn’t have the sae simple answer as the theory of flow of water in pipes. Allthough we prove an upper bound on the rate of informaion flow across any cut set, these bounds are not achievable in general. However, it is gratifying that some problems, such as the relay channel and the cascade channel, admit a simple max-flow min-cut interpretation. Another subtle problem in the seach for general theory is the absence of a source-channel separation theorem, which we touch on briefly in Section 15.10. A complete theory combining distributed source coding and network channel coding is still distant goal.

In the next section we consider Gaussian examples of some of the basic channels of network information theory. The physically motivated Gaussian channel lends itself to concrete and easily interpreted answers. Later we prove some of the basic results about joint typicality that we use to prove the theorems of multiuser information theory. We then consider various problems in detail: the multiple-access channel, the coding of correlated sources (Slepian-Wolf data compression), the broadcast channel, the relay channel, the coding of a random variable with side information, and the rate distortion problem with side information. We end with an introduction to the general theory of informaion flow in networks. There are a number of open problems in the area, and there does not yet exist a comprehensive theory of information networks. Even if such a theory is found, it may be too complex for easy implementation. But the theory will be able to tell communication designers how close they are to optimality and perhaps suggest some means of improving the communication rates.

1.1 Gaussian Multiple-user channels

Gaussian multiple-user channels illustrate some of the important features of network information theory. The intuition gained in Chapter 9 on the Gaussian channel should make this section a useful introduction. Here the key ideas for establishing the capacity regions of the Gaussian multiple access, broadcast, relay, and two-way channels will be given without proof. The proofs of the coding theorems for the discrete memory-less counterparts to these theorems are given in letter sections of the chapter.

The basic discrete-time additive white Gaussian noise channel with input power P and noise variance N is modeled by:

(1) Y_i = X_i + Z_i i = 1, 2, ...,

where the Z_i are i.i.d Gaussian random variables with mean 0 and variance N. The signal X = (X₁, X₂, ..., X_n) has power constraint

(2) (1)/(n)ⁿ⎲⎳_i = 1X²_i ≤ P

The Shannon capacity C is obtained by maximizing I(X;Y) over all random variables X such that EX² ≤ P and is given by

(3) C = (1)/(2)⋅log⎛⎝1 + (P)/(N)⎞⎠

In this chapter we restrict our attention to discrete-time memoryless channels; the results can be extended to continuous-time Gaussian channels.

1.1.1 Single-User Gaussian Cahnnel

We first review the single-user Gaussian channel studied in Chapter 9. Here Y = X + Z. Chose a rate R < (1)/(2)⋅log⎛⎝1 + (P)/(N)⎞⎠. Fix a good (2^nR, n) codebook of power P. Choose an index w in the set 2^nR. Send the w-th codeword X(w) from the code-book generated above. The receiver observes Y = X(w) + Z and then finds the index ŵ of the codeword closest to Y. If n is sufficiently large, the probability of error Pr(w ≠ ŵ) will be arbitrarily small. As can be seen form the definition of joint typicality, this minimum-distance decoding scheme is essentially equivalent to finding the codeword in code-book that is jointly typical with the received vector Y.

1.1.2 Gaussian Multiple-Access Channel with m Users

We consider m transmitters each with a power P. Let

(4) Y = ^m⎲⎳_i = 1X_i + Z

Let

(5) C⎛⎝(P)/(N)⎞⎠ = (1)/(2)log⎛⎝1 + (P)/(N)⎞⎠

denote the capacity of a single-user Gaussian channel with signal to noise ratio (P)/(N) . The achievable rate region for the Gaussian channel takes on the simple form given in the following equations:

(6) R_i < C⎛⎝(P)/(N)⎞⎠

(7) R_i + R_j ≤ C⎛⎝(2P)/(N)⎞⎠

(8) R_i + R_j + R_k ≤ C⎛⎝(3P)/(N)⎞⎠ ⋮

(9) ^m⎲⎳_i = 1R_i < C⎛⎝(m⋅P)/(N)⎞⎠

Note that when all the rates are the same, the last inequality dominates the others.

Here we need m codebooks, the i-th codebook having 2^nR_i codewords of power P. Here we need m codebooks, the i-th codebook having 2^nR_i codewords of power P . Transmission is simple. Each of the independent transmitters chooses an arbitrary codeword form its own codebook. The users send the vectors simultaneously. The receiver sees the codewors added together with the Gaussian noise Z.

Optimal decoding consists of looking for the m codewords, one from each codebook, such that the vector sum is closest to Y in euclidean distance. If (R₁, R₂, ...R_m) is in the capacity region given above, the probability of error goes to 0 as n tends to infinity.

Remarks

It is exiting to see in this problem that the sum of the rates of the users C(mP ⁄ N) goes to infinity with m. Thus, in a cocktail party with m celebrants of power P in the presence of ambient noise N, the intended listener receives an unbounded amount of information as the number of people grows to infinity. A similar conclusion holds, of course, for ground communications to a satellite. Apparently, the increasing interference as the number of senders m → ∞ does not limit the total received information.

It is also interesting to note that the optimal transmission scheme here does not involve time-division multiplexing. In fact, each of the transmitters uses all of the bandwidth all of the time.

1.1.3 Gaussian Broadcast Channel

Here we assume that we have a sender of power P and two distant receivers, one with Gaussian noise power N₁ and the other with Gaussian noise power N₂. Without loss of generality, assume that N₁ < N₂. Thus, receiver Y₁ is less noisy than receiver Y₂. The model for the channel is Y₁ = X + Z₁ and Y₂ = X + Z₂, where Z₁ and Z₂ are arbitrarily correlated Gaussian random variables with variances N₁ and N₂, respectively. The sender wishes to send independent messages at rates R₁ and R₂ to receivers Y₁ and Y₂, respectively.

Fortunately, all scalar Gaussian broadcast channels belong to the class of degraded broadcast channels discussed in Section 15.6.2. Specializing that work, we find that the capacity region of the Gaussian broadcast channel is:

(10) R₁ < C⎛⎝(αP)/(N₁)⎞⎠

(11) R₂ < C⎛⎝((1 − α)P)/(αP + N₂)⎞⎠

where α may be arbitrarily chosen (0 ≤ α ≤ 1) to trade off rate R₁ for rate R₂ as transmitter wishes.

To encode the messages, the transmitter generates two codebooks, one with power αP at rate R₁, and another codebook with power αP α = 1 − α at rate R₂, where R₁ and R₂ lie in the capacity region above. Then to send an index w₁ ∈ {1, 2, ...2^nR₁} and w₂ ∈ {1, 2, ..2^nR₂} to Y₁ and Y₂, respectively, the transmitter takes the codeword X(w₁) from the first codebook and codeword X(w₂) from the second codebook and computes the sum. He sends the sum over the channel.

The receivers must now decode the messages. First consider the bad receiver Y₂. He merely looks through the second codebook to find the closest codeword to the received vector Y₂. His effective signal-to-noise ratio is αP ⁄ (αP + N₂), since Y₁ 's message acts as noise to Y₂. (This can be proved).

The good receiver Y₁ first decodes Y₂ 's codeword, which he can accomplish because of his lower noise N₁. He subtracts this codeword X̂₂ from Y₁. He then looks for the codeword in the first codebook closest to Y₁ − X̂₂. The resulting probability of error can be made as low as desired.

A nice dividend of optimal encoding for degraded broadcast channels is that the better receiver Y₁ always knows the message intended for receiver Y₂ in addition to the message intended for himself.

1.1.4 Gaussian Relay Channel

For the relay channel, we have a sender X and an ultimate intended receiver Y. Also present is the relay channel, intended solely to help the receiver. The Gaussian relay channel 5↓ is given by

(12) Y₁ = X + Z₁,

(13) Y = X + Z₁ + X₁ + Z₂,

figure Fig15.3a. Communication network.png

Figure 5 Gaussian Relay Channel

where Z₁ and Z₂ are independent zero-mean Gaussian random variables with variance N₁ and N₂, respectively. The encoding allowed by the relay is the causal sequence

(14) X_1i = f_i(Y₁₁, Y₁₂, ..., Y_1i − 1).

Sender X has power P and sender X₁ has power P₁. The capacity is

(15) C = max_{0 ≤ α ≤ 1}min⎧⎩C⎛⎝(P + P₁ + 2√(αPP₁))/(N₁ + N₂)⎞⎠, C⎛⎝(αP)/(N₁)⎞⎠⎫⎭

where α = 1 − α. Note that if

(16) (P₁)/(N₂) ≥ (P)/(N₁),

(P + P₁ + 2√(αPP₁))/(N₁ + N₂) = ((√(αP) + √(P₁))²)/(N₁ + N₂) = (√((αP)/(N₁ + N₂)) + √((P₁)/(N₁ + N₂)))² =
= ||P₁ ≥ (N₂P)/(N₁), || ≥ (√((αP)/(N₁ + N₂)) + √((N₂P)/(N₁(N₁ + N₂))))² =
= (αP⋅N₁)/(N₁(N₁ + N₂)) + (N₂P)/(N₁(N₁ + N₂)) + 2√((αP)/(N₁ + N₂)⋅(N₂P)/(N₁(N₁ + N₂))) = ((αN₁ + N₂)P)/(N₁(N₁ + N₂)) + 2(P√(αN₁N₂))/(N₁(N₁ + N₂)) =
= ((αN₁ + N₂)P + 2P√(αN₁N₂))/(N₁(N₁ + N₂)) = ((√(αN₁) + √(N₂))²P)/(N₁ + N₂) ≥ |α = 1| ≥ (N₁ + 2√(N₁N₂) + N₂)/(N₁(N₁ + N₂))⋅P = ⎛⎝1 + 2(√(N₁N₂))/(N₁ + N₂)⎞⎠(P)/(N₁) ≥ (P)/(N₁)

It can be seen that C = C(P ⁄ N₁), Доказот е во Box-от погоре!!! which is achieved by α = 1. The channel appears to be noise-free after the relay, and the capacity C(P ⁄ N₁) from X to the relay can be achieved. Thus, the rate C(P ⁄ (N₁ + N₂)) without the relay is increased by the presence of the relay to C(P ⁄ N₁). For large N₂ and for P₁ ⁄ N₂ ≥ P ⁄ N₁, we see that the increment in rate is form C(P ⁄ N₁ + N₂) ~ 0 to C(P ⁄ N₁).

Let R₁ < C⎛⎝(αP)/(N₁)⎞⎠. Two codebooks are needed. The first codebook has 2^nR₁ words of power αP. The second has 2^nR₀ codewords of power αP. We shall use codewords form these codebooks successively to create the opportunity for cooperation by the relay. We start by sending a codeword from the first codebook. The relay now knows the index of this codeword since R₁ < C(αP ⁄ N₁), but the intended receiver has a list of possible codewords of size 2^{n(R₁ − C(αP ⁄ (N₁ + N₂)))} Не контам од каде излегуа ова!? Ска да каже дека реалниот капацитет е разлика меѓу капацитетот меѓу предавателот и релето и капацитетот кога не би постоело релето. . This list calculation involves a result on list codes.

M = 2^nR; nR = logM; R = (1)/(n)logM; R ≤ C; (1)/(n)logM ≤ C; M ≤ 2^nC

In the next block, the transmitter and the relay wish to cooperate to resolve the receiver’s uncertainty about the codeword sent previously that is on the receiver’s list. Unfortunately, they cannot be sure what this list is because they do not know the received signal Y. Thus, they randomly partition the first codebook into 2^nR₀ cells with and equal number of codewords in each cell. The relay, the receiver, and the transmitter agree on this partition. The relay and the transmitter find the cell of the partiotion in which the codeword from the first codebook lies and cooperatively send the codeword form second codebook with that index. That is, X and X₁ send the same designated codeword. The relay, of course, must scale this codeword so that it meet his power constraint P₁. They now transmit their codewords simultaneously. An important point to note here is that the cooperative information sent by the relay and the transmitter is sent coherently. So the power of the sum as seen by the receiver Y is (√(α)P + √(P₁))².

However, this does not exhaust what the transmitter does in the second block. He also chooses a fresh codeword from the first codebook, adds it „on paper” to the cooperative codeword form the second codebook, and sends the sum over the channel. The reception by the ultimate receiver Y in the second block involves first finding the cooperative index from the second codebook by looking for the closest codeword in the second codebook. He subtracts the codeword form the received sequence and then calculates a list of indices of size 2^nR₀ corresponding to all codewords of the first codebook that might have been sent in the second block.

Now it is time for the intended receiver to complete computi9ng the codeword form the first codebook sent in the first blcok. He takes his list of possible codewords that might have been sent in the first block and intersects it with the cell of the partition that he has learned form the cooperative relay transmission in the second block. The rates and powers have been chosen so that there is only one codeword in the intersection. This is Y’s guess about the information sent in the first block.

We are now in steady state. In each new block, the transmitter and the relay cooperate to resolve the list uncertainty form the previous block. In addition, the transmitter superimposes some fresh information from his first codebook to thistransmission form the second codebook and transmits the sum. The receiver is always one block behind, but for sufficiently many blocks, this does not affect his overall rate of reception.

1.1.5 Gaussian Interference Channel

The interference channel has two senders and two receivers. Sender 1 wishes to send information to receiver 1. He does not care what receiver 2 receives or understands; similarly with sender 2 and receiver 2. Each channel interferences with the other. This channel is illustrated in 6↓.

figure Fig15.5. Communication network.png

Figure 6 Gaussian interference channel

It is not quite a broadcast channel since there is only one intended receiver for each sender, nor is it a multiple access channel because each receiver is only interested in what is being sent by the corresponding transmitter.

For symmetric interference, we have

(17) Y₁ = X₁ + aX₂ + Z₁

(18) Y₂ = X₂ + aX₁ + Z₂

where Z₁, Z₂ are independent N(0, N) random variables. This channel has not been solved in general even in Gaussian case. But remarkably, in the case of high interference, it can be shown that the capacity region of this channel is the same as if there were no interference whatsoever.

To achieve this, generate two codebooks, each with power P and rate C(P|N). Each sender independently chooses a word from his book and sends it. Now, if the interference a satisfies C(a²P ⁄ (P + N)) > C(P ⁄ N), the first transmitter understands perfectly the index of second transmitter. He finds it by the usual technique of looking for the closest codeword to his received signal. Once he finds this signal, he subtracts it form his waveform received. Now there is a clean channel between him and his sender. He then searches the sender’s codebook to find the closest codeword and declares that codeword to be the one sent.

1.1.6 Gaussian Two-way channel

The two-way channel is very similar to the interference channel, with the additional provision that sender 1 is attached to receiver 2 and sender 2 is attached to receiver 1, as shown in 7↓. Hence, sender 1 can use information from previous received symbols of receiver 2 to decide what to send next. This channel introduces another fundamental aspect of network information theory: namely feedback. Feedback enables the senders to use the partial information that each has about the other’s message to cooperate with each other.

Figure 7 Two-way channel

The capacity region of the two-way channel was considered by Shannon [3], who derived upper and lower bounds on the region (Seе Problem 15.15). For Gaussian channels, these two bounds coincide and the capacity region is known; in fact, the Gausisian two-way channel decomposes into two independent channels.

Let P₁ and P₂ be the powers of transmitters 1 and 2, respectively, and let N₁ and N₂ be the noise variances of the two channels. Then the rates R₁ < C(P₁|N₁) and R₂ < C(P₂|N₂) can be achieved by the techniques described for the interference channel. In this case, we generate two codebooks of rates R₁ and R₂. Sender 1 sends a code-word from the first codebook. Receiver 2 receives the sum of the codewords sent by the two senders plus some noise. He simply subtracts out the code out the codeword of sender 2 and he has a clean channel form sender 1 (with only the noise of variance N₁). Hence the two-way Gaussian channel decomposes into two independent Gaussian channels. But this is not the case for the general two-way channel; in general, there is a trade-off between the two senders so that both of them cannot send at the optimal rate at the same time.

1.2 Jointly Typical sequences

We have previewer the capacity results for networks by considering multiuser Gaussian channels. We began a more detailed analysis in this section, where we extend the joint AEP provided in Chapter 7 to a form that we will use to prove the theorems of network information theory. The joint AEP will enable us to calculate the probability of error for jointly typical decoding for the various coding schemes considered in this chapter.

Let (X₁, X₂, ..., X_k) denote a finite collection of discrete random variables with some fixed joint distribution, p(x⁽¹⁾, x⁽²⁾, ..., x^(k)), (x⁽¹⁾, x⁽²⁾, ..., x^(k)) ∈ X₁ xX₂ x...xX_k. Let S denote an ordered subset of these random variables and consider n independent copies of S. Thus,

(19) Pr{S = s} = ⁿ∏_i = 1Pr{S_i = s_i}, s ∈ Sⁿ

For example, if S = (X_j, X_l), then

Pr{S = s} = Pr{(X_j, X_l) = (x_j, x_l)} = ⁿ∏_i = 1p(x_ij, x_il)

X₁ ∈ (0, 1), p(X₁) = ⎧⎩(1)/(2), (1)/(2)⎫⎭; X₂ ∈ (0, 1), p(X₂) = ⎧⎩(1)/(2), (1)/(2)⎫⎭;
n = 3
S = (X₁, X₂)
P_r{S = s} = P_r{(X₁, X₂) = ( x₁, x₂) = (x₁₁x₂₁x₃₁, x₁₂x₂₂x₃₂)} = ∏³_i = 1p(x_i1, x_i2) =
= p(x₁₁, x₁₂)p(x₂₁x₂₂)p(x₃₁x₃₂)

To be explicit, e will sometimes use X(S) for S. By the law of large numbers, for any subset S of random variables,

(20) − (1)/(n)logp(S₁, S₂, ...S_n) = − (1)/(n)ⁿ⎲⎳_i = 1logp(S_i) → H(S)

where the convergence takes place with probability 1 for all 2^k subsets S ⊆ {X⁽¹⁾, X⁽²⁾, ..., X^(k)}.

Definition (ϵ-typical n-sequences)

The set A⁽ⁿ⁾_ϵ of ϵ-typical n-sequences (x₁, x₂, ..., x_k) is defined by:

A⁽ⁿ⁾_ϵ(X⁽¹⁾, X⁽²⁾, ..., X^(k)) = A⁽ⁿ⁾_ϵ = ⎧⎩(x₁, x₂, ..., x_k):|| − (1)/(n)logp(s) − H(S)|| < ϵ, ∀S ⊆ {X⁽¹⁾, X⁽²⁾, ..., X^(k)}⎫⎭

Аналогијата е дека X⁽¹⁾ одговара на Xⁿ, a X⁽²⁾ на Yⁿ и така натака...

Let A⁽ⁿ⁾_ϵ(S) denote the restriction of A⁽ⁿ⁾_ϵ to the coordinates of S. Thus, if S = (X₁, X₂), we have

A⁽ⁿ⁾_ϵ(X₁, X₂) = ⎧⎩(x₁, x₂):|| − (1)/(n)logp(x₁, x₂) − H(X₁, X₂)|| < ϵ, || − (1)/(n)logp(x₁) − H(X₁)|| < ϵ, || − (1)/(n)logp(x₂) − H(X₂)|| < ϵ⎫⎭

Definition

We will use the notation α_n≐2^n(b±ϵ) to mean that

(21) \mathchoice|(1)/(n)loga_n − b| < ϵ|(1)/(n)loga_n − b| < ϵ|(1)/(n)loga_n − b| < ϵ|(1)/(n)loga_n − b| < ϵ

for n sufficiently large.

Внимавај нема − пред (1)/(n) како во стандардната дефиниција на AEP

Theorem 15.2.1

For any ϵ > 0, for sufficiently large n,

(22) 1. P(A⁽ⁿ⁾_ϵ(S)) ≥ 1 − ϵ, ∀S ⊆ {X⁽¹⁾, X⁽²⁾, ..., X^(k)}.

2. s ∈ A⁽ⁿ⁾_ϵ(S) ⇒ p(s)≐2^{− n(H(S)±ϵ)}.

(23) 4. Let S₁, S₂ ⊆ {X⁽¹⁾, X⁽²⁾, ..., X^(k)}If (s₁, s₂) ∈ A⁽ⁿ⁾_ϵ(S₁, S₂)then \mathchoicep(s₁| s₂)≐2^{− n(H(S₁|S₂)±2ϵ)}p(s₁| s₂)≐2^{− n(H(S₁|S₂)±2ϵ)}p(s₁| s₂)≐2^{− n(H(S₁|S₂)±2ϵ)}p(s₁| s₂)≐2^{− n(H(S₁|S₂)±2ϵ)}

Proof

1. This follows form the law of large numbers for the random variable in the definition A⁽ⁿ⁾_ϵ(S).

2. This follows directly from the definition of A⁽ⁿ⁾_ϵ(S).

3. This follows form

(24) 1 ≥ ⎲⎳_{s ∈ A⁽ⁿ⁾_ϵ(S)}p(s) ≥ ⎲⎳_{s ∈ A⁽ⁿ⁾_ϵ(S)}2^{− n(H(S) + ϵ)} = |A⁽ⁿ⁾_ϵ|⋅2^{− n(H(S) + ϵ)}

→ |A⁽ⁿ⁾_ϵ| ≤ 2^{+ n(H(S) + ϵ)}

(25) 1 − ϵ ≤ ⎲⎳_{s ∈ A⁽ⁿ⁾_ϵ(S)}p(s) ≤ ⎲⎳_{s ∈ A⁽ⁿ⁾_ϵ(S)}2^{− n(H(S) − ϵ)} = |A⁽ⁿ⁾_ϵ|⋅2^{− n(H(S) − ϵ)}

→ |A⁽ⁿ⁾_ϵ| ≥ (1 − ϵ)2^{+ n(H(S) − ϵ)}

(1 − ϵ)⋅2^{n(H(S) − ϵ)} ≤ |A⁽ⁿ⁾_ϵ| ≤ 2^{n(H(S) + ϵ)}

Combining 25↑ and 24↑ we have \mathchoice|A⁽ⁿ⁾_ϵ|≐2^{n(H(S₁)±2ϵ)}|A⁽ⁿ⁾_ϵ|≐2^{n(H(S₁)±2ϵ)}|A⁽ⁿ⁾_ϵ|≐2^{n(H(S₁)±2ϵ)}|A⁽ⁿ⁾_ϵ|≐2^{n(H(S₁)±2ϵ)} for sufficiently large n.

(1 − ϵ)⋅2^{+ n(H(S) − ϵ)} ≤ |A⁽ⁿ⁾_ϵ| ≤ 2^{n(H(S) + ϵ)} 2^{+ n(H(S) − ϵ)} − ϵ⋅2^{+ n(H(S) − ϵ)} ≤ |A⁽ⁿ⁾_ϵ| ≤ 2^{n(H(S) + ϵ)}
2^{+ n(H(S) − ϵ)} ≤ |A⁽ⁿ⁾_ϵ| + ϵ⋅2^{+ n(H(S) − ϵ)} ≤ 2^{n(H(S) + ϵ)}
\mathchoice||(1)/(n)loga_n − b|| < ϵ α_n≐2^n(b±ϵ);||(1)/(n)loga_n − b|| < ϵ α_n≐2^n(b±ϵ);||(1)/(n)loga_n − b|| < ϵ α_n≐2^n(b±ϵ);||(1)/(n)loga_n − b|| < ϵ α_n≐2^n(b±ϵ); − ϵ < − (1)/(n)loga_n − b < ϵ b − ϵ < − (1)/(n)loga_n < b + ϵ
n⋅(b − ϵ) < − loga_n < n(b + ϵ)
− n(b − ϵ) > loga_n > − n(b + ϵ) 2^{− n(b + ϵ)} ≤ a_n ≤ 2^{+ n(b − ϵ)}
(1 − ϵ)⋅2^{n(H(S) − ϵ)} ≤ |A⁽ⁿ⁾_ϵ| ≤ 2^{n(H(S) + ϵ)}
log(1 − ϵ) + n(H(S) − ϵ) ≤ log|A⁽ⁿ⁾_ϵ| ≤ n(H(S) + ϵ)
(1)/(n)log(1 − ϵ) + (H(S) − ϵ) ≤ (1)/(n)log|A⁽ⁿ⁾_ϵ| ≤ (H(S) + ϵ)
——————————————————————————–——————————————————————————–——————————
− (1)/(n)log(1 − ϵ) − (H(S) − ϵ) ≥ − (1)/(n)log|A⁽ⁿ⁾_ϵ| ≥ − H(S) − ϵ;
− (1)/(n)log(1 − ϵ) − ϵ ≥ − (1)/(n)log|A⁽ⁿ⁾_ϵ| ≥ − H(S) − ϵ
——————————————————————————–——————————————————————————–——————————
\cancelto − ϵ(1)/(n)log(1 − ϵ) − ϵ ≤ (1)/(n)log|A⁽ⁿ⁾_ϵ| − H(S) ≤ ϵ

-2ϵ ≤ (1)/(n)log|A⁽ⁿ⁾_ϵ| − H(S) ≤ ϵ \strikeout off\uuline off\uwave offAко важи за ϵ ќе важи и за2ϵ

− 2ϵ ≤ (1)/(n)log|A⁽ⁿ⁾_ϵ| − H(S) ≤ 2ϵ → ||(1)/(n)log|A⁽ⁿ⁾_ϵ| − H(S)|| ≤ 2ϵ → |A⁽ⁿ⁾_ϵ|≐2^n(H(S)±2ϵ)
Dокажано!!!

4. Let S₁, S₂ ⊆ {X⁽¹⁾, X⁽²⁾, ..., X^(k)}If (s₁, s₂) ∈ A⁽ⁿ⁾_ϵ(S₁, S₂)then p(s₁|s₂) = 2^{− n(H(S₁|S₂)±2ϵ)}

For (s₁, s₂) ∈ A⁽ⁿ⁾_ϵ(S₁, S₂) we have p(s₁)≐2^{− n(H(S₁)±ϵ)} and p(s₁, s₂)≐2^{− n(H(S₁S₂)±ϵ)} hence

p(s₂| s₁) = (p(s₁, s₂))/(p(s₁)) = (2^{− n(H(S₁S₂)±ϵ)})/(2^{− n(H(S₁)±ϵ)})≐2^{− n(H(S₁S₂) − H(S₁)±ϵ)}≐2^{− n(H(S₂|S₁)±2ϵ)}

The next theorem bounds the number of conditionally typical sequences for a given typical sequence.

Theorem 15.2.2

Let S₁S₂ be two subsets of X⁽¹⁾, X⁽²⁾, ..., X^(k). For any ϵ > 0, define A⁽ⁿ⁾_ϵ to be the set of s₁ sequences that are jointly ϵ-typical with a particular s₂ sequence. If s₂ ∈ A⁽ⁿ⁾_ϵ(S₂), then for sufficiently large n, we have

\mathchoice|A⁽ⁿ⁾_ϵ(S₁| s₂)| ≤ 2^{n(H(S₁|S₂) + 2ϵ)}|A⁽ⁿ⁾_ϵ(S₁| s₂)| ≤ 2^{n(H(S₁|S₂) + 2ϵ)}|A⁽ⁿ⁾_ϵ(S₁| s₂)| ≤ 2^{n(H(S₁|S₂) + 2ϵ)}|A⁽ⁿ⁾_ϵ(S₁| s₂)| ≤ 2^{n(H(S₁|S₂) + 2ϵ)}

and

\mathchoice(1 − ϵ)2^{n(H(S₁|S₂) − 2ϵ)} ≤ ⎲⎳_s₂p(s₂)|A⁽ⁿ⁾_ϵ(S₁| s₂)|(1 − ϵ)2^{n(H(S₁|S₂) − 2ϵ)} ≤ ⎲⎳_s₂p(s₂)|A⁽ⁿ⁾_ϵ(S₁| s₂)|(1 − ϵ)2^{n(H(S₁|S₂) − 2ϵ)} ≤ ⎲⎳_s₂p(s₂)|A⁽ⁿ⁾_ϵ(S₁| s₂)|(1 − ϵ)2^{n(H(S₁|S₂) − 2ϵ)} ≤ ⎲⎳_s₂p(s₂)|A⁽ⁿ⁾_ϵ(S₁| s₂)|

Proof:

As in part 3 of Theorem 15.2.1, we have

(26) 1 ≥ ⎲⎳_{s₁ ∈ A_ϵ(S₁| s₂)}p(s₁| s₂) ≥ ⎲⎳_{s₁ ∈ A_ϵ(S₁| s₂)}2^{− n(H(S₁|S₂) + 2ϵ)} = |A⁽ⁿ⁾_ϵ(S₁| s₂)|2^{− n(H(S₁|S₂) + 2ϵ)}

If n is sufficiently large we, can argue form 22↑ that

1 − ϵ ≤ \mathchoice⎲⎳_s₂p(s₂)⎲⎳_{s₁ ∈ Aⁿ_ϵ(S₁| s₂)}p(s₂| s₁)⎲⎳_s₂p(s₂)⎲⎳_{s₁ ∈ Aⁿ_ϵ(S₁| s₂)}p(s₂| s₁)⎲⎳_s₂p(s₂)⎲⎳_{s₁ ∈ Aⁿ_ϵ(S₁| s₂)}p(s₂| s₁)⎲⎳_s₂p(s₂)⎲⎳_{s₁ ∈ Aⁿ_ϵ(S₁| s₂)}p(s₂| s₁) ≤ ⎲⎳_s₂p(s₂)⎲⎳_{s₁ ∈ Aⁿ_ϵ(S₁| s₂)}2^{− n(H(S₁|S₂) − 2ϵ)} = ⎲⎳_s₂p(s₂)⋅A⁽ⁿ⁾_ϵ(S₁| s₂)⋅2^{− n(H(S₁|S₂) − 2ϵ)}

= 2^{− n(H(S₁|S₂) − 2ϵ)}⋅⎲⎳_s₂p(s₂)⋅A⁽ⁿ⁾_ϵ(S₁| s₂)

To calculate the probability of decoding error, we need to know the probability that conditionally independent sequences are jointly typical. Let S₁S₂, and S₃ be three subsets of {X⁽¹⁾, X⁽²⁾, ...X^(k)}. If S^’₁ and S^’₂ are conditionally independent given S₃’ but otherwise share the same pairwise marginals of (S₁, S₂, S₃) we have the following probability of joint typicality.

Theorem 15.2.3

Let A⁽ⁿ⁾_ϵ denote the typical set for the probability mass function p(s₁s₂s₃) and let

(27) P(S₁’ = s₁, S₂’ = s₂, S₃’ = s₃) = ⁿ∏_i = 1p(s_1i|s_3i)p(s_2i|s_3i)p(s_3i)

Then

P{S₁’, S₂’, S₃’ ∈ A⁽ⁿ⁾_ϵ}≐2^{− n(I(S₁;S₂|S₃)±6ϵ)}

Proof:

We use the ≐ notation form 21↑ to avoid calculating the upper and lower bounds separately. We have

P{(S₁’, S₂’, S₃’) ∈ A⁽ⁿ⁾_ϵ} = ⎲⎳_{(s₁, s₂, s₃) ∈ A⁽ⁿ⁾_ϵ}p(s₃)p(s₁|s₃)p(s₂|s₃)≐|A⁽ⁿ⁾_ϵ(S₁S₂S₃)|2^{− n(H(S₃)±2ϵ)}2^{− n(H(S₁|S₃)±2ϵ)}2^{− n(H(S₂|S₃)±2ϵ)}

≐2^{n(H(S₁S₂S₃)±ϵ) − n(H(S₃)±ϵ) − n(H(S₁|S₃)±2ϵ) − n(H(S₂|S₃)±2ϵ)}≐2^{− n(I(S₁;S₂|S₃)±6ϵ)}

Во книгата е 2^{− n(I(S₁;S₂|S₃)±6ϵ)} но мене ми излегува со 2ϵ затоа што ги земам во предвид промените на знаците на ± . Ако не се земат во предвид тие промени тогаш се добива 6ϵ.

Figure 8 Ilustration of Joint Typicality

Figure 9 Another Ilustration of Joint Typicality

1.3 Multiple-access channel

The fist channel that we examine in detail is the multiple-access channel, in which two (or more) senders send information to a common receiver. The channel is illustrated in 10↓.\begin_inset Separator latexpar\end_inset

figure Figure 15.7. Multiple-access channel.png

Figure 10 Multiple-access channel.

A common example of this channel is a satellite receiver with many independent ground stations, or a set of cell phones communicating with a base station. We see that the senders must contend not only with the receiver noise but with interference from each other as well.

Definition

A discrete memory-less multiple-access channel consists of three ~~alphabets~~ , X₁, X₂ and Y, and a probability transition matrix p(y|x₁, x₂).

Definition (average probability of error)

A ((2^nR₁2^nR₂), n) code for the multiple-access channel consists of two sets of integers W₁ = {1, 2, ..., 2^nR₁} and W₂ = {1, 2, ...2^nR₂} called message sets, two encoding functions,

(28) X₁:W₁ → Xⁿ₁

and

(29) X₂:W₂ → Xⁿ₂

and a decoding function,

(30) g:Yⁿ → W₁ xW₂

There are two senders and one receiver for this channel. Sender 1 chooses an index W₁ uniformly form the set {1, 2, ..., 2^nR₁} and sends the corresponding codeword over the channel. Sender 2 does likewise. Assuming that the distribution of messages over the product set W₁ xW₂ is uniform (i.e. the messages are independent and equally likely), we define the average probability of error for the ((2^nR₁, 2^nR₂), n) code as follows:

P⁽ⁿ⁾_e = (1)/(2^{n(R₁ + R₂)})⋅⎲⎳_{(w₁, w₂) ∈ W₁ xW₂}Pr{g(Yⁿ) ≠ (w₁w₂)|(w₁w₂) sent}

Definition (achievable rate pair)

A rate pair (R₁, R₂) is said to be achievable for the multiple access channel if there exists a sequence of ((2^nR₁, 2^nR₂), n) codes with P⁽ⁿ⁾_ϵ → 0.

Definition

The capacity region of the multiple-access channel is the closure of the set of achievable (R₁R₂) rate pairs.

An example of the capacity region for a multiple-access channel is illustrated in figure 411↓. We first state the capacity region in the form of a theorem.

figure Figure 15.8. Capacity region for multiple-access channel.png

Figure 11 Capacity Region of Multiple access channel

Theorem 15.3.1 (Multiple-access channel capacity)

The capacity of a multiple-access channel (X₁ xX₂, p(y|x₁, x₂), Y) is the closure of the convex hull of all (R₁R₂) satisfying

(31) \mathchoiceR₁ < I(X₁;Y|X₂)R₁ < I(X₁;Y|X₂)R₁ < I(X₁;Y|X₂)R₁ < I(X₁;Y|X₂)

(32) \mathchoiceR₂ < I(X₂;Y|X₁)R₂ < I(X₂;Y|X₁)R₂ < I(X₂;Y|X₁)R₂ < I(X₂;Y|X₁)

(33) \mathchoiceR₁ + R₂ < I(X₁, X₂;Y)R₁ + R₂ < I(X₁, X₂;Y)R₁ + R₂ < I(X₁, X₂;Y)R₁ + R₂ < I(X₁, X₂;Y)

for some product distribution p₂(x₁)p₂(x₂) on X₁ xX₂.

Дефиниција од B. Grunbaum, Convex Polytopes за конвексна лушпа (convex hull) е: The convex hull conv(A) of subset A of R^d is the intersection of all the convex sets in R^d which contain A .

Before we prove that this is the capacity region of the multiple-access channel, let us consider a few examples of multiple-access channels:

Example 15.3.1 (Independent binary symmetric channel)

Assume that we have two independent binary symmetric channels one from sender 1 and other from sender 2 as shown in 12↓. In this case, it is obvious from the results of Chapter 7 that we can sent at rate 1 − H(p₁) over the first channel and at rate 1 − H(p₂) over the second channel. Since the channels are independent, there is no interference between the senders. The capacity region in this case is shown in

figure Figure 15.9 Independent binary symmetric channels.png

Figure 12 Independent birary symmetric channels

figure Figure 15.10 Capacity refion of independent BSC.png

Figure 13 Capacity regions for independent BSC

Example 15.3.2 (Binary multiplier channel)

Consider a multiple-access channel with binary inputs and outputs

(34) Y = X₁X₂

Such channel is called binary multiplier channel. It is easy to see that by setting X₂ = 1, we can send at a rate of 1 bit per transmission form sender 1 to the receiver. Similarly, setting X₁ = 1, we can achieve R₂ = 1. Clearly, since the output is binary, the combined rates R₁ + R₂ cannot be more that 1 bit. By time-sharing, we can achieve any combination of rates such that R₁ + R₂ = 1. Hence the capacity region is shown in 14↓

figure Figure 15.11 Capacity region for binary multiplier channel.png

Figure 14 Capacity region for binary multiplier channel

Example 15.3.3 (Binary erasure multiple-access channel)

This multiple-access channel has binary inputs, X₁ = X₂ = {0, 1} and ternary output Y = X₁ + X₂. There is no ambiguity in (X₁X₂) if Y = 0 or Y = 2 is received; but Y = 1 can result form either (0, 1) or (1, 0).

We now examine the achievable rates on the axes. Setting X₂ = 0, we can send at rate of 1 bit per transmission from sender 1. Similarly, setting X₁ = 0, we can send at a rate R₂ = 1. This gives us two extreme point of the capacity region. Can we do better? Let us assume that R₁ = 1, so that the codewords of X₁ must include all possible binary sequences; X₁ would look like a Bernoulli⎛⎝(1)/(2)⎞⎠ process. That acts as a noise for the transmission from X₂. Ова јас го замислувам како X₁ да е поблиску до Y, a X₂ подалеку. Со тоа во Y X₁ без проблем се декодира, но воедно тој претставува шум за подалечниот X₂.

Figure 15 Equivalent single-user channel for user 2 of a binary erasure multiple- access channel

For X₂, the channel looks like the channel in 15↑. This is the binary erasure channel of Chapter 7. Recall in the results, the capacity of this channel is (1)/(2) bits per transmission. Hence when sending at the maximum rate 1 for sender 1, we can send an additional (1)/(2) bit form sender 2. Latter , after deriving the capacity region, we can verify that these rates are the best that can be achieved. The capacity region for a binary erasure channel is illustrated in 16↓.

figure Figure 15.13 Capacity region of binary erasure multiple access channel.png

Figure 16 Capacity region for binary erasure multiple access channel

1.3.1 Achievability of the Capacity Region for the Multiple-Access Channel

We now prove the achievability of he rate region in Theorem 15.3.1; the proof of the converse will be left until the next section. The proof of achievability is very similar to the proof for the single-user channel. We therefore only emphasize the points at which the proof differs form the single-user case. We begin by proving the achievability of rate pairs that satisfy 33↑ for some fixed product distribution p(x₁)p(x₂). In Section 15.3.3 we extend this to prove that all points in the convex hull of 33↑ are achievable.

Proof: (Achievability in Theorem 15.3.1)

Fix \mathchoicep(x₁, x₂) = p₁(x₁)p₂(x₂)p(x₁, x₂) = p₁(x₁)p₂(x₂)p(x₁, x₂) = p₁(x₁)p₂(x₂)p(x₁, x₂) = p₁(x₁)p₂(x₂)

Codebook generation:

Generate 2^nR₁ independent codewords X₁(i), i ∈ {1, 2, ..., 2^nR₁} , of length n, generating each element i.i.d. ~∏ⁿ_i = 1p₁(x_1i). Similarly, generate 2^nR₂ independent codewords X₂(j), j ∈ {1, 2, ..., 2^nR₂}, generating each element i.i.d ~ ∏ⁿ_i = 1p₂(x_2i). These codewords form the code-book which is revealed to sender and the receiver.

Encoding:

To send index i, sender 1 sends the codeword X₁(i). Similarly, to send j sender 2 sends X₂(j).

Decoding:

Let A⁽ⁿ⁾_ϵ denote the set of typical (x₁ x₂, y) sequences. The receiver Yⁿ chooses the pair (i, j) such that

(35) (x₁(i), x₂(j), y) ∈ A⁽ⁿ⁾_ϵ

if such a pair (i, j) exists and is unique; otherwise, an error is declared.

Analysis of the probability of error:

By the symmetry of the random code construction, the conditional probability of error does not depend on which pair of indices is sent. Thus, the conditional probability of error is the same as the unconditional probability of error. Потсети се како е дефинирана условната веројатност на грешка во Chapter 7 и како е дефинирана веројантоста на грешка. Ако сите условни веројатности се исти тогаш вкупнтата веројантост на грешка е еднакава на условната веројатност на грешка. So without loss of generality, we can assume that (i, j) = (1, 1) was sent.

We have an error if either the correct codewords are not typical with the received sequence or there is a pair of incorrect codewords that are typical with the received sequence. Define the events

(36) E_ij = {(X₁(i), X₂(j), Y) ∈ A⁽ⁿ⁾_ϵ}

Then by the union of events bound,

(37) P⁽ⁿ⁾_e = P(E^c₁₁∪∪_{(i, j) ≠ (1, 1)}E_ij) ≤ P(E^c₁₁) + ⎲⎳_{i ≠ 1, j = 1}P(E_i1) + ⎲⎳_{i = 1, j ≠ 1}P(E_1j) + ⎲⎳_{i ≠ 1, j ≠ 1}P(E_ij)

where P is the conditional probability given that (1, 1) was sent. From the AEP, P(E^c₁₁) → 0. By Theorems 15.2.1 and 15.2.3, for i ≠ 1 we have

\mathchoiceP(E_i1)P(E_i1)P(E_i1)P(E_i1) = P_r((X₁(i), X₂(1), Y) ∈ A⁽ⁿ⁾_ϵ) = ⎲⎳_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y) ≤ |A⁽ⁿ⁾_ϵ|2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)}

≤ 2^{n(H(X₁X₂Y) + 2ϵ)}2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)} = 2^{− n(H(X₁) − ϵ + H(X₂Y) − ϵ − H(X₁X₂Y) − 2ϵ)} =

= 2^{− n(H(X₁) − ϵ + H(X₂Y) − ϵ − H(X₁X₂Y) − 2ϵ)} =

= 2^{− n(H(X₁) + H(X₂Y) − H(X₁X₂Y) − 4ϵ)} = |check box bellow| = 2^{− n(I(X₁;X₂Y) − 4ϵ)}\overset(a) = \mathchoice2^{− n(I(X₁;Y|X₂ − 4ϵ)}2^{− n(I(X₁;Y|X₂ − 4ϵ)}2^{− n(I(X₁;Y|X₂ − 4ϵ)}2^{− n(I(X₁;Y|X₂ − 4ϵ)}

Од Теорема 15.2.3 следи
P(S₁’ = s₁, S₂’ = s₂, S₃’ = s₃) = ∏ⁿ_i = 1p(s_1i|s_3i)p(s_2i|s_3i)p(s_3i)
∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁|y)p(x₂|y)p(y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y)
∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁, x₂, y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁, x₂)p(y|x₁x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂)p(y|x₁x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y|x₁)
Всушност ако се има во предвид Theorem 15.2.3 и се замисли дека X₂ = S₃
\mathchoice∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(y, x₁| x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(x₁| x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)⋅p(x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(y, x₁| x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(x₁| x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)⋅p(x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(y, x₁| x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(x₁| x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)⋅p(x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(y, x₁| x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)⋅p(x₁| x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)⋅p(x₂)p(y|x₂) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂, y)
докажано!!!
Ова се докажува и без Теорема 15.2.3
p(y, x₁, x₂) = p(x₂)⋅p(y, x₁| x₂) = p(x₂)⋅p(y|x₂)p(x₁| x₂y) = p(x₂)⋅p(y|x₂)p(x₁)
ако се претпостави дека x₁ не зависи од y (Тоа важи ако имаш марков ланец X₁ → X₂ → Y).
|A⁽ⁿ⁾_ϵ(S)|≐2^n(H(S)±2ϵ)
\mathchoiceI(X₁;X₂Y) = H(X₂Y) − H(X₂Y|X₁) = H(X₂Y) − H(X₂Y|X₁) − H(X₁) + H(X₁) = H(X₁) + H(X₂Y) − H(X₁X₂Y)I(X₁;X₂Y) = H(X₂Y) − H(X₂Y|X₁) = H(X₂Y) − H(X₂Y|X₁) − H(X₁) + H(X₁) = H(X₁) + H(X₂Y) − H(X₁X₂Y)I(X₁;X₂Y) = H(X₂Y) − H(X₂Y|X₁) = H(X₂Y) − H(X₂Y|X₁) − H(X₁) + H(X₁) = H(X₁) + H(X₂Y) − H(X₁X₂Y)I(X₁;X₂Y) = H(X₂Y) − H(X₂Y|X₁) = H(X₂Y) − H(X₂Y|X₁) − H(X₁) + H(X₁) = H(X₁) + H(X₂Y) − H(X₁X₂Y)

Where equivalence in (a) follows form the independence of X₁ and X₂, and consequently

I(X₁;X₂Y) = \cancelto0I(X₁;X₂) + I(X₁;Y|X₂) = I(X₁;Y|X₂)

Similarly, for j ≠ 1, Во овој случај соласно теорема 15.2.3 ќе земеш X₁ = S₃

(38) P(E_1j) ≤ 2^{− n(I(X₂;Y|X₁) − 3ϵ)}

Од Теорема 15.2.3 следи
P(S₁’ = s₁, S₂’ = s₂, S₃’ = s₃) = ∏ⁿ_i = 1p(s_1i|s_3i)p(s_2i|s_3i)p(s_3i)
Всушност ако се има во предвид Theorem 15.2.3 и се замисли дека X₂ = S₃
\mathchoice∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂|x₁)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)p(x₁, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂|x₁)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)p(x₁, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂|x₁)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)p(x₁, y)∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂|x₁)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂)p(y|x₁) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)p(x₁, y)
|A⁽ⁿ⁾_ϵ(S)|≐2^n(H(S)±2ϵ)
(a) I(X₂;X₁Y) = H(X₁Y) − H(X₁Y|X₂) = H(X₁Y) − H(X₁Y|X₂) − H(X₂) + H(X₂) = H(X₂) + H(X₁Y) − H(X₁X₂Y)
\mathchoiceP(E_i1)P(E_i1)P(E_i1)P(E_i1) = P((X₁(i), X₂(1), Y) ∈ A⁽ⁿ⁾_ϵ) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₂)p(x₁, y) ≤ |A⁽ⁿ⁾_ϵ|2^{− n(H(X₂) − ϵ)}2^{− n(H(X₁Y) − ϵ)}
≤ 2^{n(H(X₁X₂Y) + 2ϵ)}2^{− n(H(X₂) − ϵ)}2^{− n(H(X₁Y) − ϵ)} = 2^{− n(H(X₂) − ϵ + H(X₁Y) − ϵ − H(X₁X₂Y) − 2ϵ)}
= 2^{− n(H(X₂) + H(X₁Y) − H(X₁X₂Y) − 4ϵ)} = |(a)| = 2^{− n(I(X₂;X₁Y) − 4ϵ)}\overset(b) = \mathchoice2^{− n(I(X₂;Y|X₁ − 4ϵ)}2^{− n(I(X₂;Y|X₁ − 4ϵ)}2^{− n(I(X₂;Y|X₁ − 4ϵ)}2^{− n(I(X₂;Y|X₁ − 4ϵ)}
(b) I(X₂;X₁Y) = \cancelto0I(X₁;X₂) + I(X₂;Y|X₁) = I(X₂;Y|X₁)

and for i ≠ 1, j ≠ 1

(39) P(E_ij) ≤ 2^{− n(I(X₁X₂;Y) − 4ϵ)}

P(E_i1) = P((X₁(i), X₂(1), Y) ∈ A⁽ⁿ⁾_ϵ) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁ x₂, y) ≤ |A⁽ⁿ⁾_ϵ|2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)}
∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁x₂y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(y)p(x₁|y)p(x₂|x₁y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(y)p(x₁|y)p(x₂|y) ≤
≤ 2^{n(H(X₁X₂Y) + 2ϵ)}2^{− n(H(Y) − ϵ)}2^{− n(H(X₁|Y) − 2ϵ) − n(H(X₂|Y) − 2ϵ)} = 2^{− n( − H(X₁X₂Y) − 7ϵ + H(Y) + H(X₁|Y) + H(X₂|Y))}
I(X₁X₂;Y) = H(X₁X₂) − H(X₁X₂|Y) = H(X₁X₂) − H(X₁|Y) − H(X₂|Y) + H(Y|X₁X₂) − H(Y|X₁X₂) = H(X₁X₂Y) − H(X₁|Y) − H(X₂|Y) − H(Y|X₁X₂)
I(X₁X₂;Y) = H(X₁X₂) − H(X₁X₂|Y) = H(X₁X₂) − H(X₁X₂|Y) + H(Y) − H(Y) = H(X₁X₂) − H(Y) − H(X₁X₂|Y) + H(Y) = H(X₁X₂) − H(Y, X₁, X₂) + H(Y)
__________________________________________________________________________________
H(X₁|Y) + H(X₂|Y) = H(X₁X₂|Y) = H(X₁X₂Y) − H(Y)
I(X₁X₂;Y) = H(X₁X₂) − H(X₁X₂|Y) = H(X₁X₂) − H(X₁X₂|Y) + H(Y|X₁X₂) − H(Y|X₁X₂) = H(X₁X₂Y) − H(X₁X₂|Y) − H(Y|X₁X₂)
——————————————————————————–—————————————————–
∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁x₂y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(y)⋅p(x₁x₂|y) = ≤ 2^{n(H(X₁X₂, Y) − 2ϵ)}2^{− n(H(Y) − ϵ)}2^{− n(H(X₁X₂|Y) − ϵ)}
2^{n(H(X₁X₂, Y) − 2ϵ)}2^{− n(H(Y) − ϵ)}2^{− n(H(X₁|Y) + H(X₂|Y) − ϵ)}
I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(X₁X₂Y) − H(Y|X₁X₂) − H(X₁|Y) − H(X₂|Y) = H(X₁X₂Y) − H(Y|X₁X₂) − H(X₁X₂|Y)
——————————————————————————–——————————————————
∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁x₂y) = ∑_{(x₁, x₂, y) ∈ A⁽ⁿ⁾_ϵ}p(x₁x₂)⋅p(y|x₁x₂) ≤ 2^{n(H(X₁X₂, Y) − ϵ)}2^{− n(H(X₁X₂) − ϵ)}2^{− n(H(Y|X₁X₂) − ϵ)}
I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(X₁X₂Y) − H(Y|X₁X₂) − H(X₁|Y) − H(X₂|Y) = H(X₁X₂Y) − H(Y|X₁X₂) − H(X₁X₂|Y)
H(X₁X₂Y) = H(X₁) + H(X₂|X₁) + H(Y|X₁X₂) = H(Y) + H(X₁|Y) + H(X₂|YX₁) = H(Y) + H(X₁|Y) + H(X₂|Y)

It follows that

(40) P⁽ⁿ⁾_ϵ ≤ P(E^c₁₁) + 2^nR₁2^{− n(I(X₁;Y|X₂) − 3ϵ)} + 2^nR₂2^{− n(I(X₂;Y|X₁) − 3ϵ)} + 2^{n(R₁ + R₂)}2^{− n(I(X₁X₂;Y) − 4ϵ)}

I(X₁;Y|X₂) − 3ϵ − R₁ ≥ 0 → R₁ ≤ I(X₁;Y|X₂) − 3ϵ → R₁ < I(X₁;Y|X₂)

Since ϵ ≥ 0 is arbitrary, the conditions of the theorem imply that each term tends to 0 as n → ∞. Thus the probability of error, conditioned on a particular codeword being sent, goes to zero if the conditions of the theorem are met. The above bound shows that the average probability of error, which by symmetry is equal to the probability for an individual codeword, averaged over all choices of codebooks in the random code construction, is arbitrarily small. Hence there exists at least one code C^* with arbitrary small probability of error.

This completes the proof of achievability of region in 33↑ for a fixed input distribution. Later, in Section 15.3.3 we show that time-sharing allows any (R₁R₂) in the convex hull to be achieved, completing the proof of the forward part of the theorem.

1.3.2 Comments on the Capacity Region for the Multiple-Access Channel

We have now proved the achievability of the capacity region of the multiple-access channel, which is the closure of the convex hull of the set of points (R₁, R₂) satisfying

(41) R₁ < I(X₁;Y|X₂)

(42) R₂ < I(X₂;Y|X₁)

(43) R₁ + R₂ < I(X₁, X₂;Y)

for some distribution p(x₁)p(x₂) on X₁ xX₂. For a particular p(x₁)p(x₂) the region is illustrated in 17↓\begin_inset Separator latexpar\end_inset

figure Figure 15.14 Achievable region of multiple-access channel for a fixed input distribution.png

Figure 17 Achievable region of multiple access channel for a fixed input distribution

I(X₂;Y|X₁) = H(X₂|X₁) − H(X₂|X₁Y) = H(X₂) − H(X₂|X₁Y) ≥ H(X₂) − H(X₂|Y) = I(X₂;Y);
\mathchoiceI(X₂;Y|X₁) ≥ I(X₂;Y)I(X₂;Y|X₁) ≥ I(X₂;Y)I(X₂;Y|X₁) ≥ I(X₂;Y)I(X₂;Y|X₁) ≥ I(X₂;Y)
\mathchoice\overset(a)I(X₂;Y|X₁) + \overset(b)I(X₁;Y) = I(X₁X₂;Y)\overset(a)I(X₂;Y|X₁) + \overset(b)I(X₁;Y) = I(X₁X₂;Y)\overset(a)I(X₂;Y|X₁) + \overset(b)I(X₁;Y) = I(X₁X₂;Y)\overset(a)I(X₂;Y|X₁) + \overset(b)I(X₁;Y) = I(X₁X₂;Y)
(a) - максимална брзина што може да ја постигне предавателот 2
(b)- максимална брзина што може да ја постигне предавателот 1, а притоа предавателот 2 пренесува со максиламанта брзина
Оттука произлегува дека вкупната брзина е R = R₁ + R₂ < I(X₁X₂;Y).

Figure 18 Notebook17k.p1

Let us now interpret the corner points in the region.

Point A corresponds to the maximum rate achievable form sender 1 to receiver when sender 2 is not sending any information. This is

(44) maxR₁ = max_{p₂(x₁)p₂(x₂)}I(X₁;Y|X₂).

Now for any distribution p₁(x₁)p₂(x₂),

Логично е неравенството зошто сумата претставува средна вредност, а средната вредност е секогаш помала од максималната вредност.

since the average is less than the maximum. Therefore, the maximum in 44↑ is attained when we set X₂ = x₂, where x₂ is the value that maximizes conditional mutual information between X₁ and Y. The distribution of X₁ is chosen to maximize this mutual information. Thus, X₂ must facilitate the transmission of X₁ by setting X₂ = x₂.

The point B corresponds to the maximum rate at which sender 2 can send as long as sender 1 sends at his maximum rate. This is the rate that is obtained if X₁ is considered as noise for the channel from X₂ to Y Види го примерот со 15.3.3 многу јасно е објаснато ова!!! . In this case using the results form single-user channels, X₂ can send at a rate I(X₂;Y). The receiver now knows which X₂ code-word was used and can „subtract” its effect form the channel. We can consider the channel now to be an indexed set of single-user channels, where the index is the X₂ symbol used. The X₁ rate achieved in this case is the average mutual information where the average is over these channels, and each channel occurs as many times as the corresponding X₂ symbol appears in the codewords Многу ефективно објаснувње!!! . Hence, the rate achieved is

(46) ⎲⎳_x₂p(x₂)I(X₁;Y|X₂ = x₂) = I(X₁;Y|X₁)

Points C and D correspond to B and A respectively, with the role of the senders reversed. The non-corner points can be achieved by time-sharing. Thus, we have given a single-user interpretation and justification for the capacity region of a multiple-access channel.

The idea of considering other signals as part of the noise, decoding one signal, and then „subtracting” it form the received signal is a very useful one. We will come across the same concept again in the capacity calculations for the degraded broadcast channel.

1.3.3 Convexity of the Capacity Region of the Multiple - Access Channel

We now recast the capacity region of the multiple-access channel in order to take into account the operation of taking the convex hull by introducing a new random variable. We begin by proving that the capacity region is convex.

Theorem 15.3.2

The capacity region C of a multiple-access channel is convex [i.e., if (R₁, R₂) ∈ C and (R₁’, R₂’) ∈ C ,then (λR₁ + + (1 − λ)R₁’, λR₂ + (1 − λ)R₂’) ∈ C for 0 ≤ λ ≤ 1].

Откако го поминав доказот на Theorem 15.3.4 ова би го парафразирал вака: Convex combination of achievable rates is achievable!

Proof:

The idea is time-sharing. Given two sequences for codes at different rates R = (R₁R₂) and R’ = (R₁’, R₂’), we can construct a third codebook at rate λR + (1 − λ)R’ by using the first codebook for the first λn symbols and using the second codebook for the last (1 − λ)n symbols. The number of X₁ codewords in the new code is

(47) 2^nλR₁2^{n(1 − λ)R₁’} = 2^{n(λR₁ + (1 − λ)R₁’)}

and hence the rate of the new code is λR + (1 − λ)R’ . Since the overall probability of error is less than the sum of the probabilities of error for each of the segments, the probability of error of the new code goes to 0 and the rate is achievable.

We can now recast the statement of the capacity region for the multiple access channel using a time-sharing random variable Q. Before we prove this result, we need to prove a property of convex sets defined by linear inequalities like those of the capacity region of the multiple-access channel.

In particular, we would like to show that the convex hull of two such regions defined by linear constraints is the region defined by the convex combination of the constraints. Initially, the equality of these two sets seems obvious, but on closer examination, there is a subtle difficulty due to the fact that some of the constraints might not be acti+ve. This is best illustrated by an example. Consider the following two sets defined by linear inequalities:

(48) C₁ = {(x, y):x ≥ 0, y ≥ 0, x ≤ 10, y ≤ 10, x + y ≤ 100}

(49) C₂ = {(x, y):x ≥ 0, y ≥ 0, x ≤ 20, y ≤ 20, x + y ≤ 20}

In this case, the ⎛⎝(1)/(2), (1)/(2)⎞⎠ convex combination of the constraints defines the region

(50) C = {(x, y):x ≥ 0, y ≥ 0, x ≤ 15, y ≤ 15, x + y ≤ 60}

It is not difficult to see that any point in C₁ or C₂ has x + y < 20, so any point in the convex hull of the union of C₁ or C₂ has x + y < 20, so any point in the convex hull of the union of C₁ and C₂ satisfies this property. Thus, the point (15, 15), which is in C, is not in the convex hull of (C₁∪C₂). This example also hints at the cause of the problem - in the definition for C₁, the constraint x + y ≤ 100 is not active. If this constraint were replaced by constraint x + y ≤ a, where a ≤ 20, the above result of the equality of the two regions would be true, as we now prove.

We restrict ourselves to the pentagonal regions that occur as components of the capacity region of two-user multiple-access channel. In this case, the capacity region for a fixed p(x₁)p(x₂) is defined by three mutual informations, I(X₁;Y|X₂), I(X₂;Y|X₁) and I(X₁, X₂;Y), which we shall call I₁, I₂ and I₃, respectively. For each p(x₁)p(x₂) , there is a corresponding vector, I = (I₁, I₂, I₃), and a rate region defined by

(51) \mathchoiceC_I = {(R₁, R₂):R₁ ≥ 0, R₂ > 0, R₁ ≤ I₁, R₂ ≤ I₂, R₁ + R₂ ≤ I₃}C_I = {(R₁, R₂):R₁ ≥ 0, R₂ > 0, R₁ ≤ I₁, R₂ ≤ I₂, R₁ + R₂ ≤ I₃}C_I = {(R₁, R₂):R₁ ≥ 0, R₂ > 0, R₁ ≤ I₁, R₂ ≤ I₂, R₁ + R₂ ≤ I₃}C_I = {(R₁, R₂):R₁ ≥ 0, R₂ > 0, R₁ ≤ I₁, R₂ ≤ I₂, R₁ + R₂ ≤ I₃}

Also, since for any distribution p(x₁)p(x₂), we have

Ова истово го докажав на сличен начин во 1.3.2↑. Суштински е да имаш во предвид дека X₁ и X₂ се независни.

and therefore

we have for all vectors I that \mathchoiceI₁ + I₂ ≥ I₃I₁ + I₂ ≥ I₃I₁ + I₂ ≥ I₃I₁ + I₂ ≥ I₃ . This property will turn out to be critical for the theorem.

Lemma 15.3.1

Let I₁, I₂ ∈ R³ be two vectors of mutual informations that define rate regions C_I₁ and C_I₂, respectively, as given in 51↑. For 0 ≤ λ ≤ 1 define I_λ = λI₁ + (1 − λ)I₂ and let C_{I_λ} be the rate region defined by I_λ. Then

(52) \mathchoiceC_{I_λ} = λC_I₁ + (1 − λ)C_I₂C_{I_λ} = λC_I₁ + (1 − λ)C_I₂C_{I_λ} = λC_I₁ + (1 − λ)C_I₂C_{I_λ} = λC_I₁ + (1 − λ)C_I₂.

Proof:

We shall prove this theorem in two parts. We first show that any point in the (λ, 1 − λ) mix of the sets C_I₁ and C_I₂ satisfies the inequalities for I₁ and point in C_I₂ satisfies the inequalities for I₂, so the (λ, 1 − λ) mix of these points will satisfy the (λ, 1 − λ) mix of the constraints. Thus, it follows that

(53) λC_I₁ + (1 − λ)C_I₂ ⊆ C_{I_λ}

To prove the reverse inclusion, we consider the extreme points of the pentagonal regions. It is not difficult to see that the rate regions defined in 51↑ are always in the form of a pentagon, or in the extreme case when I₃ = I₁ + I₂ in the form of a rectangle Ова е добро илустрирано во Maple worksheet-от . Thus, the capacity region C_I can be also defined as a convex hull of five points:

(54) \mathchoice(0, 0), (I₁, 0), (I₁, I₃ − I₁), (I₃ − I₂, I₂), (0, I₂)(0, 0), (I₁, 0), (I₁, I₃ − I₁), (I₃ − I₂, I₂), (0, I₂)(0, 0), (I₁, 0), (I₁, I₃ − I₁), (I₃ − I₂, I₂), (0, I₂)(0, 0), (I₁, 0), (I₁, I₃ − I₁), (I₃ − I₂, I₂), (0, I₂)

Ова многу добро го илустрирав во соодветниот Maple worksheet.

Consider the region defined by I_λ ; it, too, is defined by five points. Take any one of the points, say (I^(λ)₃ − I^(λ)₂, I^(λ)₂). This point can be written as the (λ, 1 − λ) mix of the points (I⁽¹⁾₃ − I⁽¹⁾₂, I⁽¹⁾₂) and (I⁽²⁾₃ − I⁽²⁾₂, I⁽²⁾₂), and therefore lies in the convex mixture of C_I₁ and C_I₂. Thus, all extreme points of the pentagon C_{I_λ} lie in the convex hull of C_I₁ and C_I₂, or

(55) C_{I_λ} ⊆ λC_I₁ + (1 − λ)C_I₂

Combining the two parts we have the theorem.

In the proof of the theorem, we have implicitly used the fact that all the rate regions are defined by five extreme points (at worst, some of the points are equal). all five points defined by the I vector were within the rate region. If the condition I₃ ≤ I₁ + I₂ is not satisfied, some of the points in 54↑ may be outside the rate region and the proof collapses.

As an immediate consequence of the above lemma, we have the following theorem:

Theorem 15.3.3

The convex hull of the union of the rate regions defined by individual I vectors is equal to the rate region defined by the convex hull of the I vectors.

These arguments on the equivalence of the convex hull operation on the rate regions with the convex combinations of the mutual informations can be extended to the general m-user multiple-access channel. A proof along these lines using the theory of polymatroids is developed in [4].

Theorem 15.3.4

The set of achievable rates of a discrete memory-less multiple-access channel is given by the closure of the set of all (R₁R₂) pairs satisfying

R₁ < I(X₁;Y|X₂, Q)

R₂ < I(X₂;Y|X₁, Q)

(56) R₁ + R₂ < I(X₁, X₂;Y|Q)

Во ваква форма ги сведуваат во Capacity Theorem.

R₁ < I(X₁;Y|X₂, Q) ≤ I(X₁;Y|X₂)

R₂ < I(X₂;Y|X₁, Q) ≤ I(X₂;Y|X₁)

R₁ + R₂ < I(X₁, X₂;Y|Q) ≤ I(X₁X₂;Y)

Интерпретацијата е едноставна. Q e time-sharing променлива. Во генерален случај на тој начин било која условена неизвесност или заедничка информација можеш да ја претставиш како сума од поединечните условени ентропии или заеднички ентропии. Преку веројатноста p(Q) = (1)/(n) се дефинира веројатноста за пренос во „еден тајмслот” од вкупноте n. Ова некако ми оди на статистичко мултиплексирање наместо на статичко. Не ги вртиш сите трансмисии по реден број туку оставаш веројатноста тоа да го одлучи. Може во екстремен случај сите n тајмслоти да ги зафати еден предевател, зошто случајноста така одлучила.

for some choice of the joint distribution \mathchoicep(q)p(x₁|q)p(x₂|q)p(y|x₁x₂)p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂)p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂)p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂) with |Q| ≤ 4 .

Proof.

We will show that every rate pair lying in the region defined in 56↑ is achievable (i.e., it lies in the convex closure of the rate pairs satisfying Theorem 15.3.1). We also show that every point in the convex closure of the region in Theorem 15.3.1 is also in the region defined in 56↑.

Consider a rate point R satisfying the inequalities 56↑ of the theorem. We can rewrite the right-hand side of the first inequality as

(57) I(X₁;Y|X₂, Q) = ^m⎲⎳_q = 1p(q)I(X₁;Y|X₂, Q = q)

(58) = ^m⎲⎳_q = 1p(q)I(X₁;Y|X₂)_{p_1q, p_2q}

where m is the cardinality of the support set of Q. We can expand the other mutual informations similarly.

For simplicity in notation we consider a rate pair as a vector and denote a pair satisfying the inequalities in 56↑ for a specific input product distribution p_1q(x₁)p_2q(x₂) as R_p₁p₂ as R_q. Specifically, let R_q = (R_1q, R_2q) be a rate pair satisfying

(59) R_1q < I(X₁;Y|X₂)_{p_1q(x₁)p_2q(x₂)}

(60) R_2q < I(X₂;Y|X₁)_{p_1q(x₁)p_2q(x₂)}

(61) R_1q + R_2q < I(X₁, X₂;Y)_{p_1q(x₁)p_2q(x₂)}

Then by Theorem 15.3.1., R_q = (R_1q, R_2q) is achievable. Then since R satisfies 56↑ and we can expand the right-hand sides as in 58↑, there exists a set of pairs R_q satisfying 61↑ such that

(62) R = ^m⎲⎳_q = 1p(q)R_q

Ова го разбирам како проширување на 52↑ C_{I_λ} = λC_I₁ + (1 − λ)C_I₂

Since a convex combination of achievable rates is achievable, so is R. Hence, we have proven the achievability of the region in the theorem.

The same argument can be used to show that every point in the convex closure of the region in 33↑ can be written as the mixture of points satisfying 61↑ and hence can be written in the form 56↑.

The converse is proved in the next section. The converse shows that all achievable rate pairs are of the form 56↑ and hence establishes that this is the capacity region of the multiple-access channel. The cardinality bound on the time-sharing random variable Q is a consequence of Caratheodory’s theorem on convex sets. See discussion below.

The proof of the convexity of the capacity region shows that any convex combination of achievable rate pairs is also achievable. We can continue this process, taking convex combinations of more points. Мене одма ми заличи на ова изразот 62↑ Do we need to use an arbitrary number of points? Will the capacity region be increased? The following theorems says no.

Theorem 15.3.5 (Caratheodory) (MMV)

Any point in the convex closure of a compact set A in a d-dimensional euclidean space can be represented as a convex combination of d + 1 or fewer points in the original set A.

Формулатцијата на оваа теорема во книгата Convex Polytopes е:

If A is subset of R^d then every x ∈ conv(A) conv(A) значи дека x e точка од convex hull од A is expressible in the form:

x = ∑^d_i = 0α_ix_i where x_i ∈ A, a_i ≥ 0 and ∑^d_i = 0α_i = 1

Proof:

The proof may be found in Eggleston [5] and Grunbaum [6].

This theorem allows us to restrict attention to a certain finite convex combination when calculating the capacity region. This is an important property because without it, we would not be able to compute the capacity region in 56↑, since we would never know whether using a larger alphabet Q would increase the region.

In the multiple-access channel, the bounds define a connected compact set in three dimensions. Therefore, all points in its closure can be defined as the convex combination of at most four points. Hence, we can restrict the cardinality of Q to at most 4 in the above definition of the capacity region.

Remark

Many of the cardinality bounds may be slightly improved by introducing other considerations. For example, if we are only interested in the boundary of the convex hull of A as we are in capacity theorems, a point on the boundary can be expressed as a mixture of d points of A, since a point on the boundary lies in the intersection of A with a (d − 1) - dimensional support hyperplane.

1.3.4 Converse for the Multiple-Access Channel

We have so far proved the achievability of the capacity region. In this section we prove the converse.

Proof: (Converse to Theorems 15.3.1 and 15.3.4)

We must show that given any sequence of \mathchoice((2^nR₁, 2^nR₂), n)((2^nR₁, 2^nR₂), n)((2^nR₁, 2^nR₂), n)((2^nR₁, 2^nR₂), n) codes with P⁽ⁿ⁾_e → 0 the rates must satisfy

R₁ ≤ I(X₁;Y|X₂, Q)

R₂ ≤ I(X₂;Y|X₁, Q)

(63) R₁ + R₂ ≤ I(X₁, X₂;Y|Q)

for some choice of random variable Q defined on {1, 2, 3, 4} and joint distribution p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂). Fix n. Consider the given code of block length n. The joint distribution on W₁ xW₂ xXⁿ₁ xXⁿ₂ xYⁿ is well defined. The only randomness is due to the random uniform choice of indices W₁ and W₂ and the randomness induced by the channel. The joint distribution is

(64) p(w₁, w₂, xⁿ₁, yⁿ) = (1)/(2^nR₁)⋅(1)/(2^nR₂)⋅p(xⁿ₁|w₁)p(xⁿ₂|w₂)ⁿ∏_i = 1p(y_i|x_1ix_2i)

where p(xⁿ₁|w₁) is either 1 or 0, depending on whether xⁿ₁ = x₁(w₁), the codeword corresponding to w₁, or not, and similarly, p(xⁿ₂|w₂) = 1 or 0, according to whether xⁿ₂ = x₂(w₂) or not. The mutual informations that follow are calculated with respect to this distribution.

By the code construction, it is possible to estimate (W₁, W₂) from the received sequence Yⁿ with low probability of error. Hence, the conditional entropy of (W₁W₂) given Yⁿ must be small. By Fano’s inequality,

H(W₁, W₂|Yⁿ) ≤ n(R₁ + R₂)P⁽ⁿ⁾_e + H(P⁽ⁿ⁾_e)≜nϵ_n

It is clear that ϵ_n → 0 as P⁽ⁿ⁾_e. Then we have

(65) H(W₁|Yⁿ) ≤ H(W₁, W₂|Yⁿ) ≤ nϵ_n

(66) H(W₁|Yⁿ) ≤ H(W₁, W₂|Yⁿ) ≤ nϵ_n

We can now bound the rate R₁ as

(67) \mathchoicenR₁nR₁nR₁nR₁ = H(W₁) = I(W₁;Yⁿ) + H(W₁|Yⁿ)\overset(a) ≤ I(W₁;Yⁿ) + nϵ_n\overset(b) ≤ I(X⁽ⁿ⁾₁(W₁);Yⁿ) + nϵ_n

= I(X⁽ⁿ⁾₁(W₁);Yⁿ|X₂(W₂)) + nϵ_n = H(Y⁽ⁿ⁾|Xⁿ₂(W₂)) − H(Yⁿ|X⁽ⁿ⁾₂(W₂), X⁽ⁿ⁾₁(W₁)) + nϵ_n =

\overset(d) = H(Yⁿ|Xⁿ₂(W₂)) − ⁿ⎲⎳_i = 1H(Y_i|Y^i − 1Xⁿ₂(W₂), X⁽ⁿ⁾₁(W₁)) + nϵ_n =

\overset(e) = H(Yⁿ|X⁽ⁿ⁾₂(W₂)) − ⁿ⎲⎳_i = 1H(Y_i|X_1i, X_2i) + nϵ_n\overset(f) ≤ ⁿ⎲⎳_i = 1H(Y_i|Xⁿ₂(W₂)) − ⁿ⎲⎳_i = 1H(Y_i|X_1i, X_2i) + nϵ_n ≤

where

(a) follows from Fanno inequality Веројатно на аспектот од фано дека H(W₁|Yⁿ) е многу мало.

(b) follows data-processing inequality

(c) follows from the fact that since W₁ and W₂ are independent, so are X⁽ⁿ⁾₁(W₁) and X⁽ⁿ⁾₂(W₂), and hence H(Xⁿ₁(W₁)) = H(X⁽ⁿ⁾₁(W₁)|X⁽ⁿ⁾₂(W₂)) and H(Yⁿ|X⁽ⁿ⁾₂(W₂), X⁽ⁿ⁾₁(W₁)) ≤ H(Yⁿ|X⁽ⁿ⁾₂(W₂)) by conditioning.

(d) follows from the chain rule

(e) follows form the fact that Y_i depends only on X_1i and X_2i by the memoryless property of the channel

(f) follows from the chain rule and removing conditioning

(g) follows from removing conditioning

Hence, we have

(69) R₁ < (1)/(n)ⁿ⎲⎳_i = 1I(X_1i;Y_i|X_2i) + ϵ_n

Similarly, we have

(70) R₂ ≤ (1)/(n)ⁿ⎲⎳_i = 1I(X_2i;Y_i|X_1i) + ϵ_n

To bound the sum of the rates, we have

(71) n(R₁ + R₂) = H(W₁W₂) = I(W₁W₂;Yⁿ) + H(W₁W₂|Yⁿ)\overset(a) ≤ I(W₁, W₂;Yⁿ) + nϵ_n

(72) \overset(b) ≤ I(Xⁿ₁(W₁), Xⁿ₂(W₂);Yⁿ) + nϵ_n = H(Yⁿ) − H(Yⁿ|Xⁿ₁(W₁), Xⁿ₂(W₂)) + nϵ_n

(73) \overset(c) = H(Yⁿ) − ⁿ⎲⎳_i = 1H(Y_i|Y^i − 1X⁽ⁿ⁾₁(W₁)X⁽ⁿ⁾₂(W₂)) + nϵ_n\overset(d) = H(Yⁿ) − ⁿ⎲⎳_i = 1H(Y_i|X_1i(W₁)X_2i(W₂)) + nϵ_n

(74) \overset(e) ≤ ⁿ⎲⎳_i = 1H(Y_i) − ⁿ⎲⎳_i = 1H(Y_i|X_1iX_2i) + nϵ_n = ⁿ⎲⎳_i = 1I(X_1i, X_2i;Y_i) + nϵ_n

where

(a) follows from Fano’s inequality

(b) follows form the data-processing inequality

(d) follows form the fact that Y_i depends only on X_1i and X_2i and is conditionally independent of everything else

(e) follows form the chain rule and removing conditioning

Hence we have

(75) R₁ + R₂ ≤ (1)/(n)ⁿ⎲⎳_i = 1I(X_1i, X_2i;Y_i)

Овој доказ секогаш најмногу ми кажува и ме учи. Логично е се ова ако се има во предвид дека бројот на елементи во (W₁W₂) e 2^nR₁⋅2^nR₂. Максималната ентропија е

log(2^nR₁⋅2^nR₂) = n(R₁ + R₂). Од овој иницијален резултат со помош на изведувањата од 71↑ до 74↑ одма доаѓаш до резултатот во 75↑. Оригинален е начинот на добивање на изразот 69↑ и тоа делот означен во зелено т.е. (c).

The expressions in 69↑,70↑ and 75↑ are the averages of the mutual informaitons calculated at the empirical distributions in column i of the codebook. We can rewrite these equations with the new variable Q, where Q = {1, 2, ..., n} with probability (1)/(n). The equations become

R₁ ≤ (1)/(n)ⁿ⎲⎳_i = 1I(X_1i;Y_i|X_2i) + ϵ_n

= (1)/(n)ⁿ⎲⎳_i = 1I(X_1q;Y_q|X_2q, Q = i) + ϵ_n

= I(X_1Q;Y_Q|X_2Q, Q) + ϵ_n

= I(X₁;Y|X₂, Q) + ϵ_n

(1)/(n)∑ⁿ_i = 1I(X_1q;Y_q|Q = i)

where X₁≜X_1Q, X₂≜X_2Q and Y≜Y_Q are new random variables whose distributions depend on Q in the same way as the distributions of X_1i, X_2i and Y_i depend on i. Since W₁ and W₂ are independent, so are X_1i(W₁) and X_2i(W₂), and hence

(76) P_r(X_1i(W₁) = x₁, X_2i(W₂) = x₂)≜Pr{X_1Q = x₁|Q = i}Pr{X_2Q = x₂|Q = i}

Hence, taking the limit as n → ∞, P⁽ⁿ⁾_e → 0, we have the following converse:

R₁ ≤ I(X₁;Y|X₂, Q)

R₂ ≤ I(X₂;Y|X₁, Q)

(77) R₁ + R₂ ≤ I(X₁, X₂;Y|Q)

for some choice of joint distribution p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂). As in Section 15.3.3, the region is unchanged if we limit the cardinality of Q to 4.

This completes the proof of the converse.

Thus, the achievability of the region of Theorem 15.3.1 was proved in Section 15.3.1. In Section 15.3.3 we showed that every point in region defined by 63↑ was also achievable. In the converse, we showed that the region in 63↑ was best we can do, establishing that this is indeed the capacity region of the channel. Thus, the region in 33↑ cannot be any larger than the region in 63↑, and this is the capacity region of the multiple-access channel.

Во Capacity Theorem чланакот продолжуваат понатаму за релеен канал и го сведуваат на следнава форма

R₁ < I(X₁;Y|X₂, Q) ≤ I(X₁;Y|X₂)

R₂ < I(X₂;Y|X₁, Q) ≤ I(X₂;Y|X₁)

R₁ + R₂ < I(X₁, X₂;Y|Q) ≤ I(X₁X₂;Y)

Ова дефинитивно фали да се додаде и во книгата.

1.3.5 m-User Multiple-Access Channels

We will now generalize the result derived for two senders to m senders, m ≥ 2. The multiple-access channel in this case is shown in 19↓.

We send independent indices w₁, w₂, ..., w_m over the channel form the senders 1, 2, ..., m respectively. The codes, rates, and achievability are all defined in exactly the same way as in the tow-sender case.

Let S ⊂ {1, 2, ..., m} , respectively. Let S^c denote the complement of S. Let R(S) = ∑_i ∈ SR_i and let X(S) = {X_i:i ∈ S}. Then we have the following theorem.

Figure 19 m-user multiple-access channle

Theorem 15.3.6

The capacity region of the m-user multiple-access channel is the closure of the convex hull of the rate vectors satisfying

(78) R(S) ≤ I(X(S);Y|X(S^c)) for all S ⊆ {1, 2, ...m}

for some product distribution p₁(x₁)p₂(x₂)...p_m(x_m) .

Proof:

The proof contains no new ideas. There arе now 2^m − 1 terms in the probability of error in the achievability proof and equal number of inequalities in the proof of the converse.

In general, the region in 78↑ is a beveled box.

1.3.6 Gaussian Multiple-Access Channels (MMV)

We now discuss the Gaussian multiple-access channel of Section 15.1.2. in somewhat more detail.

Two senders, X₁ and X₂, communicate to the single receiver, Y. The received signal at time i is:

(79) Y_i = X_1i + X_2i + Z_i

where {Z_i} is sequence of independent, identically distributed, zero-mean Gaussian random variables with variance N 20↓

figure Figure 15.16 Gaussian multiplepaccess channel .png

Figure 20 Gaussian multiple-access channel

We assume that there is a power constraint P_j on sender j; that is, for each sender for all messages we must have

(80) (1)/(n)ⁿ⎲⎳_i = 1x²_ji(w_j) ≤ P_j w_j ∈ {1, 2, ..., 2^nR} j = 1, 2

Just as the proof of achievability of channel capacity for the discrete case (Chapter 7) was extended to the Gaussian channel (Chapter 9) we can extend the proof for the discrete multiple-access channel to the Gaussian multiple-access channel. The converse can also be extended similarly, so we expect the capacity region to be the convex hull of the set of rate pairs satisfying

(81) R₁ < I(X₁;Y|X₂)

(82) R₂ < I(X₂;Y|X₁)

(83) R₁ + R₂ < I(X₁, X₂;Y)

for some input distribution f₁(x₁)f₂(x₂) satisfying EX²₁ ≤ P₁ and EX²₂ ≤ P₂.

Now we can expand the mutual information in terms of relative entropy, and thus

\mathchoiceI(X₁;Y|X₂) = h(Y|X₂) − h(Y|X₁X₂) = h(X₁ + X₂ + Z|X₂) − h(X₁ + X₂ + Z|X₁X₂) = I(X₁;Y|X₂) = h(Y|X₂) − h(Y|X₁X₂) = h(X₁ + X₂ + Z|X₂) − h(X₁ + X₂ + Z|X₁X₂) = I(X₁;Y|X₂) = h(Y|X₂) − h(Y|X₁X₂) = h(X₁ + X₂ + Z|X₂) − h(X₁ + X₂ + Z|X₁X₂) = I(X₁;Y|X₂) = h(Y|X₂) − h(Y|X₁X₂) = h(X₁ + X₂ + Z|X₂) − h(X₁ + X₂ + Z|X₁X₂) =

(84) \mathchoiceh(X₁ + Z|X₂) − h(Z)\overset(a) = h(X₁ + Z) − (1)/(2)log(2πe)N\overset(b) ≤ (1)/(2)log(2πe)((P₁ + N))/(N) = (1)/(2)log(2πe)⎛⎝1 + (P₁)/(N)⎞⎠h(X₁ + Z|X₂) − h(Z)\overset(a) = h(X₁ + Z) − (1)/(2)log(2πe)N\overset(b) ≤ (1)/(2)log(2πe)((P₁ + N))/(N) = (1)/(2)log(2πe)⎛⎝1 + (P₁)/(N)⎞⎠h(X₁ + Z|X₂) − h(Z)\overset(a) = h(X₁ + Z) − (1)/(2)log(2πe)N\overset(b) ≤ (1)/(2)log(2πe)((P₁ + N))/(N) = (1)/(2)log(2πe)⎛⎝1 + (P₁)/(N)⎞⎠h(X₁ + Z|X₂) − h(Z)\overset(a) = h(X₁ + Z) − (1)/(2)log(2πe)N\overset(b) ≤ (1)/(2)log(2πe)((P₁ + N))/(N) = (1)/(2)log(2πe)⎛⎝1 + (P₁)/(N)⎞⎠

(a) - follows from the fact that Z is independent of X₁ an X₂, and (b) from the fact that normal maximizes entropy for a given second moment. Thus, the maximizing distribution is X₁ ~ N(0, P₁) and X₂ ~ N(0, P₂) with X₁ and X₂ independent. This distribution simultaneously maximizes the mutual information bounds in 81↑-83↑.

Definition

We define the channel capacity function

(85) C(X)≜(1)/(2)log(1 + x)

corresponding to the channel capacity of Gaussian white-noise channel with signal-to-noise ratio x (21↓). Then we write the bound on R₁ as

(86) R₁ ≤ C⎛⎝(P₁)/(N)⎞⎠

(87) R₂ ≤ C⎛⎝(P₂)/(N)⎞⎠

Figure 21 Gaussian multiple-access channel capacity

and

(88) R₁ + R₂ ≤ C⎛⎝(P₁ + P₂)/(N)⎞⎠.

(1)/(2)⋅log⎛⎝(N + P₁)/(N)⎞⎠ + (1)/(2)⋅log⎛⎝(N + P₂)/(N)⎞⎠ = ??
I(X₁;Y) = (1)/(2)⋅log(E[X²₁] + E[X²₂] + N) − (1)/(2)log(E[X²₂] + N)
I(X₁;Y|X₂) = (1)/(2)⋅log(E[X²₁] + N) − (1)/(2)log(N) = (1)/(2)log⎛⎝(P₁ + N)/(N)⎞⎠
не може вака туку треба:
I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(X₁ + X₂ + Z) − H(X₁ + X₂ + Z|X₁X₂) ≤ (1)/(2)log(2πe)(P₁ + P₂ + N) − (1)/(2)log(2πe)(N) = C⎛⎝(P₁ + P₂)/(N)⎞⎠
I(X₂;Y) = I(X₁X₂;Y) − I(X₁;Y|X₂) = (1)/(2)log⎛⎝(N + P₁ + P₂)/(N)⎞⎠ − (1)/(2)⋅log⎛⎝(N + P₁)/(N)⎞⎠ = (1)/(2)log⎛⎝⎛⎝(N + P₁ + P₂)/(\cancelN)⎞⎠⎛⎝(\cancelN)/(N + P₁)⎞⎠⎞⎠ = (1)/(2)log⎛⎝(N + P₁ + P₂)/(N + P₁)⎞⎠ = (1)/(2)log⎛⎝1 + (P₂)/(N + P₁)⎞⎠ = C⎛⎝(P₂)/(N + P₁)⎞⎠

These upper bounds are achieved when X₁ ~ N(0, P₁) and X₂ = N(0, P₂) and define the capacity region. The surprising fact about these inequalities is that the sum of the rates can be as large as C⎛⎝(P₁ + P₂)/(N)⎞⎠ which is the rate achieved by a single transmitter sending with a power equal to the sum of the powers.

The interpretation of the corner points is very similar to the interpretation of the achievable rate pairs for a discrete multiple-access channel for a fixed input distribution. In the case of the Gaussian channel, we can consider decoding as a two-stage process: In the fist stage, the receiver decodes the second sender, considering the first sender as part of the noise. This decoding will have low probability of error if R₂ < C⎛⎝(P₂)/(P₁ + N)⎞⎠. After the second sender has been decoded successfully, it can be subtracted out and the first sender can be decoded correctly if R₁ ≤ C⎛⎝(P₁)/(N)⎞⎠. Hence, this argument shows that we can achieve the rate pairs at the corner points of the capacity region by means of single-user operations. This process called onion-peeling, can be extended to any number of users. Види го завршетокот на оваа глава.

Суштината на onion-peeling е постапката каде еден сигнал се декодира, а сите други (недекодирани до тогаш) се сметаат за шум. Декодираниот сигнал се одзема од резултантниот и се продолжува со декодирање на преостанатите сигнали.

figure onion peeling nested convex hulls.png

Figure 22 Onion Peeling-Nested Convex Hulls

If we generalize this to m senders with equal power, the total rate is C⎛⎝(mP)/(N)⎞⎠ which goes to ∞ as m → ∞ . The average rate per sender (1)/(m)⋅C⎛⎝(mP)/(N)⎞⎠ goes to 0. Thus, when the total number of senders is very large, so that there is a lot of interference, we can still send a total amount of information that is arbitrarily large even though the rate per individual sender goes to 0.

The capacity region described above corresponds to code-division multiple access CDMA, where separate codes are used for the different senders and the receiver decodes them one by one. In many practical situations, though, simpler schemes, such as frequency-division multiplexing or time-division multiplexing, are used. With frequency-division multiplexing, the rates depend on the bandwidth allotted to each sender. Consider the case of two senders with powers P₁ and P₂ using non-intersecting frequency bands with bandwidths W₁ and W₂, where W₁ + W₂ = W (Total bandwidth). Using the formula for the capacity of a single-user band-limited channel, the following rate pair is achievable:

(89) R₁ = W₁log⎛⎝1 + (P₁)/(NW₁)⎞⎠

(90) R₂ = W₂log⎛⎝1 + (P₂)/(NW₂)⎞⎠

As we vary W₁ and W₂, we trace out the curve as shown in 23↓. This curve touches the boundary of the capacity region at one point, which corresponds to allotting bandwidth to each channel proportional to the power in that channel. We conclude that no allocation of frequency bands to radio stations can be optimal unless the allocated powers are proportional to the bandwidths.

figure Figure15.18 Gaussian multiple access channle capacity with FDMA and TDMA.png

Figure 23 Gaussian multiple-access channel capacity with FDMA and TDMA

In time-division multiple access (TDMA), time is divided into slots, and each user is allotted a slot during which only that user will transmit and every other user remains quiet. If there are two users, each of power P, rate that each sends when the other is silent is C(P ⁄ N) . Now if time is divided into equal-length slots, and every odd slot is allocated to user 1 and every even slot to user 2, the average rate that each user achieves is (1)/(2)C(P ⁄ N) . This system is called naive time-division multiple access (TDMA). However it is possible to do better if we notice that since user 1 is sending only half the time, it is possible for him to use twice the power during his transmissions and still maintain the same average power constraint. With this modification, it is possible for each user to send information at a rate (1)/(2)C(2P ⁄ N). By varying the lengths of the slots allotted to each sender (and the instantaneous power used during the slot), we can achieve the same capacity region as FDMA with different bandwidth allocations.

As 23↑ illustrates, in general the capacity region is larger than that achieved by time- or frequency-division multiplexing. But note that the multiple-access capacity region derived above is achieved by use of a common decoder for all the senders. However, it is also possible to achieve the capacity region by onion-peeling, which removes the need for a common decoder and instead, uses a sequence of single-user codes. CDMA achieves the entire capacity region, and in addition, allows new users to be added easily without changing the codes of the current users.

On the other hand, TDMA and FDMA systems are usually designed for a fixed number of users and it is possible that either some slots are empty (if actual users is less than the number of slots) or some users are left out (if the number of users is greater than the number of slots). However, in many practical systems, simplicity of design is an important consideration, and the improvement in capacity due to the multiple-access ideas presented earlier may not be sufficient to warrant the increased complexity.

For a Gaussian multiple-access system with m sources with powers P₁, P₂, ..., P_m and ambient noise of power N, we can state the equivalent of Gauss’s law for аny set S in the form

(91) ⎲⎳_i ∈ SR_i = total rate of information flow from S ≤ C⎛⎝(⎲⎳_i ∈ SP_i)/(N)⎞⎠.

1.4 Encoding of correlated sources

We now turn to distributed data compression. This problem is in many ways the data compression dual to the multiple-access channel problem. We know now to encode a source X. A rate R ≥ H(X) is sufficient. Now suppose that there are two source (X, Y) ~ p(x, y). A rate H(X, Y) is sufficient if we are encoding them together. Откако го поминав проблемот 15.1 т.е. 15.2 ми стана јасна една фундаментална работа. H(X, Y)е дефиниран на множетво со следниве елементи XxY кое има |XxY| = 2^nR12^nR₂ = 2^{n(R₁ + R₂)} елементи. Затоа овде вели дека R₁ + R₂ ≥ H(X, Y). Си замислуваш трета променлива која ги содржи елементите од XxY кои елементи се дистрибуирани по p(X, Y). But what if the X and Y sources must be described separately for some user who wishes to reconstruct both X and Y. It is seen that rate R = R_x + R_y > H(X) + H(Y) is sufficient. However, in a surprising an fundamental paper by Slepian and Wolf [7], it is shown that a total rate R = H(X, Y) is sufficient even for separate encoding of correlated sources.

Let (X₁, Y₁), (X₂Y₂), ... be a sequence of jointly distributed random variables i.i.d~p(x, y). Assume that the X sequence is available at location A and the Y sequence is available at a location B. The situation is illustrated in 24↓.

figure Figure 15.15 Slepian-Wolf Coding.png

Figure 24 Slepian-Wulf coding

Before we proceed to the proof of this result, we will give a few definitions.

Definition

A ((2^nR₁, 2^nR₂), n) distributed source code for the joint source (X, Y) consists of two encoder maps,

(92) f₁:Xⁿ → {1, 2, ..., 2^nR₁}

(93) f₂:Yⁿ → {1, 2, ..., 2^nR₂}

and decoder map,

g:{1, 2, ...2^nR₁} x{1, 2, ..., 2^nR₂} → XⁿxYⁿ

Here f₁(Xⁿ) is the index corresponding to Xⁿ, f₂(Yⁿ) is index corresponding to Yⁿ, and (R₁R₂) is the rate pair of the code.

Definition

The probability of error for a distributed source code is defined as

(94) P⁽ⁿ⁾_e = P(g(f₁(Xⁿ), f₂(Yⁿ)) ≠ (XⁿYⁿ))

Definition (Achievable)

A rate pair (R₁R₂) is said to be achievable for a distributed source if there exists a sequence of ((2^nR₁2^nR₂), n) distributed source codes with probability of error P⁽ⁿ⁾_e → 0. The achievable rate region is the closure of the set of achievable rates.

Theorem 15.4.1 (Slepian-Wolf)

For the distributed source coding problem for the source (X, Y) drawn i.i.d ~p(x, y), the achievable rate region is given by

(95) R₁ ≥ H(X|Y)

(96) R₂ ≥ H(Y|X)

(97) R₁ + R₂ ≥ H(X, Y)

Значи со помалку бити можеш да го опишеш X и Y поединечно но и заедно. На пример во случај на X наместо да користиш H(X) може да користиш H(X|Y) бити (Види Source Coding with side information).

Let as illustrate the result with some examples.

Example 15.4.1 (MMV)

Consider the weather in Gotham and Metropolis. For the purposes of our example, we assume that Gotham is sunny with probability 0.5 and that the weather in Metropolis is the same as in Gotham with probability 0.89. The joint distribution of weather is given as

X − Gotham

Y − Metropolis

X ∈ {sunny, cloudy}p(X) = ⎧⎩(1)/(2), (1)/(2)⎫⎭; Y ∈ {sunny, cloudy};

p(Y|X) sunny cloudy sunny 0.89 0.11 cloudy 0.11 0.89 p(X, Y) = p(X)⋅p(Y|X); p(X, Y) sunny cloudy sunny 0.445 0.055 cloudy 0.055 0.445

Assume that we wish to transmit 100 days of weather information to the National Weather Service headquarters in Washington. We could send all the 100 bits of the weather in both places, making 200 bits in all. If we decided to compress the information independently, we would still need 100⋅H(0.5) = 100 bits of information from each place, for a total of 200 bits. If, instead, we use Slepian-Wolf encoding, we need only H(X, Y)⋅100 = 150 bits total. Сепак овде се поставува прашање со каков код ќе можеш да ја постигнеш оваа брзина. Тука само се кажува дека тоа е минималната брзина со кој можеш да ги пренесеш информациите за времето за овие два извори.

H(Y|X) = ∑_(x, y)p(x, y)log(p(y|x)) = 0.89⋅log⎛⎝(1)/(0.89)⎞⎠ + 0.11⋅log⎛⎝(1)/(0.11)⎞⎠
0.89⋅log₂⎛⎝(1)/(0.89)⎞⎠ + 0.11⋅log₂⎛⎝(1)/(0.11)⎞⎠ = 0.5
H(X, Y) = H(X) + H(Y|X) = 1 + 0.5 = 1.5
- Ако X и Y се независни тогаш нема спас мора да пратиш 200 бити
p(Y|X) sunny cloudy sunny 0.5 0.5 cloudy 0.5 0.5
H(X, Y) = H(X) + H(Y) = 1 + 1 = 2
или
H(X, Y) = 4⋅(1)/(2)⋅log2 = 2
Мене ми е апсолутно логично корелирани извори да може да се пренесат со помала брзини од некорелирани!!!

Example 15.4.2

Consider the following joint distribution

p(u, v) 0 1 0 (1)/(3) (1)/(3) 1 0 (1)/(3)

H(U, V) = 3⋅(1)/(3)⋅log3 = 1.58

In this case, the total rate required for the transmission of this source is H(U) + H(U|V) = log3 = 1.58 bits rather than the 2 bits that would be needed if the sources were transmitted independently without Slepian-Wolf encoding.

1.4.1 Achievability of the Slepian-Wolf Theorem (random bins)

We now prove the achievability of the rates in the Slepian-Wolf theorem. Before we proceed to the proof, we introduce a new coding procedure using random bins. The essential idea of random bins is very similar to hash functions: We choose a large random index for each source sequence. If the set of typical source sequences is small enough (or equivalently, the range of the hash function is large enough), then with high probability, different source sequences have different indices, and we can recover the source sequence from the index.

From Wikipedia
A hash function is any function that maps data of arbitrary length to data of a fixed length. The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.

Let us consider the application of this idea to the encoding of a single source. In Chapter 3 the method that we consider was to index all elements of the typical set and not bother about elements outside the typical set. We will now describe the random binning procedure, which indexes all sequences but rejects untypical sequences at a later stage.

Consider the following procedure: For each sequence Xⁿ, draw an index at random from {1, 2, ...2^nR}. The set of sequences Xⁿ which have the same index are said to form a bin, since this can be viewed as first laying down a row of bins and then throwing the Xⁿ’s at random into the bins. For decoding the source frоm the bin index, we look for a typical Xⁿ sequence in the bin. If there is one an only one typical Xⁿ sequence in the bin, we declare it to be the estimate X̂ⁿ of the source; otherwise, an error is declared.

The above procedure defines a source code. To analyze the probability of error for this code, we will now divide the Xⁿ sequences into two types, typical sequences and non-typical sequences. If the source sequence is typical, the bin corresponding to this source sequence will contain at least one typical sequence (the source sequence itself). Hence there will be an error only if there is more than one typical sequence in the bin. If the source sequence is non-typical, there will aways be an error. But if the number of bins is much larger than the number of typical sequences, the probability that there is more than one typical sequence in a bin is very small, and hence the probability that a typical sequence will result in an error is very small.

Formally, let f(Xⁿ) be the bin index corresponding to Xⁿ. Call the decoding function g. The probability of error (averaged over the random choice of code f) is:

P(g(f(X)) ≠ X) ≤ P(X ≠ A⁽ⁿ⁾_ϵ) + ⎲⎳_xP(∃x’ ≠ x:x’ ∈ A⁽ⁿ⁾_ϵ, f(x’) = f(x))p(x)

≤ ϵ + ⎲⎳_x⎲⎳_{\oversetx’x’ ≠ x}P(f(x’) = f(x))p(x) ≤ ϵ + ⎲⎳_x⎲⎳_{x’ ∈ A⁽ⁿ⁾_ϵ}2^− nRp(x) = ϵ + ⎲⎳_{x’ ∈ A⁽ⁿ⁾_ϵ}2^− nR⎲⎳_xp(x) ≤ ϵ + ⎲⎳_{x’ ∈ A⁽ⁿ⁾_ϵ}2^− nR ≤ ϵ + 2^{n(H(X) + ϵ)}2^− nR ≤ 2ϵ

if \mathchoiceR > H(X) + ϵR > H(X) + ϵR > H(X) + ϵR > H(X) + ϵ and n is sufficiently large. Hence, if the rate of the code is greater that the entropy, the probability of error is arbitrarily small and the code achieves the same results as the code described in Chapter 3.

The above example illustrates the fact that there are many ways to construct codes with low probabilities of error at rates above the entropy of the source; The universal source code is another example of such a code.

Note that the binning scheme does not require an explicit characterization of the typical set at the encoder; it is needed only at the decoder. It is this property that enable this code to continue to work in the case of a distributed source, as illustrated in the proof of the theorem.

We now return to the consideration of the distributed source coding and prove the achievability of the rate region in the Slepian-Wolf theorem.

Proof: (Achievability in Theorem 15.4.1)

The basic idea of the proof is to partition the space of Xⁿ into 2^nR₁ bins and the space of Yⁿ into 2^nR₂ bins.

Random code generation:

Assign every x ∈ Xⁿ to one of 2^nR₁ bins independently according uniform distribution on {1, 2, ...2^nR₁}. Similarly, randomly assign every y ∈ Yⁿ to one of 2^nR₂ bins. Reveal the assignments f₁ and f₂ to both the encoder and the decoder.

Encoding:

Sender 1 sends the index of the bin to which X belongs. Sender 2 sends the index of the bin to which Y belongs.

Decoding:

Given the received index pair (i₀, j₀), declare (x̂, ŷ) = (x, y) if there is one and only one pair of sequences (x, y) such that f₁( x) = i₀, f₂( y) = j₀ and (x, y) ∈ A⁽ⁿ⁾_ϵ. Otherwise declare an error. The scheme is illustrated in 25↓. The set of X sequences and the set of Y sequences are divided into bins in such a way that the pair of indices specifies a product bin.

figure Slepian-Wolf encoding: the jointly typical pairs are isolated by the product bins.png

Figure 25 Slepian-Wolf encoding: the jointly typical pairs are isolated by the product bins.

Figure 26 Random Binnning (NIT)

Probability of error:

Let (X_iY_i) ~ p(x, y). Define the events

E₀ = {( X, Y)∉A⁽ⁿ⁾_ϵ},

E₁ = {∃ x’ ≠ X:f₁( x’) = f₁( X) and (x’, Y) ∈ A⁽ⁿ⁾_ϵ}

E₂ = {∃ y’ ≠ Y:f₂( y’) = f₂( Y) and (X, y’) ∈ A⁽ⁿ⁾_ϵ}

and

E₁₂ = {∃( x’, y’):x’ ≠ X, y’ ≠ Y, f₁( x’) = f₁( X), f₂( y’) = f₂( Y) and (x’, y’) ∈ A⁽ⁿ⁾_ϵ}

Here X, Y, f₁ and f₂ are random. We have an error if (X, Y) is not in A⁽ⁿ⁾_ϵ or if there is another typical pair in the same bin. Hence by the union of events bound,

P⁽ⁿ⁾_e = P(E₀∪E₁∪E₂∪E₁₂) ≤ P(E₀) + P(E₁) + P(E₂) + P(E₁₂)

First consider E₀. By the AEP, P(E₀) → 0 and hence for n sufficiently large, P(E₀) ≤ ϵ. To bound P(E₁), we have

P(E₁) = P{∃x’ ≠ X: f₁( x’) = f₁( X), and (x’, Y) ∈ A⁽ⁿ⁾_ϵ} = ⎲⎳_(x, y)p(x, y)P{∃x’ ≠ x: f₁( x’) = f₁( x), and (x’, y) ∈ A⁽ⁿ⁾_ϵ}

≤ ⎲⎳_(x, y)p(x, y)⎲⎳_{x’ ≠ x

(x’, y) ∈ A⁽ⁿ⁾_ϵ}P(f₁( x’) = f₁( x)) = ⎲⎳_(x, y)p(x, y)⋅2^− nR₁⋅|A_ϵ(X|y)|\overset(a) ≤ 2^− nR₁2^{n(H(X|Y) + ϵ)}

(a) - by the theorem 15.2.2.

which goes to 0 if R₁ > H(X|Y). Hence for sufficiently large n, P(E₁) ≤ ϵ. Similarly, for sufficiently large n, P(E₂) ≤ ϵ if R₂ > H(Y|X) and P(E₁₂) ≤ ϵ if R₁ + R₂ > H(X, Y). Since the average probability of error is < 4ϵ, there exists at least one code (f^*₁, f^*₂, g^*) with probability of error < 4ϵ. Thus, we can construct a sequence of codes with P⁽ⁿ⁾_e → 0, and the proof of achievability is complete.

1.4.2 Converse for the Slepian-Wolf Theorem

The converse for the Slepian-Wolf theorem follows obviously from the results for a single source, but we will provide it for completeness.

Proof: (Converse to Theorem 15.4.1)

As usual, we begin with Fano’s inequality. Let f₁, f₂, g be fixed. Let \mathchoiceI₀ = f₁(Xⁿ)I₀ = f₁(Xⁿ)I₀ = f₁(Xⁿ)I₀ = f₁(Xⁿ) and \mathchoiceJ₀ = f₂(Yⁿ)J₀ = f₂(Yⁿ)J₀ = f₂(Yⁿ)J₀ = f₂(Yⁿ). Then

(98) H(Xⁿ, Yⁿ|I₀J₀) ≤ P⁽ⁿ⁾_en(log|X| + log|Y|) + 1 = nϵ

where ϵ_n → ∞ and n → ∞. Now adding conditioning, we also have

(99) H(Xⁿ|Yⁿ, I₀, J₀) ≤ nϵ_n

and

(100) H(Yⁿ|Xⁿ, I₀, J₀) ≤ nϵ

We can write a chain of inequalities

n(R₁ + R₂)\overset(a) ≥ H(I₀, J₀) = I(Xⁿ, Yⁿ;I₀, J₀) + H(I₀J₀|XⁿYⁿ)\overset(b) = I(Xⁿ, Yⁿ;I₀J₀) =

= H(XⁿYⁿ) − H(XⁿYⁿ|I₀J₀)\overset(c) ≥ H(XⁿYⁿ) − nϵ_n\overset(d) = nH(X, Y) − nϵ_n

where

(a) follows form the fact that I₀ ∈ {1, 2, ...2^nR₁} and J₀ ∈ {1, 2, ...2^nR₂}

(b) follows form the fact that I₀ is a function of Xⁿ and J₀ is a function of Yⁿ

(d) follows from the chain rule and the fact that (X_i, Y_i) are i.i.d

Similarly, using 99↑

J₀ = f(Yⁿ)

where the reasons are the same as for the equations above. Similarly we can show that

nR₂ ≥ nH(Y|X) − nϵ_n

Dividing these inequalities by n and taking the limit as n → ∞ we have the desired converse.

The region described in the Slepian-Wolf theorem is illustrated in 27↓.

figure Rate region for Slepian - Wolf encoding.png

Figure 27 Rate region for Slepian-Wolf encoding

1.4.3 Slepian-Wolf Theorem for Many Sources

The results of 15.4.2 can easily be generalized to many sources. The proof follows exactly the same lines.

Theorem 15.4.2

Let (X_1i, X_2i, ..., X_mi) be i.i.d ~ p(x₁x₂, ..., x_m). Then the set of rate vectors achievable for distributed source coding with separate encoders and a common decoder is defined by

(101) R(S) > H(X(S)|X(S^c))

for all S ⊂ {1, 2, ..., m} where

(102) R(S) = ⎲⎳_i ∈ SR_i

and X(S) = {X_j:j ∈ S}.

Proof:

The proof is identical to the case of two variables and is omitted.

The achievability of Slepian-Wolf encoding has been proved for an i.i.d correlated source, but the proof can easily be extended to the case of an arbitrary joint source that satisfies the AEP; in particular, it can be extended to the case of any joint ergodic source [8]. In these cases the entropies in the definition of the rate region are replaced by the corresponding entropy rates.

Дали постои разлика меѓу корелиран и условно зависни случајни променливи. Мислам дека нема разлика. Тогаш i.i.d correlated source е оксиморон.

1.4.4 Interpretation of Slepian-Wolf Coding

We consider an interpretation of the corner points of the rate region in Slepian-Wolf encoding in terms of graph coloring. Consider the point with rate R₁ = H(X), R₂ = H(Y|X). Using nH(X) bits, we can encode Xⁿ efficiently so that the decoder can reconstruct Xⁿ with arbitrary low probability of error. But how do we code Yⁿ with nH(Y|X) bits? H(Y|X) ≤ H(Y) due to conditioning reduces entropy. Значи со Слепиан-Вулф кодриањето се кодира Y со помалку бити. Looking at the picture in terms of typical sets, we see that associated with every Xⁿ is typical „fan” of Yⁿ sequences that are jointly typical with the given Xⁿ as shown in 28↓.

figure Figure 15.22. Jointly Typical Fans.png

Figure 28 Jointly Typical Fans

If the Y encoder knows Xⁿ, the encoder can send the index of the Yⁿ within this typical fan. The decoder, also knowing Xⁿ, can then construct this typical fan and hence reconstruct Yⁿ. But the Y encoder does not know Xⁿ. So instead of trying to determine the typical fan, he randomly colors all Yⁿ sequences with 2^nR₂ colors. If the number of colors is high enough, then with high probability all the colors in a particular fan will be different and the color of the Yⁿ sequence will uniquely define the Yⁿ sequence within the Xⁿ fan. If the rate R₂ > H(Y|X), the number of colors is exponentially larger than the number of elements in the fan and we can show that the scheme will have an exponentially small probability of error.

1.5 Duality between slepian-wolf encoding and multiple-access channels

With multiple-access channels, we considered the problem of sending independent messages over a channel with two inputs and only one output. With Slepian-Wolf encoding, we considered the problem of sending correlated source over a noiseless channel, with a common decoder for recovery of both sources. Тука ми текна на примерот со Gotham и Metropolis. Како ќе ги кодираш временските-услови ако користиш слепиан-Фулф кодирање и имаш само 1.5 бити на располагање. In this section we explore the duality between the two systems.

In 29↓ two independent messages are to be sent over the channel as Xⁿ₁ and Xⁿ₂ sequences. The receiver estimates the messages from the received sequence.

figure Figure 15.23 Multiple-access channels.png

Figure 29 Multiple-access channels

figure Figure 15.24 Correlated surce encoding.png

Figure 30 Correlated source encoding.

In 30↑ the correlated sources are encoded as „independent” messages i and j. The receiver tries to estimate the source sequences from knowledge of i and j.

In the proof of the achievability of the capacity region for the multiple access channel, we used a random map from the set of messages to the sequences Xⁿ₁ and Xⁿ₂. In the proof for Slepian-Wolf coding, we used random map from the set of sequences Xⁿ and Yⁿ to set of messages. In the proof of the coding theorem for the multiple-access channel, the probability of error was bounded by

P⁽ⁿ⁾_e ≤ ϵ + ⎲⎳_codewordsPr(codeword jointly typical with sequence received) = ϵ + ⎲⎳_{2^nR₁ terms}2^− nI₁ + ⎲⎳_{2^nR₂ terms}2^− nI₂ + ⎲⎳_{2^{n(R₁ + R₂)} terms}2^− nI₃

where ϵ is the probability the sequences are not typical, R_i are the rates corresponding to the number of codewords that can contribute to the probability of error, and I_i corresponding mutual information that corresponds to the probability that the codeword is jointly typical with the received sequence.

In the Slepian-Wolf encoding the corresponding expression for the probability of error is

P⁽ⁿ⁾_e ≤ ϵ + ⎲⎳_{jointly typical sequences}Pr(have the same codeword) = ϵ + ⎲⎳_{2^nH₁ terms}2^− nR₁ + ⎲⎳_{2^nH₂ terms}2^− nR₂ + ⎲⎳_{2^nH₃ terms}2^− nR₃

where again the probability that the constraints of the AEP are note satisfied is bounded by ϵ, and the other terms refer to the various ways in which another pair sequences could be jointly typical and in the same bin as the given source pair.

The duality of the multiple-access channel and correlated source encoding is now obvious. It is rather surprising that these two systems are duals of each other; one would have expected a duality between the broadcast channel and the multiple-access channel.

1.6 Broadcast channel

The broadcast channel is communication channel in which there is one sender and two or more receivers. It is illustrated in 31↓. The basic problem is to find the set of simultaneously achievable rates for communication in a broadcast channel. Before we begin the analysis, let us consider some examples.

figure Figure 15.25 Broadcast channel.png

Figure 31 Broadcast channel

Example 15.6.1 (TV station)

The simplest example of the broadcast channel is a radio or TV station. But this example is slightly degenerate in the sense that normally the station wants to send the same information to everybody who is tuned in; the capacity is essentially

max_p(x)min_iI(X;Y_i)

which may be less than the capacity of the worst receiver.

But we may wish to arrange the information in such a way that the better receivers receive extra information, which produces a better picture or sound, while worst receivers continue to receive more basic information. As TV stations introduce high-definition TV (HDTV), it may be necessary to encode the information so that bad receivers will receive the regular TV signal, while good receivers will receive the extra information for the high-definition signal. The methods to accomplish this will be explained in the discussion of the broadcast channel.

Example 15.6.2 (Lecturer in classroom)

A lecturer in classroom is communicating information to the students in the class. Due to differences among the students, they receive various amounts of information. Some students receive most of the information; others receive only a little. In the ideal situation, the lecturer would be able to tailor his or her lecture in such a way that the good students receive more information and the poor students receive at least the minimum amount of information. However, a poorly prepared lecture proceeds at the pace of the weakest student. This situation is another example of a broadcast channel.

Example 15.6.3 (Orthogonal broadcast channels)

The simplest broadcast channel consist of two independent channels to the two receivers. Here we can send independent information over both channels, and we can achieve rate R₁ to receiver 1 and R₂ to the receiver 2 if R₁ < C₁ and R₂ < C₂ . The capacity region is the rectangle shown in the 32↓. Ова е како independent binary symmetric channel Example 15.3.1

figure Figure 15.26 Capacity region for two orthogonal broadcast channels.png

Figure 32 Capacity region for two orthogonal broadcast channels

Example 15.6.4 (Spanish and Dutch speaker)

To illustrate the idea of superposition, we will consider a simplified example of a speaker who can speak both Spanish and Dutch. There are two listeners: One understands only Spanish and the other understands only Dutch. Assume for simplicity that the vocabulary of each language is 2²⁰ words and that the speaker speaks at rate of 1 word per second in either language. Then he can transmit 20 bits log|X| = log(2²⁰) = 20 of information per second to receiver 1 by speaking to her all the time; in this case, he sends no information to receiver 2. Similarly, he can send 20 bits per second to receiver 2 without sending any information to receiver 1. Thus, he can achieve any rate pair with R₁ + R₂ = 20 by simple time-sharing. But can he do better?

Recall that the Dutch listener, even though he does not understand Spanish, can recognize when the word is Spanish. Similarly, the Spanish listener can recognize when Dutch occurs. The speaker can use this to convey information; for example, if the proportion of time he uses each language is 50%, then of a sequence of 100 words, about 50 will be Dutch and about 50 will be Spanish. But there are many ways to order the Spanish and Dutch words; in fact, there are about (10050) ≈ 2^{100H⎛⎝(1)/(2)⎞⎠} ways to order the words. Choosing one of these ordering conveys information to both listeners. The method enables the speaker to send information at rate of 10 bits per second to the Dutch receiver, 10 bits per second to Spanish receiver, and 1 bit per second of common information to both receivers Од каде па сега излезе ова? Веројатно дополнителниот бит е затоа што приемникот има имплицитна информација дали е пренесен шпански или холандски збор. , for a total rate of 21 bit per second, which is more than that achievable by simple time-sharing. This is an example of superposition of information.

The results of the broadcast channel can also be applied to the case of a single-user channel with an unknown distribution. In this case, the objective is to get at least the minimum information through when the channel is bad and to get some extra information through when the channel is good. We can use the same superposition arguments as in the case of the broadcast channel to find the rates at which we can send information.

1.6.1 Definitions for a Broadcast Channel

Definition

A broadcast channel consist of an input alphabet X and two output alphabets, Y₁ and Y₂, and a probability transition function p(y₁, y₂|x). The broadcast channel will be said to be memory-less if p(yⁿ₁, yⁿ₂|xⁿ) = ∏ⁿ_i = 1p(y_1i, y_2i|x_i).

We define codes, probability of error, achievability, and capacity regions for the broadcast channel as we did for the multiple-access channel. A ((2^nR₁2^nR₂), n) code for a broadcast channel with independent information consists of an encoder,

(103) X:({1, 2, ..., 2^nR₁}x{1, 2, ..., 2^nR₂}) → Xⁿ

Ова можеш да го замислиш како една кодна листа со вкупно 2^nR₁ x2^nR₂ = 2^{n(R₁ + R₂)} кодни зборови

and two decoders,

(104) g₁: Yⁿ₁ → {1, 2, ..., 2^nR₁}

and

(105) g₂: Yⁿ₂ → {1, 2, ...2^nR₂}

We define the average probability of error as the probability that the decoded message is not equal to the transmitted message; that is,

(106) P⁽ⁿ⁾_e = P(g₁(Yⁿ₁) ≠ W₁ or g₂(Yⁿ₂) ≠ W₂),

where (W₁, W₂) are assumed to be uniformly distributed over 2^nR₁ x2^nR₂.

Definition

A rate pair (R₁R₂) is said to be achievable for the broadcast channel if there exists a sequence of ((2^nR₁, 2^nR₂), n) codes with P⁽ⁿ⁾_e → 0.

We will now define the rates for the case where we have common information to be sent to both receivers. A ((2^nR₀, 2^nR₁, 2^nR₂), n) code for broadcast channel with common information consists of an encoder

X:({1, 2, ..., 2^nR₀} x{1, 2, ..., 2^nR₁} x{1, 2, ...2^nR₂}) → Xⁿ,

and two decoders,

g₁: Y₁ → {1, 2, ..., 2^nR₀} x{1, 2, ..., 2^nR₁}

and

g₂:Y₂ → {1, 2, ..., 2^nR₀} x{1, 2, ..., 2^nR₂}

We will now define the rates for the case where we have common information to be sent to both receivers Изгледа W₀ е заедничката информација што треба да се испрати. .

Assuming that the distribution on (W₀, W₁, W₂) is uniform, we can define the probability of error as the probability that the decoded message is not equal to the transmitted message:

(107) P⁽ⁿ⁾_e = P(g₁(Yⁿ₁) ≠ (W₀W₁) or g₂(Zⁿ) ≠ (W₀W₂))

λ_i = Pr(g(Yⁿ) ≠ i|Xⁿ = xⁿ(i)) = ∑_yⁿp(yⁿ|xⁿ(i))I(g(yⁿ) ≠ i)
P⁽ⁿ⁾_e = (1)/(M)∑^M_iλ_i

Definition:

A rate triple (R₀, R₁, R₂) is said to be achievable for the broadcast channel with common information if there exists a sequence of ((2^nR₀, 2^nR₁, 2^nR₂), n) with P⁽ⁿ⁾_e → 0.

Definition:

The capacity region of the broadcast channel is the closure of the set of achievable rates.

We observe that an error for receiver Yⁿ₁ depends only the distribution p(xⁿ, yⁿ₁) and not on the joint distribution p(xⁿ, yⁿ₁, yⁿ₂). Thus we have the following theorem:

Theorem 15.6.1 (Capacity region depends on conditional marginals)

The capacity region of a broadcast channel depends only on the conditional marginal distributions p(y₁|x) and p(y₂|x).

1.6.2 Degraded Broadcast Channels

Definition (physically degraded):

A broadcast channel is said to be physically degraded if \mathchoicep(y₁, y₂|x) = p(y₁|x)⋅p(y₂|y₁)p(y₁, y₂|x) = p(y₁|x)⋅p(y₂|y₁)p(y₁, y₂|x) = p(y₁|x)⋅p(y₂|y₁)p(y₁, y₂|x) = p(y₁|x)⋅p(y₂|y₁).

Definition (stochastically degraded)

A broadcast channel is said to be stochastically degraded if its conditional marginal distributions are the same as that of a physically degraded broadcast channel; that is, if there exists a distribution p’(y₂|y₁) such that

p(y₂|x) = ⎲⎳_y₁p(y₁|x)p’(y₂|y₁)

p(y₁|x)⋅p(y₂|y₁) = p(y₁|x)⋅p(y₂|y₁x) = p(y₁y₂|x) ∑_y₁p(y₁y₂|x) = p(y₂|x)

Note that since the capacity of a broadcast channel depends only on the conditional marginals, the capacity region of the stochastically degraded broadcast channel is the same as that of the corresponding physically degraded channel. In much of the following, we therefore assume that the channel is physically degraded.

1.6.3 Capacity Region for the Degraded Broadcast Channel

We now consider sending independent information over a degraded broadcast channel at rate R₁ to Y₁ and rate R₂ to Y₂.

Theorem 15.6.2

The capacity region for sending independent information over the degraded broadcast channel \mathchoiceX → Y₁ → Y₂X → Y₁ → Y₂X → Y₁ → Y₂X → Y₁ → Y₂ is the convex hull of the closure of all (R₁, R₂) satisfying

(108) \mathchoiceR₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂)

(109) \mathchoiceR₁ ≤ I(X;Y₁|U)R₁ ≤ I(X;Y₁|U)R₁ ≤ I(X;Y₁|U)R₁ ≤ I(X;Y₁|U)

for some joint distribution p(u)p(x|u)p(y₁, y₂|x) , where the auxiliary random variable U has cardinality bounded by \mathchoice|U| ≤ min{|X|, |Y₁|, |Y₂|}|U| ≤ min{|X|, |Y₁|, |Y₂|}|U| ≤ min{|X|, |Y₁|, |Y₂|}|U| ≤ min{|X|, |Y₁|, |Y₂|}.

Proof:

(The cardinality bounds for the auxiliary random variable U are derived using standard methods from convex set theory and are note dealt with here) We first give and outline of the basic idea of superposition coding for the broadcast channel. The auxiliary random variable U will serve as a cloud center that can be distinguished by both receivers Y₁ and Y₂. Each cloud consists of 2^nR₁ codewords Xⁿ distinguishable by the receiver Y₁. The worst receiver can only see the clouds, while the better receiver can see the individual codewords within the clouds. The formal proof of the achievabilty of this region uses a random coding argument:

Fix p(u) and p(x|u).

Random codebook generation:

Generate 2^nR₂ independent codewords of length n, U(w₂), w₂ ∈ {1, 2, ..., 2^nR₂}, Во Problem 15.11 случајната променлива U_i ја дефинираат така што зависи од втората порака и од претходните вредности на Y₁ according to ∏ⁿ_i = 1p(u_i). For each codeword U(w₂) generate 2^nR₁ independent codewords X(w₁w₂) according to ∏ⁿ_i = 1p(x_i|u_i(w₂)) . Here u(i) plays the role of the cloud center understandable to both Y₁ and Y₂ , while x(i, j) is the j-th satellite codeword in the i-th cloud. Ова личи на суперпозиција на два вектори. Веројатно затоа постапкава ја вика суперпозиција.

Encoding:

To send the pair (W₁, W₂) send the corresponding codeword X(W₁W₂).

Decoding:

Receiver 2 determines the unique Ŵ̂₂ such that (U(Ŵ̂₂), Y₂) ∈ A⁽ⁿ⁾_ϵ. If there are none such or more that one such, an error is declared.

Receiver 1 looks for the unique (Ŵ₁Ŵ₂) such that (U(Ŵ₂), X(Ŵ₁Ŵ₂), Y₁) ∈ A⁽ⁿ⁾_ϵ. If there are none such or more than one such, an error is declared.

Analysis of the probability of error:

By the symmetry of the code generation, the probability of error does not depend on which codeword was sent. Hence without loss of generality we can assume that the message pair (W₁W₂) = (1, 1) was sent. Let P(⋅) denote conditional probability of an event given that (1, 1) was sent.

Since we have essentially a single user channel from U to Y₂, we will be able to decode the U codewords with a low probability of error if \mathchoiceR₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂)R₂ ≤ I(U;Y₂). To prove this we define the events

(110) E_{Y_i} = {(U(i), Y₂) ∈ A⁽ⁿ⁾_ϵ}

Then the probability of error at receiver 2 is:

P⁽ⁿ⁾_e(2) = P(E^c_Y₁∪∪_i ≠ 1E_{Y_i}) ≤ P(E^c_Y₁) + ⎲⎳_i ≠ 1P(E_{Y_i})\overset(a) ≤ ϵ + 2^nR₂⋅2^{− n(I(U;Y₂) − 2ϵ)} ≤ 2ϵ

if n is large enough and \mathchoiceR₂ < I(U;Y₂)R₂ < I(U;Y₂)R₂ < I(U;Y₂)R₂ < I(U;Y₂) where (a) follows form AEP. Similarly, for decoding for receiver 1, we define the events

(111) Ẽ_{Y_i} = {(U(i), Y₁) ∈ A⁽ⁿ⁾_ϵ}

(112) Ẽ_{Y_ij} = {(U(i), X(i, j), Y₁) ∈ A⁽ⁿ⁾_ϵ}

where the tilde refers to events defined at receiver 1. Then we can bound the probability of error as:

P⁽ⁿ⁾_e = P(Ẽ^c_Y₁∪Ẽ^c_Y₁₁∪∪_i ≠ 1Ẽ_{Y_i}∪∪_j ≠ 1Ẽ_{Y_1j}) ≤ P(Ẽ^c_Y₁) + P(Ẽ^c_Y₁₁) + ⎲⎳_i ≠ 1P(Ẽ_{Y_i}) + ⎲⎳_j ≠ 1P(Ẽ_{Y_1j})

Се прашувам зошто не ги зема во предвид:

⎲⎳_i ≠ 1P(Ẽ_{Y_i1})

⎲⎳_{i ≠ 1;j ≠ 1}P(Ẽ_{Y_ij})

како на пример што прави во Multiple-access channel.

By the same arguments as for receiver 2, we can bound P(Ẽ_{Y_i}) ≤ 2^{− n(I(U;Y₁) − 3ϵ)} . Hence, the third term goes to 0 if Овде e R₂ затоа што i оди до 2^nR₂, а j оди до 2^nR₁ R₂ < I(U;Y₁). But by the data-processing inequality and the degraded nature of the channel, I(U;Y₁) ≥ I(U;Y₂), and hence the conditions of the theorem imply that the third term goes to 0.

We can also bound the fourth term in the probability of error as

P(E_{Y_1j}) = P((U(1), X(1, j), Y₁) ∈ A⁽ⁿ⁾_ϵ) = ⎲⎳_{(U, X, Y₁) ∈ A⁽ⁿ⁾_ϵ}P(U(1))P(X(1, j)|U(1))P(Y₁| U(1))

≤ ⎲⎳_{(U, X, Y₁) ∈ A⁽ⁿ⁾_ϵ}2^{− n(H(U) − ϵ)}2^{− n(H(X|U) − ϵ)}2^{− n(H(Y₁|U) − ϵ)}

≤ 2^{n(H(U, X, Y₁) + ϵ)}2^{− n(H(U) − ϵ)}2^{− n(H(X|U) − ϵ)}2^{− n(H(Y₁|U) − ϵ)} = 2^{− n(I(X;Y₁|U) − 4ϵ)}

p(U, X, Y₁) = p(U)p(X|U)p(Y₁|U, X₁) = p(U)p(X|U)p(Y₁|U)
зато што U, X, Y₁ формираат марков ланец.
U → X → Y₁
2^{n(H(U, X, Y₁) + ϵ)}2^{− n(H(U) − ϵ)}2^{− n(H(X|U) − ϵ)}2^{− n(H(Y|U) − ϵ)}
nH(U, X, Y₁) + nϵ − nH(U) + nϵ − nH(X|U) + nϵ − nH(Y₁|U) + nϵ = n(H(U, X, Y₁) − H(U) − H(X|U) − H(Y|U) + 4ϵ)
I(X;Y₁|U) = H(X|U) − H(X|U, Y₁) = H(Y₁|U) − H(Y₁|UX)
H(U, X, Y₁) − H(U) − H(X|U) − H(Y₁|U) + 4ϵ =
\cancelH(U) + \cancelH(X|U) + H(Y₁|X, U) − \cancelH(U) − \cancelH(X|U) − H(Y₁|U) + 4ϵ = − (H(Y₁|U) − H(Y₁|X, U)) + 4ϵ = − I(X;Y₁|U) + 4ϵ

Hence, if R₁ < I(X;Y₁|U), ∑_j ≠ 1P(Ẽ_{Y_1j}) ≤ 2^nR₁2^{− n(I(X;Y₁|U) − 4ϵ)} the fourth term in the probability of error goes to 0. Thus we can bound the probabilities of error

P⁽ⁿ⁾_e(1) ≤ ϵ + ϵ + 2^nR₂2^{− n(I(U;Y₁) − 3ϵ)} + 2^nR₁2^{− n(I(X;Y₁|U) − 4ϵ)} ≤ 4ϵ

2^nR₁2^{− n(I(X;Y₁|U) − 4ϵ)}
R₁ < I(X;Y₁|U) − 4ϵ < I(X;Y₁|U) → R₁ < I(X;Y₁|U)
тоа не значи ако R₁ < I(X;Y₁|U) дека R₁ < I(X;Y₁|U) − 4ϵ, а обратното не важи.

if n is large enough and \mathchoiceR₂ < I(U;Y₂)R₂ < I(U;Y₂)R₂ < I(U;Y₂)R₂ < I(U;Y₂) and \mathchoiceR₁ < I(X;Y₁|U)R₁ < I(X;Y₁|U)R₁ < I(X;Y₁|U)R₁ < I(X;Y₁|U). The above bounds show that we can decode the messages with total probability of error that goes to 0. Hence there exist a sequence of good ((2^nR₁2^nR₂), n) codes C^*_n with probability of error going to 0. With this, we complete the proof of the achievability of capacity region for the degraded broadcast channel. Gallager’s [9] proof of the converse is outlined in Problem 15.11.

So far we have considered sending independent information to each receiver. But in certain situations, we wish to send common information to both receivers. Let the rate at which we send common information be R₀. Then we have the following obvious theorem:

Theorem 15.6.3

If the rate pair (R₁R₂) is achievable for a broadcast channel with independent information, the rate triple (R₀, R₁ − R₀, R₂ − R₀) with a common rate R₀ is achievable, provided that R₀ ≤ min(R₁R₂).

In this case of a degraded broadcast channel, we can do even better. Since by our coding scheme the better receiver aways decodes all the information that is sent to the worst receiver, one need not reduce the amount of information sent to the better receiver when we have common information.

Theorem 15.6.4

If the rate pair (R₁R₂) is achievable for a degraded broadcast channel, the rate triple (R₀, R₁, R₂ − R₀) is achievable for the channel with common information, provided that R₀ < R₂.

We end this section by considering the example of the binary symmetric broadcast channel.

Example 15.6.5

Consider a pair of binary symmetric channels with parameters p₁ and p₂ that form a broadcast channel as shown in 34↓. Without loss of generality in the capacity calculation, we can recast this channel as a physically degraded channel. We assume that p₁ < p₂ < (1)/(2) . Then we can express a binary symmetric channel with parameter p₂ as a cascade of a binary symmetric channel with parameter p₁ with another binary symmetric channel. Let the crossover probability of the new channel be α. Then we must have

p₁(1 − α) + (1 − p₁)α = p₂

Figure 33 Cascade of BSC

figure Figure 15.27 Binary symmettirc broadcast channel.png

Figure 34 Binary symmetric broadcast channel.

(113) p₁ − αp₁ + α − p₁α = p₂ ⇒ α − 2αp₁ = p₂ − p₁ → α = (p₂ − p₁)/(1 − 2p₁)

We now consider the auxiliary random variable in the definition of the capacity region. In this case cardinality of U |U| ≤ min{|X|, |Y₁|, |Y₂|} is binary from the bound of the theorem. By symmetry, we connect U to X by another binary symmetric channel with parameter β, as illustrated in

figure Figure 15.28 Physically degraded binary symemetric broadcast channel.png

Figure 35 Physically degraded binary symmetric broadcast channel

We can now calculate the rates in the capacity region. It is clear by symmetry that the distribution on U that maximizes the rates is the uniform distribution on {0, 1}, so that

R₂ ≤ I(U;Y₂) = H(Y₂) − H(Y₂|U) = 1 − H(βp₂)

После проблем 15.13 ми текна дека може вакво неравенство да се напише:

R₂ ≤ I(U;Y₂) ≤ I(X;Y₂) = H(Y₂) − H(Y₂|X) = 1 − H(p₂)

where

(114) βp₂ = β(1 − p₂) + (1 − β)p₂

Similarly,

R₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X) = H(βp₁) − H(p₁)

where

(115) βp₁ = β(1 − p₁) + (1 − β)p₁

figure Figure 15.29 Capacity region of binary symmetric broadcast channel.png

Figure 36 Capacity region of binary symmetric broadcast channel

Plotting these points as function of β, we obtain the capacity region in 1↑. When β = 0, we have maximum information transfer to Y₂ (i.e. R₂ = 1 − H(p₂) and R₁ = 0). When β = (1)/(2), we have maximum information transfer to Y₁ (i.e. R₁ = 1 − H(p₁) and no information transfer to Y₂. This values of β give us the corner points of the rate region.

β = (1)/(2)
R₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X) = H(βp₁) − H(p₁)
βp₁ = β(1 − p₁) + (1 − β)p₁
βp₁ = (1)/(2) − (p₁)/(2) + (1)/(2)p₁ = (1)/(2)
R₁ = H⎛⎝(1)/(2)⎞⎠ − H(p₁) = 1 − H(p₁)

figure Figure 15.30 Gaussian broadcast channel.png

Figure 37 Gaussian broadcast channel

Example 15.6.6 (Gaussian broadcast channel)

The Gaussian broadcast channel is illustrated in 37↑. We have shown it in the case where one output is a degraded version of the other output. Based on the results of Problem 15.10, it follows that all scalar Gaussian broadcast channels are equivalent to this type of degraded channel.

(116) Y₁ = X + Z₁

(117) Y₂ = X + Z₂ = Y₁ + Z₂’

where Z₁ ~ N(0, N₁) and Z₂’ ~ N(0, N₂ − N₁)

Extending the results of this section to the Gaussian case, we can show that the capacity region of this channel is given by:

(118) R₁ < C⎛⎝(αP)/(N)⎞⎠

(119) R₂ < C⎛⎝((1 − α)P)/(αP + N₂)⎞⎠

where α may be arbitrary chosen (0 ≤ α ≤ 1). The coding scheme that achieves this capacity region is outlined in Section 15.1.3.

1.7 Relay Channel

The relay channel is a channel in which there is one sender and one receiver with a number of intermediate nodes that act as relays to help the communication form the sender to the receiver. The simplest relay channle ha only one intermediate or relay node. In this calse the channel consists of foru finite sets X, X₁, Y, and Y₁ and collection of probability mass functions p(y, y₁|x, x₁) on YxY₁, one for each (x, x₁) ∈ XxX₁. The interpretation is that x is the input to the channel and y is the output of the channel, y₁ is the relay’s observation, and x₁ is the input symbol chosen by the relay, as shown in 38↓. The problem is to find the capacity of the channel between the sender X and the receiver Y.

The relay channel combines a broadcast channel (X to Y and Y₁) and a multiple-access channel (X and X₁ to Y). The capacity is known for the special case of the physically degraded relay channel. We fist prove and outer bound on the capacity of a general relay channel and later establish an achievable region for the degraded relay channel.

Figure 38 Relay Channel

Definition

A (2^nR, n) code for a relay channel consists of a set of integers W = {1, 2, ...2^nR} , an encoding function

(120) X:{1, 2, ..., 2^nR} → Xⁿ

a set of relay functions {f_i}ⁿ_i = 1 such that

x_1i = f_i(Y₁₁, Y₁₂, ..., Y_1i − 1) 1 ≤ i ≤ n

and decoding function

(121) g:Yⁿ → {1, 2, ..., 2^nR}.

Note that the definition of the encoding functions includes the non-anticipatory condition on the relay. The relay channel input is allowed to depend only on the past observations y₁₁, y₁₂, ..., y_1i − 1. The channel is memoryless in the sense that (Y_i, Y_1i) depends on the past only through the current transmitted symbols (X_i, X_1i). Thus for any choice p(w), w ∈ W, and code choice X:{1, 2, ..., 2^nR} → Xⁿ_i and relay functions {f_i}ⁿ_i = 1, the joint probability mass function on WxXⁿxXⁿ₁ xYⁿxYⁿ₁ is given by:

(122) p(w, x, x₁, y, y₁) = p(w)ⁿ∏_i = 1p(x_i|w)p(x_1i|y₁₁, y₁₂, ..., y_i − 1)p(y_iy_1i|x_ix_1i)

If the message w ∈ [1, 2^nR] is sent, let

λ(w) = Pr{g(Y) ≠ w|w sent }

denote the conditionla probability of error. We define the average probability of error of the code as:

(123) P⁽ⁿ⁾_e = (1)/(2^nR)⎲⎳_wλ(w)

The probability of error is calculated under the uniform distribution over the codewords w ∈ {1, ...2^nR}. The rate R is said to be achievable by the relay channel if there exists a sequence of (2^nR, n) codes with P⁽ⁿ⁾_e → 0. The capacity C of a relay channel is the supremum of the set of achievable rates.

We first give and upper bound on the capacity of the relay channel.

Theorem 15.7.1

For any relay channel (XxX₁, p(y, y₁|x, x₁, YxY₁)), the capacity C is bounded above by

(124) \mathchoiceC ≤ sup_p(x, x₁)min{I(X, X₁;Y), I(X;Y, Y₁|X₁)}C ≤ sup_p(x, x₁)min{I(X, X₁;Y), I(X;Y, Y₁|X₁)}C ≤ sup_p(x, x₁)min{I(X, X₁;Y), I(X;Y, Y₁|X₁)}C ≤ sup_p(x, x₁)min{I(X, X₁;Y), I(X;Y, Y₁|X₁)}

Proof:

The proof is direct consequence of a more general max-flow min-cut theorem given in Section 15.10.

This upper bound has a nice max-flow min-cut interpretation. Дури сега го сконтава откако претходно го поминав max-flow min-cut алогритамот во книгата на Ford!!! The first term in 124↑ upper bounds the maximum rate of information transfer from senders X and X₁ to receiver Y. The second terms bound the rate form X to Yand Y₁.

We now consider a family of relay channels in which the relay receiver is better then the ultimate receiver Y in sence defined below. Here the max-flow minpcut upper bound in 124↑ is achieved.

Definition (degraded relay channel)

The relay channel (XxX₁, p(y, y₁|x, x₁, YxY₁)) is said to be physically degraded if p(y, y₁|x, x₁) can be written in the form

(125) p(y, y₁|x, x₁) = p(y₁|x, x₁)p(y|y₁x₁)

Thus, Y is random degradation of the relay signal Y₁.

Broadcast channel:
p(y₁, y₂|x) = p(y₁|x)p(y₂|y₁)

For the physically degraded relay channel, the capacity is given by the following theorem.

Theorem 15.7.2

The capacity C of a physically degraded relay channel is given by:

\mathchoiceC = sup_p(x, x₁)min{I(X, X₁;Y)I(X;Y₁|X₁)}C = sup_p(x, x₁)min{I(X, X₁;Y)I(X;Y₁|X₁)}C = sup_p(x, x₁)min{I(X, X₁;Y)I(X;Y₁|X₁)}C = sup_p(x, x₁)min{I(X, X₁;Y)I(X;Y₁|X₁)}

where the supremum is over all joint distributions on XxX₁.

Proof:

Converse:

The proof follows form theorem 15.7.1 and by degradedness since for degraded relay channel I(X;Y, Y₁|X₁) = I(X;Y₁|X₁).

Achievability:

The proof of achievable involves a combination of the following basic techniques: (1) random coding, (2) list codes, (3) Slepian-Wolf partitioning, (4) coding for te cooperative multiple access channel, (5) superposition coding, and (6) block Markov encoding at the relay and transmitter. We provide only an outline of the proof.

Outline of Achievability:

We consider B blocks of transmission, each of n symbols. A sequence of B − 1 indices, w_i ∈ {1, ...2^nR}, i = 1, 2, ..., B − 1, will be sent over the channel in nB transmissions. (Note that as B → ∞ for a fixed n, the rate R(B − 1) ⁄ B is arbitrarily close to R.)

We define a doubly indexed set of codewords:

(126) C = {x(w|s), x₁(s)}:w ∈ {1, 2^nR}, s ∈ {1, 2^nR₀}, x ∈ Xⁿ, x₁ ∈ Xⁿ₁.

We will also need a partition

S = {S₁, S₂, ..., S_2^nR₀} of W = {1, 2, ..., 2^nR}

into 2^nR₀ cells, with S_i∩S_j = ∅, i ≠ j, and ∪S_i = W. The partition will enable us to send side information to the receiver in the manner of Slepian and Wolf [7].

Generation of random code:

Fix p(x₁)p(x|x₁).

First generate at random 2^nR₀ i.i.d n- sequences in Xⁿ₁, each draw according to p(x₁) = ∏ⁿ_i = 1p(x_1i) . Index them as x₁(s), s ∈ {1, 2, ..2^nR₀}. For each x₁(s) generate 2^nR conditionally independent n-sequences x(w|s) , w ∈ {1, 2, ...2^nR}, drawn independently according to p(x|x₁(s)) = ∏ⁿ_i = 1p(x_i|x_1i(s)). This defines the random codebook C = {x(w|s), x₁(s)}. Ако x₁ е вектор x е дводимензионална матрица The random partitions S = {S₁, S₂, ..., S_2^nR₀} of {1, 2, ...2^nR} is defined as follows. Let each integer w ∈ {1, 2, ...2^nR} be assigned independently according to a uniformly distribution over the indices s = 1, 2, ...2^nR₀, to cells S_s.

Encoding:

Let w_i ∈ {1, 2, ..., 2^nR} be the new index to be sent in block i, and let s_i be as the partition corresponding to w_i − 1 (i.e., w_i − 1 ∈ S_{s_i}). The encoder sends x(w_i|s_i). The relay has an estimate ŵ̂_i − 1 of the previous index w_i − 1. (This will be made precise in the decoding section.) Assume that ŵ̂_i − 1 ∈ S_{s_î̂}. The relay encoder sends x₁(s_î̂) in block i.

Decoding:

We assume that at the end of block i − 1, the receiver knows (w₁, w₂, ...w_i − 2) and (s₁, s₂, ..., s_i − 1) and the relay knows (w₁, w₂, ..., w_i − 1) and consequently (s₁, s₂, ..., s_i). The decoding procedures at the end of block i are as follows:

1. Knowing s_i and upon receiving y₁(i), the relay receiver estimates the message of the transmitter ŵ̂_i = w if and only if there exists a unique w such that (x(w|s_i), x₁(s_i), y₁(i)) are jointly ϵ-typical. Using Theorem 15.2.3 it can be shown that w_î̂ = w_i with an arbitrarily small probability of error if

R < I(X;Y₁|X₁)

and n is sufficiently large.

2. The receiver declares that s_î = s was sent iff there exists one and only one s such that (x₁(s), y(i)) are jointly ϵ-typical. From Theorem 15.2.1 we know that s_i can be decoded with arbitrarily small probability of error if

(127) R₀ < I(X₁;Y)

and n is sufficiently large.

3. Assuming that s_i is decoded correctly at the receiver, the receiver constructs a list ℒ(y(i − 1)) of indices that the receiver considers to be jointly typical with y(i − 1) in the (i − 1)-th block. The receiver then declares ŵ_i − 1 = w as the index sent in block i − 1 if there is a unique w in S_{S_i}∩ℒ(y(i − 1)). If n is sufficiently large and if

(128) R < I(X;Y|X₁) + R₀

then ŵ_i − 1 = w_i − 1 with arbitrarily small probability of error. Combining two constraints 127↑ and 128↑, R₀ drops out leaving

R < I(X;Y|X₁) + I(X₁;Y) = I(X, X₁;Y)

For a detailed analysis of the probability of error the reader is refereed to [10].

Theorem 15.7.2 C = sup_p(x, x₁)min{I(X, X₁;Y)I(X;Y₁|X₁)} can also shown to be the capacity for the following classes of relay channels:

1. Reversely degraded relay channel, that is, обичен деградиран канал беше p(y, y₁|x, x₁) = p(y₁|x, x₁)p(y|y₁x₁)

p(y, y₁|x, x₁) = p(y|x, x₁)p(y₁|y, x₁)

2. Relay channel with feedback

3. Deterministic relay channel,

(129) y₁ = f(x, x₁), y = g(x, x₁)

1.8 Source coding with side information

We now consider the distributed source coding problem where two random variables X and Y are encoded separately but only X is to be recovered. We now ask how many bits R₁ are required to describe X if we are allowed R₂ bits to describe Y. If R₂ > H(Y) , then Y can be described perfectly, and by the results of Slepian-Wolf coding, R₁ = H(X|Y) bits suffice to describe X. At the other extreme, if R₂ = 0, we must describe X without any help, and R₁ = H(X) bits are then necessary to describe X. In general, we use R₂ = I(Y;Ŷ) bits to describe an approximate version of Y. I(Y;Ŷ) = H(Y) − H(Y|Ŷ) This will allow us to describe X using H(X|Ŷ) in presence of side information Ŷ. The following theorem is consistent with this intuition.

Theorem 15.8.1

Let (X, Y) ~ p(x, y). If Y is encoded at rate R₂ and X is encoded at rate R₁, we can recover X with an arbitrarily small probability of error if and only if

R₁ ≥ H(X|U),

R₂ ≥ I(Y;U)

for some joint probability mass function \mathchoicep(x, y)p(u|y)p(x, y)p(u|y)p(x, y)p(u|y)p(x, y)p(u|y), where |U| ≤ |Y| + 2.

p(x, y, u) = p(xy)p(u|xy) = p(xy)p(u|y)
Од самата дефиниција здружената веројатност следи дека X, Y, U се во марков ланец.
X → Y → U

We prove this theorem in two parts. We began with the converse, in which we show that for any encoding scheme that has a small probability of error, we can find random variable U with a joint probability mass function as in the theorem.

Proof: (Converse)

Consider any source code for 39↓.

figure Figure 15.32 Encoding with side information.png

Figure 39 Encoding with side information

The source code consists of mappings f_n(Xⁿ) and g_n(Yⁿ) such that the rates of f_n and g_n are less than R₁ and R₂, respectively, and a decoding mapping h_n such that:

\mathchoiceP⁽ⁿ⁾_e = Pr{h_n(f_n(Xⁿ), g_n(Yⁿ)) ≠ Xⁿ} < ϵ.P⁽ⁿ⁾_e = Pr{h_n(f_n(Xⁿ), g_n(Yⁿ)) ≠ Xⁿ} < ϵ.P⁽ⁿ⁾_e = Pr{h_n(f_n(Xⁿ), g_n(Yⁿ)) ≠ Xⁿ} < ϵ.P⁽ⁿ⁾_e = Pr{h_n(f_n(Xⁿ), g_n(Yⁿ)) ≠ Xⁿ} < ϵ.

Define new random variables \mathchoiceS = f_n(Xⁿ)S = f_n(Xⁿ)S = f_n(Xⁿ)S = f_n(Xⁿ) and \mathchoiceT = g_n(Yⁿ)T = g_n(Yⁿ)T = g_n(Yⁿ)T = g_n(Yⁿ). Then since we can recover Xⁿ from S and T with low probability of error, we have by Fano’s inequality,

(130) H(Xⁿ|S, T) ≤ nϵ

Then

(131) \mathchoicenR₂nR₂nR₂nR₂\overset(a) ≥ H(T)\overset(b) ≥ I(Yⁿ;T) = ⁿ⎲⎳_i = 1I(Y_i;T|Y₁, ...Y_i − 1)\overset(c) = ⁿ⎲⎳_i = 1I(Y_i;T, Y₁, ...Y_i − 1)\overset(d) = \mathchoiceⁿ⎲⎳_i = 1I(Y_i;U_i)ⁿ⎲⎳_i = 1I(Y_i;U_i)ⁿ⎲⎳_i = 1I(Y_i;U_i)ⁿ⎲⎳_i = 1I(Y_i;U_i)

Се запрашав зошто nR₂ ≥ H(T)
nR₂ = H(Yⁿ) ≥ H(g(Yⁿ)) = H(T)

(a) follows from the fact that the range of g_n is {1, 2..., 2^nR₂}

(b) follows from the properties of the mutual information I(Yⁿ;T) = H(T) − H(T|Yⁿ) ≤ H(T)

(c) follows from the chain rule and the fact that Y_i is independent of Y₁, ...Y_i − 1 and hence I(Y_i;Y₁, ..., Y_i − 1) = 0

I(Y_i;T, Y₁, ...Y_i − 1) = \overset0I(Y_i;Y^i − 1₁) + I(Y_i;T|Y₁, ..Y_i − 1) = I(Y_i;T|Y₁, ..Y_i − 1)

(d) follows if we define \mathchoiceU_i = (T, Y₁, ..., Y_i − 1)U_i = (T, Y₁, ..., Y_i − 1)U_i = (T, Y₁, ..., Y_i − 1)U_i = (T, Y₁, ..., Y_i − 1)

We also have another chain for R₁

(a) follows from the fact that the range of S is {1, 2, ...2^nR₁}

Ако се прашуваш зошто nR₁ ≥ H(S):
nR₂ = H(Xⁿ) ≥ H(f(Xⁿ)) = H(S)

(b) follows form the fact that conditioning reduces entropy

(d) follows form the chain rule and the fact that S is a function of Xⁿ

(e) follows form the chain rule for entropy

(f) follows from the fact that conditioning reduces entropy

(g) follows from the (subtle) fact that X_i → (T, Y^i − 1) → X^i − 1 forms a Markov chain since X_i does not contain any information about X^i − 1 that is not there in Y^i − 1 and T = g_n(Yⁿ). Вау! Каква фундаментална дефиниција за марков чеин. Гениално!!!

На времето имам покажано дека својството на марковиот ланец што оди нанапред важи и наназад т.е.:
H(X_i|(T, Y^i − 1)X^i − 1) = H(X_i|(T, Y^i − 1))

(h) follows form the definition of U

Also, since X_i contains no more information about U_i than is present in Y_i, it follows that \mathchoiceX_i → Y_i → U_iX_i → Y_i → U_iX_i → Y_i → U_iX_i → Y_i → U_i forms Markov chain. Thus we have following inequalities:

(132) R₁ ≥ (1)/(n)ⁿ⎲⎳_i = 1H(X_i|U_i)

(133) R₂ ≥ (1)/(n)ⁿ⎲⎳_i = 1I(Y_i;U_i)

We now introduce a time-sharing random variable Q so that we can rewrite these equations as:

(134) R₁ ≥ (1)/(n)ⁿ⎲⎳_i = 1H(X_i|U_i, Q = i) = H(X_Q|U_Q, Q)

(135) R₁ ≥ (1)/(n)ⁿ⎲⎳_i = 1I(Y_i|U_i, Q = i) = I(Y_Q;U_Q, Q)

Now since Q is independent of Y_Q (The distribution of Y_i does not depend on i), we have

(136) I(Y_Q;U_Q|Q) = I(Y_Q;U_Q, Q) − \cancelto0I(Y_Q;Q) = I(Y_Q;U_Q, Q).

Now X_Q and Y_Q have the joint distribution p(x, y) in the theorem. X_Q≜X, Y_Q≜Y Defining U = (U_Q, Q), X = X_Q and Y = Y_Q we have shown the existence of the random variable U such that

(137) R₁ ≥ H(X|U)

(138) R₂ ≥ I(Y;U)

for any encoding scheme that has a low probability of error. Thus, the converse is proved.

Strong Typicality

Before we proceed to the proof of the achievability of this pair of rates, we will need a new lemma about strong typicality and Markov chains. Recall the definition of strong typicality for a triple of random variables X, Y and Z. A triplet of sequences xⁿ, yⁿ, zⁿ is said to be ϵ-strongly typical if (Chapter 10)

(139) \mathchoice||(1)/(n)N(a, b, c|xⁿ, yⁿ, zⁿ) − p(a, b, c)|| < (ϵ)/(|X||Y||Z|)||(1)/(n)N(a, b, c|xⁿ, yⁿ, zⁿ) − p(a, b, c)|| < (ϵ)/(|X||Y||Z|)||(1)/(n)N(a, b, c|xⁿ, yⁿ, zⁿ) − p(a, b, c)|| < (ϵ)/(|X||Y||Z|)||(1)/(n)N(a, b, c|xⁿ, yⁿ, zⁿ) − p(a, b, c)|| < (ϵ)/(|X||Y||Z|)

In particular, this implies that (xⁿ, yⁿ) are jointly strongly typical and that (yⁿ, zⁿ) are also jointly strongly typical. But the converse is not true:

The fact that (xⁿ, yⁿ) ∈ A^*(n)_ϵ(X, Y) and (yⁿ, zⁿ) ∈ A^*(n)_ϵ(Y, Z) doesn’t in general imply that (xⁿ, yⁿ, zⁿ) ∈ A^*(n)_ϵ(X, Y, Z). But if X → Y → Z forms Markov chain this implication is true. We state this as lemma without proof [1] [11].

Lemma 15.8.1 (Markov Lemma)

Let (X, Y, Z) form a Markov-chain X → Y → Z [i.e., p(x, y, z) = p(x, y)p(z|y)] . If for a given (yⁿ, zⁿ) ∈ A^*(n)_ϵ(Y, Z), Xⁿ is drawn ~∏ⁿ_i = 1p(x_i|y_i) then Pr{(xⁿ, yⁿ, zⁿ) ∈ A^*(n)_ϵ(X, Y, Z)} > 1 − ϵ for n sufficiently large.

Remark

The theorem is true from the strong law of large numbers if Xⁿ ~ ∏ⁿ_i = 1p(x_i|y_i, z_i). The Markovity of X → Y → Z is used to show that Xⁿ ~ p(x_i|y_i) is sufficient for the same conclusion.

We now outline the proof of achievability in Theorem 15.8.1

Proof: (Achievability in Theorem 15.8.1)

Fix p(u|y). Calculate p(u) = ∑_yp(y)p(u|y)

Generation of code-books:

Generate 2^nR₂ independent codewords of length n, U(w₂), w₂ ∈ {1, 2, ...2^nR₂} according to ∏ⁿ_i = 1p(u_i). Randomly bin all the Xⁿ sequences into 2^nR₁ bins by independently generating an index b distributed uniformly on {1, 2, ..., 2^nR₁} for Xⁿ. Let B(i) denote the set of Xⁿ sequences allotted to bin i.

Encoding:

The X sender sends the index i of the bin in which Xⁿ falls. The Y sender looks for an index s such that (Yⁿ, Uⁿ(s)) ∈ A⁽ⁿ⁾_ϵ(Y, U). Имај во предвид дека U_i = (T, Y₁, ..., Y_i − 1) If there is more than one such s, it sends the least. If there is no such Uⁿ(s) in the code-book, it sends s = 1. Постапкава многу ми личи на методата на енкодирање и декодирање во relay channel. Можеби може да се каже дека во realy channel се користи метода на енкодирање со странична информација.

Decoding:

The receiver looks for an unique Xⁿ ∈ B(i) such that (Xⁿ, Uⁿ(s)) ∈ A^*(n)_ϵ(X, U). If there is none or more than one, it declares an error.

Analysis of the probability of error:

The various sources of error are as follows:

1. The pair (Xⁿ, Yⁿ) generated by the source is not typical. The probability of this is small if n is large. Hence, without loss of generality, we can condition on the event that the source produces a particular typical sequence (xⁿyⁿ) ∈ A^*(n)_ϵ.

2. The sequence Yⁿ is typical, but there does not exist a Uⁿ(s) in the codebook that is jointly typical with it. The probability of this is small from the arguments of section 10.6, where we showed that if there are enough codewords; that is, if

(140) R₂ ≥ I(Y;U)

it is very likely to find a codeword that is jointly strongly typical with given source sequence. Не можев ова да го најдам во Section 10.6. Треба да се навратам и потсетам на целата секција.

3. The codeword Uⁿ(s) is jointly typical with yⁿ but not with xⁿ. But by Lema 15.8.1, the probability of this is small since X → Y → U forms a Markov chain.

4. We also have an error if there exists another typical Xⁿ ∈ B(i) which is jointly typical with Uⁿ(s). The probability that any other Xⁿ is jointly typical with Uⁿ(s) is less than 2^{− n(I(U;X) − 3ϵ)} (Theorem 7.6.1), and therefore the probability of this kind of error is bounded above by

(141) |B(i)∩A^*(n)_ϵ(X)|2^{− n(I(X;U) − 3ϵ)} ≤ 2^{n(H(x) + ϵ)}2^− nR₁2^{− n(I(X;U) − 3ϵ)}

which goes to 0 if R₁ ≥ H(X|U).

Hence, it is likely that the actual source sequence Xⁿ is jointly typical with Uⁿ(s) and that no other typical source sequence in the same bin is also jointly typical with Uⁿ(s). We can achieve an arbitrarily low probability of error with an appropriate choice of n and ϵ, and this completes the proof of achievability.

1.9 Rate Distortion with side information

We know that R(D) bits are sufficient to describe X with distortion D. Тука ми текна на изразот за rate distortion функцијата за бинарен канал R(D) = H(p) − H(D). Сака да каже дека ако нема дисторзија тогаш изворот можеш да го опишеш со H(p) бити, нормално ако прифатиш да има одредена дисторзија изворот можеш да го опишеш со помал број на бити. We now ask how many bits are required given side information Y. We begin with a few definitions. Let (X_i, Y_i) be i.i.d ~ p(x, y) and encoded as shown in 40↓.

figure Figure 15.33 Rate distortion with sde information.png

Figure 40 Rate distortion with side information

Definiton

The rate distortion function with side information R_Y(D) is defined as the minimum rate required to achieve distortion D if the side information Y is available to the decoder. Precisely, R_Y(D) is the infimum of rates R such that there exist maps i_n:Xⁿ → {1, 2, ...2^nR}, g_n:Yⁿx{1, ...2^nR} → X̂ⁿ such that

(142) lim_{n → ∞}Ed(Xⁿ, g_n(Yⁿ, i_n(Xⁿ))) ≤ D

Clearly, since the side information can only help, we have R_Y(D) ≤ R(D). For the case of zero distortion, this is the Slepian-Wolf probelm and we will need H(X|Y) bits. Hence, R_Y(0) = H(X|Y). We wish to determine the entire curve R_Y(D). The result can be expressed in the following theorem.

Theorem 15.9.1 [Rate distortion with side information (Wyner and Ziv)]

Let (X, Y) be drawn i.i.d ~ p(x, y) and let \mathchoiced(xⁿ, x̂ⁿ) = (1)/(n)∑ⁿ_i = 1d(x_i, x̂_i)d(xⁿ, x̂ⁿ) = (1)/(n)∑ⁿ_i = 1d(x_i, x̂_i)d(xⁿ, x̂ⁿ) = (1)/(n)∑ⁿ_i = 1d(x_i, x̂_i)d(xⁿ, x̂ⁿ) = (1)/(n)∑ⁿ_i = 1d(x_i, x̂_i) be given. The rate distortion function with side information is:

(143) \mathchoiceR_Y(D) = min_p(w|x)min_f(I(X;W) − I(Y;W))R_Y(D) = min_p(w|x)min_f(I(X;W) − I(Y;W))R_Y(D) = min_p(w|x)min_f(I(X;W) − I(Y;W))R_Y(D) = min_p(w|x)min_f(I(X;W) − I(Y;W))

where the minimization is over all functions \mathchoicef:YxW → X̂f:YxW → X̂f:YxW → X̂f:YxW → X̂ and conditional probability mass functions p(w|x), |W| ≤ |X| + 1, such that

(144) ⎲⎳_x⎲⎳_w⎲⎳_yp(x, y)p(w|x)d(x, f(y, w)) ≤ D

The function f in the theorem corresponds to the decoding map that maps the encoder version of the X symbols and the side information Y to the output alphabet. We minimize over all conditional distributions on W and functions f such the expected distortion for the joint distribution is less than D.

We first prove the converse after considering some of the properties of the function R_Y(D) defined in 143↑.

Lemma 15.9.1

The rate distortion function with side information R_Y(D) defined in 143↑ is non-increasing convex function.

Proof:

The monotonicity of R_Y(D) follows immediately from the fact that the domain of minimization in the definition of R_Y(D) increases with D. As in the case of rate distortion without side information, we expect R_Y(D) to be convex. However, the proof of convexity is more involved because of the double rather than single minimization in the definition of R_Y(D) in 143↑. We outline the proof here.

Let D₁ and D₂ be two values of the distortion and let W₁, f₁ and W₂, f₂ be the corresponding random variables and functions that achieve the minima in the definitions of R_Y(D₁) and R_Y(D₂), respectively. Let Q be a random variable independent of X, Y, W₁ and W₂ which takes on the value 1 with probability λ and value 2 with probability 1 − λ.

Define W = (Q, W_Q) and let f(W, Y) = f_Q(W_Q, Y). Specifically, f(W, Y) = f₁(W₁, Y) with probability λ and f(W, Y) = f₂(W₂, Y) with probability 1 − λ. Then the distortion becomes

(145) D = Ed(X, X̂) = λEd(X, f₁(W₁, Y)) + (1 − λ)Ed(X, f₂(W₂, Y)) = λD₁ + (1 − λ)D₂

and 143↑ becomes

I(X;W) − I(Y;W) = H(X) − H(X|W) − H(Y) + H(Y|W) = H(X) − H(X|W_Q, Q) − H(Y) + H(Y|W_Q, Q) =

= H(X) − λH(X|W₁) − (1 − λ)H(X|W₂) − H(Y) + λH(Y|W₁) + (1 − λ)H(Y|W₂) =

= H(X) − λH(X|W₁) − H(Y) + λH(Y|W₁) − (1 − λ)H(X|W₂) + (1 − λ)H(Y|W₂) =

= λH(X) + (1 − λ)H(X) − λH(X|W₁) − λH(Y) − (1 − λ)H(Y) + λH(Y|W₁) − (1 − λ)H(X|W₂) + (1 − λ)H(Y|W₂) =

= λH(X) − λH(X|W₁) − λH(Y) + λH(Y|W₁) + (1 − λ)H(X) − (1 − λ)H(X|W₂) − (1 − λ)H(Y) + (1 − λ)H(Y|W₂) =

= λ(H(X) − H(X|W₁) − H(Y) + H(Y|W₁)) + (1 − λ)(H(X) − H(X|W₂) − H(Y) + H(Y|W₂)) =

(146) = λ(I(W₁;X) − I(W₁;Y)) + (1 − λ)(I(W₂;X) − I(W₂;Y))

and hence

R_Y(D) = min_{U:Ed ≤ D}(I(U;X) − I(U;Y)) ≤ I(W;X) − I(W;Y) = λ(I(W₁;X) − I(W₁;Y)) + (1 − λ)(I(W₂;X) − I(W₂;Y)) =

(147) = λR_Y(D₁) + (1 − λ)R_Y(D₂),

proving the convexity of R_Y(D).

We are now in position to prove the converse to the conditional rate distortion theorem.

Proof: (Converse to Theorem 15.9.1)

Consider any rate distortion code with side information. Let the encoding function be f_n:Xⁿ → {1, 2, ..., 2^nR}. Тука забележав дека за distortion rate проблеми обратна е дефиницијата на енкодирање од на пример канален проблем. Let the decoding functions be g_n:Yⁿx{1, 2, ...2^nR} → X̂ⁿ, and let g_ni:Yⁿx{1, 2, ...2^nR} → X̂ denote i-th symbol produced by the decoding function. Let \mathchoiceT = f_n(Xⁿ)T = f_n(Xⁿ)T = f_n(Xⁿ)T = f_n(Xⁿ) denote the encoded version of Xⁿ. We must show that if Ed(Xⁿ, g_n(Yⁿ, f_n(Xⁿ))) < D, then R ≥ R_Y(D). We have the following chain of inequalities:

= ⁿ⎲⎳_i = 1I(W_i;X_i) − I(W_i;Y_i)\overset(i) ≥ ⁿ⎲⎳_i = 1R_Y(Ed(X_i, g^’_ni(W_i, Y_i))) = n⋅(1)/(n)⋅ⁿ⎲⎳_i = 1R_Y(Ed(X_i, g^’_ni(W_i, Y_i)))\overset(j) ≥ nR_Y⎛⎝(1)/(n)ⁿ⎲⎳_i = 1Ed(X_i, g_ni’(W_i, Y_i))⎞⎠\overset(k) ≥ \mathchoicenR_Y(D)nR_Y(D)nR_Y(D)nR_Y(D)

(a) follows form the fact that the range of T is {1, 2, ...2^nR}

(b) follows form the fact that conditioning reduces entropy

(d) follows form the fact that X_i is independent of the past and future Y’s and X’s given Y_i.

(e) follows from the fact that conditioning reduces entropy

(f) follows by defining \mathchoiceW_i = (T, Y^i − 1, Yⁿ_i + 1)W_i = (T, Y^i − 1, Yⁿ_i + 1)W_i = (T, Y^i − 1, Yⁿ_i + 1)W_i = (T, Y^i − 1, Yⁿ_i + 1)

(g) follows from the definition of mutual information

(h) follows from the fact that since Y_i depends only on X_i and is conditionally independent of T and the past and future Y’s, \mathchoiceW_i → X_i → Y_iW_i → X_i → Y_iW_i → X_i → Y_iW_i → X_i → Y_i Ако се сеќаваш марковиот ланец важи и „наназад“ forms a Markov chain.

(i) follows form the information conditional rate distortion function since X̂_i = g_ni(T, Yⁿ)≜g_ni’(W_i, Y_i), and hence

I(W_i;X_i) − I(W_i;Y_i) ≥ min_{W:Ed(X, X̂) ≤ D_i}I(W;X) − I(W;Y) = R_Y(D_i)

(f) follows form the Jensen’s inequality and the convexity of the conditional rate distortion function (Lemma 15.9.1)

(k) follows from the definition of D = E[\undersetd(xⁿ, x̂ⁿ)(1)/(n)⋅∑ⁿ_i = 1d(X_i, X̂_i)]

It is easy to see the parallels between this converse and the converse for rate distortion without side information (Section 10.4). The proof of achievability is also parallel to the proof of the rate distortion theorem using strong typicality. However, instead of sending the index of the codeword that is jointly typical with the source, we divide these code words into bins and send the bin index instead. If the number of codewords in each bin is small enough, the side information can be used to isolate the particular codeword in the bin at the receiver. Hence again we are combining random binning with rate distortion encoding to find a jointly typical reproduction codeword. We outline the details of the proof below.

Proof: (Achievability of Theorem 15.9.1)

Fix p(w|x) and the function f(w, y). Calculate p(w) = ∑_xp(x)p(w|x).

Generation of codebook:

Let R₁ = I(X;W) + ϵ. Generate 2^nR i.i.d. codewords Wⁿ(s) ~ ∏ⁿ_i = 1p(w) , and index them by s ∈ {1, 2, ...2^nR₁}. Let R₂ = I(X;W) − I(Y;W) + 5ϵ. Randomly assign the indices s ∈ {1, 2, ...2^nR₁} to one of 2^nR₂ bins using a uniform distribution over the bins. Let B(i) denote the indices assigned to bin i. There are approximately 2^{n(R₁ − R₂)} indices in each bin.

Example
2^R₁ = 4; 2^R₂ = 2; R₁ = 2, R₂ = 1
row|bins 1 2 s₁ s₂ s₃ s₄
in each bin: 2^{n(R₁ − R₂)} = 2^2 − 1 = 2
2^R₁ = 8; 2^R₂ = 2, R₁ = 3, R₂ = 1
row|bins 1 2 1 s₁ s₂ 2 s₃ s₄ 3 s₅ s₆ 4 s₇ s₈
2^{n(R₁ − R₂)} = 2^3 − 1 = 4

Encoding:

Given a source sequence Xⁿ , the encoder looks for a codeword Wⁿ(s) such that (Xⁿ, Wⁿ(s)) ∈ A^*(n)_ϵ. If there is no such Wⁿ, the encoder sets s = 1. If there is more than one such s, the encoder uses the lowest s. The encoder sends the index of the bin in which s belongs.

Decoding:

The decoder looks for Wⁿ(s) such that s ∈ B(i) and (Wⁿ(s), Yⁿ) ∈ A^*(n)_ϵ. If he finds a unique s, he then calculates X̂ⁿ, where \mathchoiceX̂_i = f(W_i, Y_i)X̂_i = f(W_i, Y_i)X̂_i = f(W_i, Y_i)X̂_i = f(W_i, Y_i). If he does not find any such s or more than one such s, he sets X̂ⁿ = x̂ⁿ , where x̂ⁿ is an arbitrary sequence in X̂ⁿ. It does not matter which default sequence is used; we will show that the probability of this event is small.

Analysis of probability of error:

As usual we have various error events:

1. The pair (Xⁿ, Yⁿ)∉A^*(n)_ϵ. The probability of this event is small for large enough n by the weak law of large numbers.

2. The sequence Xⁿ is typical, but there does not exist and s such that (Xⁿ, Wⁿ(s)) ∈ A^*(n)_ϵ. As in the proof of the rate distortion theorem

R(D) = min_{p(x̂|x):⎲⎳_x, x̂p(x)p(x̂|x)d(x, x̂) ≤ D}I(X;X̂)

, the probability of this event is small if

(148) R₁ > I(X;W)

3. The pair of sequences (Xⁿ, Wⁿ(s)) ∈ A^*(n)_ϵ but (Wⁿ(s), Yⁿ)∉A^*(n)_ϵ Мене ми текна на слична анализа во Chapter 7 (i.e. the codeword is not jointly typical with the Yⁿ sequence). By the markov lemma (Lemma 15.8.1), the probability of this event is small if n is large enough.

4. There exist another s’ with the same bin index such that (Wⁿ(s’), Yⁿ) ∈ A^*(n)_ϵ. Since the probability that a randomly chosen Wⁿ is jointly typical with Yⁿ is ≈ 2^{− nI(Y;W)} the probability that there is another Wⁿ in the same bin that is typical with Yⁿ is bounded by the number of codewords in the bin times the probability of joint typicality,

(149) Pr(∃s’ ∈ B(i):(Wⁿ(s’), Yⁿ) ∈ A^*(n)_ϵ) ≤ 2^{n(R₁ − R₂)}2^{− n(I(Y, W) − 3ϵ)}

which goes to zero since R₁ − R₂ < I(Y;W) − 3ϵ.

5. If the index s is decoded correctly, (Xⁿ, Wⁿ(s)) ∈ A^*(n)_ϵ. By item 1 we can assume that (Yⁿ, Wⁿ) ∈ A^*(n)_ϵ and therefore the empirical joint distribution is close to the original distribution p(x, y)p(w|x) that we started with, and hence (Xⁿ, X̂ⁿ) will have a joint distribution that is close to the distribution that achieves distortion D.

Hence with high probability, the decoder will produce X̂ⁿ such that the distortion between Xⁿ and X̂ⁿ is close to nD. This completes the proof of the theorem.

The reader is referred to Wyner and Ziv [12] for details of the proof. After the discussion of the various situations of compressing distributed data, it might be expected that the problem is almost completely solved, but unfortunately, this is not true. An immediate generalization of all the above problems is the rate distortion problem for correlated sources, illustrated in 41↓. This is essentially the Slepian-Wolf problem with distortion in both X and Y. It is easy to see that the three distributed source coding problems considered above are all special cases of this setup. Unlike the earlier problems, though, this problem has not yet been solved and the general rate distortion region remains unknown.

figure Figure 15.4 Rate Distortion for two correlated sources.png

Figure 41 Rate distortion for two correlated sources.

1.10 General Multiterminal Networks

We conclude this chapter by considering a general mutiterminal network of senders and receivers and deriving some bounds on the rates achievable for communication in such a network. A general multiterminal network is illustrated in Figure 15.35. In this section, superscripts denote node indices and subscripts denote time indices.

There are m nodes, and node i has and associated transmitted variable X⁽ⁱ⁾ and received variable Y⁽ⁱ⁾.

figure Figure 15.35. General mutiterminal network.png

Figure 42 General multiterminal network

The node i sends information at rate R^(ij) to node j. We assume that all the messages W^(ij) being sent form node i to node j are independent and uniformly distributed over their respective ranges {1, 2, ..., 2^{nR^(i, j)}}.

The channel is represented by the channel transition function p(y⁽¹⁾, ..., y^(m)|x⁽¹⁾, ..., x^(m)), which is the conditional probability mass function of the outputs given the inputs. This probability transition function captures the effects of the noise and the interference in the network. The channel is assumed to be memoryless (i.e. the outputs at any time instant depend only from the current inputs and are conditionally independent of the past inputs).

Corresponding to each transmitter-receiver node pair is a message: W^(i, j) ∈ {1, 2...., 2^{nR^(i, j)}} . The input symbol X⁽ⁱ⁾ at node i depends on W^(i, j), j ∈ {1, 2, ..., m} and also on the past values of the received symbol Y⁽ⁱ⁾ at node i. Hence, an encoding scheme of block length n consists of a set of encoding and decoding functions, one for each node:

Encoders:

\mathchoiceX⁽ⁱ⁾_k(W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im), Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_k − 1), k = 1, ...nX⁽ⁱ⁾_k(W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im), Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_k − 1), k = 1, ...nX⁽ⁱ⁾_k(W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im), Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_k − 1), k = 1, ...nX⁽ⁱ⁾_k(W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im), Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_k − 1), k = 1, ...n . The encoder maps the messages and past received symbols in the symbol X⁽ⁱ⁾_k transmitted at time k.

Decoders:

\mathchoiceŴ^(ji)(Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_n, W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im)), j = 1, ...mŴ^(ji)(Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_n, W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im)), j = 1, ...mŴ^(ji)(Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_n, W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im)), j = 1, ...mŴ^(ji)(Y⁽ⁱ⁾₁Y⁽ⁱ⁾₂..., Y⁽ⁱ⁾_n, W⁽ⁱ¹⁾, W⁽ⁱ²⁾, ..., W^(im)), j = 1, ...m The decoder j at node i maps the received symbols in each block and his own transmitted information to form estimates of the messages intended for him from node j, j = 1, 2, ..., m

Associated with every pair o nodes is a rate and a corresponding probability of error that the message will not be decoded correctly,

(150) P^(n)(i, j)_e = Pr(Ŵ^(ij)(Y^j, W^(j1), ..., W^(jm)) ≠ W^(ij))

where P^(n)(ij)_e is defined under the assumption that all the messages are independent and distributed uniformly over their respective ranges. A set of rates is said to be achievable if there exist encoders and decoders with block length n with P^(n)(i, j)_e → 0 as n → ∞ for all i, j ∈ {1, 2..., m}. We use this formulation to derive and upper bound on the flow of information in any multiterminal network. We divide the nodes into two sets, S and the complement S^c. We now bound the rate of flow of information form nodes in S to nodes in S^c. [13]

Theorem 15.10.1 (MMV)

If the information rates {R^(ij)} are achievable, there exists some joint probability distribution p(x⁽¹⁾, x⁽²⁾, ..., x^(m)) such that:

⎲⎳_{i ∈ S, j ∈ S^c}R^(ij) ≤ I(X^(S);Y^(S_c)|X^S_c)

for all S ⊂ {1, 2, ...m}. Thus, the total rate of flow of information across cut sets is bounded by the conditional mutual information.

Proof:

The proof follows the same lines as the prof of the converse for the multiple access channel. Let \mathchoiceT = {(i, j):i ∈ S, j ∈ S^c}T = {(i, j):i ∈ S, j ∈ S^c}T = {(i, j):i ∈ S, j ∈ S^c}T = {(i, j):i ∈ S, j ∈ S^c} be the set of links that cross from S to S^c, and let T^c be all the other links in the network. Then

n⎲⎳_{i ∈ S, j ∈ S^c}R^(ij)\overset(a) = ⎲⎳_{i ∈ S, j ∈ S^c}H(W^(ij))\overset(b) = H(W^(T))\overset(c) = H(W^(T)|W^{(T^c)}) = I(W^(T);Y^{(S^c)}₁, ...Y^{(S^c)}_n|W^{(T^c)}) + H(W^(T)|Y^{(S^c)}₁, ...Y^{(S^c)}_n, W^{(T^c)})

\overset(d) ≤ I(W^(T);Y^{(S^c)}₁, ...Y^{(S^c)}_n|W^{(T^c)}) + nϵ_n\overset(e) = ⁿ⎲⎳_k = 1I(W^(T);Y^{(S^c)}_k|W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1) + nϵ_n\overset(f) =

= ⁿ⎲⎳_k = 1H(Y^{(S^c)}_k|W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1) − H(Y^{(S^c)}_k|W^(T)W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1) + nϵ_n\overset(g) ≤

≤ ⁿ⎲⎳_k = 1H(Y^{(S^c)}_k|\mathchoiceW^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1X^{(S^c)}_k) − H(Y^{(S^c)}_k|W^(T)W^{(T^c)}, Y^{(S^c)}₁...Y^{(S^c)}_k − 1X^{(S^c)}_kX^(S)_k) + nϵ_n\overset(h) ≤ ⁿ⎲⎳_k = 1H(Y^{(S^c)}_k|X^{(S^c)}_k) − H(Y^{(S^c)}_k|X^{(S^c)}_kX^(S)_k) + nϵ_n =

= ⁿ⎲⎳_k = 1I(X^(S)_k;Y^{(S^c)}_k|X^{(S^c)}) + nϵ_n\overset(i) = n⋅(1)/(n)ⁿ⎲⎳_k = 1I(X^(S)_Q;Y^{(S^c)}_Q|X^{(S^c)}_Q, Q = k) + nϵ_n\overset(j) = n⋅I(X^(S)_Q;Y^{(S^c)}_Q|X^{(S^c)}_Q, Q) + nϵ_n =

= n⋅(H(Y^{(S^c)}_Q|X^{(S^c)}_Q, Q) − H(Y^{(S^c)}_Q|X^{(S^c)}_Q, Q, X^S_Q)) + nϵ_n\overset(k) ≤ n⋅(H(Y^{(S^c)}_Q|X^{(S^c)}_Q) − H(Y^{(S^c)}_Q|X^{(S^c)}_Q, Q, X^(S)_Q)) + nϵ_n\overset(l) =

= n⋅(H(Y^{(S^c)}_Q|X^{(S^c)}_Q) − H(Y^{(S^c)}_Q|X^{(S^c)}_Q, X^(S)_Q)) + nϵ_n = nI(X^(S)_Q;Y^{(S^c)}_Q|X^{(S^c)}_Q) + nϵ_n

Where

(a) follows form the fact that the messages W^(ij) are uniformly distributed over their respective ranges {1, 2, .., 2^{nR^(ij)}}

(b) follows form the definition of W^(T) = {W^(ij):i ∈ S, j ∈ S^c} and the fact that the messages are independent

(d) follows form the Fano’s inequality since the messages W^(T) can be decoded form Y^(S) and W^{(T^c)} . Ова мислам дека произлегува од 104↑. Покрај тоа мислам важноста на реченицата е како толување на Fano неравенството дека W^(T) може да се декодира од Y^(S) и W^(T_c) со многу мала веројатност на грешка.

(e) is the chain rule for mutual information

(f) follows form the definition of mutual information

(g) follows from the fact that X^{(S^c)}_k is function of the past received symbols Y^{(S^c)} and messages W^{(T^c)} 103↑ and the fact that adding conditioning reduces the second term.

(h) follows form the fact that Y^{(S^c)}_k depends only on the current input symbols X^(S)_k and X^{(S^c)}_k Веројатно заради memorylessness.

(i) follows after we introduce a new time-sharing random variable Q distributed uniformly on {1, 2, ..., n}

(j) follows from the definition of mutual information

(k) follows form the fact that conditioning reduces entropy

(l) follows from the fact that Y^{(S^c)}_Q depends only on the input X^(S)_Q and X^{(S^c)}_Q and is conditionally independent of Q

Thus, there exist random variables X^(S) and X^{(S^c)} with some arbitrary joint distribution that satisfy the inequalities of the theorem.

The Theorem has a simple max-flow min-cut interpretation. The rate of flow of information across any boundary is less than the mutual information between the inputs on one side of the boundary and the outputs on the other side, conditioned on the inputs on the other side.

The problem of information flow in networks would be solved if the bounds of the theorem were achievable. But unfortunately, these bounds are not achievable even for some simple channels. We now apply these bounds to a few of the channels that we consider earlier.

Multiple-access channel

The multiple access channel is a network with many input nodes and one output node. For the case of a two-user multiple-access channel, the bounds of Theorem 15.10.1 reduce to:

R₁ ≤ I(X₁;Y|X₂)

R₂ ≤ I(X₂;Y|X₁)

(151) R₁ + R₂ ≤ I(X₁, X₂;Y)

for some joint distribution p(x₁x₂)p(y|x₁x₂). These bounds coincide with the capacity region if we restrict the input distribution to be a product distribution and take the convex hull (Theorem 15.3.1).

Relay channel

For the relay channel, these bounds give the upper bound of Theorem 15.7.1 with different choices of subset as shown in 43↓. Thus

Figure 43 Relay Channel

C ≤ sup_p(x, x₁)min(I(X, X₁;Y), I(X;Y, Y₁|X₁))

This upper bound is the capacity of a physically degraded relay channel and for the relay channel with feedback [10].

To complement our discussion of a general network, we should mention tow featres of single-user channels tht do not apply to a multiuser network.

Source-channel separation theorem.

In Section 7.13 we discussed the source-channel separation theorem, which proves that we can transmit the source noiselessly over the channel if and only if the entropy rate is less than the channel capacity. This allows us to characterize a source by a single number (the entropy rate) and the channel by a single number (the capacity). What about multiuser case? We would expect that a distributed source could be transmitted over a channel if and only if the rate region for the noiseless coding of the source lay within the capacity region of the channel. To be specific, consider the transmission of a distributed source over a multiple-access channel, as shown in 44↓.

figure Figure 15.37 Transmission of correlated sources over a multiple-access channel.png

Figure 44 Transmission of correlated sources over a multiple-access channel.

Combining the results of Slepian-Wolf encoding with the capacity results of the multiple access channel, we can show that we can transmit the source over the channel and recover it with a low probability of error if

(152) H(U|V) ≤ I(X₁;Y|X₂, Q)

(153) H(V|U) ≤ I(X₂;Y|X₁, Q)

(154) H(U, V) ≤ I(X₁X₂;Y|Q)

fro some distribution p(q)p(x₁|q)p(x₂|q)p(y|x₁x₂). This condition is equivalent to saying that the Slepian-Wolf rate region of the source has a nonempty intersection with the capacity region of the multiple access channel.

But is this condition also necessary? No, as a simple example illustrates. Consider the transmission of the source of Example 15.4.2

p(u, v) 0 1 0 (1)/(3) (1)/(3) 1 0 (1)/(3) H(U, V) = 3⋅(1)/(3)⋅log3 = 1.58
H(U) = H(V) = (2)/(3)⋅log₂⎛⎝(3)/(2)⎞⎠ + (1)/(3)⋅log₂(3)
(2)/(3)⋅log₂⎛⎝(3)/(2)⎞⎠ + (1)/(3)⋅log₂(3) = 0.9182958341

over the binary erasure multiple-access channel (Example 15.3.3 16↑). The Slepian-Wolf region Мислам мисли на оваа слика 27↑ does not intersect the capacity region За да се утврди ова треба да го нацртам Slepian-Wolf регионот во Maple со користење на complexhull функцијата. , yet it is simple to devise a scheme that allows the source to be transmitted over the channel. We just let X₁ = U and X₂ = V, and the value of Y will tell us the pair (U, V) with no error. Претпоставувам зошто ако го примениш ова во 152↑-154↑ ќе ги добиеш 56↑ . Thus the conditions 154↑ are not necessary.

Teh reson for the failure of the source-cahnnel separation theorem lies in the fact that the capacity of the multiple-access channel increases with the correlation between the inputs of the channel. Therefore, to maximize the capacity, one should preserve the correlation between the inputs of the channel. Slepian-Wolf encodig, on the other hand, gets rid of the correlation. Cover et al. [14] proposed an achievable region for transmission of a correlated source over a multiple access channel based on the idea of preserving the correlation. Han and Costa [15] have proposed a similar region for the transmission of a correlated source over a broadcast channel.

Capacity regions with feedback

Theorem 7.12.1 shows that feedback does not increase the capacity of a single-user discrete memory-less channel. Fro channels with memory, on the other hand, feedback enables the sender to predict something about the noise and to combat it more effectively, thus increasing capacity.

What about multiuser channels? Rather surprisingly, feedback does increase the capacity region of multiuser channels, even when the channels are memory-less. This is fist shown by Gaarder and Wolf [16], who showed how feedback helps increase the capacity of the binary erasure multiple-access channel. In essence, feedback form the receiver to the two senders acts as a separate channel between the two senders. The senders can decode each other’s transmissions before the receiver does. They then cooperate to resolve the uncertainty at the receiver, sending information at the higher cooperative capacity rather than the noncooperative capacity. Using this scheme, Cover and Leung [17] established an achievable region for a multiple-access channel with feedback. Willems [] showed that this region was the capacity for a class of multiple-access channel. Ozarow [] established the capacity region for a two user Gaussian multiple-access channel. The problem of finding the capacity region for a multiple-access channel with feedback is closely related to the capacity of a two-way channel with a common output.

There is yet no unified theory of network information flow. But there can be no doubt that a complete theory of communication networks would have wide implications for the theory of communication and computation.

1.11 Summary

Multiple-access channel

The capacity of a multiple-access channel (X₁ xX₂, p(y|x₁, x₂), Y) is closure of the convex hull of all (R₁R₂) satisfying

R₁ < I(X₁;Y|X₂)

R₂ < I(X₂;Y|X₁)

R₁ + R₂ < I(X₁, X₂;Y)

for some distribution p(x₁)p(x₂) on X₁ xX₂.

The capacity region of the m-user multiple-access channel is the closure of the convex hull of the rate vectors satisfying

R(S) ≤ I(X(S);Y|X(S^c)) for all S ⊆ {1, 2, ...m}

for some product distribution p(x₁)p(x₂)...p(x_m).

Gaussian multiple-access channel

R₁ ≤ C⎛⎝(P₁)/(N)⎞⎠

R₂ ≤ C⎛⎝(P₁)/(N)⎞⎠

R₁ + R₂ ≤ C⎛⎝(P₁ + P₂)/(N)⎞⎠

where

C(x) = (1)/(2)log₂(1 + x)

Slepian-Wolf coding

Correlated sources X and Y can be described separately at rates R₁ and R₂ and be recovered with arbitrarily low probability of error by a common decoder if and only if

R₁ ≥ H(X|Y)

R₂ ≥ H(Y|X)

R₁ + R₂ ≥ H(X, Y)

Broadcast channel

The capacity region of the degraded broadcast channel X → Y₁ → Y₂ is convex hull of the closure of all (R₁, R₂) satisfying

R₁ ≤ I(U;Y₂)

R₂ ≤ I(X;Y₁|U)

for some joint distribution p(u)p(x|u)p(y₁, y₂|x)

Relay channel

The capacity C for the physically degraded relay channel p(y, y₁|x, x₁) is given by

C = sup_p(x, x₁)min{I(X, X₁;Y), I(X;Y₁|X₁}

where the supremum is over all joint distributions on XxX₁.

Source coding with side information.

Let (X, Y) ~ p(x, y). If Y is encoded at rate R₂, and X is encoded at rate R₁, we can recover X with an arbitrarily small probability of error iff

R₁ ≥ H(X|U)

R₂ ≥ I(Y;U)

for some distribution p(y, u) such that X → Y → U .

Rate distortion with side information.

Let (X, Y) ~ p(x, y). The rate distortion function with side information is given by

R_Y(D) = min_p(w|x)min_{f:YxW → X̂}I(X;W) − I(Y;W)

where the minimization is over all functions f and conditional distributions p(w|x), |W| ≤ |X| + 1 , such that

⎲⎳_x⎲⎳_w⎲⎳_yp(x, y)p(w|x)d(x, f(y, w)) ≤ D

1.12 Problems

1.12.1 Cooperative capacity of a multiple-access channel

figure Problem 15.1_fig1 Cooperative Capacity.png

(a) suppose that X₁ and X₂ have access to both indices W₁ ∈ {1, 2^nR₁}, W₂ ∈ {1, 2^nR₂}. Thus, the codewords X₁(W₁, W₂) X₂(W_1,W₂) depend on both indices. Find the capacity region.

(b) Evaluate this region for the binary erasure multiple access channel Y = X₁ + X₂, X_i ∈ {0, 1}. Compare to noncooperative region.

(a)

R₁ < I(X₁;Y|X₂)

R₂ < I(X₂;Y|X₁)

R₁ + R₂ < I(X₁, X₂;Y)

R₁ < I(X₁(W₁W₂);Y|X₂(W₁W₂))

R₂ < I(X₂(W₁W₂);Y|X₁(W₁W₂))

R₁ + R₂ < I(X₁(W₁W₂), X₂(W₁W₂);Y)

(b) \begin_inset Separator latexpar\end_inset

R₁ + R₂ ≤ I(X₁;Y|X₂) + I(X₂;Y) = I(X₁, X₂;Y)

J₁ = I(X₁;Y|X₂); J₂ = I(X₂;Y|X₁); J₃ = I(X₁, X₂;Y); J₃ − J₁ = I(X₂;Y); J₃ − J₂ = I(X₁;Y)

P = {[0, 0], [J₁, 0][J₁, J₃ − J₁][J₃ − J₂, J₂], [0, J₂}

non-cooperative region

J₁ = I(X₁;Y|X₂) = 1; J₂ = 1; J₃ = 1.5

(155) X₁X₂| Y X̂₁X̂₂ 0, 0 0 0, 0 0, 1 1 ? 1, 0 1 ? 1, 1 2 1, 1

Имајќи го во предвид 155↑ ако се претпостави дека X₁ може да се декодира со 0-та несигурност тогаш X₂ ќе се декодира со 50% несигурност\begin_inset Separator latexpar\end_inset

Solution from UIC ECE534 (HW11s.pdf)

(a) Since we know that X₁ and X₂ have access to both indices W₁ ∈ {1, 2^nR₁} W₂ ∈ {1, 2^nR₂}, and that the codewords depend on both indices, X₁(W₁, W₂), X₂(W₁W₂) the pair (X₁X₂) can be seen as a single codeword X we have that this is equivalent to have a single user channel with alphabet given by X₁ xX₂ and indice W₁ xW₂ . Then, we have a combined rate for both senders as the only bound for the achievable region given by:

R₁ + R₂ ≤ C = max_p(x₁x₂)I(X₁, X₂;Y)

Ова директно следи од тоа што W₁ xW₂ множеството има 2^nR₁⋅2^nR₂ = 2^{n(R₁ + R₂)} елементи. Ова е многу важна работа. Затоа насекаде Cover користи производ на случајни променливи. Со тоа сака да каже дека се работи за множество со |W₁ xW₂| = 2^{n(R₁ + R₂)} елементи. Истата и уште повидлива е приказната за
R₁ + R₂ < I(X₁, X₂;Y)
Пак X₁ и X₂ ги третираш како да се една случајна променлива која зема вредности од множеството W₁ xW₂ множеството кое има 2^nR₁⋅2^nR₂ = 2^{n(R₁ + R₂)} елементи. Сега е очигледно од каде следи
\undersetR_eqR₁ + R₂ < \undersetI(X_eq;Y)I(X₁, X₂;Y)

We can achieve this setting X₂ = 0 so we have a rate pair (C, 0) and also by setting X₁ = 0 achieving a rate pair (0, C)

(b) Evaluate this region for the binary erasure multiple access channel Y = X₁ + X₂, X_i ∈ {0, 1}

To evaluate this region for the binary erasure multiple access channel Y = X₁ + X₂ for the cooperative capacity region we have

R₁ + R₂ ≤ C = max_p(x₁x₂)I(X₁, X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) ≤ log|Y| = log(3)

R₁ + R₂ ≤ log(3) = 1.585

To achieve this capacity we need to set the distribution of the possible inputs to be Uniform⎛⎝(1)/(3)⎞⎠ for example by setting:

\mathchoicep(0, 0) = (1)/(3); p(1, 1) = (1)/(3); p(0, 1) + p(1, 0) = (1)/(3)p(0, 0) = (1)/(3); p(1, 1) = (1)/(3); p(0, 1) + p(1, 0) = (1)/(3)p(0, 0) = (1)/(3); p(1, 1) = (1)/(3); p(0, 1) + p(1, 0) = (1)/(3)p(0, 0) = (1)/(3); p(1, 1) = (1)/(3); p(0, 1) + p(1, 0) = (1)/(3).

When the senders work in non-cooperative mode, we have the region capacity of the binary erasure multiple-access channel. The capacity region is shown in the following figure: Испрекинатата линија го дава регионот на кооперативниот мод. \begin_inset Separator latexpar\end_inset

1.12.2 Capacity of multiple-access channels

Find the capacity region for each of the following multiple-access channels

(a) Additive modulo 2 multiple-access channel. X₁ ∈ {0, 1}, X₂ ∈ {0, 1}, Y = X₁⊕X₂

(b) Multiplicative multiple-access channel. X₁ ∈ { − 1, 1} X₂ ∈ { − 1, 1} Y = X₁⋅X₂.

(a)

R₁ ≤ I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(Y|X₂) = H(X₁⊕X₂|X₂) = H(X₁) = H(p₁) = || if p₁ = (1)/(2)|| = 1

H(X₁⊕X₂|X₂) = p(X₂ = 0)H(X₁⊕X₂|X₂ = 0) + p(X₂ = 1)H(X₁⊕X₂|X₂ = 1)

(X₁X₂) Y (0, 0) 0 (0, 1) 1 (1, 0) 1 (1, 1) 0

H(X₁) = 1 ако земеш дека p(X₁ = 1) = p(X₂ = 0) = (1)/(2). Во моето првично решение на Problem 15.9 покажав дека регионот на конвергенција не мора да биде триаголник. Може да е рамнокрак трапез ако одиш со веројатности различни за 0 и 1.

(σ) Ако ги знаеш X₂ и Y тогаш без грешка ќе го реконструираш X₁ , тоа значи дека H(X₁|X₂Y) = 0.

R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = H(X₁⊕X₂|X₁) = H(X₂) = 1 ???

p(X₁X₂) = p(X₁)p(X₂)

R₁ + R₂ ≤ I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) = H(X₁ ≠ X₂) = p(0, 1)log(1)/(p(0, 1)) + p(1, 0)log(1)/(p(1, 0)) = 2(1)/(4)log4 = (1)/(2)⋅2 = 1 ???

R₁ + R₂ ≤ I(X₁X₂;Y) = H(X₁X₂) − H(X₁X₂|Y) = 4⋅(1)/(4)log(4) − p(Y = 0)⋅\cancelto0H(X₁X₂|Y) − p(Y = 1)⋅\cancelto1H(X₁X₂|Y) − p(Y = 2)⋅\cancelto0H(X₁X₂|Y) = 2 − 1 = 1 ???

To achieve this capacity we need to set the distribution of the possible inputs to be Uniform⎛⎝(1)/(4)⎞⎠ for example by setting:

\mathchoicep(0, 0) = (1)/(4) p(1, 1) = (1)/(4) p(0, 1) = (1)/(4) p(1, 0) = (1)/(4)p(0, 0) = (1)/(4) p(1, 1) = (1)/(4) p(0, 1) = (1)/(4) p(1, 0) = (1)/(4)p(0, 0) = (1)/(4) p(1, 1) = (1)/(4) p(0, 1) = (1)/(4) p(1, 0) = (1)/(4)p(0, 0) = (1)/(4) p(1, 1) = (1)/(4) p(0, 1) = (1)/(4) p(1, 0) = (1)/(4).

J₁ = I(X₁;Y|X₂); J₂ = I(X₂;Y|X₁); J₃ = I(X₁;Y|X₂) + I(X₂;Y) = I(X₂;Y|X₁) + I(X₁;Y)

I(X₂;Y) = H(X₂) − H(X₂|Y) = 1 − 1 = 0

P = {[0, 0], [J₁, 0][J₁, J₃ − J₁][J₃ − J₂, J₂], [0, J₂}

P = {[0, 0], [1, 0][1, 0][0, 1], [0, 1]}

\begin_inset Separator latexpar\end_inset

(b)

X₁ ∈ { − 1, 1} X₂ ∈ { − 1, 1} Y = X₁⋅X₂

(X₁X₂) Y ( − 1, − 1) 1 ( − 1, 1) − 1 (1, − 1) − 1 (1, 1) 1

Пак важи истото дека ако ги знаеш X₂ и Y тогаш без грешка ќе го реконструираш X₁ , тоа значи дека H(X₁|X₂Y) = 0.

Ова во предавањето со логика го имаат кажано. Ако го држиш X₂ = 1 тогаш од X₁ кон Y можеш да пренесуваш со 1 бит. Исотото важи и ако го држиш X₁ = 1 тогаш од X₂ кон Y можеш да пренесуваш со 1 бит.

Повторно се добива истиот регион (триаголник (00),(01),(10)).

1.12.3 Cut-set interpretation of capacity region of multiple-access channel

For the multiple-access channel we know that (R₁R₂) is achievable if

R₁ < I(X₁;Y|X₂); R₂ < I(X₂;Y|X₁); R₁ + R₂ < I(X₁, X₂;Y)

for X₁, X₂ independent.

Show, for X₁, X₂ independent that

I(X₁;Y|X₂) = I(X₁;Y, X₂)

Interpret the information bounds as bounds on the rate of flow across cut sets S₁, S₂ and S₃.\begin_inset Separator latexpar\end_inset

Обичен chain rule:

I(X₁;Y, X₂) = \cancelto0I(X₁;X₂) + I(X₁;Y|X₂) = I(X₁;Y|X₂)

\begin_inset Separator latexpar\end_inset

We can interpret I(X₁;Y, X₂) = I(X₁;Y|X₂) as the maximum amount of information that could flow across the cutset S₁ This is the upper bound on the rate R₁. Similarly we can interpret the other bounds.

I(X₂;Y, X₁) = I(X₂;X₁) + I(X₂;Y|X₁) = I(X₂;Y|X₁)

Јас ова го разбирам декасака да упрости наместо со условна трансинформација оди со здружена. Демек од X₁ → Y, X₁ може да помине само I(X₂;Y, X₁) ≥ R₂

1.12.4 Gaussian multiple-access channel capacity

For AWGN multiple-access channel, prove, using typical sequences, the achievability of any rate pairs (R₁, R₂) satisfying

R₁ ≤ (1)/(2)log⎛⎝1 + (P₁)/(N)⎞⎠ R₂ < (1)/(2)log⎛⎝1 + (P₂)/(N)⎞⎠ R₁ + R₂ < (1)/(2)log⎛⎝1 + (P₁ + P₂)/(N)⎞⎠

The proof extends the proof for the discrete multiple-access channel in the same way as the proof for the single-user Gaussian channel extends the proof for the discrete single-user channel.

Претпоставувам дека е испратено (1, 1)

E_ij = Pr((X_iY_j) ∈ A⁽ⁿ⁾_ϵ)

P_e = E^c₁₁ + ⎲⎳_{i ≠ 1:j = 1}E_ij + ⎲⎳_{i = 1, ;j ≠ 1}E_ij + ⎲⎳_{\underseti ≠ 1, j ≠ 1i, j}E_ij

P(E_i, 1) = ⎲⎳_i, jp(x₂)p(x₁y|x₂) = ⎲⎳_i, jp(x₁)p(x₂y)_i, j ≤ ⎲⎳2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)} = |A⁽ⁿ⁾_ϵ|2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)}

= ϵ + 2^{n(H(X₁, X₂, Y) + 2ϵ)}2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)} = 2^{n(H(X₁, X₂, Y) + 2ϵ)}2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)}

p(x₁y|x₂) = p(y|x₂)p(x₁|y, x₂) = p(y|x₂)p(x₁)

P(E_i, 1) ≤ 2^{− n(I(X₁;Y|X₂) − 3ϵ)}

P(E_1, j) ≤ 2^{− n(I(X₂;Y|X₁) − 3ϵ)}

P(E_i, j) ≤ 2^{− n(I(X₁, X₂;Y) − 4ϵ)}

P_e = ϵ + 2^nR₁2^{− n(I(X₁;Y|X₂) − 3ϵ)} + 2^nR₂2^{− n(I(X₂;Y|X₁) − 3ϵ)} + 2^{n(R₁ + R₂)}2^{− n(I(X₁X₂;Y) − 4ϵ)}

R₁ < I(X₁;Y|X₂) R₂ < I(X₂;Y|X₁) R₁ + R₂ < I(X₁, X₂;Y)

R₁ ≤ I(X₁;Y|X₂) − 3ϵ R₁ + 3ϵ ≤ I(X₁;Y|X₂) R₁ < I(X₁;Y|X₂)

1.12.5 Converse for the Gaussian multiple-access channel.

Prove the converse for the Gaussian multiple-access channel by extending the converse in the discrete case to thake into account the power constraint on the codeword.

nR₁ = H(W₁) = I(W₁;Yⁿ) + H(W₁|Yⁿ) = I(W₁;Yⁿ) + nϵ_n ≤ I(Xⁿ₁(W₁);Yⁿ) =

= H(Xⁿ₁(W₁)) − H(Xⁿ₁(W₁)|Yⁿ) + nϵ_n ≤ H(Xⁿ₁(W₁)|Xⁿ₂(W₂)) − H(Xⁿ₁(W₁)|YⁿXⁿ₂(W₂)) + nϵ_n

= I(Xⁿ₁(W₁);Yⁿ|Xⁿ₂(W₂)) + nϵ_n = H(Yⁿ|Xⁿ₂(W₂)) − H(Yⁿ|Xⁿ₁Xⁿ₂(W₂))

= H(Yⁿ|Xⁿ₂(W₂)) − ∑ⁿ_i = 1H(Y_i|Y^i − 1₁Xⁿ₁(W₁)Xⁿ₂(W₂)) + nϵ_n =

R₁ ≤ (1)/(n)∑ⁿ_i = 1I(X_1i;Y_i|X_2i)

R₁ ≤ (1)/(n)∑ⁿ_i = 1I(X_1i;Y_i|X_2i) = (1)/(n)∑ⁿ_i = 1H(Y_i|X_{2_i}) − H(Y_i|X_1iX_2i)

Y_i = X_1i + X_2i + Z_i

\mathchoiceR₁R₁R₁R₁ ≤ (1)/(n)∑ⁿ_i = 1I(X_1i;Y_i|X_2i) = (1)/(n)∑ⁿ_i = 1H(X_1i + X_2i + Z_i|X_{2_i}) − H(X_1i + X_2i + Z_i|X_1iX_2i) =

= (1)/(n)∑ⁿ_i = 1H(X_1i + X_2i + Z_i|X_{2_i}) − H(X_1i + X_2i + Z_i|X_1iX_2i) = (1)/(n)∑ⁿ_i = 1H(X_1i + Z_i) − H(Z_i) =

= (1)/(n)∑ⁿ_i = 1(1)/(2)log[2πe(P_1i + N_i)] − (1)/(2)log2πe[2πN_i] = \mathchoice(1)/(n)∑ⁿ_i = 1(1)/(2)log⎛⎝1 + (P_1i)/(N_i)⎞⎠(1)/(n)∑ⁿ_i = 1(1)/(2)log⎛⎝1 + (P_1i)/(N_i)⎞⎠(1)/(n)∑ⁿ_i = 1(1)/(2)log⎛⎝1 + (P_1i)/(N_i)⎞⎠(1)/(n)∑ⁿ_i = 1(1)/(2)log⎛⎝1 + (P_1i)/(N_i)⎞⎠

Сличен е доказот и за R₂ и за R₁ + R₂!!!

1.12.6 Unusual multiple-access channel.

Consider the following multiple-access channel: X₁ = X₂ = Y = {0, 1}. If (X₁X₂) = (0, 0), then Y = 0. If (X₁X₂) = (0, 1), then Y = 1. If (X₁, X₂) = (1, 0), then Y = 1. If (X₁, X₂) = (1, 1), then Y = 0 with probability (1)/(2) and Y = 1 with probability (1)/(2).

(a) Show that the rate pairs (1, 0) and (0, 1) are achievable.

(b) Show that for any non-degenerate distribution p(x₁)p(x₂) we have I(X₁, X₂;Y) < 1.

(c) Argue that there are points in the capacity region of this multiple-access channel that can only be achieved by time-sharing; that is, there exist achievable rate pairs (R₁R₂) that lie in the capacity region for the channel but not in the region defined by

R₁ ≤ I(X₁;Y|X₂)

R₂ ≤ I(X₂;Y|X₁)

R₁ + R₂ ≤ I(X₁, X₂;Y)

for any product distribution p(x₁)p(x₂). Hence the operation of convexification strictly enlarges the capacity region. This channel was introduced independently by Csiszar and Korner [11] and Wallmeier [18].

——————————————————————————–———————————————-

(a)

p(Y|X₁X₂)

(X₁X₂)|Y 0 1 (0, 0) 1 0 (0, 1) 0 1 (1, 0) 0 1 (1, 1) 1 ⁄ 2 1 ⁄ 2

p(X₁) ∈ ⎧⎩(1)/(2), (1)/(2)⎫⎭ p(X₂) ∈ ⎧⎩(1)/(2), (1)/(2)⎫⎭

p(X₁X₂) ∈ {0.25, 0.25, 0.25, 0.25}

p(X₁X₂Y)

(X₁X₂)|Y 0 1 (0, 0) 0.25 0 (0, 1) 0 0.25 (1, 0) 0 0.25 (1, 1) 1 ⁄ 8 1 ⁄ 8

R₁ ≤ I(X₁;Y|X₂)

R₂ ≤ I(X₂;Y|X₁)

I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = P(X₂ = 0)H(Y|X₂ = 0) + P(X₂ = 1)H(Y|X₂ = 1) − (1)/(2)

H(Y|X₂ = 0) = 2⋅(1)/(4)log4 = 1 H(Y|X₂ = 1) = (1)/(4)⋅log4 + (1)/(8)⋅log8 = (1)/(2) + (3)/(8) = (7)/(8)

I(X₁;Y|X₂) = (1)/(2)⋅1 + (1)/(2)⋅(7)/(8) − (1)/(2) = (7)/(16)

(b)

I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) ≤ 1 − (1)/(4)log(1) + (1)/(4)log(1) + (1)/(4)log(2) + (1)/(4)log(2) = 1 − (1)/(2) = (1)/(2)

\mathchoiceH(Y|X₁X₂) = (1)/(2)H(Y|X₁X₂) = (1)/(2)H(Y|X₁X₂) = (1)/(2)H(Y|X₁X₂) = (1)/(2)

Ако p(X₁), p(X₂) се униформони дистрибуции ⎧⎩(1)/(2), (1)/(2)⎫⎭ тогаш

p(Y) ∈ ⎧⎩(1)/(4) + (1)/(8), (1)/(2) + (1)/(8)⎫⎭ = ⎧⎩(3)/(8), (5)/(8)⎫⎭

Ова не е добро треба да се најдат такви дистрибуции на (X₁X₂)

(156) I(X₁X₂;Y) + H(Y|X₁X₂) ≤ H(Y) → H(Y) > I(X₁X₂;Y) → I(X₁X₂;Y) < 1

p(Y|x₁X₂) p(X₁X₂Y)
(X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₂ (1, 0) 0 1 p₁ (1, 1) 1 ⁄ 2 1 ⁄ 2 (X₁X₂)|Y 0 1 (0, 0) 0.25 0 (0, 1) 0 0.25 (1, 0) 0 0.25 (1, 1) 1 ⁄ 8 1 ⁄ 8
p₁ + 2⋅p₂ + (p₁)/(2) + (p₁)/(2) = 1 → 2p₁ + 2p₂ = 2
p₁ + (p₁)/(2) = (1)/(2) → 3p₁ = 1 → p₁ = (1)/(3) 2p₂ + (p₁)/(2) = (1)/(2) → 2p₂ + (1)/(6) = (1)/(2) → 2p₂ = (3 − 1)/(2) → p₂ = (1)/(2) ????
——————————————————————————–———–
p(Y|x₁X₂) p(X₁X₂Y)
(X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₃ (1, 0) 0 1 p₁ (1, 1) 1 ⁄ 2 1 ⁄ 2 (X₁X₂)|Y 0 1 (0, 0) 0.25 0 (0, 1) 0 0.25 (1, 0) 0 0.25 (1, 1) 1 ⁄ 8 1 ⁄ 8
p₁ + p₂ + p₃ + (p₁)/(2) + (p₁)/(2) = 1 → 2p₁ + p₂ + p₃ = 1
p₁ + (p₁)/(2) = (1)/(2) → 3p₁ = 1 → p₁ = (1)/(3) p₂ + p₃ + (p₁)/(2) = (1)/(2) → p₂ + p₃ + (1)/(6) = (1)/(2) → p₂ + p₃ = (3 − 1)/(2) → p₂ + p₃ = 1 ????

——————————————————————————–———–

p(Y|X₁X₂) p(X₁X₂Y)

(X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₃ (1, 0) 0 1 p₄ (1, 1) 1 ⁄ 2 1 ⁄ 2 (X₁X₂)|Y 0 1 (0, 0) p₁ 0 (0, 1) 0 p₂ (1, 0) 0 p₃ (1, 1) p₄ ⁄ 2 p₄ ⁄ 2 p(Y) (1)/(2) (1)/(2)

p₁ + p₂ + p₃ + p₄ = 1

\mathchoicep₁ + (p₄)/(2) = (1)/(2)p₁ + (p₄)/(2) = (1)/(2)p₁ + (p₄)/(2) = (1)/(2)p₁ + (p₄)/(2) = (1)/(2) → 2p₁ + p₄ = 1 → p₁ = (1 − p₄)/(2) \mathchoicep₂ + p₃ + (p₄)/(2) = (1)/(2)p₂ + p₃ + (p₄)/(2) = (1)/(2)p₂ + p₃ + (p₄)/(2) = (1)/(2)p₂ + p₃ + (p₄)/(2) = (1)/(2) → p₂ + p₃ = (1 − p₄)/(2)

p₁ + p₂ + p₃ + p₄ = p₁ + (1 − p₄)/(2) + p₄ = (1 − p₄)/(2) + (1 − p₄)/(2) + p₄ = 1 → 1 − p₄ + p₄ = 1

Значи важи за било кое p₄

\mathchoicep₄ = (1)/(4) → p₁ = (3)/(8)p₄ = (1)/(4) → p₁ = (3)/(8)p₄ = (1)/(4) → p₁ = (3)/(8)p₄ = (1)/(4) → p₁ = (3)/(8) → p₂ + p₃ = (3)/(8) → | произволно ги бираш p₂, p₃ битна е нивната сума| = \mathchoicep₂ = (1)/(8); p₃ = (1)/(4)p₂ = (1)/(8); p₃ = (1)/(4)p₂ = (1)/(8); p₃ = (1)/(4)p₂ = (1)/(8); p₃ = (1)/(4)

за ваква здружена дистрибуција на (X₁,X₂) се добива униформна дистрибуција на Y а со тоа се максимизира I(X₁X₂;Y) и се потврдува важењето на 156↑

Е сега се навраќам на горните пресметки

(X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₃ (1, 0) 0 1 p₄ (1, 1) 1 ⁄ 2 1 ⁄ 2 (X₁X₂)|Y 0 1 p(X₁X₂) (0, 0) 3 ⁄ 8 0 3 ⁄ 8 (0, 1) 0 1 ⁄ 8 1 ⁄ 8 (1, 0) 0 1 ⁄ 4 1 ⁄ 4 (1, 1) 1 ⁄ 8 1 ⁄ 8 1 ⁄ 4

R₁ ≤ I(X₁;Y|X₂)

R₂ ≤ I(X₂;Y|X₁)

\mathchoiceI(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = (*)I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = (*)I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = (*)I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = (*)

H(Y|X₁X₂) = ∑_x₁x₂yp(x₁x₂y)logp(y|x₁x₂) = 2⋅(1)/(8)⋅log(2) = (1)/(4)

\mathchoice(*) = 1 − (1)/(4) = (3)/(4)(*) = 1 − (1)/(4) = (3)/(4)(*) = 1 − (1)/(4) = (3)/(4)(*) = 1 − (1)/(4) = (3)/(4)

I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = P(X₂ = 0)H(Y|X₂ = 0) + P(X₂ = 1)H(Y|X₂ = 1) − (1)/(4)

H(Y|X₂ = 0) = 2⋅(1)/(4)log4 = 1 H(Y|X₂ = 1) = (1)/(4)⋅log4 + (1)/(8)⋅log8 = (1)/(2) + (3)/(8) = (7)/(8)

I(X₁;Y|X₂) = (1)/(2)⋅1 + (1)/(2)⋅(7)/(8) − (1)/(2) = (7)/(16)

p(X₁ = 0)p(X₂ = 0) = (3)/(8) p(X₁ = 0)p(X₂ = 1) = (1)/(8) p(X₁ = 1)p(X₂ = 0) = (1)/(4) p(X₁ = 1)p(X₂ = 1) = (1)/(4)

x₀y₀ = (3)/(8) x₀y₁ = (1)/(8) x₁y₀ = (1)/(4) x₁y₁ = (1)/(4)

p(X₁ = 0) = (3)/(8⋅p(X₂ = 0)) →

ac = (3)/(8) ad = (1)/(8) bc = (1)/(4) bd = (1)/(4)

a + b = 1 c + d = 1

c = (3)/(8a); bc = (1)/(4); b(3)/(8a) = (1)/(4) → 3b = 2a → b = (2a)/(3) → a + (2a)/(3) = 1 → a⎛⎝1 + (2)/(3)⎞⎠ = 1 → \mathchoicea = (3)/(5) → b = (2)/(5)a = (3)/(5) → b = (2)/(5)a = (3)/(5) → b = (2)/(5)a = (3)/(5) → b = (2)/(5)

c = (5⋅3)/(8⋅3) = (5)/(8) d = (3)/(8)

p₁ + (p₄)/(2) = (1)/(2) → 2p₁ + p₄ = 1 → p₁ = (1 − p₄)/(2) p₂ + p₃ + (p₄)/(2) = (1)/(2) → p₂ + p₃ = (1 − p₄)/(2)

ab + (cd)/(2) = (1)/(2) ad + bc + (bd)/(2) = (1)/(2); a + b = 1; c + d = 1; ac + ad + bc + bd = 1

UIC(ECE535) Solution

(X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₃ (1, 0) 0 1 p₄ (1, 1) 1 ⁄ 2 1 ⁄ 2

To achieve the rate pairs (1, 0) and (0, 1) we can set one of the inputs to zero so the other sender will have a rate of 1 bit per transmission. For example setting X₁ = 0 will yield to Y = X₂ Погледни ја условната веројатност во 1.12.6↑. Само p₁ и p₂се во игра. then the rate pair (0, 1) is achievable. Thus, we can also do the counter case and set X₂ = 0 to obtain Y = X₁ achieving the rate pair (1, 0).This can be achieved by time sharing, like int shown in the Binary Multiplier channel example of hte textbook.f

To prove this, we know the capacity region bound of R₁ is given by

R₁ ≤ I(X₁;Y|X₂)

For example for fixed X₂ = 0 we have:

R₁ ≤ I(X₁;Y|X₂ = 0) = H(Y|X₂ = 0) − \cancelto0H(Y|X₁X₂ = 0) ≤ H⎛⎝(1)/(2)⎞⎠ = 1 → \mathchoiceR₁ ≤ 1R₁ ≤ 1R₁ ≤ 1R₁ ≤ 1

H(p(X₁ = 1)H(Y|X₁ = 1, X₂ = 0)Y|X₁, X₂ = 0) = p(X₁ = 0)\cancelto0H(Y|X₁ = 0, X₂ = 0) + \cancelto0p(X₁ = 1)H(Y|X₁ = 1, X₂ = 0)

The result for setting X₁ = 0 will follow from symmetry.

(b) Show that for any non-degenerate distribution p(x₁)p(x₂) we have I(X₁X₂;Y) < 1

A nondegenerate distribution

In mathematics, a degenerate distribution is the probability distribution of a random variable which only takes a single value. Examples include a two-headed coin and rolling a die whose sides all show the same number. While this distribution does not appear random in the everyday sense of the word, it does satisfy the definition of random variable.
The degenerate distribution is localized at a point k0 on the real line. The probability mass function is given by:
f(k;k₀) = ⎧⎨⎩ 1, if k = k₀ 0, if k ≠ k₀
The cumulative distribution function of the degenerate distribution is then:
F(k;k₀) = ⎧⎨⎩ 1, if k ≥ k₀ 0, if k < k₀

is obtained by p(x₁) ≠ {0, 1} and p(x₂) ≠ {0, 1} . So we can set an arbitrary distribution like P_r(X₁ = 1) = p

and P_r(X₂ = 1) = q then the region capacity for the combined rate is given by:

R₁ + R₂ ≤ I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂)

p(X₁X₂Y) p(X₁X₂Y) p(Y|X₁X₂)

(X₁X₂)|Y 0 1 (0, 0) p₁ 0 (0, 1) 0 p₂ (1, 0) 0 p₃ (1, 1) p₄ ⁄ 2 p₄ ⁄ 2 p(Y) (1)/(2) (1)/(2) (X₁X₂)|Y 0 1 (0, 0) (1 − p)(1 − q) 0 (0, 1) 0 (1 − p)q (1, 0) 0 p(1 − q) (1, 1) pq ⁄ 2 pq ⁄ 2 p(Y) (1)/(2) (1)/(2) (X₁X₂)|Y 0 1 p₁ (0, 0) 1 0 p₂ (0, 1) 0 1 p₃ (1, 0) 0 1 p₄ (1, 1) 1 ⁄ 2 1 ⁄ 2

Y = ⎧⎨⎩ 0 with probability (1 − p)(1 − q) + (pq)/(2) 1 with probability (1 − p)q + p(1 − q) + (pq)/(2)

1 − (1 − p)(1 − q) − (pq)/(2) = 1 − (1 − q) + p(1 − q) − (pq)/(2) = q − pq + p(1 − q)) + (pq)/(2) = q(1 − p) + p(1 − q)) + (pq)/(2) = p(Y = 1)

Then we compute the combined rate

\mathchoiceR₁ + R₂ = H⎛⎝(1 − p)q + p(1 − q) + (pq)/(2)⎞⎠ − pqR₁ + R₂ = H⎛⎝(1 − p)q + p(1 − q) + (pq)/(2)⎞⎠ − pqR₁ + R₂ = H⎛⎝(1 − p)q + p(1 − q) + (pq)/(2)⎞⎠ − pqR₁ + R₂ = H⎛⎝(1 − p)q + p(1 − q) + (pq)/(2)⎞⎠ − pq

H(Y|X₁X₂) = (1 − p)(1 − q)log1 + (1 − p)qlog1 + p(1 − q)log(1) + (pq)/(2)log(2) + (pq)/(2)log(2) = pq

Here, as p, q ∈ (0, 1), the product pq > 0, the entropy of binary random variable is bounded by 1, we obtain the result

R₁ + R₂ < 1

(c) Argue that there are points in the capacity region of this multiple-access channel that can only be achieved by time-sharing; that is, there exist achievable rate pairs (R₁R₂) that lie in the capacity region for the channel but not in the region defined by

R₁ ≤ I(X₁;Y|X₂)

R₂ ≤ I(X₂;Y|X₁)

R₁ + R₂ ≤ I(X₁, X₂;Y)

for any product distribution p(x₁)p(x₂). Hence the operation of convexification strictly enlarges the capacity region.

——————————————————————————–—————————–

From part (b) we know that for a non-degenerate distribution the combined rate is: R₁ + R₂ < 1. We also know that the rates R₁ and R₂ are bounded by 1: R₁ < 1, R₂ < 1. For degenerate distributions we have that I(X₁;Y|X₂) = 0 and I(X₂;Y|X₁) = 0 yielding rates R₁ = 0 or R₂ = 0. The achievable pairs (R₁R₂) should lie in the region that can be achieved only by time sharing are those lying on the line R₁ + R₂ = 1 which defines the triangular capacity region for the rate pairs (1, 0) and (0, 1) as in part (a).

Не го разбирам баш што сака да каже!?

1.12.7 Convexity of capacity region of broadcast channel

Ќе ја скокнам засега!!!

1.12.8 Slepian-Wolf for deterministically related sources.

Find and sketch the Slepian Wolf rate region for the simultaneous data compression of (X, Y), where y = f(x) is some deterministic function of x.

——————————————————————————–———

R₁ ≥ H(X|Y) R₂ ≥ H(Y|X) R₁ + R₂ ≥ H(X, Y)

R₁ + R₂ ≥ H(X) ≥ R₁ ≥ H(X|f(X))

H(f(X), X) = H(X) + \cancelto0H(f(X)|X) = H(f(X)) + H(X|f(X)) → H(X) ≥ H(f(X))

H(X|f(X)) = H(X) − H(f(X))

Stanford solution

The quantities defining hte Slepian Wolf rate regon are H(X, Y) = H(X), H(Y|X) = 0, H(X|Y) ≥ 0. Hence the rate region is as shown on the figure.

Исто сум ја решил како во решението од Stanford hw3sol.pdf. Згрешив само во косата линија од H(X|Y) до H(X). Сака да каже кога R₂ = 0, R₁ треба да е поголемо од H(X) што следи од (*). Како почнува да расте R₂ така долната граница на R₁ се намалува но не подолу од H(X|Y) заради (∇). Точката (H(X|Y), H(Y)) следи од (*) затоа што R₁ + R₂ ≥ H(Y) + H(X|Y) = H(X, Y) .

1.12.9 Problem 15.9 Slepian-Wolf

Let X_i be i.i.d Bernoulli(p). Let Z_i be i.i.d ~ Bernoulli(r), and let Z be independent of X. Finally, let Y = X⊕Z. Let X be described at rate R₁ and Y be described at rate R₂. What region of rates allows recovery of X, Y with probability of error tending to zero?

p(X, Y, Z) p(Y|X, Z)

(X, Z)|Y 0 1 (0, 0) (1 − p)(1 − r) 0 (0, 1) 0 (1 − p)⋅r (1, 0) 0 p⋅(1 − r) (1, 1) pr 0 p(Y) pr + (1 − p)(1 − r) (1 − p)r + p(1 − r) (X, Z)|Y 0 1 (0, 0) 1 0 (0, 1) 0 1 (1, 0) 0 1 (1, 1) 1 0

R₁ < I(X;Y|Z) R₂ < I(Z;Y|X) R₁ + R₂ < I(X, Z;Y)

p(Y = 1) = (1 − p)r + p(1 − r) = r − pr + p − pr = r + p − 2pr

p(Y = 0) = pr + (1 − p)(1 − r) = pr + 1 − r − p + pr = 1 − r − p + 2pr

I(X;Y|Z) = H(X) − \cancelto0H(X|Y, Z) = H(X) = H(p) ≤ 1 Ако ги знаеш (Y, Z) одма го наоѓаш X. Инверзна операција од mod(2) .

I(Z;Y|X) = H(Z) − \cancelto0H(Z|X, Y) = H(Z) = H(r) ≤ 1

I(X, Z;Y) = H(Y) − \cancelto0H(Y|X, Z) = H(r + p − 2pr)\begin_inset Separator latexpar\end_inset

I(X;Y) = I(X, Z;Y) − I(Z;Y|X) = H(r + p − 2pr) − H(r) I(Z;Y) = I(X, Z;Y) − I(X;Y|Z) = H(r + p − 2pr) − H(p)

Сликава е за p = 0.3 r = 0.3

Ова е супер ама треба обратно R₂ го опишува Y а не Z.

p(X, Y, Z) p(Y|X, Z)

p(X, Y, Z) p(Z|X, Y)

(X, Y)|Z 0 1 (0, 0) (1 − p)⋅(1 − r − p + 2pr) 0 (0, 1) 0 (1 − p)⋅(r + p − 2pr) (1, 0) p⋅(1 − r − p + 2pr) (1, 1) p(r + p − 2pr) 0 p(Z) pr + (1 − p)(1 − r) (1 − p)r + p(1 − r) (X, Y)|Z 0 1 (0, 0) 1 0 (0, 1) 0 1 (1, 0) 0 1 (1, 1) 1 0

p(Y = 1) = (1 − p)r + p(1 − r) = r − pr + p − pr = r + p − 2pr

p(Y = 0) = pr + (1 − p)(1 − r) = pr + 1 − r − p + pr = 1 − r − p + 2pr

p(Z = 0) = (1 − p)⋅(1 − r − p + 2pr) + p(r + p − 2pr) = 1 − r − p + 2pr − \cancelp + pr + \cancelp² − \cancel2p²r + pr + p² − \cancel2p²r = 1 − r − 2 p + 4 pr + 2 p² − 4 p²r

p(Z = 1) = (1 − p)⋅(r + p − 2pr) + p⋅(1 − r − p + 2pr) = r + 2 p − 4 pr − 2 p² + 4 p²r

1 − r − 2 p + 4 pr + 2 p² − 4 p²r = 1 − r → p = p; r = (1)/(2)

r + 2 p − 4 pr − 2 p² + 4 p²r = r → p = p; r = (1)/(2)

R₁ < I(X;Z|Y) = H(Z|Y) − H(Z|Y, X) = H(Z|Y) = p(Y = 0)H(Z|Y = 0) + p(Y = 1)H(Z|Y = 1) = H(p) ≤ 1

H(Z|Y = 0) = − p(Z = 0|Y = 0)log(p(Z = 0|Y = 0)) − p(Z = 1|Y = 0)log(p(Z = 1|Y = 0)) = − p(X = 0)log(p(X = 0)) − p(X = 1)log(p(X = 1)) = H(p)

H(Z|Y = 1) = H(p)

R₂ < I(Y;Z|X) = H(Z|X) − \cancelto0H(Z|X, Y) = H(Z) = H(r) ≤ 1

R₁ + R₂ <I(X, Y;Z) = H(Z) − H(Z|XY) = H(r) ≤ 1

Регионот на брзини што дозволува веројатноста на грешка да тежнее кон 0 е:

R₁ ≤ 1 R₂ ≤ 1 R₁ + R₂ ≤ 1\begin_inset Separator latexpar\end_inset

UIC ECE534 Solutions

Овие третирале само кодирање на изворот. И мене ми текна штом спомнуваат Slepian-Woolf но ја тераф како за регион на капацитети. Истите работи сум ги мотал со тоа што јас мислам сум отидол чекор понапред во размислувањата. Сепак очигледно тоа не е предмет на задачата.

p(X, Y, Z) p(Y|X, Z)

p(Y = 1) = (1 − p)r + p(1 − r) = r − pr + p − pr = r + p − 2pr

p(Y = 0) = pr + (1 − p)(1 − r) = pr + 1 − r − p + pr = 1 − r − p + 2pr

H(X) = H(p)

H(Z) = H(r)

H(Y) = H(r + p − 2pr)

H(X, Y) = H(X) + H(Y|X) = (*)

H(Y|X) = p(X = 0)H(Y|X = 0) + p(X = 1)H(Y|X = 1)

H(Y|X = 0) = − p(Y = 0|X = 0)log(p(Y = 0|X = 0)) − p(Y = 1|X = 0)log(p(Y = 1|X = 0)) =

= − p(Z = 0)log(p(Z = 0) − p(Z = 1)log(p(Z = 1) = H(r)

H(Y|X = 1) = − p(Y = 0|X = 1)log(p(Y = 0|X = 1)) − p(Y = 1|X = 1)log(p(Y = 1|X = 1)) = − p(Z = 1)log(p(Z = 1) − p(Z = 0)log(p(Z = 0) = H(r)

H(Y|X) = H(r)

(*) = H(p) + H(r)

\mathchoiceR₁ ≥ H(X|Y) = H(X, Y) − H(Y) = H(p) + H(r) − H(r + p − 2pr)R₁ ≥ H(X|Y) = H(X, Y) − H(Y) = H(p) + H(r) − H(r + p − 2pr)R₁ ≥ H(X|Y) = H(X, Y) − H(Y) = H(p) + H(r) − H(r + p − 2pr)R₁ ≥ H(X|Y) = H(X, Y) − H(Y) = H(p) + H(r) − H(r + p − 2pr)

\mathchoiceR₂ ≥ H(Y|X) = H(r)R₂ ≥ H(Y|X) = H(r)R₂ ≥ H(Y|X) = H(r)R₂ ≥ H(Y|X) = H(r)

\mathchoiceR₁ + R₂ ≥ H(X, Y) = H(p) + H(r)R₁ + R₂ ≥ H(X, Y) = H(p) + H(r)R₁ + R₂ ≥ H(X, Y) = H(p) + H(r)R₁ + R₂ ≥ H(X, Y) = H(p) + H(r)

Задачава се решава со истите финит што сум ги користел во мојот самостоен обид!!!

1.12.10 Broadcast capacity depends only on the conditional marginals.

Consider the general broadcast channel (X, Y₁ xY₂, p(y₁, y₂|x)). Show that the capacity region depends only on p(y₁|x) and p(y₂|x). To do this, for any given ((2^nR₁, 2^nR₂), n) code, let

P⁽ⁿ⁾₁ = P{Ŵ₁( Y₁) ≠ W₁}

P⁽ⁿ⁾₂ = P{Ŵ₂( Y₂) ≠ W₂}

P⁽ⁿ⁾ = P{(Ŵ₁Ŵ₂) ≠ (W₁W₂)}

Then show that

max{P⁽ⁿ⁾₁, P⁽ⁿ⁾₂} ≤ P⁽ⁿ⁾ ≤ P⁽ⁿ⁾₁ + P⁽ⁿ⁾₂

The result now follows by a simple argument. (Remark: The probability of error P⁽ⁿ⁾ does depend on the conditional joint distribution p(y₁, y₂|x). But whether or not P⁽ⁿ⁾ can be driven to zero [at rates (R₁R₂)] does not [except through the conditional marginals p(y₁|x), p(y₂|x)].)

Solution Ben-Guron University (hw2sol.pdf)

By the union of events bound, it is obvious that (мене повеќе ми изгледа на пресек отколку на униjа, ама изгеда заради суперпозицијата ако е грешно декодиран W₁ следи дека е грешно декодиран и W₂ и обратно ако е грешно декодиран W₂ тогаш е грешно декодиран и W₂ )

P⁽ⁿ⁾ = Pr(Ŵ₁( Y₁) ≠ W₁∪Ŵ₂( Y₂) ≠ W₂) = P{Ŵ₁( Y₁) ≠ W₁} + P{Ŵ₂( Y₂) ≠ W₂} − P(Ŵ₁( Y₁) ≠ W₁∩Ŵ₂( Y₂) ≠ W₂)

≤ P{Ŵ₁( Y₁) ≠ W₁} + P{Ŵ₂( Y₂) ≠ W₂} ≤ P⁽ⁿ⁾₁ + P⁽ⁿ⁾₂.

Also since (Ŵ₁( Y₁) ≠ W₁) or (Ŵ₂( Y₂) ≠ W₂) implies ((Ŵ₁Ŵ₂) ≠ (W₁W₂)) , we have

P⁽ⁿ⁾ ≥ max{P⁽ⁿ⁾₁, P⁽ⁿ⁾₂}

The probability of error, P⁽ⁿ⁾ for a broadcast channel does depend on the joint conditional distribution. However, the individual probabilities of error P⁽ⁿ⁾₁ and P⁽ⁿ⁾₂ however depend only on the conditional marginal distributions p(y₁|x) and p(y₂|x) respectively. Hence if we have a sequence of codes for a particular broadcast channel with P⁽ⁿ⁾ → 0, so that P⁽ⁿ⁾₁ → 0 and P⁽ⁿ⁾₂ → 0, then using the same codes for another broadcast channel with the same conditional marginals will ensure that P⁽ⁿ⁾ for that channel as well, and the corresponding rate pair is achievable for the second channel. Hence the capacity region for a broadcast channel depends only on the conditional marginals.

1.12.11 Converse for the degraded broadcast channel

The following chain of inequalities proves the converse for the degraded discrete memory-less broadcast channel. Provide reasons for each of the labeled inequalities.

Setup for converse for degraded broadcast channel capacity:

(157) \mathchoice(W₁, W₂)_indep. → Xⁿ(W₁W₂) → Yⁿ₁ → Yⁿ₂(W₁, W₂)_indep. → Xⁿ(W₁W₂) → Yⁿ₁ → Yⁿ₂(W₁, W₂)_indep. → Xⁿ(W₁W₂) → Yⁿ₁ → Yⁿ₂(W₁, W₂)_indep. → Xⁿ(W₁W₂) → Yⁿ₁ → Yⁿ₂

Encoding:

f_n:2^nR₁ x2^nR₂ → Xⁿ

Decoding:

g_n:Yⁿ₁ → 2^nR₁, h_n:Yⁿ₂ → 2^nR₂.

Let \mathchoiceU_i = (W₂, Y^i − 1₁)U_i = (W₂, Y^i − 1₁)U_i = (W₂, Y^i − 1₁)U_i = (W₂, Y^i − 1₁). Случајната променлива U_i зависи од втората порака и од претходните вредности на Y₁ Then

nR₂\overset(*) ≤ H(W₂) = I(W₂;Yⁿ₂) − H(Yⁿ₂|W₂) ≤ I(W₂;Yⁿ₂)\overset(a) = ∑ⁿ_i = 1I(W₂;Y_2i|Y^i − 1₂)\overset(b) = ∑ⁿ_i = 1H(Y_2i|Y^i − 1₂) − H(Y_2i|Y^i − 1₂W₂) =

\overset(c) ≤ ∑ⁿ_i = 1H(Y_2i) − H(Y_2i|Y^i − 1₂W₂Y^i − 1₁)\overset(d) = ∑ⁿ_i = 1H(Y_2i) − H(Y_2i|W₂Y^i − 1₁) = ∑ⁿ_i = 1H(Y_2i) − H(Y_2i|U_i)\overset(e) = ∑ⁿ_i = 1I(U_i;Y_2i)

(a) chain rule for mutual information

(b) definition of mutual information

(d) broadcast channel is memoryless and degraded, hence current outputs doesn’t depend on previous outputs

(e) definition of auxiliary random variable and definition of mutual information.

Continuation of the converse

Continuation of converse:

Give reasons for the labeled inequalities:

nR₁ ≤ H(W₁) = I(W₁;Yⁿ₁) − H(W₁|Yⁿ₁)\overset\mathchoice(f)(f)(f)(f) ≤ I(W₁;Y⁽ⁿ⁾₁) ≤ \overset(**)I(W₁;Yⁿ₁) + I(W₁;W₂|Yⁿ₁) = \mathchoiceI(W₁;Yⁿ₁, W₂)I(W₁;Yⁿ₁, W₂)I(W₁;Yⁿ₁, W₂)I(W₁;Yⁿ₁, W₂) = I(W₁;W₂) + I(W₁;Yⁿ₁|W₂)

\overset(g) ≤ I(W₁;Yⁿ₁|W₂)\overset(h) = ⁿ⎲⎳_i = 1I(W₁;Y_1i|W₂Y^i − 1₁)\overset(i) ≤ ⁿ⎲⎳_i = 1I(X_i;Y_1i|U_i)

(f) Due to the expressions given in (**)

(g) I(W₁, W₂) = 0 due to independence

(h) chain rule fore mutual information

(i) due to the definition of auxiliary variable U, and data processing (from formulation of the problem)

Со оглед на тоа што се рабтои за memoryless канал покрај 157↑важи и:

(W₁, W₂)_indep. → X_i(W₁W₂) → Y_1i → Y_2i

W₁ → X_i → Y_1i ⇒ I(W₁;Y_1i) ≤ I(X_i;Y_1i)

Now let Q be a time-sharing random variable with P_r(Q = i) = (1)/(n); i = 1, 2, 3, ..., n. Justify the following:

R₁ ≤ I(X_1Q;Y_1Q|U₁, Q)

R₂ ≤ I(U_Q;Y_2Q|Q)

for some distribution p(q)p(u|q)p(x|u, q)p(y₁, y₂|x).

By appropriately redefining U, argue that this region is equal to the convex closure of regions of the form

R₁ ≤ I(X;Y₁|U)

R₂ ≤ I(U;Y₂)

n⋅R₂ ≤ n⋅(1)/(n)⋅ⁿ⎲⎳_i = 1I(U_i;Y₂) = n⋅ⁿ⎲⎳_i = q\canceltop(q)(1)/(n)⋅I(U_q;Y_2q|Q = q) = n⋅ⁿ⎲⎳_i = qp(q)⋅I(U_q;Y_2q|Q = q) = n⋅I(U_Q;Y_2Q|Q) → R₂ ≤ I(U_Q;Y_2Q|Q)

1.12.12 Capacity points (MMV)

(a) For the degraded broadcast channel X → Y₁ → Y₂ find the points a and b where the capacity region hits the R₁ and R₂ axes.\begin_inset Separator latexpar\end_inset

Figure 45 Capacity region of degraded broadcast channel

(b) Show that b ≤ a.

R₁ < I(X;Y₁|U)

R₂ < I(U;Y₂)

(a)

a = I(X;Y₁|U) b = I(U;Y₂);

(b)

p(u)p(x|u)p(y₁, y₂|x)

I(U, X;Y₁) = I(X;U) + I(X;Y₁|U)

I(U, X;Y₂) = I(U;Y₂) + I(X;Y₂|U) = R₂ + I(X;Y₂|U) ≤ R₂ + I(X;Y₁|U)

I(U, X;Y₂) ≤ I(U, X;Y₁) due to data processing

(U, X) → Y₁ → Y₂

I(U, X;Y₁) = \cancelto0I(X;U) + I(X;Y₁|U) ≥ R₂ + I(X;Y₂|U) → R₁ ≥ R₂ + I(X;Y₂|U) → R₁ > R₂

if U = f(X); I(X;Y) = 0

Alternative approach

R₂ ≤ I(U;Y₂) ≤ I(X;Y₁)

I(U, X;Y₁) = I(X;U) + I(X;Y₁|U) = I(X;Y₁) + \cancelto0I(U;Y₁|X) = I(X;Y₁) → I(X;Y₁) ≥ I(X;Y₁|U) > R₁

I(U;Y₁|X) = H(Y₁|X) − H(Y₁|X, U) = |markovity i.e. degradedness| = H(Y₁|X) − H(Y₁|X) = 0

I(U, X;Y₁) = I(X;Y₁) + \cancelto0I(U;Y₁|X) = I(X;Y₁) = I(X;U) + I(X;Y₁|U)

R₁ ≥ R₂

Stanford university (hw3sol.pdf) Solution

The capacity region of the degraded broadcast channel X → Y₁ → Y₂ is the convex hull of regions of the form

R₁ < I(X;Y₁|U); R₂ < I(U;Y₂)

over all choices of the auxiliary random variable U and joint distribution of the form p(u)p(x|u)p(y₁, y₂|x).

The region is of the form 45↑.

The point b on the figure corresponds to the maximum achievable rate from the sender to receiver 2. From the expression for the capacity region, it is the maximum value of I(U;Y₂), for all auxiliary random variables U.

For any random variable U and p(u)p(x|u), U → X → Y₂ forms a Markov chain, and hence I(U;Y₂) ≤ I(X;Y₂) ≤ max_p(x)I(X;Y₂). The maximum can be achieved by setting U = X and choosing the distribution of X to be the one that maximizes I(X, Y₂). Hence the point b corresponds to \mathchoiceR₂ = max_p(x)I(X;Y₂)R₂ = max_p(x)I(X;Y₂)R₂ = max_p(x)I(X;Y₂)R₂ = max_p(x)I(X;Y₂), R₁ = I(X;Y₁|U) = I(X;Y₁|X) = H(X|X) − H(X|Y₁, X) = 0 − 0 = 0. Многу е важно ова во црвеново. Сака да каже дека горната граница на податочната брзина може да оди до максимумот на трансинформацијата на десната страна од неравенството. The point a has similar interpretation. The point a corresponds to the maximum rate of transmission to receiver 1. From the expression for the capacity region,

\mathchoiceR₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X)R₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X)R₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X)R₁ ≤ I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X)

Since U → X → Y₁ forms a Markov chain. Since H(Y₁|U) ≤ H(Y₁), we have

\mathchoiceR₁ ≤ H(Y₁) − H(Y₁|X) = I(X;Y₁) ≤ max_p(x)I(X;Y₁)R₁ ≤ H(Y₁) − H(Y₁|X) = I(X;Y₁) ≤ max_p(x)I(X;Y₁)R₁ ≤ H(Y₁) − H(Y₁|X) = I(X;Y₁) ≤ max_p(x)I(X;Y₁)R₁ ≤ H(Y₁) − H(Y₁|X) = I(X;Y₁) ≤ max_p(x)I(X;Y₁)

The maximum is attained when we set U = 0 and chose p(x) = p(x|u)to be distribution that maximizes I(X;Y₁). In this case, R₂ ≤ I(U;Y₂) = 0.

Hence the point a corresponds to the rates R₁ = max_p(x)I(X;Y₁), R₂ = 0.

The results have a simple single user interpretation. If we are not sending any information to receiver 1, then we can treat the channel to receiver 2 as single user channel and send at capacity for this channel, i.e., max{I(X;Y₂)}. Similarly, if we are not sending any information to receiver 2, we can send at capacity to receiver 1, which is maxI(X;Y₁).

(b) Since X → Y₁ → Y₂ forms Markov chain for all distributions p(x) we have by the data processing inequality

b = max_p(x)I(X;Y₂) = I(X^*;Y₂) ≤ I(X^*;Y₁) = max_p(x)I(X;Y₁) = a

where X^* has distribution that maximizes I(X;Y₂).

1.12.13 Degraded broadcast channel.

Find the capacity region for the degraded broadcast channel shown below

R₂ < I(U;Y₂) ≤ I(X;Y₂) = H(X) − H(X|Y₂) = 1 − H(X|Y₂) = H(Y₂) − H(Y₂|X)

αp = (1 − p)α + pα = α − pα + αp = α

αp = (1 − p)(1 − α) + p(1 − α) = 1 − α − p + p⋅α + p − pα = 1 − α

1 − αp = 1 − pα − (1 − p)α = 1 − pα − α + pα = 1 − α

αp + 1 − α = α − p⋅α + pα + 1 − α = 1

R₂ < (1 − α)H(X)

R₁ < I(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X) = H(βp) − (1 − α)H(X)

——————————————————————————–——————————————————-
Потсетување на примерот од предавања.

Ben-Guron Univeristy hw2sol.pdf Solution

From the expression for the capacity region, it is clear that the only on trivial possibility for the auxiliary random variable U is that it be binary. From the symmetry of the problem we see that the auxiliary random variable should be connected to X by a binary symmetric channel with parameter β.

Hence we have the setup as shown in figure below:

Figure 46 Broadcast channel with auxiliary random variable

We can now evaluate the capacity region for this choice of auxiliary random variable. By symmetry best distribution for U is uniform.

\mathchoiceR₂R₂R₂R₂ = I(U;Y₂) = H(Y₂) − H(Y₂|U) = H⎛⎝(α)/(2), α, (α)/(2)⎞⎠ − H(Y₂|U) = H⎛⎝(α)/(2), α, (α)/(2)⎞⎠ − H((βp + βp)α, α, (βp + β⋅p)⋅α) = (□)

H(Y₂|U) = P(U = 0)H(Y₂|U = 0) + p(U = 1)H(Y₂|U = 1) = H((βp + βp)α, α, (βp + β⋅p)⋅α)
H(Y₂|U = 0) = H((βp + βp)α, α, (βp + β⋅p)⋅α) = H(Y₂|U = 1) P(U = 0) = P(U = 1) = (1)/(2)
(1 − β)(1 − p)(1 − α) + βp(1 − α) = βpα + βpα = (βp + βp)α
(1 − β)⋅(1 − p)⋅α + (1 − β)⋅p⋅α + β⋅p⋅α + β⋅(1 − p)⋅α = α
(1 − β)⋅p⋅(1 − α) + β(1 − p)(1 − α) = βp⋅α + β⋅p⋅α = (βp + β⋅p)⋅α

(□) = (α)/(2)log(2)/(α) + αlog(1)/(α) + (α)/(2)log(2)/(α) − (β⋅p + β⋅p)⋅α⋅log⎛⎝(1)/((β⋅p + β⋅p)⋅α)⎞⎠ − αlog(1)/(α) − (βp + β⋅p)⋅α⋅log⎛⎝(1)/((βp + β⋅p)⋅α)⎞⎠

(1 − α)log(2)/(1 − α) + αlog(1)/(α) − αlog(1)/(α) − (β⋅p + β⋅p)⋅α⋅log⎛⎝(1)/((β⋅p + β⋅p)⋅α)⎞⎠ − (βp + β⋅p)⋅α⋅log⎛⎝(1)/((βp + β⋅p)⋅α)⎞⎠

\oversetH(α)(1 − α)log(1)/(1 − α) + αlog(1)/(α) − αlog(1)/(α) + (1 − α)log2 + (β⋅p + β⋅p)⋅α⋅log⎛⎝(1)/((β⋅p + β⋅p)⋅α)⎞⎠ + (βp + β⋅p)⋅α⋅log⎛⎝(1)/((βp + β⋅p)⋅α)⎞⎠

H(α) − αlog(1)/(α) + \oversetα(1 − α) − (β⋅p + β⋅p)⋅α⋅log⎛⎝(1)/((β⋅p + β⋅p)⋅α)⎞⎠ − (βp + β⋅p)⋅α⋅log⎛⎝(1)/((βp + β⋅p)⋅α)⎞⎠ = (△)

1 − (β⋅p + β⋅p) = 1 − (1 − β)(1 − p) − βp = 1 − (1 − p − β + βp) − βp = 1 − 1 + p + β − βp − βp = p + β − 2βp

(βp + β⋅p) = (1 − β)p + β(1 − p) = p − βp + β − βp = p + β − 2βp

Значи:

\mathchoice(βp + β⋅p) = 1 − (β⋅p + β⋅p)(βp + β⋅p) = 1 − (β⋅p + β⋅p)(βp + β⋅p) = 1 − (β⋅p + β⋅p)(βp + β⋅p) = 1 − (β⋅p + β⋅p)

(△) = H(α) − αlog(1)/(α) + αH⎛⎝(1)/(2)⎞⎠ − \underset − αH((βp + β⋅p))(β⋅p + β⋅p)⋅α⋅log⎛⎝(1)/((β⋅p + β⋅p))⎞⎠ − (βp + β⋅p)⋅α⋅log⎛⎝(1)/((βp + β⋅p))⎞⎠ − (β⋅p + β⋅p)⋅α⋅log(1)/(α) − (βp + β⋅p)⋅α⋅log(1)/(α) =

H(α) − αlog(1)/(α) + αH⎛⎝(1)/(2)⎞⎠ − \undersetαH((βp + β⋅p)) − (β⋅p + β⋅p)⋅α⋅log(1)/(α) − (βp + β⋅p)⋅α⋅log(1)/(α) =

H(α) − αlog(1)/(α) + αH⎛⎝(1)/(2)⎞⎠ − \undersetαH((βp + β⋅p)) − (1 − \cancel(βp + β⋅p))⋅α⋅log(1)/(α) − \cancel(βp + β⋅p)⋅α⋅log(1)/(α) =

H(α) + αlog(1)/(α) + αH⎛⎝(1)/(2)⎞⎠ − \undersetαH((βp + β⋅p)) − α⋅log(1)/(α) = H(α) − H(α) + + αH⎛⎝(1)/(2)⎞⎠ − \undersetαH((βp + β⋅p)) = \mathchoiceα⋅(1 − H((βp + β⋅p)))α⋅(1 − H((βp + β⋅p)))α⋅(1 − H((βp + β⋅p)))α⋅(1 − H((βp + β⋅p)))

\mathchoiceR₁R₁R₁R₁ = I(X;Y₁|U) = H(Y₁|U) − H(Y₁|U, X) = H(Y₁|U) − H(Y₁|X) = \mathchoice\overset(*)H(βp + βp) − H(p)\overset(*)H(βp + βp) − H(p)\overset(*)H(βp + βp) − H(p)\overset(*)H(βp + βp) − H(p)

Еквивалентата crossover probability за првите два сегменти од каскадата е:

(*) (1 − β)p + β(1 − p) = βp + βp

These equations characterize the boundary of the capacity region as β varies. When β = 0, then R₁ = 0 and R₂ = α(1 − H(p)). When β = (1)/(2), we have R₁ = 1 − H(p) and R₂ = 0.

(1)/(2)(1 − p) + (1)/(2)p = (1)/(2) → H(βp + βp) = H⎛⎝(1)/(2)⎞⎠ = 1

βp + βp = (1)/(2)p + (1)/(2)(1 − p) = (1)/(2)

1.12.14 Channels with unknown parameters

We are given a binary symmetric channel with parameter p. The capacity is C = 1 − H(p). Now we change the problem slightly. The receiver knows only that p ∈ {p₁, p₂} (i.e., p = p₁ or p = p₂, where p₁ and p₂ are given real numbers). The transmitter knows the actual value of p. Devise two codes for use by the transmitter, one to be used if p = p₁, the other to be used if p = p₂, such that transmission to the receiver can take place at rate ≈ C(p₁) if p = p₁ and at rate ≈ C(p₂) if p = p₂. (Hint: Devise a method for revealing p to the receiver without affecting the asymptotic rate. Prefixing the codeword by a sequence of 1’s of appropriate length should work.)

EIT Solution Complete

We have two possibilities; the channel is a BSC with parameter p₁ or a BSC with parameter p₂. If both sender and receiver know that state of channel, then we can achieve the capacity corresponding to which channel is in use, i.e., 1 − H(p₁) or 1 − H(p₂).

If the receiver does not know the state of the channel, then he cannot know which codebook is being used by the transmitter. He cannot then decode optimally; hence he cannot achieve the rates corresponding to the capacities of the channel.

But the transmitter can inform the receiver of the state of the channel so that the receiver can decode optimally. To do this, the transmitter can precede the codewords by a sequence of 1’s and 0’s. Let us say we use a string of m 1’s to indicate that the channel was in state p₁ and m 0’s to indicate state p₂. Then, if m = o(n) and m → ∞, where n is the block length of the code used, we have the probability of error in decoding the state of the channel going to zero. Since the receiver will then use the right code for the rest of the message, it will be decoded correctly with P⁽ⁿ⁾_e → 0.

The effective rate for this code is:

R = (log2^nC(p_i))/(n + m) → C(p_i) since m = o(n)

So we can achieve the same asymptotic rate as if both sender and receiver knew the state of the channel.

1.12.15 Two way channel

Consider the tow-way channel shown in bellow. The outputs Y₁ and Y₂ depend only on the current inputs X₁ and X₂.\begin_inset Separator latexpar\end_inset

Обратно се стрелките за X₂ и Y₂ .

(a) By using independently generated codes for the two senders, show that the following rate regions is achievable:

R₁ < I(X₁;Y₂|X₂)

R₂ < I(X₂;Y₁|X₁)

for some product distribution p(x₁)p(x₂)p(y₁y₂|x₁x₂).

(b) Show that the rates for any code for a two-way channel with arbitrarily small probability of error must satisfy (converse).

R₁ ≤ I(X₁;Y₂|X₂)

R₂ ≤ I(X₂;Y₁|X₁)

for some joint distribution p(x₁x₂)p(y₁, y₂|x₁x₂).

The inner and outer bounds on the capacity of the two-way channel are due to Shannon [3]. He also showed that the inner bound and the outer bound do not coincide in the case of the binary multiplying channel X₁ = X₂ = Y₁ = Y₂ = {0, 1}, Y₁ = Y₂ = X₁, X₂ . The capacity of the two-way channel is still and open problem.

(a)

——————————————————————————–——————————————————————————–—————–

nR₁ = H(W₁) = I(W₁;Yⁿ₂) + H(W₁|Yⁿ₂) ≤ I(W₁;Yⁿ₂) + nϵ_n ≤ I(Xⁿ₁;Yⁿ₂) + nϵ_n ≤ H(Yⁿ₂) − H(Yⁿ₂|Xⁿ₁Xⁿ₂) = ⁿ⎲⎳_i = 1H(Y_2i|Y^i − 1₂) − ⁿ⎲⎳_i = 1H(Y_2i|Y^i − 1₂Xⁿ₁)

= ∑ⁿ_i = 1H(Y_2i|Y^i − 1₂) − ∑ⁿ_i = 1H(Y_2i|X_1iX_2i) ≤ ∑ⁿ_i = 1H(Y_2i) − ∑ⁿ_i = 1H(Y_2i|X_1iX_2i) = ∑ⁿ_i = 1H(Y_2i|X_2i) − ∑ⁿ_i = 1H(Y_2i|X_1iX_2i) = ∑ⁿ_i = 1I(X_1i;Y_{2_i}|X_2i)

——————————————————————————–——————————————————————————–—————————————————–

≤ ∑ⁿ_i = 1H(X_1i|X^i − 1₁Xⁿ₂) − ∑ⁿ_i = 1H(X_1i|Yⁿ₂X^1 − i₁Xⁿ₂) = ∑ⁿ_i = 1H(X_1i|X_2i) − ∑ⁿ_i = 1H(X_1i|X_2iY_2i) = ∑ⁿ_i = 1I(X_1i;Y_{2_i}|X_2i)

(a) Conditioning reduces entropy

(b) X₁ and X₂ are independent.

——————————————————————————–——————————————————————————–——————————————————-

Achievability

Recall for Multiple-access channel (Самостојно изведување)

E_ij = P(X_1i, X_2j, Y₂ ∈ A⁽ⁿ⁾_ϵ)

P_e = P(E^c₁₁) + ⎲⎳_j ≠ 1P(E_1i) + ⎲⎳_i ≠ 1P(E_i1) + ⎲⎳_i ≠ 1P(E_i1) + ⎲⎳_{i ≠ 1, j ≠ 1}P(E_ij) = ϵ + ⎲⎳_j ≠ 1P(E_1i) + ⎲⎳_i ≠ 1P(E_i1) + ⎲⎳_j ≠ 1P(E_1j) + ⎲⎳_{i ≠ 1, j ≠ 1}P(E_ij)

P(E_i1) = ⎲⎳_{x₁x₂y ∈ A⁽ⁿ⁾_ϵ}p(x₁, x₂, y) = ⎲⎳_{x₁x₂y ∈ A⁽ⁿ⁾_ϵ}p(x₁)p(x₂y) ≤ ⎲⎳_{x₁x₂y ∈ A⁽ⁿ⁾_ϵ}2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − ϵ)} ≤ 2^{n(H(X₁X₂, Y) + ϵ)}2^{− n(H(X₁) − ϵ)}2^{− n(H(X₂Y) − 2ϵ)} =

2^{n(H(X₁X₂, Y) + 3ϵ − H(X₁) − H(X₂Y))} = 2^{− n( − H(X₁X₂Y) − 3ϵ + H(X₁) + H(X₂Y))} = 2^{− n(I(X₁;Y|X₂) − 3ϵ)}

p(x₁, x₂, y) = p(x₁)p(x₂|x₁)p(y|x₁x₂) = p(x₁)p(x₂)p(y|x₁x₂) = p(x₁)p(x₂)p(y|x₁x₂) = p(x₁)p(x₂y)

− H(Y|X₁X₂) + H(Y|X₂) = I(X₁;Y|X₂)

∑_j ≠ 1P(E_1j) ≤ 2^nR₁⋅2^{− n(I(X₁;Y|X₂) − 3ϵ)}

R₁ ≤ I(X₁;Y|X₂) → if n → ∞ → P(E_i1) → 0

Слично важи за членот

∑_ј ≠ 1P(E_1 ј) ≤ 2^nR₂⋅2^{− n(I(X₂;Y|X₁) − 3ϵ)} → R₂ ≤ I(X₂;Y|X₁)

односно за членот

∑_{i ≠ 1, j ≠ 1}P(E_ij) ≤ 2^{n(R₁ + R₂)}⋅2^{− n(I(X₁, X₂;Y) − 3ϵ)} → R₁ + R₂ ≤ I(X₁, X₂;Y)

——————————————————————————–———————————————————-

E_ij = P(X_1i, X_2j, Y_1i, Y_2j ∈ A⁽ⁿ⁾_ϵ)

P(E_i1) = ⎲⎳_{(x₁x₂y₁y₂) ∈ A⁽ⁿ⁾_ϵ}p(x₁y₂|x₂)p(x₂y₁|x₁) = ⎲⎳_{(x₁x₂y₁y₂) ∈ A⁽ⁿ⁾_ϵ}p(x₁y₂|x₂)p(x₂y₁|x₁) ≤ ⎲⎳_{(x₁x₂y₁y₂) ∈ A⁽ⁿ⁾_ϵ}2^{− n(H(X₁Y₂|X₂) − 2ϵ)}2^{− n(H(X₂Y₁|X₁) − 2ϵ)} ≤

2^{n(H(X₁X₂, Y₁, Y₂) + ϵ)}2^{− n(H(X₁Y₂|X₂) − ϵ)}2^{− n(H(X₂Y₁|X₁) − ϵ)}

2^{n(H(X₁X₂, Y₁) + 5ϵ − H(X₁Y₂|X₂) − H(X₂Y₁|X₁))} = 2^{− n( − H(X₁X₂Y₁Y₂) − 5ϵ + H(X₁Y₂|X₂) + H(X₂Y₁|X₁))} = 2^{− n(I(X₁;Y₂|X₂) − 5ϵ)}

∑_j ≠ 1P(E_1j) ≤ 2^nR₁⋅2^{− n(I(X₁;Y|X₂) − 3ϵ)}

R₁ ≤ I(X₁;Y|X₂) → if n → ∞ → P(E_i1) → 0

H(X₁X₂Y₁Y₂) − 5ϵ + H(X₁Y₂|X₂) + H(X₂Y₁|X₁)

H(X₁X₂Y₁Y₂) = H(X₂) + H(Y₁|X₂) + H(X₁Y₂|Y₁X₂)

H(X₁X₂Y₁Y₂) = H(X₁) + H(X₂Y₁|X₁) + H(Y₂|X₁Y₁X₂)

R₁ ≤ I(Y₁;Y₂|X₁X₂)

——————————————————————————–———————————————————

EIT Solutions Complete

We will only outline the proof of achievability. It is quite straightforward compared to the more complex channels considered in the text.

Fix p(x₁)p(x₂)p(y₁y₂|x₁x₂)

Code generation:

Generate a cod of size 2^nR₁ of codewords X₁(w₁), where the x_1i are generated i.i.d. ~ p(x₁) . Similarly generate a codebook X₂(w₂) of size 2^nR₂ .

Encoding:

To send index w₁ form sender 1, he sends X₁(w₁) . Similarly sender 2 sends X₂(w₂).

Decoding:

Receiver 1 looks for the unique w₂, such that (X₁(w₁), X₂(w₂), Y₁) ∈ A⁽ⁿ⁾_ϵ(X₁X₂Y₁). If there is no such w₂ or more than one such, it declares an error. Similarly, receiver 2 looks for the unique w₁, such that (X₁(w₁), X₂(w₂), Y₂) ∈ A⁽ⁿ⁾_ϵ(X₁X₂Y₂).

Analysis of probability of error

We will only analyze the probability of error in receiver 1. The analysis in receiver 2 is similar.

Without loss of generality, by the symmetry of the random code construction, we can assume that (1, 1) was sent. We have and error at receiver 1if

- (X₁(1), X₂(1), Y₁) ≠ A⁽ⁿ⁾_ϵ(X₁X₂Y₁) The probability of this goes to 0 by the law of large numbers as n → ∞.

- There exist and j ≠ 1 , such that (X₁(1), X₂(j), Y₁) ∈ A⁽ⁿ⁾_ϵ(X₁X₂Y₁)

Define events

E_j = {(X₁(1), X₂(j), Y₁) ∈ A⁽ⁿ⁾_ϵ}

Then by the union of events bound,

P⁽ⁿ⁾_ϵ = P(E^c₁∪∪_j ≠ 1E_j) ≤ P(E^c₁) + ∑_j ≠ 1P(Ej)

where P is the probability given that (1, 1) was sent. From AEP, P(E^c₁) → 0.

P(E_j) = P((X₁X₂(j), Y₁) ∈ A⁽ⁿ⁾_ϵ) = ∑_x₁x₂yp(x₂)p(x₁y₁) ≤ |A⁽ⁿ⁾_ϵ|2^{− n(H(X₂) − ϵ)}2^{− n(H(X₁Y₁) − ϵ)} ≤ 2^{n(H(X₁X₂Y₁) − ϵ)}2^{− n(H(X₂) − ϵ)}2^{− n(H(X₁Y₁) − ϵ)}

= 2^{− n(I(X₂;X₁Y) − 3ϵ)} = 2^{− n(I(X₂;Y|X₁) − 3ϵ)}

I(X₂;X₁Y) = \cancelI(X₂;X₁) + I(X₂;Y|X₁) = I(X₂;Y|X₁)

since X₁X₂ are independent.

Therefore

P⁽ⁿ⁾_e ≤ ϵ + 2^nR₂2^{− n(I(X₂;Y|X₁) − 3ϵ)}

R₂ ≤ I(X₂;Y|X₁) → n → ∞ → P_e → 0

(b) The converse is a simple application of the general Tehorem 15.10.1 to this simple case. The sets S can be taken in turn ot be ech node. We will not go into the details.

1.12.16 Multiple-access channel

Let the output Y of a multiple access channel be given by

Y = X₁ + sgn(X₂),

where X₁, X₂ are both real and power limited,

E(X²₁) ≤ P₁,

E(X²₂) ≤ P₂,

and

sgn(x) = ⎧⎨⎩ 1 x > 0 − 1 x ≤ 0

Note that there is interference but no noise in this channel.

(a) Find the capacity region.

(b) Describe a coding scheme that achieves the capacity region.

——————————————————————————–———————–

R₁ < I(X₁;Y|X₂) R₂ < I(X₂;Y|X₁) R₁ + R₂ < I(X₁X₂;Y)

I(X₁;Y|X₂) = ?

——————————————————————————–——–

recall for gaussian channel

Y = X₁ + X₂ + Z

I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(X₁ + X₂ + Z|X₂) − H(X₁ + X₂ + Z|X₁X₂) ≤ (1)/(2)log₂2πe(P₁ + N) − (1)/(2)log₂2πe(N) =

= (1)/(2)log₂⎛⎝1 + (P₁)/(N)⎞⎠ = C⎛⎝(P₁)/(N)⎞⎠ → R₁ < C⎛⎝(P₁)/(N)⎞⎠

R₂ < C⎛⎝(P₂)/(N)⎞⎠

I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(X₁ + X₂ + Z) − H(X₁ + X₂ + Z|X₁X₂) ≤ (1)/(2)log₂2πe(P₁ + P₂ + N) − (1)/(2)log₂2πe(N) =

= (1)/(2)log₂⎛⎝1 + (P₁ + P₂)/(N)⎞⎠ = C⎛⎝(P₁ + P₂)/(N)⎞⎠ → R₁ + R₂ < C⎛⎝(P₁ + P₂)/(N)⎞⎠

——————————————————————————–———–

Y = X₁ + sgn(X₂)

I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(X₁ + sgn(X₂)|X₂) − H(X₁ + sgn(X₂)|X₁X₂) = H(X₁) ≤ (1)/(2)log₂2πe(P₁)

R₁ < (1)/(2)log₂2πe(P₁)

I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(X₁ + sgn(X₂)|X₁) − H(X₁ + sgn(X₂)|X₁X₂) = H(sgn(X₂))

≤ (1)/(2)log₂2πe(E(sgn(X₂)²)) = (1)/(2)log₂(2πe)

R₂ < (1)/(2)log₂2πe

I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(X₁ + sgn(X₂)) − H(X₁ + sgn(X₂)|X₁X₂) ≤ (1)/(2)log₂2πe(P₁ + 1) = (1)/(2)log₂(P₁ + 1)

R₁ + R₂ < (1)/(2)log₂(P₁ + 1)

——————————————————————————–———————–

EIT Solutions Complete

(a) This is continuous noiseless multiple access channel, if we let U₂ = sgn(X₂) we can consider a channel form X₁ and U₂ to Y.

I(X₁;Y|X₂) = h(Y|X₂) − h(Y|X₁X₂) = h(X₁ + U₂|X₂) − h(X₁ + U₂|X₁X₂) = h(X₁) − ( − ∞) = ∞

since X₁ and X₂ are independent and similarly

Ова ме потсеќа на distortion rate, т.е. sgn(X₂) е репрезентација на X₂ .

I(X, f(X);Y) = H(Y) − H(Y|X, f(X)) = H(Y) − H(Y|X) = I(X, Y)

I(X₁, X₂;Y) = h(Y) − h(Y|X₁X₂) = h(Y) − ( − ∞) = ∞

Thus we can send at infinite rate form X₁ to Y and at a maximum rate of 1 bit/transmission from X₂ to Y

(b) We can senda a 1 for X₂ in fist transmission, and knowing this, Y can recover X₁ perfectly, recovering an infinite number of bits. From then on, X₁ can be 0 and we can send 1 bit per transmission using the sign of X₂.

1.12.17 Slepian-Wolf

Let (X, Y) have joint probability mass function p(x, y)

p(x, y) 1 2 3 1 α β β 2 β α β 3 β β α

where β = (1)/(6) − (α)/(2). (Note: This is joint, not conditional, probability mass function.)

(a) Find the Slepian-Wolf rate region for this source.

(b) What isPr{X = Y} in terms of α?

(d) What is the rate region if α = (1)/(9)?

——————————————————————————–——————————————————–

R₁ ≥ H(X|Y) R₂ ≥ H(Y|X) R₁ + R₂ ≥ H(X, Y)

p(x, y) 1 2 3 p(x) 1 α β β α + 2β 2 β α β α + 2β 3 β β α α + 2β p(y) α + 2β α + 2β α + 2β p(x|y) 1 2 3 1 (α)/(α + 2β) (β)/(α + 2β) (β)/(α + 2β) 2 (β)/(α + 2β) (α)/(α + 2β) (β)/(α + 2β) 3 (β)/(α + 2β) (β)/(α + 2β) (α)/(α + 2β)

(a)

H(X|Y) = H(Y|X) = 3αlog₂⎛⎝(α + 2β)/(α)⎞⎠ + 6β⋅log₂⎛⎝(α + 2β)/(β)⎞⎠

H(X|Y) = (1)/(2)⋅⎛⎝(3 − 9 α)log₂⎛⎝(2)/((1 − 3⋅α))⎞⎠ + 6 α log₂⎛⎝(1)/(3⋅α)⎞⎠⎞⎠

H(X, Y) = 3αlog₂⎛⎝(1)/(α)⎞⎠ + 6β⋅log₂⎛⎝(1)/(β)⎞⎠ = 3αlog₂⎛⎝(1)/(α)⎞⎠ + 6⎛⎝(1 − 3α)/(6)⎞⎠⋅log₂⎛⎝(6)/(1 − 3⋅α)⎞⎠

(b)

Pr{X = Y} = p(X = 1)⋅p(Y = 1|X = 1) + p(X = 2)⋅p(Y = 2|X = 2) + p(X = 3)⋅p(Y = 3|X = 3)

= ∑³_i = 1p(X = i)p(Y = i|X = i) = 3\cancel(α + 2β)⋅(α)/(\cancel(α + 2β)) = 3⋅α

(c)

H(X|Y) = H(Y|X) = 0

H(X, Y) = log₂(3)

R₁ ≥ 0 R₂ ≥ 0 R₁ + R₂ ≥ log₂(3)

(d)

H(X|Y) = H(Y|X) = log₂(3)

H(X, Y) = H(X, Y) = 2log₂(3)

R₁ ≥ log₂(3) R₂ ≥ log₂(3) R₁ + R₂ ≥ 2⋅log₂(3)

——————————————————————————–——————————————————————————–——————————————————————-

EIT Complete solutions

(158) H(X) = H(Y) = (1)/(3)

(159) 3(α + 2β) = 1

H(X, Y) = 3αlog₂⎛⎝(1)/(α)⎞⎠ + 6β⋅log₂⎛⎝(1)/(β)⎞⎠ = 3αlog₂⎛⎝(1)/(3α)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ + 3αlog(3) + 6βlog(3)

= 3αlog₂⎛⎝(1)/(3α)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ + \underset1(3α + 6β)⋅log(3) = H(3α, 3β, 3β) + log(3)

H(X|Y) = H(Y|X) = 3αlog₂⎛⎝(α + 2β)/(α)⎞⎠ + 6β⋅log₂⎛⎝(α + 2β)/(β)⎞⎠ = 3αlog₂⎛⎝(1)/(3α)⎞⎠ + 6β⋅log₂⎛⎝(1)/(3β)⎞⎠ = 3αlog₂⎛⎝(1)/(3α)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ + 3β⋅log₂⎛⎝(1)/(3β)⎞⎠ = H(3α, 3β, 3β)

Исти резултати се добиваат само што овие се прикажани во покомпактна форма . Исто така јас не забележав дека важат 158↑ и 159↑.

——————————————————————————–——————————————————————————–———————————————————————

1.12.18 Square channel

What is the capacity of the following multiple access channels

X₁ ∈ { − 1, 0, 1} X₂ ∈ { − 1, 0, 1} Y = X²₁ + X²₂

(a) Find the capacity region

(b) Describe p^*(x₁), p^*(x₂) achieving a point on the boundary of the capacity region

(a)

Y ∈ {0, 1, 2}

I(X₁, X₂, Y) = H(Y) − H(Y|X₁X₂) = H(X²₁ + X²₂) ≤ log(3) = 1.585

X² ∈ {0, 1}

(b)

p(X²₁) ~ ⎧⎩(1)/(2), (1)/(2)⎫⎭ p(X₁) = ?

X²₁ ∈ {0, 1} X₁ ∈ { − 1, 0, 1} p(X₁) = ⎧⎩(1)/(4), (1)/(2), (1)/(4)⎫⎭

X²₂ ∈ {0, 1} X₂ ∈ { − 1, 0, 1} p(X₂) = ⎧⎩(1)/(4), (1)/(2), (1)/(4)⎫⎭

EIT Solution Complete

(a)

If we let U₁ = X²₁ and U₂ = X²₂ , then he channel is equivalent to a sum multiple access channel Y = U₁ + U₂ . We could aso get the same beahviour by using only two input symbols (0 and 1) for both X₁ and X₂.

Thus the capacity region is

R₁ < I(X₁;Y|X₂) = H(Y|X₂)

R₂ < I(X₂;Y|X₁) = H(Y|X₁)

R₁ + R₂ < I(X₁X₂;Y) = H(Y)

Со избирање на p(x₁x₂) = (1)/(4) for (x₁x₂) = (1, 0), (0, 0), (0, 1), (1, 1) and 0 otherwise, we obtain

p(x₁x₂) p(x₂|x₁)

x₁|x₂ − 1 0 1 p(x₂) − 1 0 0 0 0 0 0.25 0.25 0.5 1 0 0.25 0.25 0.5 p(X₁) 0 0.5 0.5 x₁|x₂ − 1 0 1 p(x₂) − 1 0 0 0 0 0 0.125 0.125 0.5 1 0 0.125 0.125 0.5 p(X₁) 0 0.5 0.5

Подобар пристап (наместо овој во box-от) за пресметка на неизвесноста на Y кога за дадено X e:

(Глеадај ја табелата за здружена веојатност)

H(Y|X₁ = 0) = H(X₁ + X₂|X₁ = 0) = H(X₂|X₁ = 0) = (1)/(4)⋅2 + (1)/(4)⋅2 = 1

H(Y|X₁ = 1) = H(X₁ + X₂|X₁ = 1) = H(X₂|X₁ = 1) = (1)/(4)⋅2 + (1)/(4)⋅2 = 1

Y ∈ {0, 1, 2}

H(Y) = H(p(Y = 0), p(Y = 1), p(Y = 2))

p(Y = 1) = p(X₁ = 0, X₂ = 1) + p(X₁ = 1, X₂ = 0) = 0.5

p(Y = 0) = p(X₁ = 0, X₂ = 0) = 0.25

p(Y = 2) = p(X₁ = 1, X₂ = 1) = 0.25

H(Y) = 2⋅(1)/(4)log(4) + (1)/(2)⋅log2 = 1 + 0.5 = 1.5

(b)

The possible distribution that achieves points on the boundary of the rate region is given by the distribution in part (a)

1.12.19 Slepian-Wolf

Two senders know random variables U₁ and U₂, respectively. Let the random variables (U₁, U₂) have the following joint distribution

U₁\U₂ 0 1 2 ... m − 1 0 α (β)/(m − 1) (β)/(m − 1) ... (β)/(m − 1) 1 (β)/(m − 1) 0 0 0 0 2 (β)/(m − 1) 0 0 0 0 ... ... 0 0 0 0 3 (β)/(m − 1) 0 0 0 0

where α + β + γ = 1. Find the region of rates (R₁, R₂) that would allow common receier to decode both random variables reliably.

——————————————————–

R₁ ≥ H(U₁|U₂) R₂ ≥ H(U₂|U₁) R₁ + R₂ ≥ H(U₁U₂)

U₁\U₂ 0 1 2 ... m − 1 p(U₂) 0 α (β)/(m − 1) (β)/(m − 1) ... (β)/(m − 1) α + β 1 (γ)/(m − 1) 0 0 0 0 (γ)/(m − 1) 2 (γ)/(m − 1) 0 0 0 0 (γ)/(m − 1) ... ... 0 0 0 0 m − 1 (γ)/(m − 1) 0 0 0 0 (γ)/(m − 1) p(U₁) α + γ (β)/(m − 1) (β)/(m − 1) ... (β)/(m − 1) U₁\U₂ 0 1 2 ... m − 1 p(U₂) 0 (α)/(α + β) (β)/((m − 1)(α + β)) (β)/((m − 1)(α + β)) ... (β)/((m − 1)(α + β)) α + β 1 1 0 0 0 0 (γ)/(m − 1) 2 1 0 0 0 0 (γ)/(m − 1) ... ... 0 0 0 0 m − 1 1 0 0 0 0 (γ)/(m − 1) p(U₁) α + β (β)/(m − 1) (β)/(m − 1) ... (β)/(m − 1)

p(U₁) = α + β + (m − 1)(β)/(m − 1) = α + 2β = 1

p(U₂) = α + β + (m − 1)(γ)/(m − 1) = α + β + γ = 1

\mathchoiceH(U₁U₂)H(U₁U₂)H(U₁U₂)H(U₁U₂) = αlog⎛⎝(1)/(α)⎞⎠ + (m − 1)(γ)/(m − 1)⋅log((m − 1))/(γ) + (m − 1)(β)/(m − 1)⋅log((m − 1))/(β) = αlog⎛⎝(1)/(α)⎞⎠ + γ⋅log((m − 1))/(γ) + β⋅log((m − 1))/(β)

= αlog⎛⎝(1)/(α)⎞⎠ + γ⋅log((m − 1))/(γ) + β⋅log((m − 1))/(β) = \mathchoiceH(α, β, γ) + (γ + β)log(m − 1)H(α, β, γ) + (γ + β)log(m − 1)H(α, β, γ) + (γ + β)log(m − 1)H(α, β, γ) + (γ + β)log(m − 1)

\mathchoiceH(U₁|U₂)H(U₁|U₂)H(U₁|U₂)H(U₁|U₂) = H(U₁U₂) − H(U₂) = H(α, β, γ) + (γ + β)log(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠ − (m − 1)(γ)/(m − 1)⋅log((m − 1))/(γ)

H(α, β, γ) + (γ + β)log(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠ − γ⋅log((m − 1))/(γ) = H(α, β) + (γ + β)log(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠ − γ⋅log(m − 1)

\mathchoiceH(α, β) + βlog(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠H(α, β) + βlog(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠H(α, β) + βlog(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠H(α, β) + βlog(m − 1) − (α + β)⋅log⎛⎝(1)/(α + β)⎞⎠

\mathchoiceH(U₂|U₁)H(U₂|U₁)H(U₂|U₁)H(U₂|U₁) = H(U₁U₂) − H(U₁) = H(α, β, γ) + (γ + β)log(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠ − (m − 1)(β)/(m − 1)⋅log((m − 1))/(β)

H(α, β, γ) + (γ + β)log(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠ − β⋅log((m − 1))/(β) = H(α, γ) + (γ + β)log(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠ − β⋅log(m − 1)

\mathchoiceH(α, γ) + γlog(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠H(α, γ) + γlog(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠H(α, γ) + γlog(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠H(α, γ) + γlog(m − 1) − (α + γ)⋅log⎛⎝(1)/(α + γ)⎞⎠

1.12.20 Multiple access

(a) Find the capacity region for the multiple access channel

Y = X^X₂₁

where

X₁ ∈ {2, 4}, X₂ ∈ {1, 2}

(b) Suppose that the range of X₁ is {1, 2}. Is the capacity region decreased? Why or why not?

——————————————————————————–——————————————————————————-

My Solution

p(Y|X₁X₂) p(X₁X₂Y)

X₁X₂|Y 2 4 16 (2, 1) 1 0 0 (2, 2) 0 1 0 (4, 1) 0 1 0 (4, 2) 0 0 1 X₁X₂|Y 2 4 16 p(X₁X₂) (2, 1) 1 ⁄ 3 0 0 1 ⁄ 3 (2, 2) 0 1 ⁄ 6 0 1 ⁄ 6 (4, 1) 0 1 ⁄ 6 0 1 ⁄ 6 (4, 2) 0 0 1 ⁄ 3 1 ⁄ 3 p(Y) 1 ⁄ 3 1 ⁄ 3 1 ⁄ 3

R₁ + R₂ ≤ I(X₁, X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) = log3 = 1.585

R₁ ≤ I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(Y|X₂) = (1)/(2)log3 + (1)/(6) = 1.432

R₁ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = (1)/(2)log3 + (1)/(6) = 1.432

log[3.0] + (1)/(3) = 1.43195

p(Y = 2) = p(X₁ = 2, X₂ = 1) p(Y = 16) = p(X₁ = 4, X₂ = 2) p(Y = 4) = p(X₁ = 2, X₂ = 2) + p(X₁ = 4, X₂ = 1)

p(Y = 2) = p(Y = 16) = (1)/(3) p(Y = 4) = (1)/(3) p(X₁ = 2, X₂ = 2) = p(X₁ = 4, X₂ = 1) = (1)/(6)

H(Y) = 3⋅(1)/(3)log(3) = log(3) = 1.585

————————————————————

H(Y|X₁) = p(X₁ = 2)H(Y|X₁ = 2) + p(X₁ = 4)H(Y|X₁ = 4)

H(Y|X₁ = 2) = H(2^X₂|X₁ = 2) = H(X₂|X₁ = 2) = (1)/(3)log(3) + (1)/(6)log(6) = (1)/(3)log3 + (1)/(6)log3 + (1)/(6) = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6) =

U = 2^X₂ ∈ {2, 4}

H(Y|X₁ = 4) = H(4^X₂|X₁ = 4) = H(X₂|X₁ = 4) = (1)/(3)log(3) + (1)/(6)log(6) = (1)/(3)log3 + (1)/(6)log3 + (1)/(6) = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6)

V = 4^X₂ ∈ {4, 16}

\mathchoiceH(Y|X₁)H(Y|X₁)H(Y|X₁)H(Y|X₁) = p(X₁ = 2)H(Y|X₁ = 2) + p(X₁ = 4)H(Y|X₁ = 4) =

= ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠(p(X₁ = 2) + p(X₁ = 4)) = (1)/(2)log3 + (1)/(6) = \mathchoiceH(X₂|X₁)H(X₂|X₁)H(X₂|X₁)H(X₂|X₁)

————————————————————

H(Y|X₂) = p(X₂ = 1)H(Y|X₂ = 1) + p(X₂ = 2)H(Y|X₂ = 2)

H(Y|X₂ = 1) = H(X^X₂₁|X₂ = 1) = H(X₁|X₂ = 1) = (1)/(3)log(3) + (1)/(6)log(6) = (1)/(3)log3 + (1)/(6)log3 + (1)/(6) = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6) =

X₁ ∈ {2, 4}

H(Y|X₂ = 2) = H(X^X₂₁|X₂ = 2) = H(X²₁|X₂ = 2) = H(X₁|X₂ = 2) = (1)/(3)log(3) + (1)/(6)log(6) =

= (1)/(3)log3 + (1)/(6)log3 + (1)/(6) = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6)

X²₁ ∈ {4, 16}

\mathchoiceH(Y|X₂)H(Y|X₂)H(Y|X₂)H(Y|X₂) = p(X₂ = 2)H(Y|X₂ = 2) + p(X₂ = 4)H(Y|X₂ = 4) =

⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠(p(X₂ = 2) + p(X₂ = 4)) = (1)/(2)log3 + (1)/(6) = \mathchoiceH(X₁|X₂)H(X₁|X₂)H(X₁|X₂)H(X₁|X₂)

(160) H(X₁X₂) = 2(1)/(3)log3 + 2⋅(1)/(6)⋅log6 = (2)/(3)log3 + (1)/(3)log3 + (1)/(3) = log3 + (1)/(3)

H(X₁) = H(X₁X₂) − H(X₂|X₁) = log3 + (1)/(3) − (1)/(2)log3 − (1)/(6) = (1)/(2)log3 + (1)/(6) = H(X₂)

H(X₁X₂) = H(X₁) + H(X₂)

(b)

p(Y|X₁X₂) p(X₁X₂Y)

X₁X₂|Y 1 2 4 (1, 1) 1 0 0 (1, 2) 1 0 0 (2, 1) 0 1 0 (2, 2) 0 0 1 X₁X₂|Y 1 2 4 p(X₁X₂) (1, 1) 1 ⁄ 6 0 0 1 ⁄ 3 (1, 2) 1 ⁄ 6 0 0 1 ⁄ 6 (2, 1) 0 1 ⁄ 3 0 1 ⁄ 6 (2, 2) 0 0 1 ⁄ 3 1 ⁄ 3 p(Y) 1 ⁄ 3 1 ⁄ 3 1 ⁄ 3

R₁ + R₂ ≤ I(X₁, X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) = log3

R₁ ≤ I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(Y|X₂) = (1)/(2)log3 + (1)/(6)

\mathchoiceR₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠p(X₁ = 2) ≤ ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠p(X₁ = 2) ≤ ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠p(X₁ = 2) ≤ ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(Y|X₁) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠p(X₁ = 2) ≤ ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠

Да!!! Регионот на капацитети се намалува. R_2(b) е помало или еднаков на R_2(a).

p(Y = 2) = p(X₁ = 2, X₂ = 1) p(Y = 4) = p(X₁ = 4, X₂ = 2) p(Y = 1) = p(X₁ = 1, X₂ = 1) + p(X₁ = 1, X₂ = 2)

p(Y = 2) = p(Y = 4) = (1)/(3) p(Y = 1) = (1)/(3) p(X₁ = 1, X₂ = 1) = p(X₁ = 1, X₂ = 2) = (1)/(6)

H(Y) = 3⋅(1)/(3)log(3) = log(3) = 1.585

————————————————————

H(Y|X₁) = p(X₁ = 1)H(Y|X₁ = 1) + p(X₁ = 2)H(Y|X₁ = 2)

H(Y|X₁ = 1) = H(2^X₂|X₁ = 1) = H(1|X₁ = 1) = 0

Y = 1^X₂ ∈ {1}

H(Y|X₁ = 2) = H(2^X₂|X₁ = 2) = H(X₂|X₁ = 2) = (1)/(3)log(3) + (1)/(3)log(3) = (2)/(3)log3

Y = 2^X₂ ∈ {2, 4}

H(Y|X₁) = p(X₁ = 1)⋅0 + p(X₁ = 2)H(Y|X₁ = 2) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠p(X₁ = 2)

————————————————————

H(Y|X₂) = p(X₂ = 1)H(Y|X₂ = 1) + p(X₂ = 2)H(Y|X₂ = 2)

H(Y|X₂ = 1) = H(X^X₂₁|X₂ = 1) = H(X₁|X₂ = 1) = − p(Y = 1|X₂ = 1)log(p(Y = 1|X₂ = 1)) − p(Y = 2|X₂ = 1)log(p(Y = 2|X₂ = 1))

= (1)/(6)log(6) + (1)/(3)log(3) = (1)/(6)log3 + (1)/(6) + (1)/(3)log3 = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6)

Условните веројатности со Y одговараат на здружените веројатности на сите три променливи.

X₁ ∈ {1, 2}

H(Y|X₂ = 2) = H(X^X₂₁|X₂ = 2) = H(X²₁|X₂ = 2) = H(X₁|X₂ = 2) =

= − p(Y = 1|X₂ = 2)log(p(Y = 1|X₂ = 2)) − p(Y = 4|X₂ = 2)log(p(Y = 4|X₂ = 2)) = (1)/(3)log(3) + (1)/(6)log(6) =

= (1)/(3)log3 + (1)/(6)log3 + (1)/(6) = ⎛⎝(2 + 1)/(6)⎞⎠log3 + (1)/(6) = (1)/(2)log3 + (1)/(6)

X²₁ ∈ {1, 4}

H(Y|X₂) = p(X₂ = 1)H(Y|X₂ = 1) + p(X₂ = 2)H(Y|X₂ = 2) = ⎛⎝(1)/(2)log3 + (1)/(6)⎞⎠(p(X₂ = 1) + p(X₂ = 2)) = (1)/(2)log3 + (1)/(6) = H(X₁|X₂)

H(X₁X₂Y) = 2(1)/(3)log3 + 2⋅(1)/(6)⋅log6 = (2)/(3)log3 + (1)/(3)log3 + (1)/(3) = log3 + (1)/(3)

EIT Solutions Complete

With X₁ ∈ {2, 4}, X₂ ∈ {1, 2}, the channel Y = X^X₂₁ behaves as (не ги гледај вредностите во здружента):

X₁X₂ Y (2, 1) 2 (2, 2) 4 (4, 1) 4 (4, 2) 16

We compute

R₁ ≤ I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(X^X₂₁|X₂) = | \refeq:Eq15.352|H(X₁|X₂) = 1 bits per transmition

R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = H(X^X₂₁|X₁) = | \refeq:Eq15.353|H(X₂|X₁) = H(X₂) = 1 bits per transmition

R₁ + R₂ ≤ I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) = (3)/(2) bits per transmition

Во моите изведувања добив повисока вредност за H(Y) = log3 зошто земав Y да биде униформно распределена. Овде изгледа избрале распределбата да биде [p(2, 1), p(2, 2) + p(4, 1), p(4, 2)] = ⎡⎣(1)/(4), (1)/(2), (1)/(4)⎤⎦.
т.е униформна p(x₁x₂) распределба.

Where the bound R₁ + R₂ is achieved at the corners of 16↑, where eaither sender 1 or 2 sends 1 bit per transmission adn the other user treats the channel as a binary erasure cahnnel wiht capacity 1 − p_erasure = 1 − (1)/(2) = (1)/(2) bits per use of the channel. Other points on the line are achieved by timesharing.

(b)

With X₁ ∈ {1, 2}, X₂ ∈ {1, 2} , the channel Y = X^X₂₁ behaves like (не ги гледај вредностите во здружената)

X₁X₂ Y (1, 1) 1 (1, 2) 1 (2, 1) 2 (2, 2) 4

Note when X₁ = 1 X₂ has no effect on Y and can not be recovered given X₁ and Y. If \mathchoiceX₁ ~ Br(α)X₁ ~ Br(α)X₁ ~ Br(α)X₁ ~ Br(α) and \mathchoiceX₂ ~ Br(β)X₂ ~ Br(β)X₂ ~ Br(β)X₂ ~ Br(β) then:

\mathchoiceR₁R₁R₁R₁ ≤ I(X₁;Y|X₂) = H(Y|X₂) − H(Y|X₁X₂) = H(Y|X₂)

H(Y|X₂) = p(X₂ = 1)H(Y|X₂ = 1) + p(X₂ = 2)H(Y|X₂ = 2) = (*)

***********************************************

X₁ ∈ {1, 2}

H(Y|X₂ = 1) = H(X^X₂₁|X₂ = 1) = H(X₁|X₂ = 1)

***********************************************

X₁ ∈ {1, 2} → X²₁ ∈ {1, 4} се мапираат еден на еден па нема промена на дистрибуцијата со преод на X²₁

H(Y|X₂ = 2) = H(X^X₂₁|X₂ = 2) = H(X²₁|X₂ = 2) = H(X₁|X₂ = 2)

************************************************************

(*) = p(X₂ = 1)H(X₁|X₂ = 1) + p(X₂ = 2)H(X₁|X₂ = 2) = H(X₁|X₂) = H(X₁) = \mathchoiceH(α)H(α)H(α)H(α)

**************************************************************

X₁X₂ Y (1, 1) 1 (1, 2) 1 (2, 1) 2 (2, 2) 4

\mathchoiceR₂R₂R₂R₂ ≤ I(X₂;Y|X₁) = H(Y|X₁) − H(Y|X₁X₂) = \mathchoiceH(Y|X₁)H(Y|X₁)H(Y|X₁)H(Y|X₁)

p(X₁ = 1) = α p(X₁ = 2) = 1 − α = α p(X₂ = 1) = β p(X₂ = 2) = 1 − β = β

p(Y = 2) = p(X₁ = 2, X₂ = 1) = αβ p(Y = 4) = p(X₁ = 2, X₂ = 2) = αβ p(Y = 1) = p(X₁ = 1, X₂ = 1) + p(X₁ = 1, X₂ = 2) = αβ + αβ

———————————————————–

H(Y|X₁) = p(X₁ = 1)H(Y|X₁ = 1) + p(X₁ = 2)H(Y|X₁ = 2)

H(Y|X₁ = 1) = H(2^X₂|X₁ = 1) = H(1|X₁ = 1) = 0

Y = 1^X₂ ∈ {1}

H(Y|X₁ = 2) = H(2^X₂|X₁ = 2) = H(X₂|X₁ = 2)

Y = 2^X₂ ∈ {2, 4}се мапираат еден на еден па нема промена на дистрибуцијата со преод на X²₁

\mathchoiceH(Y|X₁)H(Y|X₁)H(Y|X₁)H(Y|X₁) = p(X₁ = 1)⋅0 + p(X₁ = 2)H(Y|X₁ = 2) = H(X^X₂₁|X₁ = 2)p(X₁ = 2) = (1 − α)H(2^X₂|X₁ = 2) = (1 − α)H(X₂|X₁ = 2) =

-(1-α)(p(X₂=1|X₁=2)logp(X₂=1|X₁=2)+p(X₂=2|X₁=2)logp(X₂=2|X₁=2)) = |не зависни се| = (1 − α)H(X₂) = \mathchoice(1 − α)H(β)(1 − α)H(β)(1 − α)H(β)(1 − α)H(β)

\mathchoiceR₁ + R₂R₁ + R₂R₁ + R₂R₁ + R₂ ≤ I(X₁X₂;Y) = H(Y) − H(Y|X₁X₂) = H(Y) = H(αβ, αβ, αβ + αβ) =

− H(αβ, αβ, αβ + αβ) = (1 − α)βlog(1 − α)β + (1 − α)(1 − β)log(1 − α)(1 − β) + (\cancelαβ + (1 − \cancelβ)α)log(αβ + (1 − β)α)

= (1 − α)βlog(1 − α)β + (1 − α)(1 − β)log(1 − α)(1 − β) + (α)log(α)

= (1 − α)βlog(1 − α) + (1 − α)βlogβ + (1 − α)(1 − β)log(1 − α) + (1 − α)(1 − β)log(1 − β) + αlogα

= (1 − α)βlog(1 − α) + αlogα + (1 − α)βlogβ + (1 − α)(1 − β)log(1 − α) + (1 − α)(1 − β)log(1 − β)

= (1 − α)βlog(1 − α) + αlogα + βlogβ − αβlogβ + (1 − α)(1 − β)log(1 − α) + (1 − α)(1 − β)log(1 − β)

= (\cancel1 − α)βlog(1 − α) + αlogα + βlogβ − αβlogβ + (1 − \cancelβ)log(1 − α) − α(1 − β)log(1 − α) + (1 − α)log(1 − β) − (1 − α)βlog(1 − β)

= − αβlog(1 − α) + log(1 − α) + αlogα + βlogβ − αβlogβ − αlog(1 − α) + αβlog(1 − α) + (1 − α)log(1 − β) − (1 − α)βlog(1 − β)

= \cancel − αβlog(1 − α) − αlog(1 − α) + log(1 − α) + αlogα + βlogβ − αβlogβ + \cancelαβlog(1 − α) + (1 − α)log(1 − β) − (1 − α)βlog(1 − β) =

= H(α) + βlogβ − αβlogβ + (1 − α)log(1 − β) − (1 − α)βlog(1 − β) = H(α) + βlogβ − αβlogβ + ((1 − α) − (1 − α)β)log(1 − β) =

= H(α) + (1 − α)βlogβ + (1 − α)(1 − β)log(1 − β) = H(α) + (1 − α)(βlogβ + (1 − β)log(1 − β)) = \mathchoiceH(α) + (1 − α)H(β)H(α) + (1 − α)H(β)H(α) + (1 − α)H(β)H(α) + (1 − α)H(β)

——————————————————————-

R₁ ≤ H(α)

R₂ ≤ (1 − α)H(β)

R₁ + R₂ ≤ H(α) + (1 − α)H(β)

we may chose β = (1)/(2) to maximize the above bounds, giving

R₁ ≤ H(α) R₂ ≤ (1 − α) R₁ + R₂ ≤ H(α) + (1 − α)

——————————————————————————–—————————-

ECE535 Solutions HW13s.pdf

Овде користат :

p(X₁) = ⎧⎨⎩ 1 1 − r = α 2 r = 1 − α p(X₂) = ⎧⎨⎩ 1 1 − s = β 2 s = 1 − β

R₁ ≤ H(r)

R₂ ≤ r⋅H(s)

R₁ + R₂ ≤ H(r) + rH(s)

To answer the question: Is the capacity region decreased? Why or why not?, we need to plot both capacity regions and compare them. However we can pick some rate pairs (R₁R₂) and see whether they are achievable for both schemes or not, so that we can have at least one argument to compare.

The rate pair (H(0.8) = 7.219, 0.8) is achievable in region for part (b) Види Мапле we obtain R₁ + R₂ = 1.5219. However this rate is clearly outside the capacity region in part (a) since there R₁ + R₂ ≤ 1.5 .

We can also take rate pair (0.5, 1) which is achievable in capacity region in part (a), we can see however that to achieve a rate R₂ = 1, we need to chose r = 1 then we have that R₁ = H(1) = 0 hence we have that this rate is not achievable in the capacity region of part (b).

To conclude, there are some rates achievable in one region that are not in the other so the plot will be the best way to see the difference.

1.12.21 Broadcast channel.

Consider the following degraded broadcast channel.

(a) What is the capacity of the channel from X to Y₁?

(b) What is the channel capacity from X to Y₂?

(a)

R₁ ≤ I(U;Y₂) R₂ ≤ I(X;Y₁|U)

C = max_p(x)I(X;Y)

I(X;Y₁) = H(Y₁) − H(Y₁|X) = H(Y₁) − H(α₁)

H(Y) = H(π(1 − α₁), α₁, (1 − π)(1 − α₁) = (1 − α₁)H(π) + H(α₁)

I(X;Y₁) = (1 − α₁)H(π)

C = 1 − α₁

(b)

p(X = 0) = π; p(X = 1) = 1 − π;

p(Y₂ = 0) = p(X = 0)(1 − α₁)(1 − α₂) = π(1 − α₁)(1 − α₂) = πα₁α₂

p(Y₂ = 1) = p(X = 1)(1 − α₁)(1 − α₂) = (1 − π)(1 − α₁)(1 − α₂) = πα₁α₂

p(Y₂ = E) = p(X = 0)((1 − α₁)α₂ + α₁) + p(X = 1)((1 − α₁)α₂ + α₁) = ((1 − α₁)α₂ + α₁)(π + (1 − π)) =

= ((1 − α₁)α₂ + α₁) = α₁α₂ + α₁ = α₁ + α₂ − α₁α₂

I(X;Y₂) = H(π)(1 − α₁α₂ − α₁) = H(π)(α₁ − α₁α₂)

C = max_p(x)I(X;Y₂) = (α₁ − α₁α₂) = 1 − α₁ − (1 − α₁)α₂ = 1 − α₁ − α₂ + α₁α₂ = (1 − α₁)(1 − α₂)

Alternative

I(X;Y₂) = H(Y₂) − H(Y₂|X) = H(πα₁α₂, πα₁α₂, α₁ + α₂ − α₁α₂) − (α₁α₂log(α₁α₂)^− 1 + (α₁ + α₂ − α₁α₂)log(α₁ + α₂ − α₁α₂)^− 1)

= H(πα₁α₂, πα₁α₂, α₁ + α₂ − α₁α₂) − (α₁α₂log(α₁α₂)^− 1 + (α₁ + α₂ − α₁α₂)log(α₁ + α₂ − α₁α₂)^− 1) =

= H(πα₁α₂, πα₁α₂, α₁α₂) = − (πα₁α₂logπα₁α₂ + πα₁α₂logπα₁α₂ + α₁α₂logα₁α₂)

= − (πα₁α₂logπ + \cancelπα₁α₂logα₁α₂ + πα₁α₂logπ + (1 − \cancelπ)α₁α₂logα₁α₂ + α₁α₂logα₁α₂) =

= α₁α₂H(π) − 2α₁α₂log(α₁α₂)

\mathchoiceC = max_p(x)I(X;Y₂) = α₁α₂ − 2α₁α₂log(α₁α₂)C = max_p(x)I(X;Y₂) = α₁α₂ − 2α₁α₂log(α₁α₂)C = max_p(x)I(X;Y₂) = α₁α₂ − 2α₁α₂log(α₁α₂)C = max_p(x)I(X;Y₂) = α₁α₂ − 2α₁α₂log(α₁α₂)

H(Y₂|X) = p(X = 0)H(Y₂|X = 0) + p(X = 0)H(Y₂|X = 0) =

− π(α₁α₂log(α₁α₂) + (α₁ + α₁α₂)log(α₁ + α₁α₂)) − (1 − π)(α₁α₂log(α₁α₂) + (α₁ + α₁α₂)log(α₁ + α₁α₂)) =

= − (α₁α₂log(α₁α₂) + (α₁ + α₁α₂)log(α₁ + α₁α₂)) = − (α₁α₂log(α₁α₂) + (α₁ + α₂ − α₁α₂)log(α₁ + α₂ − α₁α₂))

(c)

R₂ ≤ I(U;Y₂)

I(U;Y₂) = H(Y₂) − H(Y₂|U) = \mathchoiceH⎛⎝(α₁α₂)/(2), (α₁α₂ + α₁), (α₁α₂)/(2)⎞⎠ + (α₁α₂⋅H(β) + α₁α₂logα₁α₂)H⎛⎝(α₁α₂)/(2), (α₁α₂ + α₁), (α₁α₂)/(2)⎞⎠ + (α₁α₂⋅H(β) + α₁α₂logα₁α₂)H⎛⎝(α₁α₂)/(2), (α₁α₂ + α₁), (α₁α₂)/(2)⎞⎠ + (α₁α₂⋅H(β) + α₁α₂logα₁α₂)H⎛⎝(α₁α₂)/(2), (α₁α₂ + α₁), (α₁α₂)/(2)⎞⎠ + (α₁α₂⋅H(β) + α₁α₂logα₁α₂) Вториов член не е точен!!!

\mathchoiceP(Y₂ = E)P(Y₂ = E)P(Y₂ = E)P(Y₂ = E) = p(U = 0)(β(α₁α₂ + α₁) + β(α₁α₂ + α₁)) + p(U = 1)(β(α₁α₂ + α₁) + β(α₁α₂ + α₁)) = (p(U = 0) + p(U = 1))[β(α₁α₂ + α₁) + β(α₁α₂ + α₁)]

= β(α₁α₂ + α₁) + β(α₁α₂ + α₁) = \mathchoice(α₁α₂ + α₁)(α₁α₂ + α₁)(α₁α₂ + α₁)(α₁α₂ + α₁)

P(Y₂ = 0) = p(U = 0)β⋅α₁α₂ + p(U = 1)⋅βα₁α₂ = (βp(U = 0) + p(U = 1)β)α₁α₂ = ||p(U = 0) = p(U = 1) = (1)/(2)|| = (α₁α₂)/(2)

P(Y₂ = 1) = p(U = 1)βα₁α₂ + p(U = 0)βα₁α₂ = (βp(U = 1) + p(U = 0)β)α₁α₂ = (α₁α₂)/(2)

Ги сумирав овие три горе во Maple и добив единица. Чисто за проверка.

\mathchoicep(Y₂ = 1|U = 1) = p(Y₂ = 0|U = 0) = βα₁α₂p(Y₂ = 1|U = 1) = p(Y₂ = 0|U = 0) = βα₁α₂p(Y₂ = 1|U = 1) = p(Y₂ = 0|U = 0) = βα₁α₂p(Y₂ = 1|U = 1) = p(Y₂ = 0|U = 0) = βα₁α₂

\mathchoicep(Y₂ = 1|U = 0) = p(Y₂ = 0|U = 1) = βα₁α₂p(Y₂ = 1|U = 0) = p(Y₂ = 0|U = 1) = βα₁α₂p(Y₂ = 1|U = 0) = p(Y₂ = 0|U = 1) = βα₁α₂p(Y₂ = 1|U = 0) = p(Y₂ = 0|U = 1) = βα₁α₂

Сум заборавил да го пресметам p(Y₂ = E|U = 1) и p(Y₂ = E|U = 0) !!!!

\mathchoiceH(Y₂|U)H(Y₂|U)H(Y₂|U)H(Y₂|U) = − p(U = 0)(βα₁α₂logβα₁α₂ + βα₁α₂log(βα₁α₂)) − p(U = 1)(βα₁α₂logβα₁α₂ + βα₁α₂log(βα₁α₂))

= − (βα₁α₂logβα₁α₂ + βα₁α₂log(βα₁α₂)) = − (βα₁α₂logβ + βα₁α₂log(β) + βα₁α₂logα₁α₂ + βα₁α₂log(α₁α₂))

− (α₁α₂⋅H(β) + (1 − \cancelβ)α₁α₂logα₁α₂ + \cancelβα₁α₂log(α₁α₂)) = \mathchoice − (α₁α₂⋅H(β) + α₁α₂logα₁α₂) − (α₁α₂⋅H(β) + α₁α₂logα₁α₂) − (α₁α₂⋅H(β) + α₁α₂logα₁α₂) − (α₁α₂⋅H(β) + α₁α₂logα₁α₂)

************************************************************************************

U → X → Y₁

R₁ < I(X;Y₁|U) = H(Y₁|U) − H(Y₁|U, X) = H(Y₁|U) − H(Y₁|X) ≤ H(Y₁) − H(Y₁|X) = I(X;Y₁) ≤ max_p(x)I(X;Y₁)

I(X;Y₁) = (1 − α₁)H(X)

\mathchoiceR₁ ≤ 1 − α₁R₁ ≤ 1 − α₁R₁ ≤ 1 − α₁R₁ ≤ 1 − α₁

EIT Complete solutions

(a) и (b) исто сум ги решил. Во (c) има мали разлики:

As in the problem 15.13 the auxiliary random variable U in the capacity region of the broadcast channel has to be binary. We can now evaluate the capacity region for this choice of auxiliary random variable. By symmetry, the best distribution for U is the uniform. Let α = α₁ + α₂ − α₁α₂, and therefore

\mathchoice1 − α1 − α1 − α1 − α = 1 − α₁ − α₂ + α₁α₂ = (1 − α₁)(1 − α₂) = \mathchoiceα₁α₂α₁α₂α₁α₂α₁α₂

R₂ = I(U;Y₂) = H(Y₂) − H(Y₂|U) = H⎛⎝(α)/(2), α, (α)/(2)⎞⎠ − H(Y₂|U) (**)

\mathchoiceα₁α₂ + α₁α₁α₂ + α₁α₁α₂ + α₁α₁α₂ + α₁ = (1 − α₁)α₂ + α₁ = α₁ + α₂ − α₁α₂ = \mathchoiceαααα

\mathchoiceH(Y₂|U)H(Y₂|U)H(Y₂|U)H(Y₂|U) = − (βπα₁α₂logβπα₁α₂ + βπα₁α₂log(βπα₁α₂))

Во моите пресметки сум заборавил да го пресметам p(Y₂ = E|U = 1) и p(Y₂ = E|U = 0) !!!! Затоа вториот член од мојот израз за I(U;Y₂) не е добар.

p(Y₂ = 0|U = 0) = p(Y₂ = 1|U = 1) = βα₁α₂

p(Y₂ = 0|U = 1) = p(Y₂ = 1|U = 0) = βα₁α₂

p(Y₂ = E|U = 1) = p(Y₂ = E|U = 0) = βα + βα = \cancelβα + (1 − \cancelβ)α = α

H(Y₂|U) = H(βα₁α₂, βα₁α₂, α) = H(βα₁α₂, βα₁α₂, α₁α₂ + α₁)

Од (**) следи:

\mathchoiceR₂R₂R₂R₂ = I(U;Y₂) = H⎛⎝(α)/(2), \cancelα, (α)/(2)⎞⎠ − H(βα₁α₂, βα₁α₂, \cancelα) = 2⋅(α)/(2)log(2)/(α) − H(βα, β⋅α) = αlog(2)/(α) − βαlog(1)/(βα) − β⋅αlog(1)/(β⋅α) = αlog(1)/(α) + α − βαlog(1)/(βα) − β⋅αlog(1)/(β⋅α) =

αlog(1)/(α) + α − βαlog(1)/(β) − βαlog(1)/(α) − β⋅αlog(1)/(β) − β⋅αlog(1)/(α) = αlog(1)/(α)\overset0(1 − β − β) + α − α⎛⎝βlog(1)/(β) + β⋅log(1)/(β)⎞⎠ = α − αH(β) = \mathchoiceα(1 − H(β))α(1 − H(β))α(1 − H(β))α(1 − H(β))

U → X → Y the channel is degraded!!!

R₁ = H(X;Y₁|U) = H(Y₁|U) − H(Y₁|X, U) = H(Y₁|U) − H(Y₁|X) = H(Y₁|U) − H(α₁) (**)

p(Y₁ = 0|U = 0) = βα₁ p(Y₁ = 1|U = 1) = βα₁ p(Y₁ = E|U = 0) = p(Y₁ = E|U = 1) = β⋅α₁ + βα₁ = α₁

p(Y₁ = 0|U = 1) = p(Y₁ = 1|U = 0) = βα₁

H(Y₁|U) = p(U = 0)⎛⎝βα₁⋅log(1)/(βα₁) + βα₁⋅log(1)/(βα₁) + α₁log(1)/(α₁)⎞⎠ + p(U = 1)⎛⎝βα₁⋅log(1)/(βα₁) + βα₁⋅log(1)/(βα₁) + α₁log(1)/(α₁)⎞⎠

= ⎛⎝βα₁⋅log(1)/(βα₁) + βα₁⋅log(1)/(βα₁) + α₁log(1)/(α₁)⎞⎠ = H(βα₁, βα₁, α₁)

(**) →

\mathchoiceR₁R₁R₁R₁ = H(βα₁, βα₁, α₁) − H(α₁) = βα₁⋅log(1)/(βα₁) + βα₁⋅log(1)/(βα₁) + α₁log(1)/(α₁) − H(α₁) = βα₁⋅log(1)/(β) + βα₁⋅log(1)/(α₁) + βα₁⋅log(1)/(α₁) + βα₁⋅log(1)/(β) + α₁log(1)/(α₁) − H(α₁) =

= α₁⎛⎝β⋅log(1)/(β) + β⋅log(1)/(β)⎞⎠ + (βα₁ + βα₁⋅)log(1)/(α₁) + α₁log(1)/(α₁) − H(α₁) = α₁H(β) + ((1 − \cancelβ)α₁ + \cancelβα₁)log(1)/(α₁) + α₁log(1)/(α₁) − H(α₁) =

= α₁H(β) + α₁⋅log(1)/(α₁) + α₁log(1)/(α₁) − H(α₁) = α₁H(β) + H(α₁) − H(α₁) = \mathchoiceα₁H(β)α₁H(β)α₁H(β)α₁H(β)

These two equations characterize the boundary of the capacity region as β varies. When β = 0, then R₁ = 0 and R₂ = α. When β = (1)/(2), we have R₁ = α₁ and R₂ = 0.

Capacity region is sketched in the figure below Од Мапле! :

Во EIT Complete Solutions велат дека регионот на капацитети е како во 16↑. Мене во Maple ми излегува ова горе за α₁ = α₂ = (1)/(2).

За α₁ = α₂ = (1)/(4) се добива долниот регион. Веројатно ако земеш уште точки ќе добиеш триаголник. Трапез не можам да добијам.

To be done:
1. Помини го проблем 15.15 (Во контекст на референцијрањето во 15.1.6

2. Прочитај го чланакот [3]. Веројатно е во врска со Problem 15.15 т.е. Глава 1.1.6

3. Прочитај го чланакот [7]. Изгледа се работи за револуционерен чланак.

4.Прочитај го чланакот [8]. Не баш во целост ми е јасна Slepian-Wolf. Го ѕирнав може да појасни.

5. Треба еден ден да седнам и направам едно random binning кодирање какво што опишуваат во книгава.
6. Доповтори ја главата за Rate Distortion (Chapter 10)
7. Прочитај го Jointly Typical Sequences уште еднаш.
8. Во проблем 15.16 велат дека кога нема неизвесност диференцијалната ентропија е − ∞ . Не знам од каде доаѓа тоа. Претпоставувам од log₂(a) = log₂(0) = − ∞

References

[1] T. Berger. Multiterminal source coding. In G. Longo (Ed.), The Information Theory Approach to Communications. Springer-Verlag, New York, 1977.

[2] L. R. Ford and D. R. Fulkerson. Maximal flow through a network. Can. J. Math., pages 399 – 404, 1956

[3] C. E. Shannon. Two-way communication channels. In Proc. 4th Berkeley Symp. Math. Stat. Prob. , Vol. 1, pages 611 – 644. University of California Press, Berkeley, CA, 1961.

[4] T. S. Han. The capacity region of a general multiple access channel with certain correlated sources. Inf. Control , 40:37 – 60, 1979.

[5] H. G. Eggleston. Convexity (Cambridge Tracts in Mathematics and Mathe- matical Physics, No. 47). Cambridge University Press, Cambridge, 1969.

[6] [262] B. Grunbaum. u Convex Polytopes. Interscience, New York, 1967.

[7] D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory, IT-19:471–480, 1973.

[8] T. M. Cover. A proof of the data compression theorem of Slepian and Wolf for ergodic sources. IEEE Trans. Inf. Theory, IT-22:226–228, 1975.

[9] R. G. Gallager. Capacity and coding for degraded broadcast channels. Probl. Peredachi Inf., 10(3):3–14, 1974.

[10] T. M. Cover and A El Gamal. Capacity theorems for the relay channel. IEEE Trans. Inf. Theory, IT-25:572–584, 1979.

[11] I. Csiszar and J. Korner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981.

[12] A. Wyner and J. Ziv. The rate distortion function for source coding with side information at the receiver. IEEE Trans. Inf. Theory, IT-22:1–11, 1976.

[13] T. J. Tjalkens and F. M. J. Willems. A universal variable-to-fixed length source code based on Lawrence’s algorithm. IEEE Trans. Inf. Theory, pages 247–253, Mar. 1992.

[14] T. M. Cover, A. El Gamal, and M. Salehi. Multiple access channels with arbitrarily correlated sources. IEEE Trans. Inf. Theory, IT-26:648–657, 1980.

[15] T. S. Han and M. H. M. Costa. Broadcast channels with arbitrarily correlated sources. IEEE Trans. Inf. Theory, IT-33:641–650, 1987.

[16] T. Gaarder and J. K. Wolf. The capacity region of a multiple-access discrete memoryless channel can increase with feedback. IEEE Trans. Inf. Theory, IT-21:100–102, 1975.

[17] T. M. Cover and C. S. K. Leung. An achievable rate region for the multiple access channel with feedback. IEEE Trans. Inf. Theory, IT-27:292–298, 1981.

[18] M. Bierbaum and H. M. Wallmeier. A note on the capacity region of the multiple access channel. IEEE Trans. Inf. Theory, IT-25:484, 1979.