# FPGA Design and Implementation of the Detector for the MIMO-SDM System Using PNC

Doan-Thien Le \*<sup>‡</sup>, Duc-Hiep Vu <sup>†</sup>, Xuan-Nam Tran <sup>§</sup>, Minh-Tuan Le <sup>‡</sup>, Vu-Duc Ngo \*<sup>‡</sup>

\*Hanoi University of Science and Technology, Vietnam

<sup>†</sup>Telecommunications University, Vietnam

<sup>§</sup>Le Quy Don Technical University, Vietnam

<sup>‡</sup>Mobifone R&D Center, Vietnam

Email: {ledoanthien, vdhiep76, tuan.hdost}@gmail.com, namtx@mta.edu.vn, duc.ngovu@hust.edu.vn

*Abstract*—Multiple-input multiple-output (MIMO) systems using spatial division multiplexing (SDM) technique have been considered a potential technology for high speed data transmission wireless internet networks, such as IEEE 802.11, 3GPP Long Term Evolution, WiMAX,... Meanwhile, physical layer network coding (PNC) is a promising technique to increase transmission throughput over bidirectional relay wireless networks. By using MIMO-SDM-PNC, multiplexing gain, spectral efficiency and transmission rates can be increased. But until now, there is no technical paper considering and evaluating the MIMO-SDM-PNC system based on hardware design. In this paper, we propose the design and implementation of two detector architectures at the destination node for MIMO-SDM-PNC system on FPGA. Based on the design, we consider the consumption of FPGA elements in our design and design's complexity.

Index Terms—MIMO, SDM, PNC, detector, ZF, MMSE, FPGA.

#### I. INTRODUCTION

In radio communications, Multiple-Input and Multiple-Output (MIMO) is a method of multiplying the capacity of a radio link using multiple transmit and receive antennas to exploit multipath propagation. MIMO has become an essential element of wireless communication standards including IEEE 802.11n (Wi-Fi), IEEE 802.11ac (Wi-Fi), HSPA+ (3G), WiMAX (4G), and Long Term Evolution (4G),... To improve the performance and spectral efficiency of (MIMO) systems, various techniques have been researched and developed. Among them, the MIMO-SDM-PNC which is proposed by Vu *et al.*, in [1] has been known as a typical approach of exploiting the strong points of both Wolniansky's (Spatial Division Multiplexing) SDM [3] and Physical Network Coding (PNC) [2]. In this paper, the origin of the MIMO-SDM-PNC and its significant qualifications in comparison with the other MIMO systems will be gradually summarized.

Before network coding, multiple-input multiple-output (MIMO) systems were known as the most effective way to increase the channel capacity in a point-to-point wireless communication system [3],[8]. A well-known realization of MIMO systems is vertical Bell Labs space-time (V-BLAST) which uses spatial division multiplexing at the transmitter and linear detection with successive interference cancellation at the receiver [3]. This MIMO-SDM system has been standardized

as a radio interface for many wireless systems, which achieves a multiplexing gain to increase spectrum efficiency.

In wireless ad hoc networks, physical-layer network coding (PNC) can exploit the broadcast nature of wireless environment to double throughput in a bidirectional relay channel [2],[4]. The combination of MIMO and PNC thus promises a significant improvement in transmission rate and has become an attractive topic of research [5],[6],[7]. In [5], Kim and Chun proposed a MIMO-PNC system, which used the linear pre-equalizer at source nodes to increase multiplexing gain. In the proposed system the source nodes and the relay node are all equipped with multiple antennas. In order to perform pre-equalization, the source nodes need to know the forward channels from them to the relay node. Research work by [6] proposed two linear detection based schemes using loglikelihood ratio and selective combining. In [7] Chung et al. proposed linear zero forcing (ZF) and minimum mean square error (MMSE) detectors for QAM signalling. But in [6] and [7], systems only considered the case in which source nodes have only one antenna, and thus cannot achieve multiplexing gain.

In order to resolve these problems, Vu et al. proposed a MIMO-SDM-PNC system [1], in which all source nodes are equipped with 2 antennas while the relay node has 4 antennas. The source nodes use MIMO-SDM to exchange their data via the relay. Since the MIMO-SDM allows transmission of 2 parallel streams, the proposed system can achieve double multiplexing gain compared with these in [6],[7]. In order to detect the network coded symbols, the linear detection based LLR and selective combination proposed in [4] are extended to cope with self co-channel interference (CCI) among the two streams. In the broadcast phase to the destination nodes, multiple antenna transmission is used by the relay node and a simple MIMO fading equalizer is proposed at the destination to compensate for the channel effect. MIMO-SDM-PNC system achieves the same diversity order of the MIMO-PNC in [6] but has double multiplexing gain. Compared with the system in [5], the source nodes in their system do not need information about the forward channel from them to the relay.

Until now, there have been a lot of technical papers designing and implementing the detectors for MIMO systems on FPGA [10,[11],[12], but there is no technical paper considering and evaluating the detectors for MIMO-SDM-PNC

978-1-5090-1801-7/16/\$31.00 ©2016 IEEE

system based on hardware design, which is an important step before going to manufacture of integrated circuits. In the paper, we present our own design and implementation of two detector architectures at the destination node for MIMO-SDM-PNC system on Xilinx Virtex-7 FPGA. The architecture of detectors is based on linear detect algorithms (ZF/MMSE). The efficiency of our design is shown through the consumption of FPGA elements.

The rest of the paper is organized as follows. Section II presents MIMO-SDM-PNC system model and linear detect algorithms at destination nodes. In section III, the architecture of linear detectors is introduced. Some experimental results will be pointed out in section IV before the paper reaching to the conclusion in section V.

## II. SYSTEM MODEL

This section considers a MIMO-SDM-PNC system [1], in which all source nodes are equipped with 2 antennas while the relay node has 4 antennas, as illustrated in Fig. 1. The bidirectional relay system consists of two phases. At the first phase, the two source nodes N1 and N2 transmit to the relay at the same time. The relay node detects and encodes the received signals from the two source nodes according to the PNC scheme. It then initiates the second phase by broadcasting the encoded symbols to both source nodes simultaneously.



Fig. 1. System model of the bidirectional MIMO-SDM-PNC relay system

In the system, each source node transmits two parallel data streams:

The channels between the sources and the relay are defined as follows:

$$\boldsymbol{H}_{1} = \begin{bmatrix} h_{11}^{(1)} & h_{12}^{(1)} \\ h_{21}^{(1)} & h_{22}^{(1)} \\ h_{31}^{(1)} & h_{32}^{(1)} \\ h_{41}^{(1)} & h_{42}^{(1)} \end{bmatrix}, \boldsymbol{H}_{2} = \begin{bmatrix} h_{12}^{(2)} & h_{12}^{(2)} \\ h_{21}^{(2)} & h_{22}^{(2)} \\ h_{31}^{(2)} & h_{32}^{(2)} \\ h_{41}^{(2)} & h_{42}^{(2)} \end{bmatrix}.$$
(3)

The equivalent channel matrix and signal vector are defined, respectively as:

$$\boldsymbol{H} = [\boldsymbol{H_1}, \, \boldsymbol{H_2}]; \tag{4}$$

$$\boldsymbol{x} = [\boldsymbol{x}_1^T, \, \boldsymbol{x}_2^T]^T. \tag{5}$$

The received signal vector at the relay node can be expressed as:

$$\boldsymbol{r} = \frac{1}{\sqrt{2}} \boldsymbol{H} \boldsymbol{x} + \boldsymbol{z}; \tag{6}$$

where  $\boldsymbol{r} = [r_1, r_2, r_3, r_4]^T$  and  $\boldsymbol{z} = [z_1, z_2, z_3, z_4]^T$ . The fraction  $\frac{1}{\sqrt{2}}$  accounts for power normalization factor.

The relay node needs to estimate the network coded symbols sent to the two source nodes. Since each source node transmits two symbols  $x_1^{(i)}$ ,  $x_2^{(i)}$ , i = 1, 2, the network coded symbols are given  $x_1^{(1)} \oplus x_1^{(2)}$  for node N1 and  $x_2^{(1)} \oplus x_2^{(2)}$  for node N2. Vu *et al.* use ZF/MMSE detector and LLR/selective combination to estimate the network coded symbols [1]. Based on Vu's result, we only use MMSE detector and LLR combination at relay node to archive the best quality.

After the network coded symbols,  $x_1^{(1)} \oplus x_1^{(2)}$  and  $x_2^{(1)} \oplus x_2^{(2)}$ , are successfully estimated, the relay will send them to the respective destination node *i*. The transmit vector from the relay is defined as:

$$\boldsymbol{x}_{r} = [x_{1}^{(1)} \oplus x_{1}^{(2)}, x_{2}^{(1)} \oplus x_{2}^{(2)}]^{T};$$
 (7)

and the backward channels from the relay to the source nodes are given as:

$$\hat{\boldsymbol{H}}_{1} = \begin{bmatrix} h_{11}^{(1)} & h_{12}^{(1)} \\ h_{21}^{(1)} & h_{22}^{(1)} \end{bmatrix};$$
(8)

$$\hat{\boldsymbol{H}}_{2} = \begin{bmatrix} h_{11}^{(2)} & h_{12}^{(2)} \\ h_{21}^{(2)} & h_{22}^{(2)} \end{bmatrix}.$$
(9)

We can present the signal vectors received at the source nodes as follows

$$\boldsymbol{u}_1 = \frac{1}{\sqrt{2}} \hat{\boldsymbol{H}}_1 \boldsymbol{x}_r + \boldsymbol{n}_1; \tag{10}$$

$$\boldsymbol{u}_2 = \frac{1}{\sqrt{2}} \hat{\boldsymbol{H}}_2 \boldsymbol{x}_r + \boldsymbol{n}_2; \qquad (11)$$

where  $\boldsymbol{n}_1 = [n_1^1, n_2^1]^T$  and  $\boldsymbol{n}_2 = [n_1^2, n_2^2]^T$  are noise vectors at the two destination nodes, respectively.

In order to detect the received symbols, we can also use the linear detectors such as ZF and MMSE at the destination nodes to estimate the network coded versions of  $x_1^{(1)} \oplus x_1^{(2)}$ and  $x_2^{(1)} \oplus x_2^{(2)}$ . Assume that the destination nodes know the forward channel from themselves to the relay, the combining weight matrices of the detectors are given, respectively, by:

$$\boldsymbol{G}_{i}^{ZF} = \left(\boldsymbol{\hat{H}}_{i}^{H}\boldsymbol{\hat{H}}_{i}\right)^{-1}\boldsymbol{\hat{H}}_{i}^{H}; \qquad (12)$$

$$\boldsymbol{G}_{i}^{MMSE} = \left(\boldsymbol{\hat{H}}_{i}^{H}\boldsymbol{\hat{H}}_{i} + \sigma_{n}^{2}\boldsymbol{I}_{2}\right)^{-1}\boldsymbol{\hat{H}}_{i}^{H}; \quad (13)$$

where  $(.)^H$  is the Hermitian transpose,  $\sigma_n^2$  is the noise variance at the destination nodes,  $I_2$  is the 2 by 2 identity matrix. The decision statistics of the network coded symbols after linear combining are given by:

$$\hat{x}_{r}^{(i)} = \boldsymbol{G}_{i} \boldsymbol{u}_{i} = \begin{bmatrix} \hat{x}_{1}^{(i)} \\ \hat{x}_{2}^{(i)} \end{bmatrix} = \begin{bmatrix} x_{1}^{(i)} \oplus x_{1}^{(2)} \\ x_{2}^{(1)} \oplus x_{2}^{(2)} \end{bmatrix}.$$
(14)

Each destination node can use a quantization function to obtain the estimates of the network coded symbols as  $\overline{x_1^{(1)} \oplus x_1^{(2)}} = Q(x_1^{(1)} \oplus x_1^{(2)})$  and  $\overline{x_2^{(1)} \oplus x_2^{(2)}} = Q(x_2^{(1)} \oplus x_2^{(2)})$ . The two nodes then simply perform XOR of the estimated coded symbols with their transmit symbols to obtain the received symbols.

In the next section, we will present our own design and implementation of ZF and MMSE detector architectures at the destination node for MIMO-SDM-PNC system on FPGA.

## **III. ARCHITECTURE**

The high level architecture of the ZF and MMSE detectors at the destination node for MIMO-SDM-PNC system is shown in Figs. 2 and 3. The set of input signals of the detectors are H channel matrix from the relay to the destination node, U matrix of received signal vectors at the destination node, S matrix of transmit symbols of the destination node and the noise variance  $\sigma$  (19-bit wide) at the destination node in case of using MMSE detector.

The elements of H and U matrices are complex numbers (38-bit wide) and can be expressed in the form a + bi, where a is the real part ([37:19]) and b is the imaginary part ([18:0]). Each element of S matrix is a bit.



Fig. 2. High level architecture of the ZF detector



Fig. 3. High level architecture of the MMSE detector

The G\_ZF and G\_MMSE blocks are detailed in Figs. 4 and 5. Their functions are to calculate the weight matrices using formulas (12) and (13), respectively.

As can be seen in Figs. 2, 3, 4 and 5, the architectures of ZF and MMSE detectors are divided into a number of separate processing units in order to facilitate pipeline processing. The pipeline stages of ZF and MMSE detectors are illustrated in Figs. 6 and 7. Most of them are performed in 1 cycle, but MUL state is performed in 3 cycles by MUL matrix block and especially INV state is performed in 41 cycles by Invert matrix block. MUL matrix and INV matrix blocks include smaller blocks and are designed in the form of sequential logic. The timing diagrams of the 2 architectures are shown in Figs. 8



Fig. 4. G\_ZF architecture



Fig. 5. G\_MMSE architecture

and 9. Because the entire systems are designed in the form of pipeline, we are able to obtain received symbols continuously.

In Tab. 1, we show illustrated image and detailed functions of the blocks:

TABLE I FUNCTIONAL BLOCKS OF THE SYSTEMS

| Block      | Figure  | Function                                     |
|------------|---------|----------------------------------------------|
| TRAN       | Fig. 10 | Hermitian transpose                          |
| MUL real   | Fig. 11 | Multiplication of a matrix by a              |
|            |         | number                                       |
| MUL matrix | Fig. 12 | Multiplication of 2 matrices                 |
| ADD        | Fig. 13 | Addition of 2 matrices                       |
| INV matrix | Fig. 14 | Invert of a matrix                           |
| SIGMA      | Fig. 15 | Multiplication of $\sigma_n^2$ by the 2 by 2 |
|            |         | identity matrix                              |
| Comp mul   | Fig. 16 | Multiplication of 2 complex num-             |
|            |         | bers                                         |
| Comp div   | Fig. 17 | Division of 2 complex numbers                |

Additionally, some other blocks such as Comp conj, Comp add and Comp sub blocks perform complex conjugate, addition and subtraction of 2 complex numbers, respectively. Moreover, Real mul, Real div, Real add and Real sub blocks are corresponding to multiplication, division, addition and subtraction of 2 real numbers.

We optimize the division of real numbers by using 2's complement addings (subtracting) and shiftings; and optimize the multiplication of real numbers by using addings and shiftings.



Fig. 6. Pipeline stages of ZF detector







Fig. 8. Timing Diagram of ZF detector









Fig. 11. MUL real block

Fig. 9. Timing Diagram of MMSE detector



Fig. 12. MUL block



Fig. 13. ADD block



Fig. 14. INV matrix block



Fig. 15. SIGMA block

## IV. IMPLEMENTATION RESULTS

In the implementation, the network model is the same as in Fig. 1, all channels are assumed to undergo flat Rayleigh



Fig. 16. Comp mul block



Fig. 17. Comp div block

fading. BPSK is used for modulation and we assume that the Es/N0 at the all nodes are the same. Moreover, 3.5 million set of the inputs are generated randomly with word lengths, as shown in Tab. 2. The architectures of the detectors are implemented and synthesized by using the devices of Virtex7-xc7vx690t.

TABLE II Word lengths

| Signal   | Wordlength |
|----------|------------|
| H matrix | 4x38 bits  |
| U matrix | 2x38 bits  |
| S matrix | 2x1 bits   |
| $\sigma$ | 19 bits    |

Throughput in Megabits per second (Mbps) of the systems is calculated as the following equation:

$$T = \frac{N \times F}{c}; \tag{15}$$

where N is the total number of bits in a set of input vector, F is the maximum clock frequency and c is the number of clock cycles that the systems need to process a set of input vector.

TABLE III Synthesis results of ZF detector

|                         | No optimize | Optimize div | Optimize div |
|-------------------------|-------------|--------------|--------------|
|                         | _           | _            | and mul      |
| Slices                  | 5589        | 12474        | 99752        |
| Frequency (Mhz)         | 16.234      | 214.506      | 320.123      |
| Number of bits in input | 230         | 230          | 230          |
| vectors                 |             |              |              |
| Clock cycles            | 21          | 54           | 178          |
| Throughput (Mbps)       | 177.80      | 913.64       | 413.64       |

Tabs. 3 and 4 show the synthesis results of 3 different design options. In the first option, we do not optimize the

|                         | No optimize | Optimize div | Optimize div |
|-------------------------|-------------|--------------|--------------|
|                         |             |              | and mul      |
| Slices                  | 5781        | 12666        | 100653       |
| Frequency (Mhz)         | 16.234      | 214.506      | 320.123      |
| Number of bits in input | 249         | 249          | 249          |
| vectors                 |             |              |              |
| Clock cycles            | 22          | 55           | 179          |
| Throughput (Mbps)       | 183.74      | 971.13       | 445.31       |

TABLE IV Synthesis results of MMSE detector

multiplication and division of real numbers. In the second option, we only optimize the division of real numbers by using 2's complement addings (subtracting) and shiftings. And in the final option, both the division and multiplication of real numbers are optimized by using addings and shiftings. As can be seen, the throughput of the second option is the highest, while the consumption of system resources is acceptable. That is the reason we choose the second option to design and implement the detectors on the FPGA.

#### V. CONCLUSION

The MIMO-SDM-PNC system increases multiplexing gain, spectral efficiency and transmission rates and is considered as a potential technology for high speed wireless internet communications. In the paper, we have presented our own design and implementation of detectors for MIMO-SDM-PNC system at the destination node on the FPGA. A detailed design of full systems was shown. Based on the design, we have measured and evaluated quality of the systems. The results have shown that our design can be implemented as a chip efficiently. Additionally, we also evaluated our design by considering the design's complexity.

### REFERENCES

- Duc Hiep Vu, Van Bien Pham, Xuan Nam Tran, "Physical Network Coding for Bidirectional Relay MIMO-SDM System", *The 2013 International Conference on Advanced Technologies for Communications (ATC'13)*, pp. 141-146, 2013.
- [2] S. Zhang et al., "Physical layer network coding", in Proc. 12th Annual International Conf. on Mobile Computing and Networking (ACM Mobi-Com06), pp. 358-365, NY, September. 2006.
  [3] P. Wolniansky et al., "V-BLAST: an architecture for realizing very high
- [3] P. Wolniansky et al., "V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel", in *The URSI Internation Symposium on Signals, Systems, and Electronics, Italy*, September 1998
- [4] S. Zhang et al., "MIMO physical layer network coding based on VBLAST detection", in 2012 International Conf. on Wireless Communuciations & Signal Processing (WCSP), pp. 15, 25-27 Octember. 2012.
- [5] S. Kim and J. Chun, "Network coding with linear MIMO pre-equalizer using modulo in two-way channel", *in The 2008 IEEE Wireless Communications and Networking Conference (WCNC08)*, 2008.
  [6] S. Zhang and S. C. Liew, "Physical layer network coding with multiple
- [6] S. Zhang and S. C. Liew, "Physical layer network coding with multiple antennas", in *The 2010 IEEE Wireless Communications and Networking Conference*, (WCNC10), April 2010.
- [7] H. H. Chung et al., "A physical-layer network coding scheme based on linear MIMO detection", 2012 IEEE Vehicular Technology Conference (VTC Spring), Yokohama, May 2012.
- [8] G. J. Foschini and M. J. Gans, "On limits of wireless communications in a fading environment when using multiple antennas", *Wireless Personal Communications, vol. 6, no. 3,* pp. 311-335, March 1998.
- [9] C. Fragouli, JY. Le Boudec, J. Widmer, "Network coding: an instant primer", ACM SIGCOM Computer Communication Review, pp. 63-68, 2006.

- [10] Chris Dick et al., "Design and Architecture of Spatial Multiplexing MIMO Decoders for FPGAs", 42nd Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, USA, pp. 160-164, 2008.
- [11] P. Bhagawat, R. Dash, G. Choi "Architecture for Reconfigurable MIMO Detector and its FPGA Implementation", in 15th IEEE International Conference, pp. 61-64, August 2008.
- [12] Z. Guo and P. Nilsson, "Algorithm and implementation of the K-Best sphere decoding for MIMO detection", *IEEE JSAC, vol. 24, no. 3,* pp. 491-503, March 2006.