# Ultra-High Density Content Addressable Memory Based on Current Induced Domain Wall Motion in Magnetic Track

Yue Zhang<sup>1,2</sup>, Weisheng Zhao<sup>1,2</sup>, Jacques-Olivier Klein<sup>1,2</sup>, Dafiné Ravelsona<sup>1,2</sup>, and Claude Chappert<sup>1,2</sup>

<sup>1</sup>IEF, Université Paris Sud, Centre d'Orsay, F-91405 France <sup>2</sup>CNRS, UMR 8622, Orsay, 91405 France

A new path to frame low power, high-density and fast integrated circuits has been rolled out by the observation of current-induced domain wall (DW) motion in magnetic track. As an advanced extension of this mechanism, high performance racetrack memory can be built up combining with magnetic tunnel junction (MTJ) read and write heads. The rapid progress of CoFeB/MgO perpendicular magnetic anisotropy (PMA) shows that the PMA MTJ can be scaled down to 20 nm while keeping fast data access. These recent discoveries allow us to design an ultra-high density content addressable memory (CAM), one of the most important applications of MRAM. The mainstream CAMs suffer from high power and large area as its conventional structure is composed of numerous large-capacity SRAM blocks in order to provide fast data access. MRAM based non-volatile CAMs have been proposed to relive the power consumption, however the density issue cannot be surmounted due to the large switching currents. In this paper, we present a design of NOR-type CAM based on DW motion in PMA magnetic tracks. The CMOS switching and sensing circuits are globally shared to optimize the cell area down to 6  $F^2$ /bit; the complementary dual track allows the local sensing and faster data search speed while keeping low power. By using an accuracy spice model of PMA racetrack memory and CMOS 65 nm design-kit, mixed simulations have been performed to demonstrate its functionality and evaluate its high performance.

Index Terms-Content addressable memory, domain wall (DW), non-volatile, perpendicular magnetic anisotropy, ultra-high density.

## I. INTRODUCTION

▼ URRENT-INDUCED domain wall (DW) motion in magnetic nanowires or tracks opens a new route to build up low-power, high-density and high-speed circuits [1], [2]. Racetrack memory (RM), a novel ultra-dense non-volatile (NV) storage based on this mechanism, is considered as one of the most promising candidates for the next generation stand-alone and embedded memories [3], [4]. Combining with magnetic tunnel junction (MTJ) nanopillars as the read and write heads, CMOS integrability and fast data-access can be achieved [5]. The write head nucleates a local domain in a magnetic track through spin-transfer torque (STT) [6], and a polarized current pulse drives this domain to propagate sequentially to the read head. Data or the magnetization direction is stored in the domain separated by two adjacent constrictions or patterned notches, and can be detected by read head through tunnel magneto-resistance (TMR) effect after a series of DW motion (see Fig. 1). The first 256 bits RM prototype fabricated on 90 nm node has been presented recently [7]. However, it is based on in-plane magnetic anisotropy in NiFe nanowires with an intrinsic low energy barrier E, which leads to insufficient data retention on ultra-deep node (e.g., 22 nm). Recent progress demonstrates that the perpendicular magnetic anisotropy (PMA) CoFeB can further improve the density, speed and power performance for the RM comparing with the in-plane anisotropy [8], [9]. The advantageous nucleation current and higher switching speed promise a better candidate of the write head, furthermore, the high energy barrier, which could overcome the issue of data retention for in-plane anisotropy, makes PMA MTJ feasible for future read head [10].

Manuscript received March 02, 2012; revised May 04, 2012; accepted May 04, 2012. Date of current version October 19, 2012. Corresponding author: W. Zhao (e-mail: weisheng.zhao@u-psud.fr).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMAG.2012.2198876



Fig. 1. (a) Current induced DW motion in a long PMA CoFeB magnetic track. (b) MTJs are used as write and read heads for PMA nucleation and detection. STT switching mechanism: the MTJ state changes from parallel (P) to antiparallel (AP) as the positive direction current  $I_{P \rightarrow AP} > I_{C0}$ , on the contrast, its state will return as the negative direction current  $I_{AP \rightarrow P} > I_{C0}$ . [11] (c) Perpendicular magnetic anisotropy (PMA) hysteresis loop of a crystallized Ta (5 nm)/CoFeB (1 nm)/MgO (0.9 nm) structure showing very low coercivity in spite of a strong perpendicular anisotropy.

Content addressable memory (CAM) is widely used in mobile, internet routers and processors. It is expected to provide fast data access and ultra-high density [12]. Mainstream CAMs are composed of large capacity SRAM blocks, which drive high static power and large die area [13]. They become the key challenges of future R&D for CAM. MRAM is a promising solution to build up NV CAM and overcome both these drawbacks. This field is currently under intense investigation. For instance, a DW motion MRAM based CAM (DW-CAM) were prototyped recently, which demonstrates important progress in terms of power and density [14]. However, this DW-CAM uses a three-terminal MTJ as storage element, which cannot allow the expected ultra-high density.





Fig. 2. (a) Structure of the dual tracks based RM-CAM. One writing current pulse nucleates a couple of MTJs with complementary configurations. A propagation current pulse drives the dual tracks synchronously. Every dual-track shares a comparison circuit. (b) One example of current pulse configuration for  $I_{\rm write}$  and  $I_{\rm propagation}$ .  $T_{\rm N}$  and  $T_{\rm P}$  are respectively their pulse durations.

In this paper, we present a novel design of NOR-type CAM structure based on DW motion in dual PMA CoFeB magnetic tracks (RM-CAM) (see Fig. 2(a)). It fixes on a series of DW storage or racetrack memory, and utilizes current induced DW motion to transfer data beyond the single cell in DW-CAM. Certainly, the non-volatility of RM-CAM provides instant ON/OFF operation to reduce the leakage power for data retention and reloading. Furthermore, it allows ultra-high density and fast data search. The structure of PMA RM-CAM will be detailed in the next section. In Section III, we will show the mixed magnetic/CMOS simulations to validate its functionality and demonstrate its expected performance. They are performed based on a PMA RM compact model [15] and CMOS 65 nm design-kit [16].

# II. PMA COFEB MAGNETIC TRACKS BASED CAM

The PMA RM-CAM is composed of comparison circuits, PMA magnetic tracks and DW nucleation/propagation circuits. A couple of complementary magnetic tracks are used to present one word in order to obtain the most reliable and fast access operation for CAM applications as this solution benefits the maximum TMR value instead of TMR/2 for conventional single track structures (see Fig. 2(a)). We designed the comparison circuit based on Pre-charge Sense Amplifier (PCSA) [17], which allows minimum power and sensing errors. This RM-CAM includes a couple of PMA CoFeB/MgO/CoFeB MTJs connected together as the write heads. Due to the different directions of the writing current pulse  $I_{\rm write}$  through these two MTJs, they can nucleate the complementary configurations through STT switching mechanism under the same  $I_{\text{write}}$  pulse. One of the critical challenges for complementary magnetic tracks is to synchronize precisely the domain wall positions. Here, the same current pulse Ipropagation propagates domains in the dual tracks and we implement the DW pinning constrictions with the same distance in the magnetic tracks [18]. To avoid the interference between the DW nucleation and previous data, write heads do

Fig. 3. Schematic of the comparison circuit. It outputs the logic value '1' or '0' according to the configuration of complementary MTJs'. MN3–6 transistors build up a NOR-type CAM.

not hold the data storage and there is always an  $I_{\text{propagation}}$  pulse following each DW nucleation (see Fig. 2(b)). There are also a couple of PMA MTJs at each bit of storage elements as read heads. Since lower resistance can reduce the rate of breakdown and higher resistance can improve the sensing performance, the size of the read heads should therefore be smaller than that of the write heads to obtain the best switching and sensing reliability.

The comparison circuit (see Fig. 3) is consisted of two parts: a PCSA detects the complementary magnetizations of the read heads by two reading current pluses ( $I_{read}$  and  $I_{readb}$ ) and outputs a logic value; the transistors M3–M6 build up a NOR-type CAM. The signal "MLpre" is used to pre-charge the match line (ML). In case that the search line "SL" ("SLb" is its complementary signal) matches the stored data, there is no path to discharge and ML will thus be asserted. In contrast, ML will be discharged.

In order to improve the area efficiency, every couple of dual tracks shares the comparison circuit in this RM-CAM (see Fig. 2(a)). Beyond the DW-CAM, where there is a large transistor for nucleation for every storage cell. We share the same write head for one magnetic track in RM-CAM, and the CMOS area dedicated for each storage cell becomes ignorable for a long track with numerous pinning constrictions. This structure allows then ultra-high density.

Fast search operation as shown in [14] can be also expected in the RM-CAM. At first, we program the magnetic tracks, and then the switch signals select each bit of magnetic tracks to be loaded in the comparison circuit. By sequentially triggering the switch signals, all the words can be explored. If there is no match case, DW nucleation and propagation will be carried out to enter new words for next search. The programming speed of magnetic tracks depends on,  $T_N$  and  $T_P$ , which are respectively the pulse durations of  $I_{write}$  and  $I_{propagation}$ . They can be both speed up to ~1 ns [9]–[11]. According to the current pulse configuration shown in Fig. 2(b), the worst case of programming duration is  $N \times (T_N + T_P)$ , where N is the number of pinning potentials



Fig. 4. Schematic of  $8 \times 8$  bits RM-CAM. Each word is composed of the bits at the same positions in 8 different dual tracks, they can be driven to move simultaneously by the propagation currents.

 TABLE I

 CRITICAL PARAMETERS IN THE RM COMPACT MODEL

| Parameter      | Description                        | Default Value                           |  |
|----------------|------------------------------------|-----------------------------------------|--|
| tox            | Oxide barrier height               | 0.85 nm                                 |  |
| Area           | MTJ surface                        | 65 nm x 65 nm x π/4                     |  |
| TMR(0)         | TMR ratio with 0 V <sub>bias</sub> | 120%                                    |  |
| V              | Volume of free layer               | surface x1.3nm                          |  |
| $R \cdot A$    | Resistance-area product            | $10\Omega\mu m^2$                       |  |
| $V_{write}$    | Writing voltage                    | 2V                                      |  |
| $V_{read}$     | Reading voltage                    | 1.2V                                    |  |
| Jc_nucleation  | DW Nucleation current density      | 5.7 x 10 <sup>6</sup> A/cm <sup>2</sup> |  |
| Jc_propagation | DW Propagation current density     | $6.2 \text{ x } 10^7 \text{ A/cm}^2$    |  |

in the magnetic track. We can benefit a higher speed for the repeated bits such as "111" and "000" when only one DW nucleation is required for three bits.

### III. SPICE SIMULATION OF RM-CAM

In order to obtain fast simulation and evaluation of hybrid magnetic/CMOS circuits, the using of compact models for magnetic devices under CMOS standard simulators becomes an efficient solution [19]. We developed a spice-compatible compact model for the PMA RM [15] based on the related physical models such as spin transfer torque, current induced DW motion and MagnetoResistance etc. [6]–[10]. By using the compact model and STMicroelectronics CMOS 65 nm design kit, an 8-bits-width-8-words-depth PMA RM-CAM (see Fig. 4) has been co-simulated. The main parameters for the simulation are described in Table I.

First, we implemented the transient simulation for the search operation without motion (see Fig. 5(a)). The clock signal "CLK" involves the "Pre-charge" phase and the "Evaluation" phase. During the "Pre-charge" phase, both of the signals "SEN" and "MLpre" (see Fig. 3) are set low to pre-charge the PCSA circuit and the match line "ML". The first word "Word0" has been loaded by enabling the signal "Switch0". With the response of the signal "Miss", "Switch1" will be then activated and so on. This process will not stop until the occurrence of the match case. We find that this search operation needs only ~0.45 ns, which is faster than that of conventional SRAM-based CAM



Fig. 5. Transient simulations of PMA RM-CAM: (a) Without DW nucleation and motion. (b) With DW nucleation and motion.

TABLE II Performance for Different CAMs

| Tuno                            | SRAM-based | DW-CAM | RM-   |
|---------------------------------|------------|--------|-------|
| Туре                            | CAM [22]   | [14]   | CAM   |
| Cell area (F <sup>2</sup> /bit) | 540        | ~815   | ~19   |
| Cycle time (ns)                 | 2          | 5      | ~0.45 |
| Energy (fJ/bit/search)          | 9.5        | ~30    | ~12   |
| Static power                    | Yes        | No     | No    |

and DW-CAM (see Table II). In addition, the energy consumption of searching is as low as  $\sim 12$  fJ/bit/search, which can be further reduced by the decrease of activity rate thanks to the segmentation of the match line [20].

In case that no storage data can match the search word, a new word will be nucleated and propagated into the magnetic track for the next round of search. Fig. 5(b) shows the transient simulation result of the worst case: 1-bit miss process. It means the rest 7 bits of the search word match the stored data, only one bit is different from the stored data. As shown in Fig. 5(b), the search bit is '1', if no match is found, the propagation current pulse will start to drive the DW motion, until "SL" and "Stored data" match each other. We can find the whole operation consisting of "Pre-charge", "Propagation" and "Evaluation" phases, only requires  $\sim 2$  ns. This suggests a high operating frequency up to 500 MHz, comparable to that of traditional CAM [21].

We estimate the cell area for RM-CAM with (1):

$$A_C = \frac{A_{\rm CO} + A_{\rm NU} + A_{\rm PR} + N \times \text{MAX}(A_{\rm BT} + A_{\rm LS})}{N}$$
(1)



Fig. 6. Dependence of full area versus number of bits per word.



Fig. 7. Proposed RM-CAM structure as cache memory. A couple of dual tracks construct a stored word.

where  $A_{\rm C0}$  is the area of a comparison circuit =  $\sim 50 \ {\rm F}^2$ ,  $A_{\rm NU}$  is the area of a DW nucleation circuit = ~48 F<sup>2</sup>,  $A_{\rm PR}$ is the area of a propagation current generating circuit = -7 $F^2$ ,  $A_{BT}$  is the area of every bit in track memory,  $A_{LS}$  is the area of two load selecting transistors for every bit and N is the number of bits per word. Thanks to the 3D integration of MTJs above CMOS circuit, only the bigger value of MTJs' area and selecting transistors' area will be involved for calculating the full area. For our design,  $A_{\rm BT}$  is ~6 F<sup>2</sup> considering 2 F between two adjacent notches. Coincidentally, the  $A_{\rm LS}$  is also  $\sim 6$  $F^2$  with the minimum size. If the distance between two adjacent notches exceeds 2 F, only  $A_{\rm BT}$  would be taken into account into (1). As N = 8 for our simulation, the cell area per bit is therefore  $\sim 19 \text{ F}^2$ , which is much lower than that of SRAM-based CAM or DW-CAM (see Table II). Meanwhile, with the increase of the bit number per word, the area of shared CMOS circuits for data comparison. DW nucleation and motion would become negligible (see Fig. 6). The cell area per bit will approach to  $MAX(A_{BT} + A_{LS})$  (e.g., ~6 F<sup>2</sup> for our design).

RM-CAM can be used as cache memory, such as translation lookaside buffers (TLB) [14]. Fig. 7 shows a new structure, where dual tracks consist in a storage word instead of word organization based on numerous tracks (see Fig. 4).

## IV. CONCLUSION

In this paper, we presented a novel design of CAM based on the DW motion in the PMA magnetic tracks. The DW motion and shared CMOS circuits (comparison, nucleation and propagation) make it realize ultra-high density and fast search operation while keeping its power performance comparable to that of mainstream CAMs. By using the a precise compact model of PMA RM and CMOS 65 nm design kit, mixed magnetic/CMOS simulations have been performed to validate the functionality of this RM-CAM and confirm its assets. A prototype of this structure based on 90 nm node magnetic/CMOS integration process is under development in our laboratory [23].

#### ACKNOWLEDGMENT

The authors wish to acknowledge financial support from the European FP7 program through MAGWIRE (257707).

#### REFERENCES

- [1] C. Chappert, A. Fert, and F. N. Van Dau, "The emergence of spin electronics in data storage," Nat. Mater., vol. 6, p. 813, 2007.
- S. Mangin et al., "Current-induced magnetization reversal in nanopil-[2] lars with perpendicular anisotropy," Nat. Mater., vol. 5, p. 210, 2006.
- [3] M. Hayashi, L. Thomas, R. Moriya, C. Rettner, and S. S. P. Parkin, "Current-controlled magnetic domain-wall nanowire shift register," Science, vol. 320, p. 209, 2008.
- [4] S. S. P. Parkin, M. Hayashi, and L. Thomas, "Magnetic domain-wall
- racetrack memory," *Science*, vol. 320, pp. 190–194, 2008. [5] W. S. Zhao *et al.*, "A compact model of domain wall propagation for logic and memory design," J. Appl. Phys., vol. 109, p. 07D501, 2011.
- J. C. Slonczewski, "Current-driven excitation of magnetic multi-[6] layers," J. Magn. Magn. Mater., vol. 159, pp. L1-L7, 1996.
- L. Thomas et al., "Racetrack memory cell array with integrated mag-[7] netic tunnel junction readout," in IEDM Digest, 2011, 24.3
- [8] S. Ikeda et al., "A perpendicular-anisotropy CoFeB-MgO magnetic tunnel junction," Nat. Mater., vol. 9, p. 721, 2010.
- [9] D. C. Worledge et al., "Spin torque switching of perpendicular Ta CoFeB MgO-based magnetic tunnel junctions," Appl. Phys. Lett., vol. 98, p. 022501.2, 2011.
- [10] S. Fukami et al., "Current-induced domain wall motion in perpendicularly magnetized CoFeB nanowire," Appl. Phys. Lett., vol. 98, p. 082504, 2011
- [11] Y. Zhang et al., "A compact model of perpenticular magnetic anistropy magnetic tunnel junction," IEEE Trans. Electron Devices, vol. 59, pp. 819-826, Mar. 2012.
- [12] K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory (CAM) circuits and architectures: A tutorial and survey," IEEE J. Solid-State Circuits, vol. 41, pp. 712-727, Apr. 2006.
- [13] N. S. Kim et al., "Leakage current: Moore's law meets static power," *IEEE Trans. Comput.*, vol. 36, pp. 68–75, Jan. 2003. [14] R. Nebashi *et al.*, "A content addressable memory using magnetic do-
- main wall motion cells," in Proc. Symp. VLSIC, 2011, pp. 300-301.
- [15] Y. Zhang et al., "Perpendicular-magnetic-anisotropy CoFeB racetrack memory," J. Appl. Phys., vol. 111, 2012.
- [16] STMicroelectronics, Manuel of Design Kit for CMOS 65 nm 2009.
- [17] W. S. Zhao, C. Chappert, V. Javerliac, and J.-P. Noziere, "High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits," IEEE Trans. Magn., vol. 45, p. 3784, 2009.
- [18] W. S. Zhao et al., "Domain wall shift register-based reconfigurable logic," IEEE Trans. Magn., vol. 47, no. 10, pp. 2966-2969, 2011.
- [19] G. Prenat et al., "CMOS/magnetic hybrid architectures," in Proc. IEEE-ICECS, Morocco, 2007, pp. 190-193.
- [20] S. Matsunaga et al., "Fully parallel 6T-2MTJ non-volatile TCAM with single-transistor-based self match-line discharge control," in Proc. Int. Symp. VLSIC, 2011, pp. 298–299. [21] H. Kadota *et al.*, "An 8-kbit content-addressable and reentrant
- memory," IEEE J. Solid-State Circuits, vol. SC-20, pp. 951-957, 1985.
- [22] I. Arsovski et al., "A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme," IEEE J. Solid-State Circuits, vol. 38, pp. 155-158, 2003.
- [23] FP7 European Project MAGWIRE [Online]. Available: http://pages. ief.u-psud.fr/magwire/Magwire/Homepage.html