Public. This publication contains proprietary information which is subject to change without notice and is supplied ‘as is’, without any warranty of any kind.
Table of Contents

Chapter 1: About This Book ........................................................................................................................................ 2
  1.1: Typographical Conventions ............................................................................................................................ 3
  1.1.1: Italic Text ................................................................................................................................................... 3
  1.1.2: Bold Text .................................................................................................................................................. 3
  1.1.3: Courier Text .............................................................................................................................................. 3
  1.2: UNPREDICTABLE and UNDEFINED ........................................................................................................... 3
    1.2.1: UNPREDICTABLE .................................................................................................................................... 3
    1.2.2: UNDEFINED ............................................................................................................................................ 4
    1.2.3: UNSTABLE ............................................................................................................................................. 4
  1.3: Special Symbols in Pseudocode Notation .................................................................................................... 4
  1.4: Notation for Register Field Accessibility .................................................................................................... 7
  1.5: For More Information ..................................................................................................................................... 9

Chapter 2: Guide to the Instruction Set .................................................................................................................. 10
  2.1: Understanding the Instruction Fields ........................................................................................................... 10
    2.1.1: Instruction Fields ...................................................................................................................................... 12
    2.1.2: Instruction Descriptive Name and Mnemonic .......................................................................................... 12
    2.1.3: Format Field ........................................................................................................................................... 12
    2.1.4: Purpose Field ........................................................................................................................................... 13
    2.1.5: Description Field ..................................................................................................................................... 13
    2.1.6: Restrictions Field .................................................................................................................................... 13
    2.1.7: Availability and Compatibility Fields .................................................................................................. 14
    2.1.8: Operation Field ...................................................................................................................................... 15
    2.1.9: Exceptions Field ..................................................................................................................................... 15
    2.1.10: Programming Notes and Implementation Notes Fields ............................................................... 15
  2.2: Operation Section Notation and Functions .................................................................................................. 16
    2.2.1: Instruction Execution Ordering ........................................................................................................... 16
    2.2.2: Pseudocode Functions .......................................................................................................................... 16
  2.3: Op and Function Subfield Notation ............................................................................................................... 27
  2.4: FPU Instructions ............................................................................................................................................ 27

Chapter 3: The MIPS32® Instruction Set .................................................................................................................. 29
  3.1: Compliance and Subsetting ............................................................................................................................ 29
    3.1.1: Subsetting of Non-Privileged Architecture .......................................................................................... 29
  3.2: Alphabetical List of Instructions .................................................................................................................. 31
    ABS fmt .......................................................................................................................................................... 32
    ADD ............................................................................................................................................................... 33
    ADD.I fmt ........................................................................................................................................................ 34
    ADDI ............................................................................................................................................................... 35
    ADDIU ............................................................................................................................................................ 36
    ADDIUPC ....................................................................................................................................................... 37
    ADDU ............................................................................................................................................................. 38
    ALIGN ........................................................................................................................................................... 39
    ALN.V.PS ....................................................................................................................................................... 41
    ALUIPC .......................................................................................................................................................... 43
    AND ............................................................................................................................................................... 44
    ANDI ............................................................................................................................................................... 45
<table>
<thead>
<tr>
<th>Instruction</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>AUI</td>
<td>47</td>
</tr>
<tr>
<td>AUIPC</td>
<td>48</td>
</tr>
<tr>
<td>B</td>
<td>49</td>
</tr>
<tr>
<td>BAL</td>
<td>50</td>
</tr>
<tr>
<td>BALE</td>
<td>52</td>
</tr>
<tr>
<td>BE</td>
<td>53</td>
</tr>
<tr>
<td>BC1EQZ BC1NEZ</td>
<td>54</td>
</tr>
<tr>
<td>BC1F</td>
<td>56</td>
</tr>
<tr>
<td>BC1FL</td>
<td>58</td>
</tr>
<tr>
<td>BC1T</td>
<td>60</td>
</tr>
<tr>
<td>BC1TL</td>
<td>62</td>
</tr>
<tr>
<td>BC2EQZ BC2NEZ</td>
<td>64</td>
</tr>
<tr>
<td>BC2F</td>
<td>66</td>
</tr>
<tr>
<td>BC2FL</td>
<td>67</td>
</tr>
<tr>
<td>BC2T</td>
<td>69</td>
</tr>
<tr>
<td>BC2TL</td>
<td>70</td>
</tr>
<tr>
<td>BE</td>
<td>72</td>
</tr>
<tr>
<td>BEQL</td>
<td>73</td>
</tr>
<tr>
<td>BGEZ</td>
<td>75</td>
</tr>
<tr>
<td>BGZEAL</td>
<td>76</td>
</tr>
<tr>
<td>B(LE,GE,GT,LT,EQ,NE)ZALC</td>
<td>77</td>
</tr>
<tr>
<td>BGELZ</td>
<td>80</td>
</tr>
<tr>
<td>BEQ</td>
<td>82</td>
</tr>
<tr>
<td>BGZEZ</td>
<td>86</td>
</tr>
<tr>
<td>BGTEZ</td>
<td>88</td>
</tr>
<tr>
<td>BGTEZL</td>
<td>89</td>
</tr>
<tr>
<td>BITSWAP</td>
<td>91</td>
</tr>
<tr>
<td>BLEZ</td>
<td>93</td>
</tr>
<tr>
<td>BLEZL</td>
<td>94</td>
</tr>
<tr>
<td>BLTZ</td>
<td>96</td>
</tr>
<tr>
<td>BLTZAL</td>
<td>97</td>
</tr>
<tr>
<td>BLTZALL</td>
<td>98</td>
</tr>
<tr>
<td>BLTZL</td>
<td>100</td>
</tr>
<tr>
<td>BNE</td>
<td>102</td>
</tr>
<tr>
<td>BNEL</td>
<td>103</td>
</tr>
<tr>
<td>BOVC</td>
<td>105</td>
</tr>
<tr>
<td>BNVC</td>
<td></td>
</tr>
<tr>
<td>BREAK</td>
<td>107</td>
</tr>
<tr>
<td>C.cond.fmt</td>
<td>108</td>
</tr>
<tr>
<td>CACHE</td>
<td>112</td>
</tr>
<tr>
<td>CACHEE</td>
<td>119</td>
</tr>
<tr>
<td>CEIL.L.fmt</td>
<td>125</td>
</tr>
<tr>
<td>CEIL.W.fmt</td>
<td>126</td>
</tr>
<tr>
<td>CFC1</td>
<td>127</td>
</tr>
<tr>
<td>CFC2</td>
<td>129</td>
</tr>
<tr>
<td>CLASS_fmt</td>
<td>130</td>
</tr>
<tr>
<td>CLO</td>
<td>132</td>
</tr>
<tr>
<td>CLZ</td>
<td>133</td>
</tr>
<tr>
<td>CMP.condn.fmt</td>
<td>134</td>
</tr>
<tr>
<td>COP2</td>
<td>139</td>
</tr>
<tr>
<td>CTC1</td>
<td>140</td>
</tr>
<tr>
<td>CTC2</td>
<td>143</td>
</tr>
<tr>
<td>CVT.D.fmt</td>
<td>144</td>
</tr>
<tr>
<td>CVT.L.fmt</td>
<td>145</td>
</tr>
</tbody>
</table>

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
<table>
<thead>
<tr>
<th>Instruction</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>CVT.PS.S</td>
<td>146</td>
</tr>
<tr>
<td>CVT.S.PL</td>
<td>148</td>
</tr>
<tr>
<td>CVT.S.PU</td>
<td>149</td>
</tr>
<tr>
<td>CVT.S.fmt</td>
<td>150</td>
</tr>
<tr>
<td>CVT.W.fmt</td>
<td>151</td>
</tr>
<tr>
<td>DDIV</td>
<td>152</td>
</tr>
<tr>
<td>DDIU</td>
<td>153</td>
</tr>
<tr>
<td>DERET</td>
<td>154</td>
</tr>
<tr>
<td>DIA</td>
<td>155</td>
</tr>
<tr>
<td>DIV</td>
<td>156</td>
</tr>
<tr>
<td>DIV MOD DIVU MODU</td>
<td>158</td>
</tr>
<tr>
<td>DIV.fmt</td>
<td>160</td>
</tr>
<tr>
<td>DIVU</td>
<td>161</td>
</tr>
<tr>
<td>DVP</td>
<td>162</td>
</tr>
<tr>
<td>EHB</td>
<td>165</td>
</tr>
<tr>
<td>EI</td>
<td>166</td>
</tr>
<tr>
<td>ERET</td>
<td>167</td>
</tr>
<tr>
<td>ERETNC</td>
<td>169</td>
</tr>
<tr>
<td>EVP</td>
<td>171</td>
</tr>
<tr>
<td>EXT</td>
<td>173</td>
</tr>
<tr>
<td>FLOOR.L.fmt</td>
<td>175</td>
</tr>
<tr>
<td>FLOOR.W.fmt</td>
<td>176</td>
</tr>
<tr>
<td>INS</td>
<td>177</td>
</tr>
<tr>
<td>J</td>
<td>179</td>
</tr>
<tr>
<td>JAL</td>
<td>180</td>
</tr>
<tr>
<td>JALR</td>
<td>181</td>
</tr>
<tr>
<td>JALR.HB</td>
<td>183</td>
</tr>
<tr>
<td>JALX</td>
<td>187</td>
</tr>
<tr>
<td>JIALC</td>
<td>189</td>
</tr>
<tr>
<td>JIC</td>
<td>191</td>
</tr>
<tr>
<td>JR</td>
<td>192</td>
</tr>
<tr>
<td>JR.HB</td>
<td>194</td>
</tr>
<tr>
<td>LB</td>
<td>197</td>
</tr>
<tr>
<td>LBE</td>
<td>198</td>
</tr>
<tr>
<td>LBU</td>
<td>199</td>
</tr>
<tr>
<td>LBUE</td>
<td>200</td>
</tr>
<tr>
<td>LDC1</td>
<td>201</td>
</tr>
<tr>
<td>LDC2</td>
<td>202</td>
</tr>
<tr>
<td>LDXC1</td>
<td>204</td>
</tr>
<tr>
<td>LH</td>
<td>205</td>
</tr>
<tr>
<td>LHE</td>
<td>206</td>
</tr>
<tr>
<td>LHU</td>
<td>207</td>
</tr>
<tr>
<td>LHUE</td>
<td>208</td>
</tr>
<tr>
<td>LL</td>
<td>209</td>
</tr>
<tr>
<td>LLE</td>
<td>211</td>
</tr>
<tr>
<td>LLX, LLXE</td>
<td>213</td>
</tr>
<tr>
<td>LSA</td>
<td>224</td>
</tr>
<tr>
<td>LUI</td>
<td>225</td>
</tr>
<tr>
<td>LUXC1</td>
<td>226</td>
</tr>
<tr>
<td>LW</td>
<td>227</td>
</tr>
<tr>
<td>LWC1</td>
<td>228</td>
</tr>
<tr>
<td>LWC2</td>
<td>229</td>
</tr>
<tr>
<td>LWE</td>
<td>231</td>
</tr>
<tr>
<td>Instruction</td>
<td>Page</td>
</tr>
<tr>
<td>-------------------</td>
<td>------</td>
</tr>
<tr>
<td>LWL</td>
<td>232</td>
</tr>
<tr>
<td>LWLE</td>
<td>234</td>
</tr>
<tr>
<td>LWPC</td>
<td>237</td>
</tr>
<tr>
<td>LWR</td>
<td>238</td>
</tr>
<tr>
<td>LWRE</td>
<td>241</td>
</tr>
<tr>
<td>LWXC1</td>
<td>244</td>
</tr>
<tr>
<td>MADD</td>
<td>245</td>
</tr>
<tr>
<td>MADD.fmt</td>
<td>246</td>
</tr>
<tr>
<td>MADDFF.mt</td>
<td>249</td>
</tr>
<tr>
<td>MADDU</td>
<td>251</td>
</tr>
<tr>
<td>MAX.fmt</td>
<td>252</td>
</tr>
<tr>
<td>MIN.fmt</td>
<td>253</td>
</tr>
<tr>
<td>MAXA.fmt</td>
<td>254</td>
</tr>
<tr>
<td>MINA.fmt</td>
<td>255</td>
</tr>
<tr>
<td>MFC0</td>
<td>256</td>
</tr>
<tr>
<td>MFC1</td>
<td>257</td>
</tr>
<tr>
<td>MFC2</td>
<td>258</td>
</tr>
<tr>
<td>MFHC0</td>
<td>259</td>
</tr>
<tr>
<td>MFHC1</td>
<td>260</td>
</tr>
<tr>
<td>MFHC2</td>
<td>261</td>
</tr>
<tr>
<td>MFHI</td>
<td>262</td>
</tr>
<tr>
<td>MFLD</td>
<td>263</td>
</tr>
<tr>
<td>MOV.fmt</td>
<td>264</td>
</tr>
<tr>
<td>MOVF</td>
<td>265</td>
</tr>
<tr>
<td>MOVF.fmt</td>
<td>266</td>
</tr>
<tr>
<td>MOVN</td>
<td>268</td>
</tr>
<tr>
<td>MOVN.fmt</td>
<td>269</td>
</tr>
<tr>
<td>MOVV</td>
<td>270</td>
</tr>
<tr>
<td>MOVT</td>
<td>271</td>
</tr>
<tr>
<td>MOVT.fmt</td>
<td>271</td>
</tr>
<tr>
<td>MOVZ</td>
<td>273</td>
</tr>
<tr>
<td>MOVZ.fmt</td>
<td>274</td>
</tr>
<tr>
<td>MSUB</td>
<td>275</td>
</tr>
<tr>
<td>MSUB.fmt</td>
<td>276</td>
</tr>
<tr>
<td>MSUBU</td>
<td>278</td>
</tr>
<tr>
<td>MTC0</td>
<td>279</td>
</tr>
<tr>
<td>MTC1</td>
<td>281</td>
</tr>
<tr>
<td>MTC2</td>
<td>282</td>
</tr>
<tr>
<td>MTHC0</td>
<td>283</td>
</tr>
<tr>
<td>MTHC1</td>
<td>284</td>
</tr>
<tr>
<td>MTHC2</td>
<td>285</td>
</tr>
<tr>
<td>MTHI</td>
<td>286</td>
</tr>
<tr>
<td>MTLO</td>
<td>287</td>
</tr>
<tr>
<td>MUL</td>
<td>288</td>
</tr>
<tr>
<td>MULU</td>
<td>289</td>
</tr>
<tr>
<td>MULU.fmt</td>
<td>291</td>
</tr>
<tr>
<td>MULT</td>
<td>292</td>
</tr>
<tr>
<td>MULTU</td>
<td>293</td>
</tr>
<tr>
<td>NAL</td>
<td>294</td>
</tr>
<tr>
<td>NEG.fmt</td>
<td>295</td>
</tr>
<tr>
<td>NMADD.fmt</td>
<td>296</td>
</tr>
<tr>
<td>NMSUB.fmt</td>
<td>298</td>
</tr>
<tr>
<td>NOP</td>
<td>300</td>
</tr>
<tr>
<td>NOR</td>
<td>301</td>
</tr>
<tr>
<td>OR</td>
<td>302</td>
</tr>
<tr>
<td>ORI</td>
<td>303</td>
</tr>
<tr>
<td>PAUSE</td>
<td>305</td>
</tr>
</tbody>
</table>
Appendix A: Instruction Bit Encodings ................................................................. 441
  A.1: Instruction Encodings and Instruction Classes ........................................ 441
  A.2: Instruction Bit Encoding Tables ................................................................. 441
  A.3: Floating Point Unit Instruction Format Encodings .................................... 452
  A.4: Release 6 Instruction Encodings ............................................................... 454

Appendix B: Revision History ............................................................................. 459
List of Figures

Figure 2.1: Example of Instruction Description ........................................................................ 11
Figure 2.2: Example of Instruction Fields ................................................................................ 12
Figure 2.3: Example of Instruction Descriptive Name and Mnemonic ........................................ 12
Figure 2.4: Example of Instruction Format .................................................................................. 12
Figure 2.5: Example of Instruction Purpose .................................................................................. 13
Figure 2.6: Example of Instruction Description .............................................................................. 13
Figure 2.7: Example of Instruction Restrictions ............................................................................. 14
Figure 2.8: Example of Instruction Operation ............................................................................... 15
Figure 2.9: Example of Instruction Exception ............................................................................... 15
Figure 2.10: Example of Instruction Programming Notes ............................................................ 16
Figure 2.11: COP_LW Pseudocode Function ................................................................................. 16
Figure 2.12: COP_LD Pseudocode Function ................................................................................. 17
Figure 2.13: COP_SW Pseudocode Function ................................................................................. 17
Figure 2.14: COP_SD Pseudocode Function ................................................................................. 17
Figure 2.15: CoprocessorOperation Pseudocode Function ............................................................ 18
Figure 2.16: MisalignedSupport Pseudocode Function ................................................................. 18
Figure 2.17: AddressTranslation Pseudocode Function ................................................................. 19
Figure 2.18: LoadMemory Pseudocode Function .......................................................................... 19
Figure 2.19: StoreMemory Pseudocode Function ........................................................................ 20
Figure 2.20: Prefetch Pseudocode Function ................................................................................... 20
Figure 2.21: SyncOperation Pseudocode Function ....................................................................... 21
Figure 2.22: ValueFPR Pseudocode Function ............................................................................... 21
Figure 2.23: StoreFPR Pseudocode Function ................................................................................ 22
Figure 2.24: CheckFPException Pseudocode Function ................................................................. 23
Figure 2.25: FPConditionCode Pseudocode Function ................................................................. 23
Figure 2.26: SetFPConditionCode Pseudocode Function ............................................................. 24
Figure 2.27: sign_extend Pseudocode Functions ........................................................................... 24
Figure 2.28: memory_address Pseudocode Function ................................................................. 25
Figure 2.29: Instruction Fetch Implicit memory_address Wrapping ............................................. 25
Figure 2.30: AddressTranslation implicit memory_address Wrapping ....................................... 25
Figure 2.31: SignalException Pseudocode Function .................................................................... 26
Figure 2.32: SignalDebugBreakpointException Pseudocode Function ...................................... 26
Figure 2.33: SignalDebugModeBreakpointException Pseudocode Function ............................ 26
Figure 2.34: NullifyCurrentInstruction PseudoCode Function ...................................................... 26
Figure 2.35: PolyMult Pseudocode Function ............................................................................... 27
Figure 2.36: ALIGN operation (32-bit) ......................................................................................... 39
Figure 3.1: Example of an ALNV.PS Operation ........................................................................ 41
Figure 3.2: Usage of Address Fields to Select Index and Way ...................................................... 113
Figure 3.3: Usage of Address Fields to Select Index and Way ...................................................... 119
Figure 3.4: Operation of the EXT Instruction .............................................................................. 173
Figure 3.5: Operation of the INS Instruction ............................................................................... 177
Figure 4.1: Unaligned Word Load Using LWL and LWR ............................................................. 232
Figure 4.2: Bytes Loaded by LWL Instruction ............................................................................... 233
Figure 4.3: Unaligned Word Load Using LWL and LWRE ............................................................ 234
Figure 4.4: Bytes Loaded by LWLE Instruction ............................................................................ 235
Figure 4.5: Unaligned Word Load Using LWL and LWR ............................................................. 238
Figure 4.6: Bytes Loaded by LWR Instruction ............................................................................... 239
List of Tables

Table 1.1: Symbols Used in Instruction Operation Statements ............................................................................. 4
Table 1.2: Read/Write Register Field Notation ........................................................................................................ 7
Table 2.1: AccessLength Specifications for Loads/Store ........................................................................................... 20
Table 3.1: FPU Comparisons Without Special Operand Exceptions ........................................................................... 109
Table 3.2: FPU Comparisons With Special Operand Exceptions for QNaNs ............................................................... 110
Table 3.3: Usage of Effective Address ..................................................................................................................... 112
Table 3.4: Encoding of Bits[17:16] of CACHE Instruction ........................................................................................... 113
Table 3.5: Encoding of Bits [20:18] of the CACHE Instruction ...................................................................................... 114
Table 3.6: Usage of Effective Address ..................................................................................................................... 119
Table 3.7: Encoding of Bits[17:16] of CACHEE Instruction ......................................................................................... 120
Table 3.8: Encoding of Bits [20:18] of the CACHEE Instruction .................................................................................... 121
Table 3.10: Recommended and non-recommended LL/SC family instructions to start and end atomic code sequences ........................................................................................................................................... 216
Table 4.1: Special Cases for FP MAX, MIN, MAXA, MINA ........................................................................................... 254
Table 5.2: Values of hint Field for PREF Instruction .................................................................................................. 310
Table 5.3: Values of hint Field for PREFE Instruction ................................................................................................ 314
Table 5.4: RDHWR Register Numbers ..................................................................................................................... 320
Table 5.5: Recommended and non-recommended LL/SC family instructions to start and end atomic code sequences ........................................................................................................................................... 344
Table 5.6: Encodings of the Bits[10:6] of the SYNC instruction; the SType Field ........................................................ 401
Table A.1: Symbols Used in the Instruction Encoding Tables ...................................................................................... 442
Table A.2: MIPS32 Encoding of the Opcode Field .................................................................................................... 444
Table A.3: MIPS32 SPECIAL Opcode Encoding of Function Field .............................................................................. 445
Table A.4: MIPS32 REGIMM Encoding of rt Field .................................................................................................... 445
Table A.5: MIPS32 SPECIAL2 Encoding of Function Field ........................................................................................ 446
Table A.6: MIPS32 SPECIAL3 Encoding of Function Field for Release 2 of the Architecture ............................................ 446
Table A.7: MIPS32 MOVCF6R Encoding of tf Bit ......................................................................................................... 446
Table A.8: MIPS32 SRL Encoding of Shift/Rotate ..................................................................................................... 447
Table A.9: MIPS32 SRLV Encoding of Shift/Rotate .................................................................................................. 447
Table A.10: MIPS32 BSHFL Encoding of sa Field ....................................................................................................... 447
Table A.11: MIPS32 COP0 Encoding of rs Field ........................................................................................................ 448
Table A.12: MIPS32 COP0 Encoding of Function Field When rs=CO........................................................................... 448
Table A.13: PCREL Encoding of Minor Opcode Field ................................................................................................ 448
Table A.14: MIPS32 Encoding of rs Field ................................................................................................................ 449
Table A.15: MIPS32 COP1 Encoding of Function Field When rs=S ............................................................................. 449
Table A.16: MIPS32 COP1 Encoding of Function Field When rs=D ........................................................................... 450
Table A.17: MIPS32 COP1 Encoding of Function Field When rs=W or L .................................................................. 450
Table A.18: MIPS32 COP1 Encoding of Function Field When rs=PS ........................................................................ 451
Table A.19: MIPS32 COP1 Encoding of tf Bit When rs=S, D, or PS6R, Function=MOVCF6R ........................................... 451
Table A.20: MIPS32 COP2 Encoding of rs Field ........................................................................................................ 451
Table A.21: MIPS32 COP1X6R Encoding of Function Field ........................................................................................ 452
Table A.22: Floating Point Unit Instruction Format Encodings ..................................................................................... 452
Table A.23: Release 6 MUL/DIV encodings ................................................................................................................ 455
Table A.24: Release 6 PC-relative family encoding .................................................................................................. 455
Table A.25: Release 6 PC-relative family encoding bitstrings ....................................................................................... 456
Table A.26: B*C compact branch encodings .............................................................................................................. 457
Chapter 1

About This Book

The The MIPS32® Instruction Set Manual comes as part of a multi-volume set.

- Volume I-A describes conventions used throughout the document set, and provides an introduction to the MIPS32® Architecture
- Volume I-B describes conventions used throughout the document set, and provides an introduction to the micro-MIPS™ Architecture
- Volume II-A provides detailed descriptions of each instruction in the MIPS32® instruction set
- Volume II-B provides detailed descriptions of each instruction in the microMIPS32™ instruction set
- Volume III describes the MIPS32® and microMIPS32™ Privileged Resource Architecture which defines and governs the behavior of the privileged resources included in a MIPS® processor implementation
- Volume IV-a describes the MIPS16e™ Application-Specific Extension to the MIPS32® Architecture. Beginning with Release 3 of the Architecture, microMIPS is the preferred solution for smaller code size. Release 6 removes MIPS16e: MIPS16e cannot be implemented with Release 6.
- Volume IV-b describes the MDMX™ Application-Specific Extension to the MIPS64® Architecture and microMIPS64™. It is not applicable to the MIPS32® document set nor the microMIPS32™ document set. With Release 5 of the Architecture, MDMX is deprecated. MDMX and MSA can not be implemented at the same time. Release 6 removes MDMX: MDMX cannot be implemented with Release 6.
- Volume IV-c describes the MIPS-3D® Application-Specific Extension to the MIPS® Architecture. Release 6 removes MIPS-3D: MIPS-3D cannot be implemented with Release 6.
- Volume IV-d describes the SmartMIPS® Application-Specific Extension to the MIPS32® Architecture and the microMIPS32™ Architecture. Release 6 removes SmartMIPS: SmartMIPS cannot be implemented with Release 6, neither MIPS32 Release 6 nor MIPS64 Release 6.
- Volume IV-e describes the MIPS® DSP Module to the MIPS® Architecture.
- Volume IV-f describes the MIPS® MT Module to the MIPS® Architecture
- Volume IV-h describes the MIPS® MCU Application-Specific Extension to the MIPS® Architecture
- Volume IV-i describes the MIPS® Virtualization Module to the MIPS® Architecture
- Volume IV-j describes the MIPS® SIMD Architecture Module to the MIPS® Architecture
1.1 Typographical Conventions

This section describes the use of *italic*, **bold** and **courier** fonts in this book.

1.1.1 Italic Text

- is used for *emphasis*
- is used for *bits, fields, and registers* that are important from a software perspective (for instance, address bits used by software, and programmable fields and registers), and various *floating point instruction formats*, such as $S$ and $D$
- is used for the memory access types, such as *cached* and *uncached*

1.1.2 Bold Text

- represents a term that is being defined
- is used for *bits and fields* that are important from a hardware perspective (for instance, *register* bits, which are not programmable but accessible only to hardware)
- is used for ranges of numbers; the range is indicated by an ellipsis. For instance, $5..1$ indicates numbers 5 through 1
- is used to emphasize **UNPREDICTABLE** and **UNDEFINED** behavior, as defined below.

1.1.3 Courier Text

**Courier** fixed-width font is used for text that is displayed on the screen, and for examples of code and instruction pseudocode.

1.2 UNPREDICTABLE and UNDEFINED

The terms **UNPREDICTABLE** and **UNDEFINED** are used throughout this book to describe the behavior of the processor in certain cases. **UNDEFINED** behavior or operations can occur only as the result of executing instructions in a privileged mode (i.e., in Kernel Mode or Debug Mode, or with the CP0 usable bit set in the Status register). Unprivileged software can never cause **UNDEFINED** behavior or operations. Conversely, both privileged and unprivileged software can cause **UNPREDICTABLE** results or operations.

1.2.1 UNPREDICTABLE

**UNPREDICTABLE** results may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. Software can never depend on results that are **UNPREDICTABLE**. **UNPREDICTABLE** operations may cause a result to be generated or not. If a result is generated, it is **UNPREDICTABLE**. **UNPREDICTABLE** operations may cause arbitrary exceptions.

**UNPREDICTABLE** results or operations have several implementation restrictions:

- Implementations of operations generating **UNPREDICTABLE** results must not depend on any data source (memory or internal state) which is inaccessible in the current processor mode
1.3 Special Symbols in Pseudocode Notation

- **UNPREDICTABLE** operations must not read, write, or modify the contents of memory or internal state which is inaccessible in the current processor mode. For example, **UNPREDICTABLE** operations executed in user mode must not access memory or internal state that is only accessible in Kernel Mode or Debug Mode or in another process.

- **UNPREDICTABLE** operations must not halt or hang the processor

### 1.2.2 UNDEFINED

**UNDEFINED** operations or behavior may vary from processor implementation to implementation, instruction to instruction, or as a function of time on the same implementation or instruction. **UNDEFINED** operations or behavior may vary from nothing to creating an environment in which execution can no longer continue. **UNDEFINED** operations or behavior may cause data loss.

**UNDEFINED** operations or behavior has one implementation restriction:

- **UNDEFINED** operations or behavior must not cause the processor to hang (that is, enter a state from which there is no exit other than powering down the processor). The assertion of any of the reset signals must restore the processor to an operational state

### 1.2.3 UNSTABLE

**UNSTABLE** results or values may vary as a function of time on the same implementation or instruction. Unlike **UNPREDICTABLE** values, software may depend on the fact that a sampling of an **UNSTABLE** value results in a legal transient value that was correct at some point in time prior to the sampling.

**UNSTABLE** values have one implementation restriction:

- Implementations of operations generating **UNSTABLE** results must not depend on any data source (memory or internal state) which is inaccessible in the current processor mode

### 1.3 Special Symbols in Pseudocode Notation

In this book, algorithmic descriptions of an operation are described using a high-level language pseudocode resembling Pascal. Special symbols used in the pseudocode notation are listed in Table 1.1.

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>←</td>
<td>Assignment</td>
</tr>
<tr>
<td>=, ≠</td>
<td>Tests for equality and inequality</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td>xy</td>
<td>A $y$-bit string formed by $y$ copies of the single-bit value $x$</td>
</tr>
<tr>
<td>b\text{n}</td>
<td>A constant value $n$ in base $b$. For instance 10#100 represents the decimal value 100, 2#100 represents the binary value 100 (decimal 4), and 16#100 represents the hexadecimal value 100 (decimal 256). If the &quot;bit&quot; prefix is omitted, the default base is 10.</td>
</tr>
<tr>
<td>0\text{bn}</td>
<td>A constant value $n$ in base 2. For instance 0b100 represents the binary value 100 (decimal 4).</td>
</tr>
<tr>
<td>0\text{xn}</td>
<td>A constant value $n$ in base 16. For instance 0x100 represents the hexadecimal value 100 (decimal 256).</td>
</tr>
</tbody>
</table>
**Table 1.1 Symbols Used in Instruction Operation Statements (Continued)**

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>$x_{y..z}$</td>
<td>Selection of bits $y$ through $z$ of bit string $x$. Little-endian bit notation (rightmost bit is 0) is used. If $y$ is less than $z$, this expression is an empty (zero length) bit string.</td>
</tr>
<tr>
<td>$x$.bit[$y$]</td>
<td>Bit $y$ of bitstring $x$. Alternative to the traditional MIPS notation $x_y$.</td>
</tr>
<tr>
<td>$x$.bits[$y..z$]</td>
<td>Selection of bits $y$ through $z$ of bit string $x$. Alternative to the traditional MIPS notation $x_{y..z}$.</td>
</tr>
<tr>
<td>$x$.byte[$y$]</td>
<td>Byte $y$ of bitstring $x$. Equivalent to the traditional MIPS notation $x_8<em>7..8</em>y$.</td>
</tr>
<tr>
<td>$x$.bytes[$y..z$]</td>
<td>Selection of bytes $y$ through $z$ of bit string $x$. Alternative to the traditional MIPS notation $x_8<em>7..8</em>z$.</td>
</tr>
<tr>
<td>$x$.halfword[$y$], $x$.word[$i$], $x$.doubleword[$i$]</td>
<td>Similar extraction of particular bitfields (used in e.g., MSA packed SIMD vectors).</td>
</tr>
<tr>
<td>$x$.bit31, $x$.byte0, etc.</td>
<td>Examples of abbreviated form of $x$.bit[$y$], etc. notation, when $y$ is a constant.</td>
</tr>
<tr>
<td>$x$.field[$y$]</td>
<td>Selection of a named subfield of bitstring $x$, typically a register or instruction encoding. More formally described as “Field $y$ of register $x$”. For example, FIR.D = “the D bit of the Coprocessor 1 Floating-point Implementation Register (FIR)”.</td>
</tr>
<tr>
<td>$+, -$</td>
<td>2’s complement or floating point arithmetic: addition, subtraction</td>
</tr>
<tr>
<td>$*, \times$</td>
<td>2’s complement or floating point multiplication (both used for either)</td>
</tr>
<tr>
<td>div</td>
<td>2’s complement integer division</td>
</tr>
<tr>
<td>mod</td>
<td>2’s complement modulo</td>
</tr>
<tr>
<td>$/$</td>
<td>Floating point division</td>
</tr>
<tr>
<td>$&lt;$</td>
<td>2’s complement less-than comparison</td>
</tr>
<tr>
<td>$&gt;$</td>
<td>2’s complement greater-than comparison</td>
</tr>
<tr>
<td>$\leq$</td>
<td>2’s complement less-than or equal comparison</td>
</tr>
<tr>
<td>$\geq$</td>
<td>2’s complement greater-than or equal comparison</td>
</tr>
<tr>
<td>nor</td>
<td>Bitwise logical NOR</td>
</tr>
<tr>
<td>xor</td>
<td>Bitwise logical XOR</td>
</tr>
<tr>
<td>and</td>
<td>Bitwise logical AND</td>
</tr>
<tr>
<td>or</td>
<td>Bitwise logical OR</td>
</tr>
<tr>
<td>not</td>
<td>Bitwise inversion</td>
</tr>
<tr>
<td>&amp;&amp;</td>
<td>Logical (non-Bitwise) AND</td>
</tr>
<tr>
<td>$&lt;&lt;&lt;$</td>
<td>Logical Shift left (shift in zeros at right-hand-side)</td>
</tr>
<tr>
<td>$&gt;&gt;$</td>
<td>Logical Shift right (shift in zeros at left-hand-side)</td>
</tr>
<tr>
<td>GPRLEN</td>
<td>The length in bits (32 or 64) of the CPU general-purpose registers</td>
</tr>
<tr>
<td>GPR[$x$]</td>
<td>CPU general-purpose register $x$. The content of GPR[0] is always zero. In Release 2 of the Architecture, GPR[$x$] is a short-hand notation for SGPR[SRSCtlCSS, $x$].</td>
</tr>
<tr>
<td>SGPR[$s,x$]</td>
<td>In Release 2 of the Architecture and subsequent releases, multiple copies of the CPU general-purpose registers may be implemented. SGPR[$s,x$] refers to GPR set $s$, register $x$.</td>
</tr>
<tr>
<td>FPR[$x$]</td>
<td>Floating Point operand register $x$</td>
</tr>
<tr>
<td>FCC[CC]</td>
<td>Floating Point condition code CC. FCC[0] has the same value as COC[1]. Release 6 removes the floating point condition codes.</td>
</tr>
<tr>
<td>FPR[$x$]</td>
<td>Floating Point (Coprocessor unit 1), general register $x$</td>
</tr>
</tbody>
</table>
## 1.3 Special Symbols in Pseudocode Notation

### Table 1.1 Symbols Used in Instruction Operation Statements (Continued)

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPR[z,x,s]</td>
<td>Coprocessor unit z, general register x, select s</td>
</tr>
<tr>
<td>CP2CPR[x]</td>
<td>Coprocessor unit 2, general register x</td>
</tr>
<tr>
<td>CCR[z,x]</td>
<td>Coprocessor unit z, control register x</td>
</tr>
<tr>
<td>CP2CCR[x]</td>
<td>Coprocessor unit 2, control register x</td>
</tr>
<tr>
<td>COC[z]</td>
<td>Coprocessor unit z condition signal</td>
</tr>
<tr>
<td>Xlat[x]</td>
<td>Translation of the MIPS16e GPR number x into the corresponding 32-bit GPR number</td>
</tr>
<tr>
<td>BigEndianMem</td>
<td>Endian mode as configured at chip reset (0 → Little-Endian, 1 → Big-Endian). Specifies the endianness of the memory interface (see LoadMemory and StoreMemory pseudocode function descriptions) and the endianness of Kernel and Supervisor mode execution.</td>
</tr>
<tr>
<td>BigEndianCPU</td>
<td>The endianness for load and store instructions (0 → Little-Endian, 1 → Big-Endian). In User mode, this endianness may be switched by setting the RE bit in the Status register. Thus, BigEndianCPU may be computed as (BigEndianMem XOR ReverseEndian).</td>
</tr>
<tr>
<td>ReverseEndian</td>
<td>Signal to reverse the endianness of load and store instructions. This feature is available in User mode only, and is implemented by setting the RE bit of the Status register. Thus, ReverseEndian may be computed as (SRRE and User mode).</td>
</tr>
<tr>
<td>LLbit</td>
<td>Bit of virtual state used to specify operation for instructions that provide atomic read-modify-write. LLbit is set when a linked load occurs and is tested by the conditional store. It is cleared, during other CPU operation, when a store to the location would no longer be atomic. In particular, it is cleared by exception return instructions.</td>
</tr>
<tr>
<td>I, I+n, I-n</td>
<td>This occurs as a prefix to Operation description lines and functions as a label. It indicates the instruction time during which the pseudocode appears to “execute.” Unless otherwise indicated, all effects of the current instruction appear to occur during the instruction time of the current instruction. No label is equivalent to a time label of I. Sometimes effects of an instruction appear to occur either earlier or later — that is, during the instruction time of another instruction. When this happens, the instruction operation is written in sections labeled with the instruction time, relative to the current instruction I, in which the effect of that pseudocode appears to occur. For example, an instruction may have a result that is not available until after the next instruction. Such an instruction has the portion of the instruction operation description that writes the result register in a section labeled I+1. The effect of pseudocode statements for the current instruction labeled I+1 appears to occur “at the same time” as the effect of pseudocode statements labeled I for the following instruction. Within one pseudocode sequence, the effects of the statements take place in order. However, between sequences of statements for different instructions that occur “at the same time,” there is no defined order. Programs must not depend on a particular order of evaluation between such sections.</td>
</tr>
<tr>
<td>PC</td>
<td>The Program Counter value. During the instruction time of an instruction, this is the address of the instruction word. The address of the instruction that occurs during the next instruction time is determined by assigning a value to PC during an instruction time. If no value is assigned to PC during an instruction time by any pseudocode statement, it is automatically incremented by either 2 (in the case of a 16-bit MIPS16e instruction) or 4 before the next instruction time. A taken branch assigns the target address to the PC during the instruction time of the instruction in the branch delay slot. In the MIPS Architecture, the PC value is only visible indirectly, such as when the processor stores the restart address into a GPR on a jump-and-link or branch-and-link instruction, or into a Coprocessor 0 register on an exception. Release 6 adds PC-relative address computation and load instructions. The PC value contains a full 32-bit address, all of which are significant during a memory reference.</td>
</tr>
</tbody>
</table>
1.4 Notation for Register Field Accessibility

In this document, the read/write properties of register fields use the notations shown in Table 1.1.

### Table 1.2 Read/Write Register Field Notation

<table>
<thead>
<tr>
<th>Read/Write Notation</th>
<th>Hardware Interpretation</th>
<th>Software Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>R/W</td>
<td>A field in which all bits are readable and writable by software and, potentially, by hardware. Hardware updates of this field are visible by software read. Software updates of this field are visible by hardware read. If the Reset State of this field is “Undefined”, either software or hardware must initialize the value before the first read will return a predictable value. This should not be confused with the formal definition of UNDEFINED behavior.</td>
<td></td>
</tr>
</tbody>
</table>
1.4 Notation for Register Field Accessibility

Table 1.2 Read/Write Register Field Notation  (Continued)

<table>
<thead>
<tr>
<th>Read/Write Notation</th>
<th>Hardware Interpretation</th>
<th>Software Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>R</td>
<td>A field which is either static or is updated only by hardware. If the Reset State of this field is either “0”, “Preset”, or “Externally Set”, hardware initializes this field to zero or to the appropriate state, respectively, on powerup. The term “Preset” is used to suggest that the processor establishes the appropriate state, whereas the term “Externally Set” is used to suggest that the state is established via an external source (e.g., personality pins or initialization bit stream). These terms are suggestions only, and are not intended to act as a requirement on the implementation. If the Reset State of this field is “Undefined”, hardware updates this field only under those conditions specified in the description of the field.</td>
<td>A field to which the value written by software is ignored by hardware. Software may write any value to this field without affecting hardware behavior. Software reads of this field return the last value updated by hardware. If the Reset State of this field is “Undefined”, software reads of this field result in an UNPREDICTABLE value except after a hardware update done under the conditions specified in the description of the field.</td>
</tr>
<tr>
<td>R0</td>
<td>R0 = reserved, read as zero, ignore writes by software. Hardware ignores software writes to an R0 field. Neither the occurrence of such writes, nor the values written, affects hardware behavior. Hardware always returns 0 to software reads of R0 fields. The Reset State of an R0 field must always be 0. If software performs an mtc0 instruction which writes a non-zero value to an R0 field, the write to the R0 field will be ignored, but permitted writes to other fields in the register will not be affected.</td>
<td>Architectural Compatibility: R0 fields are reserved, and may be used for not-yet-defined purposes in future revisions of the architecture. When writing an R0 field, current software should only write either all 0s, or, preferably, write back the same value that was read from the field. Current software should not assume that the value read from R0 fields is zero, because this may not be true on future hardware. Future revisions of the architecture may redefine an R0 field, but must do so in such a way that software which is unaware of the new definition and either writes zeros or writes back the value it has read from the field will continue to work correctly. Writing back the same value that was read is guaranteed to have no unexpected effects on current or future hardware behavior. (Except for non-atomicity of such read-writes.) Writing zeros to an R0 field may not be preferred because in the future this may interfere with the operation of other software which has been updated for the new field definition.</td>
</tr>
</tbody>
</table>
### Table 1.2 Read/Write Register Field Notation (Continued)

<table>
<thead>
<tr>
<th>Read/Write Notation</th>
<th>Hardware Interpretation</th>
<th>Software Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td><strong>Release 6</strong></td>
<td></td>
</tr>
<tr>
<td></td>
<td>Release 6 legacy “0” behaves like R0 - read as zero, nonzero writes ignored. Legacy “0” should not be defined for any new control register fields; R0 should be used instead.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>HW returns 0 when read.</td>
<td>Only zero should be written, or, value read from register.</td>
</tr>
<tr>
<td></td>
<td>HW ignores writes.</td>
<td></td>
</tr>
<tr>
<td></td>
<td><strong>pre-Release 6</strong></td>
<td></td>
</tr>
<tr>
<td></td>
<td>pre-Release 6 legacy “0” - read as zero, nonzero writes UNDEFINED</td>
<td></td>
</tr>
<tr>
<td></td>
<td>A field which hardware does not update, and for which hardware can assume a zero value.</td>
<td></td>
</tr>
<tr>
<td>R/W0</td>
<td>Like R/W, except that writes of non-zero to a R/W0 field are ignored. E.g. Status.NMI</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Hardware may set or clear an R/W0 bit.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Hardware ignores software writes of nonzero to an R/W0 field. Neither the occurrence of such writes, nor the values written, affects hardware behavior.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Software writes of 0 to an R/W0 field may have an effect.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Hardware may return 0 or nonzero to software reads of an R/W0 bit.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>If software performs an mtc0 instruction which writes a non-zero value to an R/W0 field, the write to the R/W0 field will be ignored, but permitted writes to other fields in the register will not be affected.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Software can only clear an R/W0 bit.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Software writes 0 to an R/W0 field to clear the field.</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Software writes nonzero to an R/W0 bit in order to guarantee that the bit is not affected by the write.</td>
<td></td>
</tr>
</tbody>
</table>

### 1.5 For More Information

MIPS processor manuals and additional information about MIPS products can be found at http://www.imgtec.com.

For comments or questions on the MIPS32® Architecture or this document, send Email to IMGBA-DocFeed-back@imgtec.com.
Chapter 2

Guide to the Instruction Set

This chapter provides a detailed guide to understanding the instruction descriptions, which are listed in alphabetical order in the tables at the beginning of the next chapter.

2.1 Understanding the Instruction Fields

Figure 2.1 shows an example instruction. Following the figure are descriptions of the fields listed below:

- “Instruction Fields” on page 12
- “Instruction Descriptive Name and Mnemonic” on page 12
- “Format Field” on page 12
- “Purpose Field” on page 13
- “Description Field” on page 13
- “Restrictions Field” on page 13
- “Operation Field” on page 15
- “Exceptions Field” on page 15
- “Programming Notes and Implementation Notes Fields” on page 15
Figure 2.1 Example of Instruction Description

**Example Instruction Name**

**EXAMPLE**

31 26 25 21 20 16 15 11 10 6 5 0

**EXAMPLE**

000000 0r t r d 0

00000

6 5 5555 6

**Format:** EXAMPLE fd,rs,rt

**Purpose:** Example Instruction Name

To execute an EXAMPLE op.

**Description:** GPR[rd] ← GPR[rs] exampleop GPR[rt]

This section describes the operation of the instruction in text, tables, and illustrations. It includes information that would be difficult to encode in the Operation section.

**Restrictions:**

This section lists any restrictions for the instruction. This can include values of the instruction encoding fields such as register specifiers, operand values, operand formats, address alignment, instruction scheduling hazards, and type of memory access for addressed locations.

**Operation:**

```c
/* This section describes the operation of an instruction in */
/* a high-level pseudo-language. It is precise in ways that */
/* the Description section is not, but is also missing */
/* information that is hard to express in pseudocode. */
temp ← GPR[rs] exampleop GPR[rt]
GPR[rd] ← sign_extend(temp31..0)
```

**Exceptions:**

A list of exceptions taken by the instruction.

**Programming Notes:**

Information useful to programmers, but not necessary to describe the operation of the instruction.

**Implementation Notes:**

Like Programming Notes, except for processor implementors.
2.1 Understanding the Instruction Fields

2.1.1 Instruction Fields

Fields encoding the instruction word are shown in register form at the top of the instruction description. The following rules are followed:

- The values of constant fields and the opcode names are listed in uppercase (SPECIAL and ADD in Figure 2.2). Constant values in a field are shown in binary below the symbolic or hexadecimal value.

- All variable fields are listed with the lowercase names used in the instruction description (rs, rt, and rd in Figure 2.2).

- Fields that contain zeros but are not named are unused fields that are required to be zero (bits 10:6 in Figure 2.2). If such fields are set to non-zero values, the operation of the processor is UNPREDICTABLE.

![Figure 2.2 Example of Instruction Fields](image)

2.1.2 Instruction Descriptive Name and Mnemonic

The instruction descriptive name and mnemonic are printed as page headings for each instruction, as shown in Figure 2.3.

![Figure 2.3 Example of Instruction Descriptive Name and Mnemonic](image)

2.1.3 Format Field

The assembler formats for the instruction and the architecture level at which the instruction was originally defined are given in the Format field. If the instruction definition was later extended, the architecture levels at which it was extended and the assembler formats for the extended definition are shown in their order of extension (for an example, see C.cond.fmt). The MIPS architecture levels are inclusive; higher architecture levels include all instructions in previous levels. Extensions to instructions are backwards compatible. The original assembler formats are valid for the extended architecture.

![Figure 2.4 Example of Instruction Format](image)
as “MIPS64, MIPS32 Release 2”. Instructions removed by particular architecture release are indicated in the Availability section.

There can be more than one assembler format for each architecture level. Floating point operations on formatted data show an assembly format with the actual assembler mnemonic for each valid value of the `fmt` field. For example, the ADD.fmt instruction lists both ADD.S and ADD.D.

The assembler format lines sometimes include parenthetical comments to help explain variations in the formats (once again, see C.cond.fmt). These comments are not a part of the assembler format.

### 2.1.4 Purpose Field

The *Purpose* field gives a short description of the use of the instruction.

**Figure 2.5 Example of Instruction Purpose**

<table>
<thead>
<tr>
<th>Purpose: Add Word</th>
</tr>
</thead>
<tbody>
<tr>
<td>To add 32-bit integers. If an overflow occurs, then trap.</td>
</tr>
</tbody>
</table>

### 2.1.5 Description Field

If a one-line symbolic description of the instruction is feasible, it appears immediately to the right of the *Description* heading. The main purpose is to show how fields in the instruction are used in the arithmetic or logical operation.

**Figure 2.6 Example of Instruction Description**

<table>
<thead>
<tr>
<th>Description:</th>
<th>GPR[rd] ← GPR[rs] + GPR[rt]</th>
</tr>
</thead>
<tbody>
<tr>
<td>The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit result.</td>
<td></td>
</tr>
<tr>
<td>• If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs.</td>
<td></td>
</tr>
<tr>
<td>• If the addition does not overflow, the 32-bit result is placed into GPR rd.</td>
<td></td>
</tr>
</tbody>
</table>

The body of the section is a description of the operation of the instruction in text, tables, and figures. This description complements the high-level language description in the *Operation* section.

This section uses acronyms for register descriptions. “GPR rt” is CPU general-purpose register specified by the instruction field `rt`. “FPR fs” is the floating point operand register specified by the instruction field `fs`. “CP1 register fd” is the coprocessor 1 general register specified by the instruction field `fd`. “FCSR” is the floating point *Control / Status* register.

### 2.1.6 Restrictions Field

The *Restrictions* field documents any possible restrictions that may affect the instruction. Most restrictions fall into one of the following six categories:

- Valid values for instruction fields (for example, see floating point ADD.fmt)
2.1 Understanding the Instruction Fields

- ALIGNMENT requirements for memory addresses (for example, see LW)
- Valid values of operands (for example, see ALNV.PS)
- Valid operand formats (for example, see floating point ADD.fmt)
- Order of instructions necessary to guarantee correct execution. These ordering constraints avoid pipeline hazards for which some processors do not have hardware interlocks (for example, see MUL).
- Valid memory access types (for example, see LL/SC)

**Figure 2.7 Example of Instruction Restrictions**

**Restrictions:**

None

2.1.7 Availability and Compatibility Fields

The Availability and Compatibility sections are not provided for all instructions. These sections list considerations relevant to whether and how an implementation may implement some instructions, when software may use such instructions, and how software can determine if an instruction or feature is present. Such considerations include:

- Some instructions are not present on all architecture releases. Sometimes the implementation is required to signal a Reserved Instruction exception, but sometimes executing such an instruction encoding is architecturally defined to give UNPREDICTABLE results.

- Some instructions are available for implementations of a particular architecture release, but may be provided only if an optional feature is implemented. Control register bits typically allow software to determine if the feature is present.

- Some instructions may not behave the same way on all implementations. Typically this involves behavior that was UNPREDICTABLE in some implementations, but which is made architectural and guaranteed consistent so that software can rely on it in subsequent architecture releases.

- Some instructions are prohibited for certain architecture releases and/or optional feature combinations.

- Some instructions may be removed for certain architecture releases. Implementations may then be required to signal a Reserved Instruction exception for the removed instruction encoding; but sometimes the instruction encoding is reused for other instructions.

All of these considerations may apply to the same instruction. If such considerations applicable to an instruction are simple, the architecture level in which an instruction was defined or redefined in the Format field, and/or the Restrictions section, may be sufficient; but if the set of such considerations applicable to an instruction is complicated, the Availability and Compatibility sections may be provided.
2.1.8 Operation Field

The *Operation* field describes the operation of the instruction as pseudocode in a high-level language notation resembling Pascal. This formal description complements the *Description* section; it is not complete in itself because many of the restrictions are either difficult to include in the pseudocode or are omitted for legibility.

**Figure 2.8 Example of Instruction Operation**

```
Operation:
  temp ← (GPR[rs]31 | GPR[rs]31..0) + (GPR[rt]31 | GPR[rt]31..0)
  if temp32 ≠ temp31 then
    SignalException(IntegerOverflow)
  else
    GPR[rd] ← temp
  endif
```

See 2.2 “Operation Section Notation and Functions” on page 16 for more information on the formal notation used here.

2.1.9 Exceptions Field

The *Exceptions* field lists the exceptions that can be caused by *Operation* of the instruction. It omits exceptions that can be caused by the instruction fetch, for instance, TLB Refill, and also omits exceptions that can be caused by asynchronous external events such as an Interrupt. Although a Bus Error exception may be caused by the operation of a load or store instruction, this section does not list Bus Error for load and store instructions because the relationship between load and store instructions and external error indications, like Bus Error, are dependent upon the implementation.

**Figure 2.9 Example of Instruction Exception**

```
Exceptions:
  Integer Overflow
```

An instruction may cause implementation-dependent exceptions that are not present in the *Exceptions* section.

2.1.10 Programming Notes and Implementation Notes Fields

The *Notes* sections contain material that is useful for programmers and implementors, respectively, but that is not necessary to describe the instruction and does not belong in the description sections.
2.2 Operation Section Notation and Functions

In an instruction description, the Operation section uses a high-level language notation to describe the operation performed by each instruction. Special symbols used in the pseudocode are described in the previous chapter. Specific pseudocode functions are described below.

This section presents information about the following topics:

• “Instruction Execution Ordering” on page 16
• “Pseudocode Functions” on page 16

2.2.1 Instruction Execution Ordering

Each of the high-level language statements in the Operations section are executed sequentially (except as constrained by conditional and loop constructs).

2.2.2 Pseudocode Functions

There are several functions used in the pseudocode descriptions. These are used either to make the pseudocode more readable, to abstract implementation-specific behavior, or both. These functions are defined in this section, and include the following:

• “Coprocessor General Register Access Functions” on page 16
• “Memory Operation Functions” on page 18
• “Floating Point Functions” on page 21
• “Miscellaneous Functions” on page 25

2.2.2.1 Coprocessor General Register Access Functions

Defined coprocessors, except for CP0, have instructions to exchange words and doublewords between coprocessor general registers and the rest of the system. What a coprocessor does with a word or doubleword supplied to it and how a coprocessor supplies a word or doubleword is defined by the coprocessor itself. This behavior is abstracted into the functions described in this section.

2.2.2.1.1 COP_LW

The COP_LW function defines the action taken by coprocessor z when supplied with a word from memory during a load word operation. The action is coprocessor-specific. The typical action would be to store the contents of memword in coprocessor general register rt.

Figure 2.11 COP_LW Pseudocode Function

COP_LW (z, rt, memword)
z: The coprocessor unit number
rt: Coprocessor general register specifier
memword: A 32-bit word value supplied to the coprocessor

/* Coprocessor-dependent action */
endfunction COP_LW

2.2.2.1.2 COP_LD

The COP_LD function defines the action taken by coprocessor z when supplied with a doubleword from memory during a load doubleword operation. The action is coprocessor-specific. The typical action would be to store the contents of memdouble in coprocessor general register rt.

Figure 2.12 COP_LD Pseudocode Function

COP_LD (z, rt, memdouble)
z: The coprocessor unit number
rt: Coprocessor general register specifier
memdouble: 64-bit doubleword value supplied to the coprocessor.

/* Coprocessor-dependent action */
endfunction COP_LD

2.2.2.1.3 COP_SW

The COP_SW function defines the action taken by coprocessor z to supply a word of data during a store word operation. The action is coprocessor-specific. The typical action would be to supply the contents of the low-order word in coprocessor general register rt.

Figure 2.13 COP_SW Pseudocode Function

dataword ← COP_SW (z, rt)
z: The coprocessor unit number
rt: Coprocessor general register specifier
dataword: 32-bit word value

/* Coprocessor-dependent action */
endfunction COP_SW

2.2.2.1.4 COP_SD

The COP_SD function defines the action taken by coprocessor z to supply a doubleword of data during a store doubleword operation. The action is coprocessor-specific. The typical action would be to supply the contents of the low-order doubleword in coprocessor general register rt.

Figure 2.14 COP_SD Pseudocode Function

datadouble ← COP_SD (z, rt)
z: The coprocessor unit number
rt: Coprocessor general register specifier
datadouble: 64-bit doubleword value

/* Coprocessor-dependent action */
2.2 Operation Section Notation and Functions

2.2.2.1.5 CoprocessorOperation

The CoprocessorOperation function performs the specified Coprocessor operation.

**Figure 2.15** CoprocessorOperation Pseudocode Function

CoprocessorOperation (z, cop_fun)

/* z: Coprocessor unit number */
/* cop_fun: Coprocessor function from function field of instruction */
/* Transmit the cop_fun value to coprocessor z */

endfunction CoprocessorOperation

2.2.2.2 Memory Operation Functions

Regardless of byte ordering (big- or little-endian), the address of a halfword, word, or doubleword is the smallest byte address of the bytes that form the object. For big-endian ordering this is the most-significant byte; for a little-endian ordering this is the least-significant byte.

In the Operation pseudocode for load and store operations, the following functions summarize the handling of virtual addresses and the access of physical memory. The size of the data item to be loaded or stored is passed in the Access-Length field. The valid constant names and values are shown in Table 2.1. The bytes within the addressed unit of memory (word for 32-bit processors or doubleword for 64-bit processors) that are used can be determined directly from the AccessLength and the two or three low-order bits of the address.

2.2.2.2.1 Misaligned Support

MIPS processors originally required all memory accesses to be naturally aligned. MSA (the MIPS SIMD Architecture) supported misaligned memory accesses for its 128 bit packed SIMD vector loads and stores, from its introduction in MIPS Release 5. Release 6 requires systems to provide support for misaligned memory accesses for all ordinary memory reference instructions: the system must provide a mechanism to complete a misaligned memory reference for this instruction, ranging from full execution in hardware to trap-and-emulate.

The pseudocode function MisalignedSupport encapsulates the version number check to determine if misalignment is supported for an ordinary memory access.

**Figure 2.16** MisalignedSupport Pseudocode Function

predicate ← MisalignedSupport ()

    return Config.AR ≥ 2 // Architecture Revision 2 corresponds to MIPS Release 6.

end function

See Appendix B, “Misaligned Memory Accesses” on page 511 for a more detailed discussion of misalignment, including pseudocode functions for the actual misaligned memory access.

2.2.2.2.2 AddressTranslation

The AddressTranslation function translates a virtual address to a physical address and its cacheability and coherency attribute, describing the mechanism used to resolve the memory reference.
Given the virtual address \textit{vAddr}, and whether the reference is to Instructions or Data \((\text{IorD})\), find the corresponding physical address \((p\text{Addr})\) and the cacheability and coherency attribute \((CCA)\) used to resolve the reference. If the virtual address is in one of the unmapped address spaces, the physical address and \(CCA\) are determined directly by the virtual address. If the virtual address is in one of the mapped address spaces then the TLB or fixed mapping MMU determines the physical address and access type; if the required translation is not present in the TLB or the desired access is not permitted, the function fails and an exception is taken.

**Figure 2.17 AddressTranslation Pseudocode Function**

\[
(p\text{Addr}, CCA) \leftarrow \text{AddressTranslation} \ (v\text{Addr}, \text{IorD}, \text{LorS})
\]

\[
/* p\text{Addr}: physical address */
/* CCA: Cacheability&Coherency Attribute, the method used to access caches*/
/* and memory and resolve the reference */

/* v\text{Addr}: virtual address */
/* \text{IorD}: Indicates whether access is for INSTRUCTION or DATA */
/* \text{LorS}: Indicates whether access is for LOAD or STORE */

/* See the address translation description for the appropriate MMU */
/* type in Volume III of this book for the exact translation mechanism */

endfunction AddressTranslation

### 2.2.2.2.3 LoadMemory

The LoadMemory function loads a value from memory.

This action uses cache and main memory as specified in both the Cacheability and Coherency Attribute \((CCA)\) and the access \((\text{IorD})\) to find the contents of \textit{AccessLength} memory bytes, starting at physical location \textit{pAddr}. The data is returned in a fixed-width naturally aligned memory element \((\text{MemElem})\). The low-order 2 (or 3) bits of the address and the \textit{AccessLength} indicate which of the bytes within \textit{MemElem} need to be passed to the processor. If the memory access type of the reference is \textit{uncached}, only the referenced bytes are read from memory and marked as valid within the memory element. If the access type is \textit{cached} but the data is not present in cache, an implementation-specific size and alignment block of memory is read and loaded into the cache to satisfy a load reference. At a minimum, this block is the entire memory element.

**Figure 2.18 LoadMemory Pseudocode Function**

\[
\text{MemElem} \leftarrow \text{LoadMemory} \ (CCA, \text{AccessLength}, \text{pAddr}, \text{vAddr}, \text{IorD})
\]

\[
/* \text{MemElem}: Data is returned in a fixed width with a natural alignment. The */
/* width is the same size as the CPU general-purpose register, */
/* 32 or 64 bits, aligned on a 32- or 64-bit boundary, */
/* respectively. */
/* CCA: Cacheability&Coherency Attribute, the method used to access caches */
/* and memory and resolve the reference */

/* \text{AccessLength}: Length, in bytes, of access */
/* \text{pAddr}: physical address */
/* \text{vAddr}: virtual address */
/* \text{IorD}: Indicates whether access is for Instructions or Data */

endfunction LoadMemory
2.2.2.2.4 StoreMemory

The StoreMemory function stores a value to memory.

The specified data is stored into the physical location \( pAddr \) using the memory hierarchy (data caches and main memory) as specified by the Cacheability and Coherency Attribute (CCA). The \( MemElem \) contains the data for an aligned, fixed-width memory element (a word for 32-bit processors, a doubleword for 64-bit processors), though only the bytes that are actually stored to memory need be valid. The low-order two (or three) bits of \( pAddr \) and the \( AccessLength \) field indicate which of the bytes within the \( MemElem \) data should be stored; only these bytes in memory will actually be changed.

**Figure 2.19 StoreMemory Pseudocode Function**

```plaintext
StoreMemory (CCA, AccessLength, MemElem, pAddr, vAddr)

/* CCA: Cacheability&Coherency Attribute, the method used to access */
/* caches and memory and resolve the reference. */
/* AccessLength: Length, in bytes, of access */
/* MemElem: Data in the width and alignment of a memory element. */
/* The width is the same size as the CPU general */
/* purpose register, either 4 or 8 bytes, */
/* aligned on a 4- or 8-byte boundary. For a */
/* partial-memory-element store, only the bytes that will be*/
/* stored must be valid.*)
/* pAddr: physical address */
/* vAddr: virtual address */
endfunction StoreMemory
```

2.2.2.2.5 Prefetch

The Prefetch function prefetches data from memory.

Prefetch is an advisory instruction for which an implementation-specific action is taken. The action taken may increase performance but must not change the meaning of the program or alter architecturally visible state.

**Figure 2.20 Prefetch Pseudocode Function**

```plaintext
Prefetch (CCA, pAddr, vAddr, DATA, hint)

/* CCA: Cacheability&Coherency Attribute, the method used to access */
/* caches and memory and resolve the reference. */
/* pAddr: physical address */
/* vAddr: virtual address */
/* DATA: Indicates that access is for DATA */
/* hint: hint that indicates the possible use of the data */
endfunction Prefetch
```

Table 2.1 lists the data access lengths and their labels for loads and stores.

**Table 2.1 AccessLength Specifications for Loads/Stores**

<table>
<thead>
<tr>
<th>AccessLength Name</th>
<th>Value</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>DOUBLEWORD</td>
<td>7</td>
<td>8 bytes (64 bits)</td>
</tr>
</tbody>
</table>
2.2.2.2.6 SyncOperation

The SyncOperation function orders loads and stores to synchronize shared memory.

This action makes the effects of the synchronizable loads and stores indicated by stype occur in the same order for all processors.

**Figure 2.21 SyncOperation Pseudocode Function**

```
SyncOperation(stype)

/* stype: Type of load/store ordering to perform. */
/* Perform implementation-dependent operation to complete the */
/* required synchronization operation */
endfunction SyncOperation
```

2.2.2.3 Floating Point Functions

The pseudocode shown below specifies how the unformatted contents loaded or moved to CP1 registers are interpreted to form a formatted value. If an FPR contains a value in some format, rather than unformatted contents from a load (uninterpreted), it is valid to interpret the value in that format (but not to interpret it in a different format).

2.2.2.3.1 ValueFPR

The ValueFPR function returns a formatted value from the floating point registers.

**Figure 2.22 ValueFPR Pseudocode Function**

```
value ← ValueFPR(fpr, fmt)

/* value: The formatted value from the FPR */
/* fpr: The FPR number */
/* fmt: The format of the data, one of: */
/* S, D, W, L, PS, */
/* OB, QH, */
/* UNINTERPRETED_WORD, */
/* UNINTERPRETED_DOUBLEWORD */
/* The UNINTERPRETED values are used to indicate that the datatype */
/* is not known as, for example, in SWC1 and SDC1 */
```
case fmt of
    S, W, UNINTERPRETED_WORD:
        valueFPR ← FPR[fpr]
    D, UNINTERPRETED_DOUBLEWORD:
        if (FP32RegistersMode = 0)
            if (fpr0 ≠ 0) then
                valueFPR ← UNPREDICTABLE
            else
                valueFPR ← FPR[fpr+1]31..0 || FPR[fpr]31..0
            endif
        else
            valueFPR ← FPR[fpr]
        endif
    L:
        if (FP32RegistersMode = 0) then
            valueFPR ← UNPREDICTABLE
        else
            valueFPR ← FPR[fpr]
        endif
    DEFAULT:
        valueFPR ← UNPREDICTABLE
endcase
endfunction ValueFPR

The pseudocode shown below specifies the way a binary encoding representing a formatted value is stored into CP1 registers by a computational or move operation. This binary representation is visible to store or move-from instructions. Once an FPR receives a value from the StoreFPR(), it is not valid to interpret the value with ValueFPR() in a different format.

### 2.2.2.3.2 StoreFPR

![Figure 2.23 StoreFPR Pseudocode Function](image)

StoreFPR (fpr, fmt, value)

/* fpr:  The FPR number */
/* fmt:  The format of the data, one of: */
/*       S, D, W, L, PS, */
/*       OB, QH, */
/*       UNINTERPRETED_WORD, */
/*       UNINTERPRETED_DOUBLEWORD */
/* value: The formatted value to be stored into the FPR */

/* The UNINTERPRETED values are used to indicate that the datatype */
/* is not known as, for example, in LWC1 and LDC1 */

case fmt of
    S, W, UNINTERPRETED_WORD:
        FPR[fpr] ← value
    D, UNINTERPRETED_DOUBLEWORD:
if (FP32RegistersMode == 0)
  if (fpr0 != 0) then
    UNPREDICTABLE
  else
    FPR[fpr] ← UNPREDICTABLE32 || value31..0
    FPR[fpr+1] ← UNPREDICTABLE32 || value63..32
  endif
else
  FPR[fpr] ← value
endif

L:
if (FP32RegistersMode == 0) then
  UNPREDICTABLE
else
  FPR[fpr] ← value
endif
endcase
endfunction StoreFPR

2.2.2.3.3 CheckFPException

The pseudocode shown below checks for an enabled floating point exception and conditionally signals the exception.

Figure 2.24 CheckFPException Pseudocode Function

CheckFPException()

/* A floating point exception is signaled if the E bit of the Cause field is a 1 */
/* (Unimplemented Operations have no enable) or if any bit in the Cause field */
/* and the corresponding bit in the Enable field are both 1 */

if ( (FCSR_17 = 1) or
    ((FCSR_16..12 and FCSR_11..7) != 0) ) then
  SignalException(FloatingPointException)
endif
endfunction CheckFPException

2.2.2.3.4 FPConditionCode

The FPConditionCode function returns the value of a specific floating point condition code.

Figure 2.25 FPConditionCode Pseudocode Function

tf ← FPConditionCode(cc)

/* tf: The value of the specified condition code */
/* cc: The Condition code number in the range 0..7 */

if cc = 0 then
  FPConditionCode ← FCSR_23
else
  FPConditionCode ← FCSR_24+cc

2.2.2 Operation Section Notation and Functions

2.2.2.3.5 SetFPConditionCode

The SetFPConditionCode function writes a new value to a specific floating point condition code.

**Figure 2.26 SetFPConditionCode Pseudocode Function**

```
if cc = 0 then
    FCSR ← FCSR31..24 || tf || FCSR22..0
else
    FCSR ← FCSR31..25+cc || tf || FCSR23+cc..0
endif
endfunction SetFPConditionCode
```

2.2.2.4 Pseudocode Functions Related to Sign and Zero Extension

2.2.2.4.1 Sign extension and zero extension in pseudocode

Much pseudocode uses a generic function `sign_extend` without specifying from what bit position the extension is done, when the intention is obvious. E.g. `sign_extend(immediate16)` or `sign_extend(disp9)`.

However, sometimes it is necessary to specify the bit position. For example, `sign_extend(temp31..0)` or the more complicated `(offset15)GPRLEN-(16+2) || offset || 0^2`.

The explicit notation `sign_extend.nbits(val)` or `sign_extend(val,nbits)` is suggested as a simplification. They say to sign extend as if an nbits-sized signed integer. The width to be sign extended to is usually apparent by context, and is usually GPRLEN, 32 or 64 bits. The previous examples then become.

```
signExtend(temp31..0)
= signExtend.32(temp)
```

and

```
(offset15)GPRLEN-(16+2) || offset || 0^2
= signExtend.16(offset)<<2
```

Note that `sign_extend.n(value)` extends from bit position N-1, if the bits are numbered 0..N-1 as is typical.

The explicit notations `sign_extend.nbits(val)` or `sign_extend(val,nbits)` is used as a simplification. These notations say to sign extend as if an nbits-sized signed integer. The width to be sign extended to is usually apparent by context, and is usually GPRLEN, 32 or 64 bits.

**Figure 2.27 sign_extend Pseudocode Functions**

```
sign_extend.nbits(val) = sign_extend(val,nbits) /* syntactic equivalents */
function sign_extend(val,nbits)
    return (val{nbits-1})GPRLEN-nbits || val{nbits-1..0}
end function
```

The earlier examples can be expressed as

```
(offset15)GPRLEN-(16+2) || offset || 0^2
```
= sign_extend.16(offset) << 2)

and

sign_extend(temp31..0) = sign_extend.32(temp)

Similarly for zero_extension, although zero extension is less common than sign extension in the MIPS ISA.

Floating point may use notations such as zero_extension.fmt corresponding to the format of the FPU instruction. E.g. zero_extension.S and zero_extension.D are equivalent to zero_extension.32 and zero_extension.64.

Existing pseudocode may use any of these, or other, notations.

2.2.2.4.2 memory_address

The pseudocode function memory_address performs mode-dependent address space wrapping for compatibility between MIPS32 and MIPS64. It is applied to all memory references. It may be specified explicitly in some places, particularly for new memory reference instructions, but it is also declared to apply implicitly to all memory references as defined below. In addition, certain instructions that are used to calculate effective memory addresses but which are not themselves memory accesses specify memory_address explicitly in their pseudocode.

Figure 2.28 memory_address Pseudocode Function

function memory_address(ea)
    return ea
end function

On a 32-bit CPU, memory_address returns its 32-bit effective address argument unaffected.

In addition to the use of memory_address for all memory references (including load and store instructions, LL/SC), Release 6 extends this behavior to control transfers (branch and call instructions), and to the PC-relative address calculation instructions (ADDIUPC, AUIPC, ALUIPC). In newer instructions the function is explicit in the pseudocode.

Implicit address space wrapping for all instruction fetches is described by the following pseudocode fragment which should be considered part of instruction fetch:

Figure 2.29 Instruction Fetch Implicit memory_address Wrapping

PC ← memory_address(PC)
( instruction_data, length ) ← instruction_fetch(PC)
/* decode and execute instruction */

Implicit address space wrapping for all data memory accesses is described by the following pseudocode, which is inserted at the top of the AddressTranslation pseudocode function:

Figure 2.30 AddressTranslation implicit memory_address Wrapping

(pAddr, CCA) ← AddressTranslation(vAddr, IorD, LorS)
vAddr ← memory_address(vAddr)

In addition to its use in instruction pseudocode,

2.2.2.5 Miscellaneous Functions

This section lists miscellaneous functions not covered in previous sections.
2.2 Operation Section Notation and Functions

2.2.2.5.1 SignalException

The SignalException function signals an exception condition.

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

**Figure 2.31 SignalException Pseudocode Function**

```plaintext
SignalException(Exception, argument)
/* Exception: The exception condition that exists. */
/* argument: A exception-dependent argument, if any */
endfunction SignalException
```

2.2.2.5.2 SignalDebugBreakpointException

The SignalDebugBreakpointException function signals a condition that causes entry into Debug Mode from non-Debug Mode.

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

**Figure 2.32 SignalDebugBreakpointException Pseudocode Function**

```plaintext
SignalDebugBreakpointException()
endfunction SignalDebugBreakpointException
```

2.2.2.5.3 SignalDebugModeBreakpointException

The SignalDebugModeBreakpointException function signals a condition that causes entry into Debug Mode from Debug Mode (i.e., an exception generated while already running in Debug Mode).

This action results in an exception that aborts the instruction. The instruction operation pseudocode never sees a return from this function call.

**Figure 2.33 SignalDebugModeBreakpointException Pseudocode Function**

```plaintext
SignalDebugModeBreakpointException()
endfunction SignalDebugModeBreakpointException
```

2.2.2.5.4 NullifyCurrentInstruction

The NullifyCurrentInstruction function nullifies the current instruction.

The instruction is aborted, inhibiting not only the functional effect of the instruction, but also inhibiting all exceptions detected during fetch, decode, or execution of the instruction in question. For branch-likely instructions, nullification kills the instruction in the delay slot of the branch likely instruction.

**Figure 2.34 NullifyCurrentInstruction PseudoCode Function**

```plaintext
NullifyCurrentInstruction()
```
2.2.2.5.5 PolyMult

The PolyMult function multiplies two binary polynomial coefficients.

Figure 2.35 PolyMult Pseudocode Function

```
PolyMult(x, y)
    temp ← 0
    for i in 0 .. 31
        if x_i = 1 then
            temp ← temp xor (y(31-i)..0 || 0^i)
        endif
    endfor
    PolyMult ← temp
endfunction PolyMult
```

2.3 Op and Function Subfield Notation

In some instructions, the instruction subfields `op` and `function` can have constant 5- or 6-bit values. When reference is made to these instructions, uppercase mnemonics are used. For instance, in the floating point ADD instruction, `op=COP1` and `function=ADD`. In other cases, a single field has both fixed and variable subfields, so the name contains both upper- and lowercase characters.

2.4 FPU Instructions

In the detailed description of each FPU instruction, all variable subfields in an instruction format (such as `fs`, `ft`, `immediate`, and so on) are shown in lowercase. The instruction name (such as `ADD`, `SUB`, and so on) is shown in uppercase.

For the sake of clarity, an alias is sometimes used for a variable subfield in the formats of specific instructions. For example, `rs=base` in the format for load and store instructions. Such an alias is always lowercase since it refers to a variable subfield.

Bit encodings for mnemonics are given in Volume I, in the chapters describing the CPU, FPU, MDMX, and MIPS16e instructions.

See “Op and Function Subfield Notation” on page 27 for a description of the `op` and `function` subfields.
The MIPS32® Instruction Set

3.1 Compliance and Subsetting

To be compliant with the MIPS32 Architecture, designs must implement a set of required features, as described in this document set. To allow implementation flexibility, the MIPS32 Architecture provides subsetting rules. An implementation that follows these rules is compliant with the MIPS32 Architecture as long as it adheres strictly to the rules, and fully implements the remaining instructions. Supersetting of the MIPS32 Architecture is only allowed by adding functions to the SPECIAL2, COP2, or both major opcodes, by adding control for co-processors via the COP2, LWC2, SWC2, LDC2, and/or SDC2, or via the addition of approved Application Specific Extensions.

Release 6 removes all instructions under the SPECIAL2 major opcode, either by removing them or moving them to the COP2 major opcode. All coprocessor 2 support instructions (for example, LWC2) have been moved to the COP2 major opcode. Supersetting of the Release 6 architecture is only allowed in the COP2 major opcode, or via the addition of approved Application Specific Extensions. SPECIAL2 is reserved for MIPS.

Note: The use of COP3 as a customizable coprocessor has been removed in the Release 2 of the MIPS32 architecture. The COP3 is reserved for the future extension of the architecture. Implementations using Release 1 of the MIPS32 architecture are strongly discouraged from using the COP3 opcode for a user-available coprocessor as doing so will limit the potential for an upgrade path to a 64-bit floating point unit.

The instruction set subsetting rules are described in the subsections below, and also the following rule:

- **Co-dependence of Architecture Features:** MIPSr5™ (also called Release 5) and subsequent releases (such as Release 6) include a number of features. Some are optional; some are required. Features provided by a release, such as MIPSr5 or later, whether optional or required, must be consistent. If any feature that is introduced by a particular release is implemented (such as a feature described as part of Release 5 and not any earlier release) then all other features must be implemented in a manner consistent with that release. For example: the VZ and MSA features are introduced by Release 5 but are optional. The FR=1 64-bit FPU register model was optional when introduced earlier, but is now required by Release 5 if any FPU is implemented. If any or all of VZ or MSA are implemented, then Release 5 is implied, and then if an FPU is implemented, it must implement the FR=1 64-bit FPU register model.

3.1.1 Subsetting of Non-Privileged Architecture

- All non-privileged (do not need access to Coprocessor 0) CPU (non-FPU) instructions must be implemented — no subsetting of these are allowed — per the MIPS Instruction Set Architecture release supported.

- If any instruction is subsetted out based on the rules below, an attempt to execute that instruction must cause the appropriate exception (typically Reserved Instruction or Coprocessor Unusable).

- The FPU and related support instructions, such as CPU conditional branches on FPU conditions (pre-Release 6 BC1T/BC1F, Release 6 BC1NEQZ) and CPU conditional moves on FPU conditions (pre-Release 6 MOV/MOVF), may be omitted. Software may determine if an FPU is implemented by checking the state of the FP bit in the Config1 CP0 register. Software may determine which FPU data types are implemented by checking the
appropriate bits in the **FIR** CP1 register. The following allowable FPU subsets are compliant with the MIPS32 architecture:

- **No FPU**
  
  Config1.FP=0

- **FPU with S, and W formats and all supporting instructions.**

  This 32-bit subset is permitted by Release 6, but prohibited by pre-Release 6 releases.

  Config1.FP=1, Status.FR=0, FIR.S=FIR.L=1, FIR.D=FIR.L=FIR.PS=0.

- **FPU with S, D, W, and L formats and all supporting instructions**

  Config1.FP=1, Status.FR=(see below), FIR.S=FIR.L=FIR.D=FIR.L=1, FIR.PS=0.

  pre-MIPSr5 permits this 64-bit configuration, and allows both FPU register modes. Status.FR=0 support is required but Status.FR=1 support is optional.

  MIPSr5 permits this 64-bit configuration, and requires both FPU register modes, i.e. both Status.FR=0 and Status.FR=1 support are required.

  Release 6 permits this 64-bit configuration, but requires Status.FR=1 and FIR.F64=1. Release 6 prohibits Status.FR=0 if FIR.D=1 or FIR.L=1.

- **FPU with S, D, PS, W, and L formats and all supporting instructions**

  Config1.FP=1, Status.FR=0/1, FIR.S=FIR.L=FIR.D=FIR.L=FIR.PS=1.

  Release 6 prohibits this mode, and any mode with FIR.PS=1 paired single support.

- In Release 5 of the Architecture, if floating point is implemented then FR=1 is required. I.e. the 64-bit FPU, with the FR=1 64-bit FPU register model, is required. The FR=0 32-bit FPU register model continues to be required.

- Coprocessor 2 is optional and may be omitted. Software may determine if Coprocessor 2 is implemented by checking the state of the C2 bit in the **Config1** CP0 register. If Coprocessor 2 is implemented, the Coprocessor 2 interface instructions (BC2, CFC2, COP2, CTC2, LDC2, LWC2, MFC2, MTC2, SDC2, and SWC2) may be omitted on an instruction-by-instruction basis.

- The caches are optional. The **Config1**\_DL and **Config1**\_IL fields denote whether the first level caches are present or not.

- Instruction, CP0 Register, and CP1 Control Register fields that are marked “Reserved” or shown as “0” in the description of that field are reserved for future use by the architecture and are not available to implementations. Implementations may only use those fields that are explicitly reserved for implementation dependent use.

- Supported Modules/ASEs are optional and may be subsetted out. In most cases, software may determine if a supported Module/ASE is implemented by checking the appropriate bit in the **Config1** or **Config3** or **Config4** CP0 register. If they are implemented, they must implement the entire ISA applicable to the component, or implement subsets that are approved by the Module/ASE specifications.
• EJTAG is optional and may be subsetted out. If it is implemented, it must implement only those subsets that are approved by the EJTAG specification. If EJTAG is not implemented, the EJTAG instructions (SDBBP and DERET) can be subsetted out.

• In MIPS Release 3, there are two architecture branches (MIPS32/64 and microMIPS32/64). A single device is allowed to implement both architecture branches. The Privileged Resource Architecture (COP0) registers do not mode-switch in width (32-bit vs. 64-bit). For this reason, if a device implements both architecture branches, the address/data widths must be consistent. If a device implements MIPS64 and also implements microMIPS, it must implement microMIPS64 not just microMIPS32. Similarly, if a device implements microMIPS64 and also implements MIPS32/64, it must implement MIPS64 not just MIPS32.

• Prior to Release 6, the JALX instruction is required if and only if ISA mode-switching is possible. If both of the architecture branches are implemented (MIPS32/64 and microMIPS32/64) or if MIPS16e is implemented then the JALX instructions are required. If only one branch of the architecture family and MIPS16e is not implemented then the JALX instruction is not implemented. The JALX instruction was removed in Release 6.

3.2 Alphabetical List of Instructions

The following pages present detailed descriptions of instructions, arranged alphabetical order of opcode mnemonic (except where several similar instructions are described together.)
ABS.fmt

Floating Point Absolute Value

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>ABS</th>
</tr>
</thead>
<tbody>
<tr>
<td>100001</td>
<td></td>
<td>00000</td>
<td></td>
<td></td>
<td>000101</td>
</tr>
</tbody>
</table>

Format:  
ABS.fmt  
ABS.S fd, fs  
ABS.D fd, fs  
ABS.PS fd, fs  
MIPS32, MIPS32 Release 2, removed in Release 6

Purpose:  Floating Point Absolute Value

Description:  
FPR[fd] ← abs(FPR[fs])

The absolute value of the value in FPR fs is placed in FPR fd. The operand and result are values in format fmt.  
ABS.PS takes the absolute value of the two values in FPR fs independently, and ORs together any generated exceptions.

The Cause bits are ORed into the Flag bits if no exception is taken.

If \( F_{IR_{Has2008}} = 0 \) or \( F_{CSR_{ABS2008}} = 0 \) then this operation is arithmetic. For this case, any NaN operand signals invalid operation.

If \( F_{CSR_{ABS2008}} = 1 \) then this operation is non-arithmetic. For this case, both regular floating point numbers and NaN values are treated alike, only the sign bit is affected by this instruction. No IEEE exception can be generated for this case, and the FCSR\(_{Cause}\) and FCSR\(_{Flags}\) fields are not modified.

Restrictions:

The fields \( fs \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of ABS.PS is UNPREDICTABLE if the processor is executing in the \( FR=0 \) 32-bit FPU register model. ABS.PS is predictable if executing on a 64-bit FPU in the \( FR=1 \) mode, but not with \( FR=0 \), and not on a 32-bit FPU.

Availability and Compatibility:

ABS.PS has been removed in Release 6.

Operation:

\[
\text{StoreFPR}(fd, fmt, \text{AbsoluteValue}(\text{ValueFPR}(fs, fmt)))
\]

Exceptions:

Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:

Unimplemented Operation, Invalid Operation
ADD Add Word

### Format:
ADD rd, rs, rt

### Purpose:
Add Word

To add 32-bit integers. If an overflow occurs, then trap.

### Description:
GPR[rd] ← GPR[rs] + GPR[rt]

The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit result.

- If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs.
- If the addition does not overflow, the 32-bit result is placed into GPR rd.

### Restrictions:
None

### Operation:

\[
\text{temp} \leftarrow (\text{GPR}[\text{rs}]_{31}||\text{GPR}[\text{rs}]_{31..0}) + (\text{GPR}[\text{rt}]_{31}||\text{GPR}[\text{rt}]_{31..0})
\]

\[
\text{if} \quad \text{temp}_{32} \neq \text{temp}_{31} \quad \text{then}
\]

\[
\text{SignalException(IntegerOverflow)}
\]

\[
\text{else}
\]

\[
\text{GPR}[\text{rd}] \leftarrow \text{temp}
\]

\[
\text{endif}
\]

### Exceptions:
Integer Overflow

### Programming Notes:
ADDU performs the same arithmetic operation but does not trap on overflow.
ADD.fmt

Floating Point Add

Format:

ADD.fmt
ADD.S fd, fs, ft  
ADD.D fd, fs, ft  
ADD.PS fd, fs, ft  

MIPS32
MIPS64,MIPS32 Release 2, removed in Release 6

Purpose: Floating Point Add

To add floating point values.

Description:

\[ \text{FPR}[fd] \leftarrow \text{FPR}[fs] + \text{FPR}[ft] \]

The value in FPR \(ft\) is added to the value in FPR \(fs\). The result is calculated to infinite precision, rounded by using to the current rounding mode in \(FCSR\), and placed into FPR \(fd\). The operands and result are values in format \(fmt\).

ADD.PS adds the upper and lower halves of FPR \(fs\) and FPR \(ft\) independently, and ORs together any generated exceptions.

The \textit{Cause} bits are ORed into the \textit{Flag} bits if no exception is taken.

Restrictions:

The fields \(fs, ft,\) and \(fd\) must specify FPRs valid for operands of type \(fmt\). If the fields are not valid, the result is \text{UNPREDICTABLE}.

The operands must be values in format \(fmt\). If the fields are not, the result is \text{UNPREDICTABLE} and the value of the operand FPRs becomes \text{UNPREDICTABLE}.

The result of ADD.PS is \text{UNPREDICTABLE} if the processor is executing in the \(FR=0\) 32-bit FPU register model. ADD.PS is predictable if executing on a 64-bit FPU in the \(FR=1\) mode, but not with \(FR=0\), and not on a 32-bit FPU.

Availability and Compatibility:

ADD.PS has been removed in Release 6.

Operation:

\[
\text{StoreFPR (fd, fmt, ValueFPR(fs, fmt) +fmt ValueFPR(ft, fmt))}
\]

Exceptions:

Coprocessor Usable, Reserved Instruction

Floating Point Exceptions:

Unimplemented Operation, Invalid Operation, Inexact, Overflow, Underflow
ADDI  Add Immediate Word

<table>
<thead>
<tr>
<th>Format:</th>
<th>ADDI rt, rs, immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Purpose:</td>
<td>Add Immediate Word</td>
</tr>
<tr>
<td>To add a constant to a 32-bit integer. If overflow occurs, then trap.</td>
<td></td>
</tr>
<tr>
<td>Description:</td>
<td>GPR[rt] ← GPR[rs] + immediate</td>
</tr>
<tr>
<td>The 16-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result.</td>
<td></td>
</tr>
<tr>
<td>• If the addition results in 32-bit 2’s complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs.</td>
<td></td>
</tr>
<tr>
<td>• If the addition does not overflow, the 32-bit result is placed into GPR rt.</td>
<td></td>
</tr>
<tr>
<td>Restrictions:</td>
<td></td>
</tr>
<tr>
<td>Availability and Compatibility:</td>
<td></td>
</tr>
<tr>
<td>This instruction has been removed in Release 6. The encoding has been reused for other instructions introduced by Release 6.</td>
<td></td>
</tr>
<tr>
<td>Operation:</td>
<td></td>
</tr>
<tr>
<td>temp ← (GPR[rs]31</td>
<td></td>
</tr>
<tr>
<td>if temp32 ≠ temp31 then</td>
<td></td>
</tr>
<tr>
<td>SignalException(IntegerOverflow)</td>
<td></td>
</tr>
<tr>
<td>else</td>
<td></td>
</tr>
<tr>
<td>GPR[rt] ← temp</td>
<td></td>
</tr>
<tr>
<td>endif</td>
<td></td>
</tr>
<tr>
<td>Exceptions:</td>
<td></td>
</tr>
<tr>
<td>Integer Overflow</td>
<td></td>
</tr>
<tr>
<td>Programming Notes:</td>
<td></td>
</tr>
<tr>
<td>ADDIU performs the same arithmetic operation but does not trap on overflow.</td>
<td></td>
</tr>
</tbody>
</table>
### ADDIU

Add Immediate Unsigned Word

#### Format:
```
ADDIU rt, rs, immediate
```

#### Purpose:
To add a constant to a 32-bit integer.

#### Description:
```
GPR[rt] ← GPR[rs] + immediate
```

The 16-bit signed immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rt.

No Integer Overflow exception occurs under any circumstances.

#### Restrictions:
None

#### Operation:
```
temp ← GPR[rs] + sign_extend(immediate)
GPR[rt] ← temp
```

#### Exceptions:
None

#### Programming Notes:
The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. This instruction is appropriate for unsigned arithmetic, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
ADDIUPC  Add Immediate to PC (unsigned - non-trapping)

Format: ADDIUPC rs, immediate

Purpose: Add Immediate to PC (unsigned - non-trapping)

Description: GPR[rs] ← ( PC + sign_extend( immediate << 2 ) )

This instruction performs a PC-relative address calculation. The 19-bit immediate is shifted left by 2 bits, sign-extended, and added to the address of the ADDIUPC instruction. The result is placed in GPR rs.

Restrictions:
None

Availability and Compatibility:
This instruction is introduced by and required as of Release 6.

Operation:
GPR[rs] ← ( PC + sign_extend( immediate << 2 ) )

Exceptions:
None

Programming Notes:
The term “unsigned” in this instruction mnemonic is a misnomer. “Unsigned” here means “non-trapping”. It does not trap on a signed 32-bit overflow. ADDIUPC corresponds to unsigned ADDIU, which does not trap on overflow, as opposed to ADDI, which does trap on overflow.

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
ADDU Add Unsigned Word

**Format:** ADDU rd, rs, rt

**Purpose:** Add Unsigned Word
To add 32-bit integers.

**Description:**
GPR[rd] ← GPR[rs] + GPR[rt]
The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rd.
No Integer Overflow exception occurs under any circumstances.

**Restrictions:**
None

**Operation:**

\[
\text{temp} \leftarrow \text{GPR}[rs] + \text{GPR}[rt] \\
\text{GPR}[rd] \leftarrow \text{temp}
\]

**Exceptions:**
None

**Programming Notes:**
The term “unsigned” in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. This instruction is appropriate for unsigned arithmetic, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
ALIGN Concatenate two GPRs, and extract a contiguous subset at a byte position

**Format:**

ALIGN rd, rs, rt, bp  
MIPS32 Release 6

**Purpose:** Concatenate two GPRs, and extract a contiguous subset at a byte position

**Description:**

GPR[rd] <- (GPR[rt] << (8*bp)) or (GPR[rs] >> (GPRLEN-8*bp))

The input registers GPR rt and GPR rs are concatenated, and a register width contiguous subset is extracted, which is specified by the byte pointer bp.

The ALIGN instruction operates on 32-bit words, and has a 2-bit byte position field bp.

- The 32-bit word in GPR rt is left shifted as a 32-bit value by bp byte positions. The 32-bit word in register rs is right shifted as a 32-bit value by (4-bp) byte positions. These shifts are logical shifts, zero-filling. The shifted values are then or-ed together to create a 32-bit result that is written to destination GPR rd.

**Restrictions:**

Executing ALIGN with shift count bp=0 acts like a register to register move operation, and is redundant, and therefore discouraged. Software should not generate ALIGN with shift count bp=0.

**Availability and Compatibility:**

The ALIGN instruction is introduced by and required as of Release 6.

**Programming Notes:**

Release 6 ALIGN instruction corresponds to the pre-Release 6 DSP Module BALIGN instruction, except that BALIGN with shift counts of 0 and 2 are specified as being UNPREDICTABLE, whereas ALIGN defines all bp values, discouraging only bp=0.

Graphically,

![Figure 3.1 ALIGN operation (32-bit)](image)

**Operation:**

tmp_rt_hi <- unsigned_word(GPR[rt]) << (8*bp)  
tmp_rs_lo <- unsigned_word(GPR[rs]) >> (8*(4-bp))  
tmp <- tmp_rt_hi or tmp_rs_lo  
GPR[rd] <- tmp  
/* end of instruction */
**ALIGN**

Concatenate two GPRs, and extract a contiguous subset at a byte position

Exceptions:

None
ALNV.PS Floating Point Align Variable

Format:  ALNV.PS fd, fs, ft, rs

Purpose: Floating Point Align Variable
To align a misaligned pair of paired single values.

Description:
FPR[fd] ← ByteAlign(GPR[rs2..0], FPR[fs], FPR[ft])
FPR fs is concatenated with FPR ft and this value is funnel-shifted by GPR rs2..0 bytes, and written into FPR fd. If GPR rs2..0 is 0, FPR fd receives FPR fs. If GPR rs2..0 is 4, the operation depends on the current endianness.

Figure 3.1 illustrates the following example: for a big-endian operation and a byte alignment of 4, the upper half of FPR fd receives the lower half of the paired single value in fs, and the lower half of FPR fd receives the upper half of the paired single value in FPR ft.

Figure 3.2 Example of an ALNV.PS Operation

The move is non arithmetic; it causes no IEEE 754 exceptions, and the FCSRCause and FCSRFlags fields are not modified.

Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If the fields are not valid, the result is UNPREDICTABLE.
If GPR rs2..0 are non-zero, the results are UNPREDICTABLE.

The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. The instruction is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
if GPR[rs]2..0 = 0 then
ALNV.PS

Floating Point Align Variable

```
StoreFPR(fd, PS, ValueFPR(fs, PS))
else if GPR[rs]2..0 ≠ 4 then
  UNPREDICTABLE
else if BigEndianCPU then
  StoreFPR(fd, PS, ValueFPR(fs, PS)31..0 || ValueFPR(ft, PS)63..32)
else
  StoreFPR(fd, PS, ValueFPR(ft, PS)31..0 || ValueFPR(fs, PS)63..32)
endif
```

Exceptions:
Coprocessor Unusable, Reserved Instruction

Programming Notes:
ALNV.PS is designed to be used with LUXC1 to load 8 bytes of data from any 4-byte boundary. For example:

```c
/* Copy T2 bytes (a multiple of 16) of data T0 to T1, T0 unaligned, T1 aligned.
 * Reads one dw beyond the end of T0. */
LUXC1  F0, 0(T0)  /* set up by reading 1st src dw */
LI      T3, 0    /* index into src and dst arrays */
ADDIU   T4, T0, 8 /* base for odd dw loads */
ADDIU   T5, T1, -8/* base for odd dw stores */
LOOP:
LUXC1  F1, T3(T4)
ALNV.PS F2, F0, F1, T0/* switch F0, F1 for little-endian */
SDC1    F2, F3(T1)
ADDIU   T3, T3, 16
LUXC1  F0, T3(T0)
ALNV.PS F2, F1, F0, T0/* switch F1, F0 for little-endian */
BNE     T3, T2, LOOP
SDC1    F2, T3(T5)
DONE:
```

ALNV.PS is also useful with SUXC1 to store paired-single results in a vector loop to a possibly misaligned address:

```c
/* T1[i] = T0[i] + F8, T0 aligned, T1 unaligned. */
   CVT.PS.S F8, F8, F8/* make addend paired-single */
/* Loop header computes 1st pair into F0, stores high half if T1 */
/* misaligned */
LOOP:
LDC1    F2, T3(T4)/* get T0[i+2]/T0[i+3] */
ADD.PS  F1, F2, F8/* compute T1[i+2]/T1[i+3] */
ALNV.PS F3, F0, F1, T1/* align to dst memory */
SUXC1   F3, T3(T1)/* store to T1[i+0]/T1[i+1] */
ADDIU   T3, 16 /* i = i + 4 */
LDC1    F2, T3(T0)/* get T0[i+0]/T0[i+1] */
ADD.PS  F0, F2, F8/* compute T1[i+0]/T1[i+1] */
ALNV.PS F3, F1, F0, T1/* align to dst memory */
BNE     T3, T2, LOOP
SUXC1   F3, T3(T5)/* store to T1[i+2]/T1[i+3] */
/* Loop trailer stores all or half of F0, depending on T1 alignment */
```
**ALUIPC**

Aligned Add Upper Immediate to PC

<table>
<thead>
<tr>
<th>Format:</th>
<th>ALUIPC rs,immediate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Purpose:</td>
<td>Aligned Add Upper Immediate to PC</td>
</tr>
<tr>
<td>Description:</td>
<td>GPR[rs] ← -0x0FFFFFF &amp; ( PC + sign_extend( immediate &lt;&lt; 16 ) )</td>
</tr>
</tbody>
</table>

This instruction performs a PC-relative address calculation. The 16-bit immediate is shifted left by 16 bits, sign-extended, and added to the address of the ALUIPC instruction. The low 16 bits of the result are cleared, that is the result is aligned on a 64K boundary. The result is placed in GPR rs.

**Restrictions:**
None

**Availability and Compatibility:**
This instruction is introduced by and required as of Release 6.

**Operation:**
GPR[rs] ← -0x0FFFFFF & ( PC + sign_extend( immediate << 16 ) )

**Exceptions:**
None
**AND**

Format: `AND rd, rs, rt`

**Purpose:** and

To do a bitwise logical AND.

**Description:** `GPR[rd] ← GPR[rs] and GPR[rt]`

The contents of GPR `rs` are combined with the contents of GPR `rt` in a bitwise logical AND operation. The result is placed into GPR `rd`.

**Restrictions:**

None

**Operation:**

`GPR[rd] ← GPR[rs] and GPR[rt]`

**Exceptions:**

None
ANDI and immediate

**Format:**  
ANDI rt, rs, immediate

**MIPS32**

**Purpose:** and immediate  
To do a bitwise logical AND with a constant

**Description:**  
GPR[rt] ← GPR[rs] and zero_extend(immediate)  
The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical AND operation. The result is placed into GPR rt.

**Restrictions:**  
None

**Operation:**  
GPR[rt] ← GPR[rs] and zero_extend(immediate)

**Exceptions:**  
None
ANDI and immediate
AUI

Add Immediate to Upper Bits

Format: AUI rt, rs immediate

Purpose: Add Immediate to Upper Bits

Add Upper Immediate

Description:

\[
GPR[rt] \leftarrow GPR[rs] + \text{sign\_extend}(\text{immediate} \ll 16)
\]

The 16 bit immediate is shifted left 16 bits, sign-extended, and added to the register rs, storing the result in rt.

In Release 6, LUI is an assembly idiom for AUI with rs=0.

Restrictions:

Availability and Compatibility:

AUI is introduced by and required as of Release 6.

Operation:

\[
GPR[rt] \leftarrow GPR[rs] + \text{sign\_extend}(\text{immediate} \ll 16)
\]

Exceptions:

None.

Programming Notes:

AUI can be used to synthesize large constants in situations where it is not convenient to load a large constant from memory. To simplify hardware that may recognize sequences of instructions as generating large constants, AUI should be used in a stylized manner.

To create an integer:

- LUI rd, imm_low(rtmp)
- ORI rd, rd, imm_upper

To create a large offset for a memory access whose address is of the form rbase+large_offset:

- AUI rtmp, rbase, imm_upper
- LW rd, (rtmp)imm_low

To create a large constant operand for an instruction of the form rd:=rs+large_immediate or rd:=rs-large_immediate:

- AUI rtmp, rs, imm_upper
- ADDIU rd, rtmp, imm_low
**Format:** AUIPC rs, immediate

**Purpose:** Add Upper Immediate to PC

**Description:**

\[ GPR[rs] \leftarrow (PC + (\text{immediate} \ll 16)) \]

This instruction performs a PC-relative address calculation. The 16-bit immediate is shifted left by 16 bits, sign-extended, and added to the address of the AUIPC instruction. The result is placed in GPR \(rs\).

**Restrictions:**
None

**Availability and Compatibility:**
This instruction is introduced by and required as of Release 6.

**Operation:**

\[ GPR[rs] \leftarrow (PC + (\text{immediate} \ll 16)) \]

**Exceptions:**
None
**Unconditional Branch**

**Purpose:** Unconditional Branch

To do an unconditional branch.

**Description:** branch

B offset is the assembly idiom used to denote an unconditional branch. The actual instruction is interpreted by the hardware as BEQ r0, r0, offset.

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

**Restrictions:**

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
I & : \quad \text{target_offset} \leftarrow \text{sign extend}(\text{offset} \mid \mid 0^2) \\
I+1 & : \quad \text{PC} \leftarrow \text{PC} + \text{target_offset}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ± 128 Kbytes. Use jump (J) or jump register (JR) or the Release 6 branch compact (BC) instructions to branch to addresses outside this range.
 BAL IBranch and Link

The MIPS32® Instruction Set Manual, Revision 6.04 50
Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Format: BAL offset

Assembly Idiom MIPS32, MIPS32 Release 6

Purpose: Branch and Link
To do an unconditional PC-relative procedure call.

Description: procedure_call
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2-bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

Restrictions:
Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.
Pre-Release 6: Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.
Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Availability and Compatibility:
Pre-Release 6: BAL offset is the assembly idiom used to denote an unconditional branch. The actual instruction is interpreted by the hardware as BGEZAL r0, offset.
Release 6 keeps the BAL special case of BGEZAL, but removes all other instances of BGEZAL. BGEZAL with rs any register other than GPR [0] is required to signal a Reserved Instruction exception.

Operation:
I:  target_offset ← sign_extend(offset || 0^2)
    GPR[31] ← PC + 8
I+1: PC ← PC + target_offset

Exceptions:
None

Programming Notes:
BAL without a corresponding return should NOT be used to read the PC. Doing so is likely to cause a performance loss on processors with a return address predictor.

Pre-Release 6:

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>00000</td>
<td>BGEZAL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Release 6:

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>REGIMM</td>
<td>00000</td>
<td>BAL</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
With the 18-bit signed instruction offset, the conditional branch range is $\pm 128$ KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
Branch and Link, Compact

Format: \texttt{BALC} \texttt{offset}

Purpose: Branch and Link, Compact

To do an unconditional PC-relative procedure call.

Description: procedure\_call (no delay slot)

Place the return address link in GPR 31. The return link is the address of the instruction immediately following the branch, where execution continues after a procedure call. (Because compact branches have no delay slots, see below.)

A 28-bit signed offset (the 26-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address.

Compact branches do not have delay slots. The instruction after the branch is NOT executed when the branch is taken.

Restrictions:

This instruction is an unconditional, always taken, compact branch. It does not have a forbidden slot, that is, a Reserved Instruction exception is not caused by a Control Transfer Instruction placed in the slot following the branch.

Availability and Compatibility:

This instruction is introduced by and required as of Release 6.

Release 6 instruction \texttt{BALC} occupies the same encoding as pre-Release 6 instruction \texttt{SWC2}. The \texttt{SWC2} instruction has been moved to the COP2 major opcode in MIPS Release 6.

Exceptions:

None

Operation:

\[
\text{target\_offset} \leftarrow \text{sign\_extend}(\text{offset} \mathbin{||} 0^2)
\]

\[
\text{GPR[31]} \leftarrow \text{PC}+4
\]

\[
\text{PC} \leftarrow \text{PC}+4 + \text{sign\_extend}(\text{target\_offset})
\]
Branch, Compact

**Format:** \( BC \ offset \)

**Purpose:** Branch, Compact

**Description:** PC \( \leftarrow \) PC+4 + sign_extend( offset \ll 2 )

A 28-bit signed offset (the 26-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address.

Compact branches have no delay slot: the instruction after the branch is NOT executed when the branch is taken.

**Restrictions:**

This instruction is an unconditional, always taken, compact branch. It does not have a forbidden slot, that is, a Reserved Instruction exception is not caused by a Control Transfer Instruction placed in the slot following the branch.

**Availability and Compatibility:**

This instruction is introduced by and required as of Release 6.

Release 6 instruction BC occupies the same encoding as pre-Release 6 instruction \( LWC2 \). The \( LWC2 \) instruction has been moved to the COP2 major opcode in MIPS Release 6.

**Exceptions:**

None

**Operation:**

\[
target\_offset \leftarrow \text{sign}\_\text{extend}( offset \mid 0^2 )
\]

\[
PC \leftarrow ( PC+4 + \text{sign}\_\text{extend}(target\_offset) )
\]
Branch if Coprocessor 1 (FPU) Register Bit 0 Equal/Not Equal to Zero

**Format:**
- `BC1EQZ ft, offset`
- `BC1NEZ ft, offset`

**Purpose:**
- **BC1EQZ:** Branch if Coprocessor 1 (FPU) Register Bit 0 is Equal to Zero
- **BC1NEZ:** Branch if Coprocessor 1 (FPU) Register Bit 0 is Not Equal to Zero

**Description:**

- **BC1EQZ:** if FPR[ft] & 1 = 0 then branch
- **BC1NEZ:** if FPR[ft] & 1 ≠ 0 then branch

The condition is evaluated on FPU register `ft`.

- For **BC1EQZ**, the condition is true if and only if bit 0 of the FPU register `ft` is zero.
- For **BC1NEZ**, the condition is true if and only if bit 0 of the FPU register `ft` is non-zero.

If the condition is false, the branch is not taken, and execution continues with the next instruction.

A 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address. Execute the instruction in the delay slot before the instruction at the target.

**Restrictions:**

If access to Coprocessor 1 is not enabled, a Coprocessor Unusable Exception is signaled.

Because these instructions BC1EQZ and BC1NEZ do not depend on a particular floating point data type, they operate whenever Coprocessor 1 is enabled.

**Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.**

If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Availability and Compatibility:**

These instructions are introduced by and required as of Release 6.

**Exceptions:**

- Coprocessor Unusable

**Operation:**

1. In Release 6, BC1EQZ and BC1NEZ are required, if the FPU is implemented. They must not signal a Reserved Instruction exception. They can signal a Coprocessor Unusable Exception.
### BC1EQZ BC1NEZ

Branch if Coprocessor 1 (FPU) Register Bit 0 Equal/Not Equal to Zero

```
<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>tmp ← ValueFPR(ft, UNINTERPRETED_WORD)</td>
<td></td>
</tr>
<tr>
<td>BC1EQZ: cond ← tmp &amp; 1 = 0</td>
<td></td>
</tr>
<tr>
<td>BC1NEZ: cond ← tmp &amp; 1 ≠ 0</td>
<td></td>
</tr>
<tr>
<td>if cond then</td>
<td></td>
</tr>
<tr>
<td>I: target_PC ← ( PC+4 + sign_extend( offset &lt;&lt; 2 )</td>
<td></td>
</tr>
<tr>
<td>I+1: PC ← target_PC</td>
<td></td>
</tr>
</tbody>
</table>
```

**Programming Notes:**

Release 6: These instructions, BC1EQZ and BC1NEZ, replace the pre-Release 6 instructions BC1F and BC1T. These Release 6 FPU branches depend on bit 0 of the scalar FPU register.

Note: BC1EQZ and BC1NEZ do not have a format or data type width. The same instructions are used for branches based on conditions involving any format, including 32-bit S (single precision) and W (word) format, and 64-bit D (double precision) and L (longword) format, as well as 128-bit MSA. The FPU scalar comparison instructions CMP.condn.fmt produce an all ones or all zeros truth mask of their format width with the upper bits (where applicable) UNPREDICTABLE. BC1EQZ and BC1NEZ consume only bit 0 of the CMP.condn.fmt output value, and therefore operate correctly independent of fmt.
BC1F offset (cc = 0 implied)
BC1F cc, offset

MIPS32, removed in Release 6

**Purpose:** Branch on FP False

To test an FP condition code and do a PC-relative conditional branch.

**Description:** if FPCond((cc) = 0 then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP condition code bit cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt.

**Restrictions:**
Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**

| I: | condition ← FPCond((cc) = 0) |
| I+1: | if condition then |

| target_offset ← (offset15)GPRLEN-(16+2) || offset || 02 |
| PC ← PC + target_offset |
| endif |

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**
Unimplemented Operation

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

This instruction has been removed in Release 6 and has been replaced by the BC1EQZ instruction. Refer to the ‘BC1EQZ’ instruction in this manual for more information.

**Historical Information:**
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (CplCond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.

The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are
valid for MIPS IV and MIPS32.
BC1FL  IBranch on FP False Likely

### Format:
- BC1FL offset (cc = 0 implied)
- BC1FL cc, offset

### Purpose:
Branch on FP False Likely

To test an FP condition code and make a PC-relative conditional branch; execute the instruction in the delay slot only if the branch is taken.

### Description:
- if FPConditionCode(cc) = 0 then branch likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP Condition Code bit cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

An FP condition code is set by the FP compare instruction, C.cond.fmt.

### Restrictions:
Processor operation is **UNPREDICTABLE** if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

### Availability and Compatibility:
This instruction has been removed in Release 6.

### Operation:
This operation specification is for the general Branch On Condition operation with the *tf* (true/false) and *nd* (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for *tf* and *nd*.

| I    | condition ← FPConditionCode(cc) = 0         target_offset ← (offset15) \( \text{GPRLEN} - (16+2) \) || offset || 0^2 |
| I+1  | if condition then                           PC ← PC + target_offset                      else NullifyCurrentInstruction() endif |

### Exceptions:
- Coprocessor Unusable, Reserved Instruction

### Floating Point Exceptions:
- Unimplemented Operation

### Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.
**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC1F instruction instead.

**Historical Information:**

The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (\(Cp1Cond\)) and the \(C\) bit in the FP Control/Status register. MIPS I, II, and III architectures must have the \(CC\) field set to 0, which is implied by the first format in the “Format” section.

The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
Format:  BC1T offset (cc = 0 implied)  
        BC1T cc, offset  

Purpose:  Branch on FP True  
To test an FP condition code and do a PC-relative conditional branch.  

Description:  if FPConditionCode(cc) = 1 then branch  
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following  
the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP con-  
dition code bit cc is true (1), the program branches to the effective target address after the instruction in the delay slot  
is executed. An FP condition code is set by the FP compare instruction, C.cond.fmt.  

Restrictions:  
Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a  
branch or jump.  

Availability and Compatibility:  
This instruction has been removed in Release 6.  

Operation:  
I:  condition ← FPConditionCode(cc) = 1  
target_offset ← (offset15)^GPRLEN-(16+2) || offset || 0^2  
I+1:  if condition then  
      PC ← PC + target_offset  
      endif  

Exceptions:  
Coprocessor Unusable, Reserved Instruction  

Floating Point Exceptions:  
Unimplemented Operation  

Programming Notes:  
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register  
(JR) to branch to addresses outside this range.  
This instruction has been replaced by the BC1NEZ instruction. Refer to the ‘BC1NEZ’ instruction in this manual for  
more information.  

Historical Information:  
The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition  
signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC  
field set to 0, which is implied by the first format in the “Format” section.
The MIPS IV and MIPS32 architectures add seven more *Condition Code* bits to the original condition code 0. FP compare and conditional branch instructions specify the *Condition Code* bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
**BC1TL**

**Branch on FP True Likely**

**Format:**
- BC1TL offset (cc = 0 implied)
- BC1TL cc, offset

**Purpose:**
Branch on FP True Likely

To test an FP condition code and do a PC-relative conditional branch; execute the instruction in the delay slot only if the branch is taken.

**Description:**
if FPConditionCode(cc) = 1 then branch_likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the FP Condition Code bit cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

An FP condition code is set by the FP compare instruction, C.cond.fmt.

**Restrictions:**
Processor operation is **UNPREDICTABLE** if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC1F, BC1FL, BC1T, and BC1TL have specific values for tf and nd.

\[
\begin{align*}
  I: & \quad \text{condition} \leftarrow \text{FPConditionCode}(cc) = 1 \\
  & \quad \text{target_offset} \leftarrow \text{offset}_{15} \oplus \text{GPRLEN} - (16 + 2) || \text{offset} || 0^2 \\
  I+1: & \quad \text{if condition then} \\
  & \quad \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
  & \quad \quad \text{else} \\
  & \quad \quad \text{NullifyCurrentInstruction()} \\
  & \quad \text{endif}
\end{align*}
\]

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**
Unimplemented Operation

**Implementation Note:**
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.
Programming Notes:

With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC1T instruction instead.

Historical Information:

The MIPS I architecture defines a single floating point condition code, implemented as the coprocessor 1 condition signal (Cp1Cond) and the C bit in the FP Control/Status register. MIPS I, II, and III architectures must have the CC field set to 0, which is implied by the first format in the “Format” section.

The MIPS IV and MIPS32 architectures add seven more Condition Code bits to the original condition code 0. FP compare and conditional branch instructions specify the Condition Code bit to set or test. Both assembler formats are valid for MIPS IV and MIPS32.
Branch if Coprocessor 2 Condition (Register) Equal/Not Equal to Zero

**Purpose:** Branch if Coprocessor 2 Condition (Register) Equal/Not Equal to Zero

**BC2EQZ:** Branch if Coprocessor 2 Condition (Register) is Equal to Zero

**BC2NEZ:** Branch if Coprocessor 2 Condition (Register) is Not Equal to Zero

**Description:**

- **BC2EQZ:** if COP2Condition[ct] = 0 then branch
- **BC2NEZ:** if COP2Condition[ct] ≠ 0 then branch

The 5-bit field ct specifies a coprocessor 2 condition.

- For BC2EQZ if the coprocessor 2 condition is true the branch is taken.
- For BC2NEZ if the coprocessor 2 condition is false the branch is taken.

A 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address. Execute the instruction in the delay slot before the instruction at the target.

**Restrictions:**

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

If access to Coprocessor 2 is not enabled, a Coprocessor Unusable Exception is signaled.

**Availability and Compatibility:**

These instructions are introduced by and required as of Release 6.

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Operation:**

```plaintext
tmpcond ← Coprocessor2Condition(ct)
if BC2EQZ then
    tmpcond ← not(tmpcond)
endif
if tmpcond then
    PC ← PC+4 + sign_extend( immediate << 2 )
endif
```
Implementation Notes:

As of Release 6 these instructions, BC2EQZ and BC2NEZ, replace the pre-Release 6 instructions BC2F and BC2T, which had a 3-bit condition code field (as well as nullify and true/false bits). Release 6 makes all 5 bits of the \( c_t \) condition code available to the coprocessor designer as a condition specifier.

A customer defined coprocessor instruction set can implement any sort of condition it wants. For example, it could implement up to 32 single-bit flags, specified by the 5-bit field \( c_t \). It could also implement conditions encoded as values in a coprocessor register (such as testing the least significant bit of a coprocessor register) as done by Release 6 instructions BC1EQZ/BC1NEZ.
Format:  
BC2F offset \( (cc = 0 \text{ implied}) \)  
MIPS32, removed in Release 6  
BC2F cc, offset  
MIPS32, removed in Release 6

Purpose: Branch on COP2 False
To test a COP2 condition code and do a PC-relative conditional branch.

Description: if COP2Condition\((cc) = 0\) then branch
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by \(cc\) is false (0), the program branches to the effective target address after the instruction in the delay slot is executed.

Restrictions:
Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
\[
\begin{align*}
\text{I:} & \quad \text{condition} \leftarrow \text{COP2Condition}(cc) = 0 \\
& \quad \text{offset} \leftarrow (\text{offset}_{15})_{\text{GPRLEN}} - (16+2) \mid | \text{offset} | \mid 0^2 \\
\text{I+1:} & \quad \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
& \quad \text{endif}
\end{align*}
\]

Exceptions:
Coprocessor Unusable, Reserved Instruction

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

This instruction has been replaced by the BC2EQZ instruction. Refer to the ‘BC2EQZ’ instruction in this manual for more information.
Branch on COP2 False Likely

Format:  
BC2FL offset (cc = 0 implied)  
MIPS32, removed in Release 6  
BC2FL cc, offset  
MIPS32, removed in Release 6

Purpose:  
Branch on COP2 False Likely

To test a COP2 condition code and make a PC-relative conditional branch; execute the instruction in the delay slot only if the branch is taken.

Description:  
if COP2Condition(cc) = 0 then branch_likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is false (0), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.

I:  
condition ← COP2Condition(cc) = 0  
target_offset ← (offset15)GPRLEN-(16+2) || offset || 02

I+1:  
if condition then  
PC ← PC + target_offset  
else  
NullifyCurrentInstruction()  
endif

Exceptions:
Coprocessor Unusable, Reserved Instruction

Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions,
as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC2F instruction instead.
### BC2T Branch on COP2 True

**Format:**
- BC2T offset \((cc = 0 \text{ implied})\)
- BC2T cc, offset  \(\text{MIPS32, removed in Release 6}\)

**Purpose:**
To test a COP2 condition code and do a PC-relative conditional branch.

**Description:**
if \(\text{COP2Condition}(cc) = 1\) then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by \(cc\) is true (1), the program branches to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**
Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**

```plaintext
I:  condition ← COP2Condition(cc) = 1
    target_offset ← (offset_{15})_{\text{GRLEN-(16+2)}} || offset || 0^2
I+1: if condition then
     PC ← PC + target_offset
endif
```

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Programming Notes:**
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

This instruction has been replaced by the BC2NEZ instruction. Refer to the ‘BC2NEZ’ instruction in this manual for more information.
**BC2TL**

### Branch on COP2 True Likely

**Format:**
- BC2TL offset (cc = 0 implied)
- BC2TL cc, offset

**MIPS32, removed in Release 6**

**Purpose:** Branch on COP2 True Likely

To test a COP2 condition code and do a PC-relative conditional branch; execute the instruction in the delay slot only if the branch is taken.

**Description:**
if COP2Condition(cc) = 1 then branch_likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself) in the branch delay slot to form a PC-relative effective target address. If the COP2 condition specified by cc is true (1), the program branches to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

**Restrictions:**
Processor operation is **UNPREDICTABLE** if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**
This operation specification is for the general Branch On Condition operation with the tf (true/false) and nd (nullify delay slot) fields as variables. The individual instructions BC2F, BC2FL, BC2T, and BC2TL have specific values for tf and nd.

\[
\begin{align*}
I: & \quad \text{condition} \leftarrow \text{COP2Condition}(cc) = 1 \\
& \quad \text{target_offset} \leftarrow (\text{offset}_{15})^{\text{GPRLEN}-16+2} \ || \ | \ \text{offset} \ || \ 0^2 \\
I+1: & \quad \text{if condition then} \\
& \quad \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
& \quad \quad \text{else} \\
& \quad \quad \text{NullifyCurrentInstruction()} \\
& \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Implementation Note:**
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

**Programming Notes:**
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions,
as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BC2T instruction instead.
BEQ

Branch on Equal

Format: BEQ rs, rt, offset

Purpose: Branch on Equal

To compare GPRs then do a PC-relative conditional branch.

Description: if GPR[rs] = GPR[rt] then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs and GPR rt are equal, branch to the effective target address after the instruction in the delay slot is executed.

Restrictions:

Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Operation:

I: target_offset ← sign_extend(offset || 0^2)
   condition ← (GPR[rs] = GPR[rt])
I+1: if condition then
     PC ← PC + target_offset
     endif

Exceptions:

None

Programming Notes:

With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

BEQ r0, r0 offset, expressed as B offset, is the assembly idiom used to denote an unconditional branch.
BEQL Branch on Equal Likely

Format: \texttt{BEQL\ rs, rt, offset}

MIPS32, removed in Release 6

Purpose: Branch on Equal Likely
To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

Description: if \( GPR[rs] = GPR[rt] \) then branch\_likely
An 18-bit signed offset (the 16-bit \textit{offset} field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR \( rs \) and GPR \( rt \) are equal, branch to the target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:
Processor operation is \textbf{UNPREDICTABLE} if a branch, jump, ERET, DERET, or \texttt{WAIT} instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:

\begin{align*}
  \text{I:} & \quad \text{target\_offset} \leftarrow \text{sign\_extend}((\text{offset} \mid 0^2) \\
  & \quad \text{condition} \leftarrow (GPR[rs] = GPR[rt])
  \\
  \text{I+1:} & \quad \text{if condition then} \\
  & \quad \text{PC} \leftarrow \text{PC} + \text{target\_offset} \\
  & \quad \text{else} \\
  & \quad \text{NullifyCurrentInstruction()}
\end{align*}

Exceptions:
None

Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is \( \pm 128 \text{ KBytes} \). Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BEQ instruction instead.
<table>
<thead>
<tr>
<th>BEQL</th>
<th>Branch on Equal Likely</th>
</tr>
</thead>
</table>

**Historical Information:**

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
BGEZ Branch on Greater Than or Equal to Zero

**Format:**  
BGEZ rs, offset

**Purpose:** Branch on Greater Than or Equal to Zero  
To test a GPR then do a PC-relative conditional branch

**Description:**  
if GPR[rs] ≥ 0 then branch  
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.  
If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**  
CONTROL TRANSFER INSTRUCTIONS (CTIS) SHOULD NOT BE PLACED IN BRANCH DELAY SLOTS OR RELEASE 6 FORBIDDEN SLOTS. CTIS INCLUDE ALL BRANCHES AND JUMPS, NAL, ERET, ERETNC, DERET, WAIT, AND PAUSE. 
PRE-RELEASE 6: PROCESSOR OPERATION IS UNPREDICTABLE IF A CONTROL TRANSFER INSTRUCTION (CTI) IS PLACED IN THE DELAY SLOT OF A BRANCH OR JUMP. 
RELEASE 6: IF A CONTROL TRANSFER INSTRUCTION (CTI) IS EXECUTED IN THE DELAY SLOT OF A BRANCH OR JUMP, RELEASE 6 IMPLEMENTATIONS ARE REQUIRED TO SIGNAL A RESERVED INSTRUCTION EXCEPTION.

**Operation:**  
I:  
   target_offset ← sign_extend(offset || 0²)  
   condition ← GPR[rs] ≥ 0^GPRLEN  
I+1: if condition then  
   PC ← PC + target_offset  
   endif

**Exceptions:**  
None

**Programming Notes:**  
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.
BGEZAL  
Branch on Greater Than or Equal to Zero and Link

**Format:**  
BGEZAL rs, offset  
MIPS32, removed in Release 6

**Purpose:**  
Branch on Greater Than or Equal to Zero and Link  
To test a GPR then do a PC-relative conditional procedure call

**Description:**  
if GPR[rs] ≥ 0 then procedure_call  
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed.

**Availability and Compatibility**  
This instruction has been removed in Release 6 with the exception of special case BAL (unconditional Branch and Link) which was an alias for BGEZAL with rs=0.

**Restrictions:**  
Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

**Branch-and-link Restartability:** GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot or forbidden slot.

**Operation:**

```plaintext
I:    target_offset ← sign_extend(offset || 0^2)  
    condition ← GPR[rs] ≥ 0^GPRLEN  
    GPR[31] ← PC + 8
I+1:  if condition then  
    PC ← PC + target_offset  
endif
```

**Exceptions:**  
None

**Programming Notes:**  
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.

BGEZAL r0, offset, expressed as BAL offset, is the assembly idiom used to denote a PC-relative branch and link. BAL is used in a manner similar to JAL, but provides PC-relative addressing and a more limited target PC range.
### B{LE,GE,GT,LT,EQ,NE}ZALC Compact Zero-Compare and Branch-and-Link Instructions

#### Format:

<table>
<thead>
<tr>
<th>Format</th>
<th>MIPS32 Release 6</th>
</tr>
</thead>
<tbody>
<tr>
<td>BLEZALC rt, offset</td>
<td></td>
</tr>
<tr>
<td>BGEZALC rt, offset</td>
<td></td>
</tr>
<tr>
<td>BGTZALC rt, offset</td>
<td></td>
</tr>
<tr>
<td>BLTZALC rt, offset</td>
<td></td>
</tr>
<tr>
<td>BEQZALC rt, offset</td>
<td></td>
</tr>
<tr>
<td>BNEZALC rt, offset</td>
<td></td>
</tr>
</tbody>
</table>

#### Purpose:

Compact Zero-Compare and Branch-and-Link Instructions

- **BLEZALC**: Compact branch-and-link if GPR \( r_t \) is less than or equal to zero
- **BGEZALC**: Compact branch-and-link if GPR \( r_t \) is greater than or equal to zero
- **BGTZALC**: Compact branch-and-link if GPR \( r_t \) is greater than zero
- **BLTZALC**: Compact branch-and-link if GPR \( r_t \) is less than zero
- **BEQZALC**: Compact branch-and-link if GPR \( r_t \) is equal to zero
- **BNEZALC**: Compact branch-and-link if GPR \( r_t \) is not equal to zero

#### Description:

if condition(GPR[rt]) then procedure_call branch (no delay slot)

The condition is evaluated. If the condition is true, the branch is taken.

Places the return address link in GPR 31. The return link is the address of the instruction immediately following the branch, where execution continues after a procedure call.

The return address link is unconditionally updated.

A 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address.
### B{LE,GE,GT,LT,EQ,NE}ZALC Compact Zero-Compare and Branch-and-Link Instructions

BLEZALC: the condition is true if and only if GPR rt is less than or equal to zero.
BGEZALC: the condition is true if and only if GPR rt is greater than or equal to zero.
BLTZALC: the condition is true if and only if GPR rt is less than zero.
BGTZALC: the condition is true if and only if GPR rt is greater than zero.
BEQZALC: the condition is true if and only if GPR rt is equal to zero.
BNEZALC: the condition is true if and only if GPR rt is not equal to zero.

Compact branches do not have delay slots. The instruction after a compact branch is only executed if the branch is not taken.

### Restrictions:

Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

If a control transfer instruction (CTI) is executed in the forbidden slot of a compact branch, Release 6 implementations are required to signal a Reserved Instruction exception, but only when the branch is not taken.

Branch-and-link Restartability: GPR 31 must not be used for the source registers, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot or forbidden slot.

### Availability and Compatibility:

These instructions are introduced by and required as of Release 6.
- BEQZALC reuses the opcode assigned to pre-Release 6 ADDI.
- BNEZALC reuses the opcode assigned to pre-Release 6 MIPS64 DADDI.

These instructions occupy primary opcode spaces originally allocated to other instructions. BLEZALC and BGEZALC have the same primary opcode as BLEZ, and are distinguished by rs and rt register numbers. Similarly, BGTZALC and BLTZALC have the same primary opcode as BGTZ, and are distinguished by register fields.

BEQZALC and BNEZALC reuse the primary opcodes ADDI and DADDI.

### Exceptions:

None

### Operation:

\[
\begin{align*}
\text{GPR}[31] & \leftarrow \text{PC}+4 \\
target\_offset & \leftarrow \text{sign\_extend}( offset \| 0^2 )
\end{align*}
\]

BLTZALC: \( \text{cond} \leftarrow \text{GPR}[rt] < 0 \)
BLEZALC: \( \text{cond} \leftarrow \text{GPR}[rt] \leq 0 \)
BGEZALC: \( \text{cond} \leftarrow \text{GPR}[rt] \geq 0 \)
BGTZALC: \( \text{cond} \leftarrow \text{GPR}[rt] > 0 \)
BEQZALC: \( \text{cond} \leftarrow \text{GPR}[rt] = 0 \)
BNEZALC: \( \text{cond} \leftarrow \text{GPR}[rt] \neq 0 \)

if \( \text{cond} \) then
  \( \text{PC} \leftarrow (\text{PC}+4+\text{sign\_extend}( target\_offset )) \)
endif

### Programming Notes:

Software that performs incomplete instruction decode may incorrectly decode these new instructions, because of their
very tight encoding. For example, a disassembler might look only at the primary opcode field, instruction bits 31-26, to decode BLEZL without checking that the “rt” field is zero. Such software violated the pre-Release 6 architecture specification.

With the 16-bit offset shifted left 2 bits and sign extended, the conditional branch range is ± 128 KBytes. Other instructions such as pre-Release 6 JAL and JALR, or Release 6 JIALC and BALC have larger ranges. In particular, BALC, with a 26-bit offset shifted by 2 bits, has a 28-bit range, ± 128 MBytes. Code sequences using AUIPC and JIALC allow still greater PC-relative range.
BGEZALL  Branch on Greater Than or Equal to Zero and Link Likely

**Format:**  BGEZALL rs, offset  

**MIPS32, removed in Release 6**

**Purpose:**  Branch on Greater Than or Equal to Zero and Link Likely  
To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken.

**Description:**  if GPR[rs] ≥ 0 then procedure_call_likely  
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

**Restrictions:**  Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

*Branch-and-link Restartability:*  GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  

```
I:  target_offset ← sign_extend(offset || 0^2)
    condition ← GPR[rs] ≥ 0^GPRLEN
    GPR[31] ← PC + 8
I+1: if condition then
      PC ← PC + target_offset
      else
      NullifyCurrentInstruction()
      endif
```

**Exceptions:**  
None

**Programming Notes:**  
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is
encouraged to use the BGEZAL instruction instead.

**Historical Information:**

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
**B<cond>C Compact Compare-and-Branch Instructions**

**Format:**  
\[ \text{B<cond>C rs, rt, offset} \]  

**Purpose:** Compact Compare-and-Branch Instructions

**Format Details:**

Equal/Not-Equal register-register compare and branch with 16-bit offset:

- `BEQC rs, rt, offset`
- `BNEC rs, rt, offset`

---

### MIPS32 Release 6

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
Signed register-register compare and branch with 16-bit offset:

- \texttt{BLTC rs, rt, offset}
- \texttt{BGEC rs, rt, offset}  

Unsigned register-register compare and branch with 16-bit offset:

- \texttt{BLTUC rs, rt, offset}
- \texttt{BGEUC rs, rt, offset}

Assembly idioms with reversed operands for signed/unsigned compare-and-branch:

- \texttt{BGTC rt, rs, offset}
- \texttt{BLEC rt, rs, offset}
- \texttt{BGTUC rt, rs, offset}
- \texttt{BLEUC rt, rs, offset}

Signed Compare register to Zero and branch with 16-bit offset:

- \texttt{BLTZC rt, offset}
- \texttt{BLEZC rt, offset}
- \texttt{BGEZC rt, offset}
- \texttt{BGTZC rt, offset}

Equal/Not-equal Compare register to Zero and branch with 21-bit offset:

- \texttt{BEQZC rs, offset}
- \texttt{BNEZC rs, offset}

Description: if condition(GPR[rs] and/or GPR[rt]) then compact branch (no delay slot)

The condition is evaluated. If the condition is true, the branch is taken.

An 18/23-bit signed offset (the 16/21-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address.

The offset is 16 bits for most compact branches, including BLTC, BLEC, BGEC, BGTUC, BGEUC, BGTC, BLTZC, BLEZC, BGEZC, BGTZC. The offset is 21 bits for BEQZC and BNEZC.

Compact branches have no delay slot: the instruction after the branch is NOT executed if the branch is taken.

The conditions are as follows:

Equal/Not-equal register-register compare-and-branch with 16-bit offset:

- \texttt{BEQC}: Compact branch if GPRs are equal
- \texttt{BNEC}: Compact branch if GPRs are not equal

Signed register-register compare and branch with 16-bit offset:

- \texttt{BLTC}: Compact branch if GPR \( rs \) is less than GPR \( rt \)
- \texttt{BGEC}: Compact branch if GPR \( rs \) is greater than or equal to GPR \( rt \)

Unsigned register-register compare and branch with 16-bit offset:

- \texttt{BLTUC}: Compact branch if GPR \( rs \) is less than GPR \( rt \), unsigned
- \texttt{BGEUC}: Compact branch if GPR \( rs \) is greater than or equal to GPR \( rt \), unsigned

Assembly Idioms with Operands Reversed:

- \texttt{BLEC}: Compact branch if GPR \( rt \) is less than or equal to GPR \( rs \) (alias for BGEC)
- \texttt{BGTC}: Compact branch if GPR \( rt \) is greater than GPR \( rs \) (alias for BLTC)
- \texttt{BLEUC}: Compact branch if GPR \( rt \) is less than or equal to GPR \( rt \), unsigned (alias for BGEUC)
- \texttt{BGTUC}: Compact branch if GPR \( rt \) is greater than GPR \( rs \), unsigned (alias for BLTUC)
B<cond>C Compact Compare-and-Branch Instructions

Compare register to zero and branch with 16-bit offset:
- BLTZC: Compact branch if GPR rt is less than zero
- BLEZC: Compact branch if GPR rt is less than or equal to zero
- BGEZC: Compact branch if GPR rt is greater than or equal to zero
- BGTZC: Compact branch if GPR rt is greater than zero

Compare register to zero and branch with 21-bit offset:
- BEQZC: Compact branch if GPR rs is equal to zero
- BNEZC: Compact branch if GPR rs is not equal to zero

Restrictions:

Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

If a control transfer instruction (CTI) is placed in the forbidden slot of a compact branch, Release 6 implementations are required to signal a Reserved Instruction exception, but only when the branch is not taken.

Availability and Compatibility:

These instructions are introduced by and required as of Release 6.
- BEQZC reuses the opcode assigned to pre-Release 6 LDC2.
- BNEZC reuses the opcode assigned to pre-Release 6 SDC2.
- BEQC reuses the opcode assigned to pre-Release 6 ADDI.
- BNEC reuses the opcode assigned to pre-Release 6 MIPD64 DADDI.

Exceptions:

None

Operation:

```c
#define sign_extend(x) ((x) < 0 ? ((x) + 0x80000000) : (x))

target_offset ← sign_extend( offset || 02 )

/* Register-register compare and branch, 16 bit offset: */
/* Equal / Not-Equal */
BEQC: cond ← GPR[rs] = GPR[rt]
BNEC: cond ← GPR[rs] ≠ GPR[rt]
/* Signed */
BLTC: cond ← GPR[rs] < GPR[rt]
BGEC: cond ← GPR[rs] ≥ GPR[rt]
/* Unsigned */
BLTUC: cond ← unsigned(GPR[rs]) < unsigned(GPR[rt])
BGEC: cond ← unsigned(GPR[rs]) ≥ unsigned(GPR[rt])

/* Compare register to zero, small offset: */
BLTZC: cond ← GPR[rt] < 0
BLEZC: cond ← GPR[rt] ≤ 0
BGEZC: cond ← GPR[rt] ≥ 0
BGTZC: cond ← GPR[rt] > 0
/* Compare register to zero, large offset: */
BEQZC: cond ← GPR[rs] = 0
BNEZC: cond ← GPR[rs] ≠ 0

if cond then
  PC ← (PC+4+ sign_extend( offset ) )
```
end if

Programming Notes:
Legacy software that performs incomplete instruction decode may incorrectly decode these new instructions, because of their very tight encoding. For example, a disassembler that looks only at the primary opcode field (instruction bits 31-26) to decode BLEZL without checking that the “rt” field is zero violates the pre-Release 6 architecture specification. Complete instruction decode allows reuse of pre-Release 6 BLEZL opcode for Release 6 conditional branches.
BGEZL  Branch on Greater Than or Equal to Zero Likely

Format:  BGEZL rs, offset

Purpose:  Branch on Greater Than or Equal to Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

Description:
if GPR[rs] \geq 0 then branch_likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs are greater than or equal to zero (sign bit is 0), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
I:  target_offset \leftarrow \text{sign_extend}(\text{offset} \ || \ 0^2)
    condition \leftarrow \text{GPR}[rs] \geq 0^{\text{GPRLEN}}
    \text{if condition then}
    \text{PC} \leftarrow \text{PC} + \text{target_offset}
    \text{else}
    \text{NullifyCurrentInstruction()}
    \text{endif}

Exceptions:
None

Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BGEZ instruction instead.
Historical Information:

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
BGTZ Branch on Greater Than Zero

**Format:**  BGTZ rs, offset

**Purpose:** Branch on Greater Than Zero

To test a GPR then do a PC-relative conditional branch.

**Description:** if GPR[rs] > 0 then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
I: & \quad \text{target_offset} \leftarrow \text{sign_extend}(	ext{offset} || 0^2) \\
    & \quad \text{condition} \leftarrow \text{GPR}[rs] > 0^{\text{GPRLEN}} \\
I+1: & \quad \text{if condition then} \\
    & \quad \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
    & \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.
BGTZL Branch on Greater Than Zero Likely

Format: \texttt{BGTZL rs, offset}  \quad \textit{MIPS32, removed in Release 6}

Purpose: Branch on Greater Than Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

Description: if GPR\{rs\} > 0 then branch\_likely
An 18-bit signed offset (the 16-bit \texttt{offset} field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR \texttt{rs} are greater than zero (sign bit is 0 but value not zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:
Processor operation is \textbf{UNPREDICTABLE} if a branch, jump, ERET, DERET, or \texttt{WAIT} instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
\begin{verbatim}
I:  target_offset ← sign_extend(offset || 02)  
    condition ← GPR[rs] > 0GPRLEN
I+1:  if condition then  
    PC ← PC + target_offset  
  else  
    NullifyCurrentInstruction()
endif
\end{verbatim}

Exceptions:
None

Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.
In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.
Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is
encouraged to use the BGTZ instruction instead.

**Historical Information:**

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
BITSWAP

**Format:**

```
BITSWAP

BITSWAP rd,rt
```

**Purpose:**

Swaps (reverses) bits in each byte

**Description:**

\[ \text{GPR}[^{rd}].\text{byte}(i) \leftarrow \text{reverse\_bits\_in\_byte}(\text{GPR}[^{rt}].\text{byte}(i)), \text{ for all bytes } i \]

Each byte in input GPR \(^{rt}\) is moved to the same byte position in output GPR \(^{rd}\), with bits in each byte reversed.

BITSWAP operates on all 4 bytes of a 32-bit GPR on a 32-bit CPU.

**Restrictions:**

None.

**Availability and Compatibility:**

The BITSWAP instruction is introduced by and required as of Release 6.

**Operation:**

```
BITSWAP:
for i in 0 to 3 do /* for all bytes in 32-bit GPR width */
    tmp.byte(i) \leftarrow \text{reverse\_bits\_in\_byte}( \text{GPR}[^{rt}].\text{byte}(i) )
endfor
GPR[^{rd}] \leftarrow tmp
```

where

```
function \text{reverse\_bits\_in\_byte}(\text{inbyte})
    \text{outbyte}_7 \leftarrow \text{inbyte}_0
    \text{outbyte}_6 \leftarrow \text{inbyte}_1
    \text{outbyte}_5 \leftarrow \text{inbyte}_2
    \text{outbyte}_4 \leftarrow \text{inbyte}_3
    \text{outbyte}_3 \leftarrow \text{inbyte}_4
    \text{outbyte}_2 \leftarrow \text{inbyte}_5
    \text{outbyte}_1 \leftarrow \text{inbyte}_6
    \text{outbyte}_0 \leftarrow \text{inbyte}_7
    return \text{outbyte}
end function
```

**Exceptions:**

None

**Programming Notes:**

The Release 6 BITSWAP instruction corresponds to the DSP Module BITREV instruction, except that the latter bit-reverses the least-significant 16-bit halfword of the input register, zero extending the rest, while BITSWAP operates on 32-bits.
BITSWAP Swaps (reverses) bits in each byte
BLEZ
Branch on Less Than or Equal to Zero

**Format:**
BLEZ rs, offset

**Purpose:**
Branch on Less Than or Equal to Zero

To test a GPR then do a PC-relative conditional branch.

**Description:**
if GPR[rs] ≤ 0 then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Operation:**

```plaintext
I:  target_offset ← sign_extend(offset || 02)
    condition ← GPR[rs] ≤ 0^{GPRLEN}
I+1: if condition then
      PC ← PC + target_offset
    endif
```

**Exceptions:**
None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.
**Format:** BLEZL rs, offset

**Purpose:** Branch on Less Than or Equal to Zero Likely
To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

**Description:** if GPR[rs] ≤ 0 then branch_likely
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than or equal to zero (sign bit is 1 or value is zero), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

**Restrictions:**
Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**

\[
\begin{align*}
I & : \quad \text{target_offset} \leftarrow \text{signextend}(\text{offset} || 0^2) \\
& \quad \text{condition} \leftarrow \text{GPR}[rs] \leq 0^{\text{GPRLEN}} \\
I+1 & : \quad \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
& \quad \text{else} \\
& \quad \text{NullifyCurrentInstruction()} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:** None

**Implementation Note:**
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

**Programming Notes:**
With the 18-bit signed instruction offset, the conditional branch range is ±128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is
encouraged to use the BLEZ instruction instead.

**Historical Information:**
In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
BLTZ Branch on Less Than Zero

**Format:**  \texttt{BLTZ \textit{rs}, \textit{offset}}

**Purpose:**  Branch on Less Than Zero

To test a GPR then do a PC-relative conditional branch.

**Description:**  \texttt{if GPR[rs] < 0 then branch}

An 18-bit signed offset (the 16-bit \textit{offset} field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR \textit{rs} are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**

\textit{Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.} CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is \textbf{UNPREDICTABLE} if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Operation:**

\begin{align*}
  &I: \quad \text{target\_offset} \leftarrow \text{sign\_extend(offset || 0^2)} \\
  &\quad \text{condition} \leftarrow \text{GPR[rs]} < 0^{\text{GPRLEN}} \\
  &I+1: \quad \text{if condition then} \\
  &\quad \text{PC} \leftarrow \text{PC} + \text{target\_offset} \\
  &\quad \text{endif}
\end{align*}

**Exceptions:**

None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is \pm 128 KBytes. Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
**BLTZAL**

**Branch on Less Than Zero and Link**

**Format:** \( \text{BLTZAL } rs, \text{ offset} \)

**MIPS32**, removed in Release 6

**Purpose:** Branch on Less Than Zero and Link

To test a GPR then do a PC-relative conditional procedure call.

**Description:**

- if \( \text{GPR}[rs] < 0 \) then procedure_call

Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

- An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

- If the contents of GPR \( rs \) are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed.

**Availability and Compatibility:**

This instruction has been removed in Release 6.

The special case BLTZAL r0, offset, has been retained as NAL in Release 6.

**Restrictions:**

Processor operation is **UNPREDICTABLE** if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

**Branch-and-link Restartability:** GPR 31 must not be used for the source register \( rs \), because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is **UNPREDICTABLE**. This restriction permits an exception handler to resume execution by re-executing the branch when an exception occurs in the branch delay slot.

**Operation:**

\[
\begin{align*}
\text{I:} & \quad \text{target_offset} \leftarrow \text{sign\_extend}(\text{offset} \mid\mid 0^2) \\
& \quad \text{condition} \leftarrow \text{GPR}[rs] < 0^{\text{GPR\_LEN}} \\
& \quad \text{GPR}[31] \leftarrow \text{PC} + 8 \\
\text{I+1:} & \quad \text{if condition then} \\
& \quad \text{PC} \leftarrow \text{PC} + \text{target\_offset} \\
& \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is \( \pm 128 \text{ KBytes} \). Use jump and link (JAL) or jump and link register (JALR) instructions for procedure calls to addresses outside this range.
BLTZALL Branch on Less Than Zero and Link Likely

Format: BLTZALL rs, offset

Purpose: Branch on Less Than Zero and Link Likely
To test a GPR then do a PC-relative conditional procedure call; execute the delay slot only if the branch is taken.

Description: if GPR[rs] < 0 then procedure_call_likely
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.
An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.
If the contents of GPR rs are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:
Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Branch-and-link Restartability: GPR 31 must not be used for the source register rs, because such an instruction does not have the same effect when reexecuted. The result of executing such an instruction is **UNPREDICTABLE**. This restriction permits an exception handler to resume execution by reexecuting the branch when an exception occurs in the branch delay slot.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:

I: target_offset ← sign_extend(offset || 0^2)
condition ← GPR[rs] < 0^GPRLEN
GPR[31] ← PC + 8
I+1: if condition then
    PC ← PC + target_offset
else
    NullifyCurrentInstruction()
endif

Exceptions:
None

Implementation Note:
Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:
With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump and link (JAL) or
jump and link register (JALR) instructions for procedure calls to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BLTZAL instruction instead.

**Historical Information:**

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
Format: \texttt{BLTZL rs, offset}  

MIPS32, removed in Release 6

Purpose: Branch on Less Than Zero Likely

To test a GPR then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

Description: if \( GPR[rs] < 0 \) then branch likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR \( rs \) are less than zero (sign bit is 1), branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:

Processor operation is \textbf{UNPREDICTABLE} if a branch, jump, ERET, DERET, or \texttt{WAIT} instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:

This instruction has been removed in Release 6.

Operation:

\begin{verbatim}
I:  target_offset ← sign_extend(offset || 0^2)  
    condition ← GPR[rs] < 0^GPRLEN 
I+1:  if condition then 
       PC ← PC + target_offset 
       NullifyCurrentInstruction() 
    else 
       endif
\end{verbatim}

Exceptions:

None

Implementation Note:

Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:

With the 18-bit signed instruction offset, the conditional branch range is \( \pm 128 \text{ KBytes} \). Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BLTZ instruction instead.
Historical Information:

In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
**BNE Branch on Not Equal**

**Format:**  
BNE rs, rt, offset

**Purpose:**  
Branch on Not Equal

To compare GPRs then do a PC-relative conditional branch

**Description:**  
if GPR[rs] ≠ GPR[rt] then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed.

**Restrictions:**  
*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
\text{I:} & \quad \text{target_offset} \leftarrow \text{sign_extend}(\text{offset} \mid\mid 0^2) \\
& \quad \text{condition} \leftarrow (\text{GPR}[rs] \neq \text{GPR}[rt]) \\
\text{I+1:} & \quad \text{if condition then} \\
& \quad \quad \text{PC} \leftarrow \text{PC} + \text{target_offset} \\
& \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.
BNEL IBranch on Not Equal Likely

The MIPS32® Instruction Set Manual, Revision 6.04 103
Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Format: BNEL rs, rt, offset

MIPS32, removed in Release 6

Purpose: Branch on Not Equal Likely

To compare GPRs then do a PC-relative conditional branch; execute the delay slot only if the branch is taken.

Description: if GPR[rs] ≠ GPR[rt] then branch_likely

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address.

If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed. If the branch is not taken, the instruction in the delay slot is not executed.

Restrictions:

Processor operation is UNPREDICTABLE if a branch, jump, ERET, DERET, or WAIT instruction is placed in the delay slot of a branch or jump.

Availability and Compatibility:

This instruction has been removed in Release 6.

Operation:

I: target_offset ← sign_extend(offset || 02)
condition ← (GPR[rs] ≠ GPR[rt])

I+1: if condition then
    PC ← PC + target_offset
else
    NullifyCurrentInstruction()
endif

Exceptions:

None

Implementation Note:

Some implementations always predict that the branch will be taken, and do not use nor do they update the branch internal processor branch prediction tables for this instruction. To maintain performance compatibility, future implementations are encouraged to do the same.

Programming Notes:

With the 18-bit signed instruction offset, the conditional branch range is ± 128 KBytes. Use jump (J) or jump register (JR) to branch to addresses outside this range.

In Pre-Release 6 implementations, software is strongly encouraged to avoid the use of the Branch Likely instructions, as they will be removed from a future revision of the MIPS Architecture.

Some implementations always predict the branch will be taken, so there is a significant penalty if the branch is not taken. Software should only use this instruction when there is a very high probability (98% or more) that the branch will be taken. If the branch is not likely to be taken or if the probability of a taken branch is unknown, software is encouraged to use the BNE instruction instead.
Historical Information:
In the MIPS I architecture, this instruction signaled a Reserved Instruction exception.
**BOVC BNVC**

**Branch on Overflow, Compact; Branch on No Overflow, Compact**

<table>
<thead>
<tr>
<th>Format:</th>
<th>BOVC BNVC</th>
</tr>
</thead>
<tbody>
<tr>
<td>BOVC rs,rt,offset</td>
<td>BNVC rs,rt,offset</td>
</tr>
</tbody>
</table>

**MIPS32 Release 6**

**Purpose:** Branch on Overflow, Compact; Branch on No Overflow, Compact

**BOVC:** Detect overflow for add (signed 32 bits) and branch if overflow.

**BNVC:** Detect overflow for add (signed 32 bits) and branch if no overflow.

**Description:**

branch if/!if-not NotWordValue(GPR[rs]+GPR[rt])

- BOVC performs a signed 32-bit addition of rs and rt. BOVC discards the sum, but detects signed 32-bit integer overflow of the sum, and branches if such overflow is detected.
- BNVC performs a signed 32-bit addition of rs and rt. BNVC discards the sum, but detects signed 32-bit integer overflow of the sum, and branches if such overflow is not detected.

BOVC and BNVC are compact branches—they have no branch delay slots, but do have a forbidden slot.

A 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), to form a PC-relative effective target address.

The special case with rt=0 (for example, GPR[0]) is allowed. On MIPS32, BOVC rs,r0 offset never branches, while BNVC rs,r0 offset always branches.

The special case of rs=0 and rt=0 is allowed. BOVC never branches, while BNVC always branches.

**Restrictions:**

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.* CTIs include all branches and jumps, NAL, ERET, ERETN, DERET, WAIT, and PAUSE.

If a control transfer instruction (CTI) is executed in the forbidden slot of a compact branch, Release 6 implementations are required to signal a Reserved Instruction exception, but only when the branch is not taken.

**Availability and Compatibility:**

These instructions are introduced by and required as of Release 6.

See section A.4 on page 454 in Volume II for a complete overview of Release 6 instruction encodings. Brief notes related to these instructions:

- BOVC uses the primary opcode allocated to MIPS32 pre-Release 6 ADDI. Release 6 reuses the ADDI primary opcode for BOVC and other instructions, distinguished by register numbers.
- BNVC uses the primary opcode allocated to MIPS64 pre-Release 6 DADDI. Release 6 reuses the DADDI primary opcode for BNVC and other instructions, distinguished by register numbers.

**Operation:**

temp1 ← GPR[rs]
temp2 ← GPR[rt]
tempd ← temp1 + temp2  // wider than 32-bit precision
sum_overflow ← (tempd32 ≠ tempd31)

BOVC: cond ← sum_overflow
BNVC: cond ← not(sum_overflow)

if cond then
    PC ← (PC+4 + sign_extend(offset << 2))
endif

Exceptions:
None
**Format:** \texttt{BREAK} \hfill MIPS32

**Purpose:** Breakpoint

To cause a Breakpoint exception

**Description:**

A breakpoint exception occurs, immediately and unconditionally transferring control to the exception handler. The \textit{code} field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Restrictions:**

None

**Operation:**

\begin{verbatim}
    SignalException(Breakpoint)
\end{verbatim}

**Exceptions:**

Breakpoint
Floating Point Compare

**Format:**

- `C.cond.fmt`
- `C.cond.S fs, ft (cc = 0 implied)`
- `MIPS32, removed in Release 6`
- `C.cond.D fs, ft (cc = 0 implied)`
- `MIPS32, removed in Release 6`
- `C.cond.PS fs, ft (cc = 0 implied)`
- `MIPS32 Release 2, removed in Release 6`
- `C.cond.S cc, fs, ft`
- `MIPS32, removed in Release 6`
- `C.cond.D cc, fs, ft`
- `MIPS32, removed in Release 6`
- `C.cond.PS cc, fs, ft`
- `MIPS32 Release 2, removed in Release 6`

**Purpose:**

Floating Point Compare

To compare FP values and record the Boolean result in a condition code.

**Description:**

FPCOnditionCode(cc) ← FPR[fs] compare_cond FPR[ft]

The value in FPR/fs is compared to the value in FPR/ft; the values are in format fmt. The comparison is exact and neither overflows nor underflows.

If the comparison specified by the cond field of the instruction is true for the operand values, the result is true; otherwise, the result is false. If no exception is taken, the result is written into condition code CC; true is 1 and false is 0.

In the cond field of the instruction: cond2..1 specify the nature of the comparison (equals, less than, and so on). cond0 specifies whether the comparison is ordered or unordered, that is, false or true if any operand is a NaN; cond3 indicates whether the instruction should signal an exception on QNaN inputs, or not (see Table 3.2).

C.cond.PS compares the upper and lower halves of FPR/fs and FPR/ft independently and writes the results into condition codes CC +1 and CC respectively. The CC number must be even. If the number is not even the operation of the instruction is UNPREDICTABLE.

If one of the values is an SNaN, or cond3 is set and at least one of the values is a QNaN, an Invalid Operation condition is raised and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written and an Invalid Operation exception is taken immediately. Otherwise, the Boolean result is written into condition code CC.

There are four mutually exclusive ordering relations for comparing floating point values; one relation is always true and the others are false. The familiar relations are greater than, less than, and equal. In addition, the IEEE floating point standard defines the relation unordered, which is true when at least one operand value is NaN; NaN compares unordered with everything, including itself. Comparisons ignore the sign of zero, so +0 equals -0.

The comparison condition is a logical predicate, or equation, of the ordering relations such as less than or equal, equal, not less than, or unordered or equal. Compare distinguishes among the 16 comparison predicates. The Boolean result of the instruction is obtained by substituting the Boolean value of each ordering relation for the two FP values in the equation. If the equal relation is true, for example, then all four example predicates above yield a true result. If the unordered relation is true then only the final predicate, unordered or equal, yields a true result.

Logical negation of a compare result allows eight distinct comparisons to test for the 16 predicates as shown in Table 3.2. Each mnemonic tests for both a predicate and its logical negation. For each mnemonic, compare tests the truth of the first predicate. When the first predicate is true, the result is true as shown in the “If Predicate Is True” column, and the second predicate must be false, and vice versa. (Note that the False predicate is never true and False/True do not follow the normal pattern.)

The truth of the second predicate is the logical negation of the instruction result. After a compare instruction, test for the truth of the first predicate can be made with the Branch on FP True (BC1T) instruction and the truth of the second
can be made with Branch on FP False (BC1F).

Table 3.2 shows another set of eight compare operations, distinguished by a cond\textsubscript{3} value of 1 and testing the same 16 conditions. For these additional comparisons, if at least one of the operands is a NaN, including Quiet NaN, then an Invalid Operation condition is raised. If the Invalid Operation condition is enabled in the FCSR, an Invalid Operation exception occurs.

### Table 3.1 FPU Comparisons Without Special Operand Exceptions

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Comparison Predicate</th>
<th>Relation Values</th>
<th>Comparison CC Result</th>
<th>Inv Op Excp. if QNaN?</th>
<th>Condition Field</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cond Mnemonic</td>
<td>Name of Predicate and Logically Negated Predicate (Abbreviation)</td>
<td>&gt;</td>
<td>&lt;</td>
<td>≠</td>
<td>?</td>
</tr>
<tr>
<td>F</td>
<td>False [this predicate is always False]</td>
<td>F</td>
<td>F</td>
<td>F</td>
<td>F</td>
</tr>
<tr>
<td>True (T)</td>
<td></td>
<td>T</td>
<td>T</td>
<td>T</td>
<td>T</td>
</tr>
<tr>
<td>UN</td>
<td>Unordered</td>
<td>F</td>
<td>F</td>
<td>F</td>
<td>T</td>
</tr>
<tr>
<td>Ordered (OR)</td>
<td></td>
<td>T</td>
<td>T</td>
<td>T</td>
<td>F</td>
</tr>
<tr>
<td>EQ</td>
<td>Equal</td>
<td>F</td>
<td>F</td>
<td>T</td>
<td>F</td>
</tr>
<tr>
<td>Not Equal (NEQ)</td>
<td></td>
<td>T</td>
<td>T</td>
<td>F</td>
<td>T</td>
</tr>
<tr>
<td>UEQ</td>
<td>Unordered or Equal</td>
<td>F</td>
<td>F</td>
<td>T</td>
<td>T</td>
</tr>
<tr>
<td>Ordered or Greater Than or Less Than (OGL)</td>
<td></td>
<td>T</td>
<td>T</td>
<td>F</td>
<td>F</td>
</tr>
<tr>
<td>OLT</td>
<td>Ordered or Less Than</td>
<td>F</td>
<td>T</td>
<td>F</td>
<td>F</td>
</tr>
<tr>
<td>Unordered or Greater Than or Equal (UGE)</td>
<td></td>
<td>T</td>
<td>F</td>
<td>T</td>
<td>T</td>
</tr>
<tr>
<td>ULT</td>
<td>Unordered or Less Than</td>
<td>F</td>
<td>T</td>
<td>F</td>
<td>T</td>
</tr>
<tr>
<td>Ordered or Greater Than or Equal (OGE)</td>
<td></td>
<td>T</td>
<td>F</td>
<td>T</td>
<td>F</td>
</tr>
<tr>
<td>OLE</td>
<td>Ordered or Less Than or Equal</td>
<td>F</td>
<td>T</td>
<td>T</td>
<td>F</td>
</tr>
<tr>
<td>Unordered or Greater Than (UGT)</td>
<td></td>
<td>T</td>
<td>F</td>
<td>F</td>
<td>T</td>
</tr>
<tr>
<td>ULE</td>
<td>Unordered or Less Than or Equal</td>
<td>F</td>
<td>T</td>
<td>T</td>
<td>T</td>
</tr>
<tr>
<td>Ordered or Greater Than (OGT)</td>
<td></td>
<td>T</td>
<td>F</td>
<td>F</td>
<td>F</td>
</tr>
</tbody>
</table>

Key: ? = unordered, > = greater than, < = less than, = is equal, T = True, F = False
### Table 3.2 FPU Comparisons With Special Operand Exceptions for QNaNs

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Comparison Predicate</th>
<th>Relation Values</th>
<th>Comparison CC Result</th>
<th>Instruction</th>
<th>Condition Field</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Cond Mnemonic</strong></td>
<td><strong>Name of Predicate and Logically Negated Predicate (Abbreviation)</strong></td>
<td><strong>Relation Values</strong></td>
<td><strong>If Predicate is True</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SF</td>
<td>Signaling True (ST)</td>
<td>T T T T</td>
<td>T</td>
<td></td>
<td>1 0</td>
</tr>
<tr>
<td>NGLE</td>
<td>Not Greater Than or Less Than or Equal</td>
<td>F F T T</td>
<td>T</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>SEQ</td>
<td>Signaling Equal</td>
<td>F F T T</td>
<td>T</td>
<td></td>
<td>2</td>
</tr>
<tr>
<td>NGL</td>
<td>Not Greater Than or Less Than</td>
<td>F F T T</td>
<td>T</td>
<td></td>
<td>3</td>
</tr>
<tr>
<td>LT</td>
<td>Less Than</td>
<td>F T F F</td>
<td>T</td>
<td></td>
<td>4</td>
</tr>
<tr>
<td>NGE</td>
<td>Not Greater Than or Equal</td>
<td>F T F T</td>
<td>T</td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>LE</td>
<td>Less Than or Equal</td>
<td>F T T F</td>
<td>T</td>
<td></td>
<td>6</td>
</tr>
<tr>
<td>NGT</td>
<td>Not Greater Than</td>
<td>F T T T</td>
<td>T</td>
<td></td>
<td>7</td>
</tr>
<tr>
<td></td>
<td>Greater Than (GT)</td>
<td>T F F F</td>
<td>F</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Key: ? = unordered, > = greater than, < = less than, = is equal, T = True, F = False

### Restrictions:

The fields $fs$ and $ft$ must specify FPRs valid for operands of type $fmt$. If the fields are not valid, the result is **UNPREDICTABLE**.

The operands must be values in format $fmt$; if they are not, the result is **UNPREDICTABLE** and the value of the operand FPRs becomes **UNPREDICTABLE**.

The result of C.cond.PS is **UNPREDICTABLE** if the processor is executing in the $FR=0$ 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the $FR=1$ mode, but not with $FR=0$, and not on a 32-bit FPU.

The result of C.cond.PS is **UNPREDICTABLE** if the condition code number is odd.

### Availability and Compatibility:

This instruction has been removed in Release 6 and has been replaced by the ‘CMP.cond.fmt’ instruction. Refer to the CMP.cond.fmt instruction in this manual for more information. Release 6 does not support Paired Single (PS).

### Operation:

```plaintext
if SNaN(ValueFPR(fs, fmt)) or SNaN(ValueFPR(ft, fmt)) or QNaN(ValueFPR(fs, fmt)) or QNaN(ValueFPR(ft, fmt)) then
  less ← false
  equal ← false
  unordered ← true
else
  if (SNaN(ValueFPR(fs,fmt)) or SNaN(ValueFPR(ft,fmt))) or (cond3 and (QNaN(ValueFPR(fs,fmt)) or QNaN(ValueFPR(ft,fmt)))) then
    less ← false
    equal ← false
    unordered ← true
  else
    less ← true
    equal ← true
    unordered ← true
end if
```

The MIPS32® Instruction Set Manual, Revision 6.04
Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
SignalException(InvalidOperation)
endif
else
    less ← ValueFPR(fs, fmt) <fmt ValueFPR(ft, fmt)
    equal ← ValueFPR(fs, fmt) =fmt ValueFPR(ft, fmt)
    unordered ← false
endif
condition ← (cond₂ and less) or (cond₁ and equal)
            or (cond₀ and unordered)
SetFPConditionCode(cc, condition)

For C.cond.PS, the pseudo code above is repeated for both halves of the operand registers, treating each half as an independent single-precision values. Exceptions on the two halves are logically ORed and reported together. The results of the lower half comparison are written to condition code CC; the results of the upper half comparison are written to condition code CC+1.

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Unimplemented Operation, Invalid Operation

Programming Notes:
FP computational instructions, including compare, that receive an operand value of Signaling NaN raise the Invalid Operation condition. Comparisons that raise the Invalid Operation condition for Quiet NaNs in addition to SNaNs permit a simpler programming model if NaNs are errors. Using these compares, programs do not need explicit code to check for QNaNs causing the unordered relation. Instead, they take an exception and allow the exception handling system to deal with the error when it occurs. For example, consider a comparison in which we want to know if two numbers are equal, but for which unordered would be an error.

# comparisons using explicit tests for QNaN
  c.eq.d $f2,$f4  # check for equal
  nop
  bcmov $f2,$f4, 0  # it is equal
  c.un.d $f2,$f4  # it is not equal,
                  # but might be unordered
  bcmov $f2,$f4, 1  # unordered goes off to an error handler
# not-equal-case code here
...
# equal-case code here
L2:
# ---------------------------------------------
# comparison using comparisons that signal QNaN
  c.seq.d $f2,$f4  # check for equal
  nop
  bcmov $f2,$f4, 0  # it is equal
  nop
# it is not unordered here
...
# not-equal-case code here
...
# equal-case code here
Pre-Release 6

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>6</td>
<td>5</td>
<td>op</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Release 6

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>offset</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Format: CACHE op, offset(base)

Purpose: Perform Cache Operation

To perform the cache operation specified by op.

Description:

The 16-bit offset is sign-extended and added to the contents of the base register to form an effective address. The effective address is used in one of the following ways based on the operation to be performed and the type of cache as described in the following table.

Table 3.3 Usage of Effective Address

<table>
<thead>
<tr>
<th>Operation Requires an</th>
<th>Type of Cache</th>
<th>Usage of Effective Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>Address</td>
<td>Virtual</td>
<td>The effective address is used to address the cache. An address translation may or may not be performed on the effective address (with the possibility that a TLB Refill or TLB Invalid exception might occur)</td>
</tr>
<tr>
<td>Index</td>
<td>N/A</td>
<td>The effective address is translated by the MMU to a physical address. It is implementation dependent whether the effective address or the translated physical address is used to address the cache. As such, an unmapped address (such as within kseg0) should always be used for cache operations that require an index. See the Programming Notes section below.</td>
</tr>
</tbody>
</table>

Assuming that the total cache size in bytes is CS, the associativity is A, and the number of bytes per tag is BPT, the following calculations give the fields of the address which specify the way and the index:

\[
\begin{align*}
\text{OffsetBit} & \leftarrow \log_2 (\text{BPT}) \\
\text{IndexBit} & \leftarrow \log_2 (\frac{\text{CS}}{\text{A}}) \\
\text{WayBit} & \leftarrow \text{IndexBit} + \lceil \log_2 (\text{A}) \rceil \\
\text{Way} & \leftarrow \text{Addr}_{\text{WayBit}-1..\text{IndexBit}} \\
\text{Index} & \leftarrow \text{Addr}_{\text{IndexBit}-1..\text{OffsetBit}}
\end{align*}
\]

For a direct-mapped cache, the Way calculation is ignored and the Index value fully specifies the cache tag. This is shown symbolically in the figure below.
A TLB Refill and TLB Invalid (both with cause code equal TLBL) exception can occur on any operation. For index operations (where the address is used to index the cache but need not match the cache tag), software must use unmapped addresses to avoid TLB exceptions. This instruction never causes TLB Modified exceptions nor TLB Refill exceptions with a cause code of TLBS. This instruction never causes Execute-Inhibit nor Read-Inhibit exceptions.

The effective address may be an arbitrarily-aligned by address. The CACHE instruction never causes an Address Error Exception due to an non-aligned address.

As a result, a Cache Error exception may occur because of some operations performed by this instruction. For example, if a Writeback operation detects a cache or bus error during the processing of the operation, that error is reported via a Cache Error exception. Also, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error. However, cache error exceptions must not be triggered by an Index Load Tag or Index Store tag operation, as these operations are used for initialization and diagnostic purposes.

An Address Error Exception (with cause code equal AdEL) may occur if the effective address references a portion of the kernel address space which would normally result in such an exception. It is implementation dependent whether such an exception does occur.

It is implementation dependent whether a data watch is triggered by a cache instruction whose address matches the Watch register address match conditions.

The CACHE instruction and the memory transactions which are sourced by the CACHE instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.

Bits [17:16] of the instruction specify the cache on which to perform the operation, as follows:

### Table 3.4 Encoding of Bits[17:16] of CACHE Instruction

<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b00</td>
<td>I</td>
<td>Primary Instruction</td>
</tr>
<tr>
<td>0b01</td>
<td>D</td>
<td>Primary Data or Unified Primary</td>
</tr>
<tr>
<td>0b10</td>
<td>T</td>
<td>Tertiary</td>
</tr>
<tr>
<td>0b11</td>
<td>S</td>
<td>Secondary</td>
</tr>
</tbody>
</table>

Bits [20:18] of the instruction specify the operation to perform. To provide software with a consistent base of cache operations, certain encodings must be supported on all processors. The remaining encodings are recommended.

When implementing multiple level of caches and where the hardware maintains the smaller cache as a proper subset of a larger cache (every address which is resident in the smaller cache is also resident in the larger cache; also known as the inclusion property). It is recommended that the CACHE instructions which operate on the larger, outer-level cache; must first operate on the smaller, inner-level cache. For example, a Hit_Writeback_Invalidate operation targeting the Secondary cache, must first operate on the primary data cache first. If the CACHE instruction implementation does not follow this policy then any software which flushes the caches must mimic this behavior. That is, the software sequences must first operate on the inner cache then operate on the outer cache. The software must place a SYNC instruction after the CACHE instruction whenever there are possible writebacks from the inner cache to
ensure that the writeback data is resident in the outer cache before operating on the outer cache. If neither the CACHE
instruction implementation nor the software cache flush sequence follow this policy, then the inclusion property of
the caches can be broken, which might be a condition that the cache management hardware cannot properly deal with.

When implementing multiple level of caches without the inclusion property, the use of a SYNC instruction after the
CACHE instruction is still needed whenever writeback data has to be resident in the next level of memory hierarchy.

For multiprocessor implementations that maintain coherent caches, some of the Hit type of CACHE instruction oper-
ations may optionally affect all coherent caches within the implementation. If the effective address uses a coherent
Cache Coherency Attribute (CCA), then the operation is globalized, meaning it is broadcast to all of the coherent
caches within the system. If the effective address does not use one of the coherent CCAs, there is no broadcast of the
operation. If multiple levels of caches are to be affected by one CACHE instruction, all of the affected cache levels
must be processed in the same manner - either all affected cache levels use the globalized behavior or all affected
cache levels use the non-globalized behavior.

### Table 3.5 Encoding of Bits [20:18] of the CACHE Instruction

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Operand Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
</table>
| 0b000 | I      | Index Invalidate | Index | Set the state of the cache block at the specified index to invalid. 
This required encoding may be used by software to invalidate the entire instruction cache by stepping through all valid indices. | Required |
|      | D      | Index Writeback Invalidate / Index Invalidate | Index | For a write-back cache: If the state of the cache block at the specified index is valid and dirty, write the block back to the memory address specified by the cache tag. After that operation is completed, set the state of the cache block to invalid. If the block is valid but not dirty, set the state of the block to invalid. 
For a write-through cache: Set the state of the cache block at the specified index to invalid. 
This required encoding may be used by software to invalidate the entire data cache by stepping through all valid indices. The Index Store Tag must be used to initialize the cache at power up. | Required |
|      | S, T   | Index Writeback Invalidate / Index Invalidate | Index | Required if S, T cache is implemented |
| 0b001 | All    | Index Load Tag | Index | Read the tag for the cache block at the specified index into the TagLo and TagHi Coprocessor 0 registers. If the DataLo and DataHi registers are implemented, also read the data corresponding to the byte index into the DataLo and DataHi registers. This operation must not cause a Cache Error Exception. 
The granularity and alignment of the data read into the DataLo and DataHi registers is implementation-dependent, but is typically the result of an aligned access to the cache, ignoring the appropriate low-order bits of the byte index. | Recommended |
Table 3.5 Encoding of Bits [20:18] of the CACHE Instruction (Continued)

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Operand Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b010</td>
<td>All</td>
<td>Index Store Tag</td>
<td>Index</td>
<td>Write the tag for the cache block at the specified index from the TagLo and TagHi Coprocessor 0 registers. This operation must not cause a Cache Error Exception. This required encoding may be used by software to initialize the entire instruction or data caches by stepping through all valid indices. Doing so requires that the TagLo and TagHi registers associated with the cache be initialized first.</td>
<td>Required</td>
</tr>
<tr>
<td>0b011</td>
<td>All</td>
<td>Implementation Dependent</td>
<td>Unspecified</td>
<td>Available for implementation-dependent operation.</td>
<td>Optional</td>
</tr>
<tr>
<td>0b100</td>
<td>I, D</td>
<td>Hit Invalidate</td>
<td>Address</td>
<td>If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the instruction cache by stepping through the address range by the line size of the cache. In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Required (Instruction Cache Encoding Only), Recommended otherwise</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Invalidate</td>
<td>Address</td>
<td></td>
<td>Optional, if Hit_Invalidate_D is implemented, the S and T variants are recommended.</td>
</tr>
<tr>
<td>0b101</td>
<td>I</td>
<td>Fill</td>
<td>Address</td>
<td>Fill the cache from the specified address.</td>
<td>Recommended</td>
</tr>
<tr>
<td></td>
<td>D</td>
<td>Hit Writeback Inval- idate / Hit Invalidate</td>
<td>Address</td>
<td>For a write-back cache: If the cache block contains the specified address and it is valid and dirty, write the contents back to memory. After that operation is completed, set the state of the block to invalid. For a write-through cache: If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the data cache by stepping through the address range by the line size of the cache. In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Required</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Writeback Inval- idate / Hit Invalidate</td>
<td>Address</td>
<td></td>
<td>Required if S, T cache is implemented</td>
</tr>
</tbody>
</table>
Perform Cache Operation

Table 3.5 Encoding of Bits [20:18] of the CACHE Instruction (Continued)

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Opera and Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b110</td>
<td>D</td>
<td>Hit Writeback</td>
<td>Address</td>
<td>If the cache block contains the specified address and it is valid and dirty, write the contents back to memory. After the operation is completed, leave the state of the line valid, but clear the dirty state. For a write-through cache, this operation may be treated as a nop. In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Recommended</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Writeback</td>
<td>Address</td>
<td>Recommended</td>
<td>Optional, if Hit_Writeback_D is implemented, the S and T variants are recommended.</td>
</tr>
<tr>
<td>0b111</td>
<td>I, D</td>
<td>Fetch and Lock</td>
<td>Address</td>
<td>If the cache does not contain the specified address, fill it from memory, performing a writeback if required. Set the state to valid and locked. If the cache already contains the specified address, set the state to locked. In set-associative or fully-associative caches, the way selected on a fill from memory is implementation dependent. The lock state may be cleared by executing an Index Invalidate, Index Writeback Invalidate, Hit Invalidate, or Hit Writeback Invalidate operation to the locked line, or via an Index Store Tag operation to the line that clears the lock bit. Clearing the lock state via Index Store Tag is dependent on the implementation-dependent cache tag and cache line organization, and that Index and Index Writeback Invalidate operations are dependent on cache line organization. Only Hit and Hit Writeback Invalidate operations are generally portable across implementations. It is implementation dependent whether a locked line is displaced as the result of an external invalidate or intervention that hits on the locked line. Software must not depend on the locked line remaining in the cache if an external invalidate or intervention would invalidate the line if it were not locked. It is implementation dependent whether a Fetch and Lock operation affects more than one line. For example, more than one line around the referenced address may be fetched and locked. It is recommended that only the single line containing the referenced address be affected.</td>
<td>Recommended</td>
</tr>
</tbody>
</table>
Restrictions:
The operation of this instruction is **UNDEFINED** for any operation/cache combination that is not implemented. In Release 6, the instruction in this case should perform no operation.

The operation of this instruction is **UNDEFINED** if the operation requires an address, and that address is uncacheable. In Release 6, the instruction in this case should perform no operation.

The operation of the instruction is **UNPREDICTABLE** if the cache line that contains the CACHE instruction is the target of an invalidate or a writeback invalidate.

If this instruction is used to lock all ways of a cache at a specific cache index, the behavior of that cache to subsequent cache misses to that cache index is **UNDEFINED**.

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Any use of this instruction that can cause cacheline writebacks should be followed by a subsequent SYNC instruction to avoid hazards where the writeback data is not yet visible at the next level of the memory hierarchy.

This instruction does not produce an exception for a misaligned memory address, since it has no memory access size.

Availability and Compatibility:
This instruction has been recoded for Release 6.

Operation:

\[
\text{vAddr} \leftarrow \text{GPR}[\text{base}] + \text{sign extend}(\text{offset})
\]

\[
(\text{pAddr, uncached}) \leftarrow \text{AddressTranslation(vAddr, DataReadReference)}
\]

\[
\text{CacheOp}(\text{op, vAddr, pAddr})
\]

Exceptions:

- TLB Refill Exception
- TLB Invalid Exception
- Coprocessor Unusable Exception
- Address Error Exception
- Cache Error Exception
- Bus Error Exception

Programming Notes:

Release 6 architecture implements a 9-bit offset, whereas all release levels lower than Release 6 implement a 16-bit offset.

For cache operations that require an index, it is implementation dependent whether the effective address or the translated physical address is used as the cache index. Therefore, the index value should always be converted to an unmapped address (such as an kseg0 address - by ORing the index with 0x80000000 before being used by the cache instruction). For example, the following code sequence performs a data cache Index Store Tag operation using the index passed in GPR a0:

```
li a1, 0x80000000 /* Base of kseg0 segment */
or a0, a0, a1 /* Convert index to kseg0 address */
cache DCIndexStTag, 0(a1) /* Perform the index store tag operation */
```
CACHE
Perform Cache Operation
**CACHEE IPerform Cache Operation EVA**

**Format:** CACHEE op, offset(base)

**Purpose:** Perform Cache Operation EVA

To perform the cache operation specified by op using a user mode virtual address while in kernel mode.

**Description:**

The 9-bit offset is sign-extended and added to the contents of the base register to form an effective address. The effective address is used in one of the following ways based on the operation to be performed and the type of cache as described in the following table.

### Table 3.6 Usage of Effective Address

<table>
<thead>
<tr>
<th>Operation Requires an</th>
<th>Type of Cache</th>
<th>Usage of Effective Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>Address</td>
<td>Virtual</td>
<td>The effective address is used to address the cache. An address translation may or may not be performed on the effective address (with the possibility that a TLB Refill or TLB Invalid exception might occur)</td>
</tr>
<tr>
<td>Address</td>
<td>Physical</td>
<td>The effective address is translated by the MMU to a physical address. The physical address is then used to address the cache</td>
</tr>
<tr>
<td>Index</td>
<td>N/A</td>
<td>The effective address is translated by the MMU to a physical address. It is implementation dependent whether the effective address or the translated physical address is used to index the cache. As such, a kseg0 address should always be used for cache operations that require an index. See the Programming Notes section below.</td>
</tr>
</tbody>
</table>

Assuming that the total cache size in bytes is CS, the associativity is A, and the number of bytes per tag is BPT, the following calculations give the fields of the address which specify the way and the index:

- OffsetBit = \(\log_2(BPT)\)
- IndexBit = \(\log_2(CS / A)\)
- WayBit = IndexBit + Ceiling(\(\log_2(A)\))
- Way = AddrWayBit-1..IndexBit
- Index = AddrIndexBit-1..OffsetBit

For a direct-mapped cache, the Way calculation is ignored and the Index value fully specifies the cache tag. This is shown symbolically in the figure below.

**Figure 3.4 Usage of Address Fields to Select Index and Way**

A TLB Refill and TLB Invalid (both with cause code equal TLBL) exception can occur on any operation. For index
operations (where the address is used to index the cache but need not match the cache tag) software should use unmapped addresses to avoid TLB exceptions. This instruction never causes TLB Modified exceptions nor TLB Refill exceptions with a cause code of TLBS. This instruction never causes Execute-Inhibit nor Read-Inhibit exceptions.

The effective address may be an arbitrarily-aligned by address. The CACHEE instruction never causes an Address Error Exception due to an non-aligned address.

A Cache Error exception may occur as a by-product of some operations performed by this instruction. For example, if a Writeback operation detects a cache or bus error during the processing of the operation, that error is reported via a Cache Error exception. Similarly, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error. However, cache error exceptions must not be triggered by an Index Load Tag or Index Store tag operation, as these operations are used for initialization and diagnostic purposes.

An Address Error Exception (with cause code equal AdEL) may occur if the effective address references a portion of the kernel address space which would normally result in such an exception. It is implementation dependent whether such an exception does occur.

It is implementation dependent whether a data watch is triggered by a cache instruction whose address matches the Watch register address match conditions.

The CACHEE instruction and the memory transactions which are sourced by the CACHEE instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.

Bits [17:16] of the instruction specify the cache on which to perform the operation, as follows:

<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b00</td>
<td>I</td>
<td>Primary Instruction</td>
</tr>
<tr>
<td>0b01</td>
<td>D</td>
<td>Primary Data or Unified Primary</td>
</tr>
<tr>
<td>0b10</td>
<td>T</td>
<td>Tertiary</td>
</tr>
<tr>
<td>0b11</td>
<td>S</td>
<td>Secondary</td>
</tr>
</tbody>
</table>

Bits [20:18] of the instruction specify the operation to perform. To provide software with a consistent base of cache operations, certain encodings must be supported on all processors. The remaining encodings are recommended when implementing multiple level of caches and where the hardware maintains the smaller cache as a proper subset of a larger cache, it is recommended that the CACHEE instructions must first operate on the smaller, inner-level cache. For example, a Hit_Writeback_Invalidate operation targeting the Secondary cache, must first operate on the primary data cache first. If the CACHEE instruction implementation does not follow this policy then any software which flushes the caches must mimic this behavior. That is, the software sequences must first operate on the inner cache then operate on the outer cache. The software must place a SYNC instruction after the CACHEE instruction whenever there are possible writebacks from the inner cache to ensure that the writeback data is resident in the outer cache before operating on the outer cache. If neither the CACHEE instruction implementation nor the software cache flush sequence follow this policy, then the inclusion property of the caches can be broken, which might be a condition that the cache management hardware cannot properly deal with.

When implementing multiple level of caches without the inclusion property, you must use SYNC instruction after the CACHEE instruction whenever writeback data has to be resident in the next level of memory hierarchy.

For multiprocessor implementations that maintain coherent caches, some of the Hit type of CACHEE instruction operations may optionally affect all coherent caches within the implementation. If the effective address uses a coherent Cache Coherency Attribute (CCA), then the operation is globalized, meaning it is broadcast to all of the coherent caches.
caches within the system. If the effective address does not use one of the coherent CCAs, there is no broadcast of the operation. If multiple levels of caches are to be affected by one CACHEE instruction, all of the affected cache levels must be processed in the same manner — either all affected cache levels use the globalized behavior or all affected cache levels use the non-globalized behavior.

The CACHEE instruction functions the same as the CACHE instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the $Config^{EVA}_5$ field being set to 1.

### Table 3.8 Encoding of Bits [20:18] of the CACHEE Instruction

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Operand Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b000</td>
<td>I</td>
<td>Index Invalidate</td>
<td>Index</td>
<td>Set the state of the cache block at the specified index to invalid. This required encoding may be used by software to invalidate the entire instruction cache by stepping through all valid indices.</td>
<td>Required</td>
</tr>
<tr>
<td></td>
<td>D</td>
<td>Index Writeback</td>
<td>Index</td>
<td>For a write-back cache: If the state of the cache block at the specified index is valid and dirty, write the block back to the memory address specified by the cache tag. After that operation is completed, set the state of the cache block to invalid. If the block is valid but not dirty, set the state of the block to invalid.</td>
<td>Required</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Index Writeback</td>
<td>Index</td>
<td>For a write-through cache: Set the state of the cache block at the specified index to invalid. This required encoding may be used by software to invalidate the entire data cache by stepping through all valid indices. Note that Index Store Tag should be used to initialize the cache at power up.</td>
<td>Required if S, T cache is implemented</td>
</tr>
<tr>
<td>0b001</td>
<td>All</td>
<td>Index Load Tag</td>
<td>Index</td>
<td>Read the tag for the cache block at the specified index into the $TagLo$ and $TagHi$ Coprocessor 0 registers. If the $DataLo$ and $DataHi$ registers are implemented, also read the data corresponding to the byte index into the $DataLo$ and $DataHi$ registers. This operation must not cause a Cache Error Exception. The granularity and alignment of the data read into the $DataLo$ and $DataHi$ registers is implementation-dependent, but is typically the result of an aligned access to the cache, ignoring the appropriate low-order bits of the byte index.</td>
<td>Recommended</td>
</tr>
</tbody>
</table>
Table 3.8 Encoding of Bits [20:18] of the CACHEE Instruction (Continued)

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Operand Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b010</td>
<td>All</td>
<td>Index Store Tag</td>
<td>Index</td>
<td>Write the tag for the cache block at the specified index from the TagLo and TagHi Coprocessor 0 registers. This operation must not cause a Cache Error Exception. This required encoding may be used by software to initialize the entire instruction or data caches by stepping through all valid indices. Doing so requires that the TagLo and TagHi registers associated with the cache be initialized first.</td>
<td>Required</td>
</tr>
<tr>
<td>0b011</td>
<td>All</td>
<td>Implementation Dependent</td>
<td>Unspecified</td>
<td>Available for implementation-dependent operation.</td>
<td>Optional</td>
</tr>
<tr>
<td>0b100</td>
<td>I, D</td>
<td>Hit Invalidate</td>
<td>Address</td>
<td>If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the instruction cache by stepping through the address range by the line size of the cache.</td>
<td>Required (Instruction Cache Encoding Only), Recommended otherwise</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Invalidate</td>
<td>Address</td>
<td>In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Optional, if Hit_Invalidate_D is implemented, the S and T variants are recommended.</td>
</tr>
<tr>
<td>0b101</td>
<td>I</td>
<td>Fill</td>
<td>Address</td>
<td>Fill the cache from the specified address.</td>
<td>Recommended</td>
</tr>
<tr>
<td></td>
<td>D</td>
<td>Hit Writeback invalidate / Hit Invalidate</td>
<td>Address</td>
<td>For a write-back cache: If the cache block contains the specified address and it is valid and dirty, write the contents back to memory. After that operation is completed, set the state of the cache block to invalid. If the block is valid but not dirty, set the state of the block to invalid.</td>
<td>Required</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Writeback invalidate / Hit Invalidate</td>
<td>Address</td>
<td>For a write-through cache: If the cache block contains the specified address, set the state of the cache block to invalid. This required encoding may be used by software to invalidate a range of addresses from the data cache by stepping through the address range by the line size of the cache. In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Required if S, T cache is implemented</td>
</tr>
</tbody>
</table>
Table 3.8 Encoding of Bits [20:18] of the CACHEE Instruction (Continued)

<table>
<thead>
<tr>
<th>Code</th>
<th>Caches</th>
<th>Name</th>
<th>Effective Address Operand Type</th>
<th>Operation</th>
<th>Compliance Implemented</th>
</tr>
</thead>
<tbody>
<tr>
<td>0b110</td>
<td>D</td>
<td>Hit Writeback</td>
<td>Address</td>
<td>If the cache block contains the specified address and it is valid and dirty, write the contents back to memory. After the operation is completed, leave the state of the line valid, but clear the dirty state. For a write-through cache, this operation may be treated as a nop. In multiprocessor implementations with coherent caches, the operation may optionally be broadcast to all coherent caches within the system.</td>
<td>Recommended</td>
</tr>
<tr>
<td></td>
<td>S, T</td>
<td>Hit Writeback</td>
<td>Address</td>
<td></td>
<td>Optional, if Hit_Writeback_D is implemented, the S and T variants are recommended.</td>
</tr>
<tr>
<td>0b111</td>
<td>I, D</td>
<td>Fetch and Lock</td>
<td>Address</td>
<td>If the cache does not contain the specified address, fill it from memory, performing a writeback if required. Set the state to valid and locked. If the cache already contains the specified address, set the state to locked. In set-associative or fully-associative caches, the way selected on a fill from memory is implementation dependent. The lock state may be cleared by executing an Index Invalidate, Index Writeback Invalidate, Hit Invalidate, or Hit Writeback Invalidate operation to the locked line, or via an Index Store Tag operation to the line that clears the lock bit. Clearing the lock state via Index Store Tag is dependent on the implementation-dependent cache tag and cache line organization, and that Index and Index Writeback Invalidate operations are dependent on cache line organization. Only Hit and Hit Writeback Invalidate operations are generally portable across implementations. It is implementation dependent whether a locked line is displaced as the result of an external invalidate or intervention that hits on the locked line. Software must not depend on the locked line remaining in the cache if an external invalidate or intervention would invalidate the line if it were not locked. It is implementation dependent whether a Fetch and Lock operation affects more than one line. For example, more than one line around the referenced address may be fetched and locked. It is recommended that only the single line containing the referenced address be affected.</td>
<td>Recommended</td>
</tr>
</tbody>
</table>
Restrictions:

The operation of this instruction is **UNDEFINED** for any operation/cache combination that is not implemented. In Release 6, the instruction in this case should perform no operation.

The operation of this instruction is **UNDEFINED** if the operation requires an address, and that address is uncacheable. In Release 6, the instruction in this case should perform no operation.

The operation of the instruction is **UNPREDICTABLE** if the cache line that contains the CACHEE instruction is the target of an invalidate or a writeback invalidate.

If this instruction is used to lock all ways of a cache at a specific cache index, the behavior of that cache to subsequent cache misses to that cache index is **UNDEFINED**.

Any use of this instruction that can cause cacheline writebacks should be followed by a subsequent SYNC instruction to avoid hazards where the writeback data is not yet visible at the next level of the memory hierarchy.

Only usable when access to Coprocessor 0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

This instruction does not produce an exception for a misaligned memory address, since it has no memory access size.

Operation:

\[
\begin{align*}
\text{vAddr} &\leftarrow \text{GPR[base]} + \text{sign\_extend(offset)} \\
(p\text{Addr, uncached}) &\leftarrow \text{AddressTranslation(vAddr, DataReadReference)} \\
\text{CacheOp}(op, \text{vAddr, pAddr}) 
\end{align*}
\]

Exceptions:

- TLB Refill Exception.
- TLB Invalid Exception
- Coprocessor Unusable Exception
- Reserved Instruction
- Address Error Exception
- Cache Error Exception
- Bus Error Exception

Programming Notes:

For cache operations that require an index, it is implementation dependent whether the effective address or the translated physical address is used as the cache index. Therefore, the index value should always be converted to a kseg0 address by ORing the index with 0x80000000 before being used by the cache instruction. For example, the following code sequence performs a data cache Index Store Tag operation using the index passed in GPR a0:

```plaintext
li a1, 0x80000000 /* Base of kseg0 segment */
or a0, a0, a1 /* Convert index to kseg0 address */
cache DCIndexStTag, 0(a1) /* Perform the index store tag operation */
```
CEIL.L.fmt IFixed Point Ceiling Convert to Long Fixed Point

The MIPS32® Instruction Set Manual, Revision 6.04 125
Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Format:

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td></td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

CEIL.L.S fd, fs
CEIL.L.D fd, fs

MIPS32 Release 2
MIPS32 Release 2

Purpose: Fixed Point Ceiling Convert to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding up.

Description:

FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounding toward +∞ (rounding mode 2). The result is placed in FPR fd.

When the source value is Infinity, NaN, or rounds to an integer outside the range -2^{63} to 2^{63}-1, the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to fd. On cores with FCSR{NAN2008}=0, the default result is 2^{63}–1. On cores with FCSR{NAN2008}=1, the default result is:

- 0 when the input value is NaN
- 2^{63}–1 when the input value is +∞ or rounds to a number larger than 2^{63}–1
- -2^{63}–1 when the input value is –∞ or rounds to a number smaller than -2^{63}–1

Restrictions:
The fields fs and fd must specify valid FPRs: fs for type fmt and fd for long fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Operation:

StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Invalid Operation, Unimplemented Operation, Inexact
CEIL.W.fmt Floating Point Ceiling Convert to Word Fixed Point

Format:

CEIL.W.fmt

CEIL.W.S fd, fs

CEIL.W.D fd, fs

MIPS32

Purpose: Floating Point Ceiling Convert to Word Fixed Point

To convert an FP value to 32-bit fixed point, rounding up

Description:

\[ \text{FPR}[fd] \leftarrow \text{convert_and_round(FPR}[fs]) \]

The value in FPR \( fs \), in format \( fmt \), is converted to a value in 32-bit word fixed point format and rounding toward +\( \times \) (rounding mode 2). The result is placed in FPR \( fd \).

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{31} \) to \(2^{31}-1\), the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to \( fd \) and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to \( fd \). On cores with FCSR\_NAN2008=0, the default result is \(2^{31}-1\). On cores with FCSR\_NAN2008=1, the default result is:

- 0 when the input value is NaN
- \(2^{31}-1\) when the input value is \( +\infty \) or rounds to a number larger than \(2^{31}-1\)
- \(-2^{31}-1\) when the input value is \( -\infty \) or rounds to a number smaller than \(-2^{31}-1\)

Restrictions:

The fields \( fs \) and \( fd \) must specify valid FPRs; \( fs \) for type \( fmt \) and \( fd \) for word fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

Operation:

\[ \text{StoreFPR}(fd, W, \text{ConvertFmt(ValueFPR}(fs, fmt), fmt, W)) \]

Exceptions:

Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:

Invalid Operation, Unimplemented Operation, Inexact
Format:  

CFC1 rt, fs

Purpose:  Move Control Word From Floating Point
To copy a word from an FPU control register to a GPR.

Description:  
GPR[rt] ← FP_Control[fs]  
Copy the 32-bit word from FP (coprocessor 1) control register fs into GPR rt.

The definition of this instruction has been extended in Release 5 to support user mode read and write of StatusFR under the control of Config5_UFR. This optional feature is meant to facilitate transition from FR=0 to FR=1 floating-point register modes in order to obsolete FR=0 mode in a future architecture release. User code may set and clear StatusFR without kernel intervention, providing kernel explicitly provides permission.

This UFR facility is not supported in Release 6 because Release 6 only allows FR=1 mode. Accessing the UFR and UNFR registers causes a Reserved Instruction exception in Release 6 because FIR_UFRP is always 0.

The definition of this instruction has been extended in Release 6 to allow user code to read and modify the Config5_FRE bit. Such modification is allowed when this bit is present (as indicated by FIR_UFRP) and user mode modification of the bit is enabled by the kernel (as indicated by Config5_UFE). Setting Config5_FRE to 1 causes all floating point instructions which are not compatible with FR=1 mode to take an Reserved Instruction exception. This makes it possible to run pre-Release 6 FR=0 floating point code on a Release 6 core which only supports FR=1 mode, provided the kernel has been set up to trap and emulate FR=0 behavior for these instructions. These instructions include floating-point arithmetic instructions that read/write single-precision registers, LWC1, SWC1, MTC1, and MFC1 instructions.

The FRE facility uses COP1 register aliases FRE and NFRE to access Config5_FRE.

Restrictions:
There are a few control registers defined for the floating point unit. Prior to Release 6, the result is UNPREDICTABLE if fs specifies a register that does not exist. In Release 6 and later, a Reserved Instruction exception occurs if fs specifies a register that does not exist.

The result is UNPREDICTABLE if fs specifies the UNFR or NFRE write-only control. Release 6 and later implementations are required to produce a Reserved Instruction exception; software must assume it is UNPREDICTABLE.

Operation:

if fs = 0 then
    temp ← FIR
elseif fs = 1 then /* read UFR (COP1 Register 1) */
    if FIR_UFRP then
        if not Config5_UFR then SignalException(ReservedInstruction) endif
        temp ← StatusFR
    else
        if Config_AR ≥ 2 SignalException(ReservedInstruction) /* Release 6 traps *// endif
        temp ← UNPREDICTABLE
    endif
endif
else if \( fs = 4 \) then /* read \( fs=4 \) UNFR not supported for reading - UFR suffices */
    if \( \text{ConfigAR} \geq 2 \) then SignalException(ReservedInstruction) /* Release 6 traps */
    endif
    temp ← UNPREDICTABLE
else if \( fs = 5 \) then /* user read of FRE, if permitted */
    if \( \text{ConfigAR} \leq 2 \) then temp ← UNPREDICTABLE
    else
        if not \( \text{Config5UFR} \) then SignalException(ReservedInstruction) endif
    temp ← \( \text{0}^{31} \) || \( \text{Config5FRE} \)
    endif
else if \( fs = 25 \) then /* FCCR */
    temp ← \( \text{0}^{24} \) || \( \text{FCSR}_{31..25} \) || \( \text{FCSR}_{23} \)
else if \( fs = 26 \) then /* FEXR */
    temp ← \( \text{0}^{14} \) || \( \text{FCSR}_{17..12} \) || \( \text{0}^{5} \) || \( \text{FCSR}_{6..2} \) || \( \text{0}^{2} \)
else if \( fs = 28 \) then /* FENR */
    temp ← \( \text{0}^{20} \) || \( \text{FCSR}_{11..7} \) || \( \text{0}^{4} \) || \( \text{FCSR}_{24} \) || \( \text{FCSR}_{1..0} \)
else if \( fs = 31 \) then /* FCSR */
    temp ← \( \text{FCSR} \)
else
    if \( \text{Config2AR} \geq 2 \) then SignalException(ReservedInstruction)
    /* Release 6 traps; includes NFRE*/
    endif
    temp ← UNPREDICTABLE
endif

if \( \text{Config2AR} < 2 \) then
    \( \text{GPR}[rt] \) ← temp
endif

Exceptions:
Coprocessor Unusable, Reserved Instruction

Historical Information:
For the MIPS I, II and III architectures, the contents of GPR \( rt \) are UNPREDICTABLE for the instruction immediately following CFC1.

MIPS V and MIPS32 introduced the three control registers that access portions of FCSR. These registers were not available in MIPS I, II, III, or IV.

MIPS32 Release 5 introduced the UFR and UNFR register aliases that allow user level access to \( \text{Status}_{FR} \). Release 6 removes them.
CFC2  IMove Control Word From Coprocessor 2

Format:  CFC2 rt, Impl  

The syntax shown above is an example using CFC1 as a model. The specific syntax is implementation dependent.

Purpose:  Move Control Word From Coprocessor 2

To copy a word from a Coprocessor 2 control register to a GPR

Description:  GPR[rt] ← CP2CCR[Impl]

Copy the 32-bit word from the Coprocessor 2 control register denoted by the Impl field. The interpretation of the Impl field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

Restrictions:

The result is UNPREDICTABLE if Impl specifies a register that does not exist.

Operation:

```
  temp ← CP2CCR[Impl]
  GPR[rt] ← temp
```

Exceptions:

Coprocessor Unusable, Reserved Instruction
Format:  CLASS.fmt
        CLASS.S fd,fs  MIPS32 Release 6
        CLASS.D fd,fs  MIPS32 Release 6

Purpose:  Scalar Floating-Point Class Mask

Scalar floating-point class shown as a bit mask for Zero, Negative, Infinite, Subnormal, Quiet NaN, or Signaling NaN.

Description:  FPR[fd] ← class(FPR[fs])

 Stores in fd a bit mask reflecting the floating-point class of the floating point scalar value fs.

The mask has 10 bits as follows. Bits 0 and 1 indicate NaN values: signaling NaN (bit 0) and quiet NaN (bit 1). Bits 2, 3, 4, 5 classify negative values: infinity (bit 2), normal (bit 3), subnormal (bit 4), and zero (bit 5). Bits 6, 7, 8, 9 classify positive values: infinity (bit 6), normal (bit 7), subnormal (bit 8), and zero (bit 9).

This instruction corresponds to the class operation of the IEEE Standard for Floating-Point Arithmetic 754\textsuperscript{TM}-2008. This scalar FPU instruction also corresponds to the vector FCLASS.df instruction of MSA.

The input values and generated bit masks are not affected by the flush-subnormal-to-zero mode FCSR.FS.

The input operand is a scalar value in floating-point data format fmt. Bits beyond the width of fmt are ignored. The result is a 10-bit bitmask as described above, zero extended to fmt-width bits. Coprocessor register bits beyond fmt-width bits are UNPREDICTABLE (e.g., for CLASS.S bits 32-63 are UNPREDICTABLE on a 64-bit FPU, while bits 32-128 bits are UNPREDICTABLE if the processor supports MSA).

Restrictions:

No data-dependent exceptions are possible.

Availability and Compatibility:

This instruction is introduced by and required as of Release 6.

CLASS.fmt is defined only for formats S and D. Other formats must produce a Reserved Instruction exception (unless used for a different instruction).

Operation:

```c
if not IsCoprocessorEnabled(1)
    then SignalException(CoprocessorUnusable, 1) endif
if not IsFloatingPointImplemented(fmt))
    then SignalException(ReservedInstruction) endif

fin ← ValueFPR(fs,fmt)
masktmp ← ClassFP(fin, fmt)
StoreFPR (fd, fmt, ftmp )
/* end of instruction */

function ClassFP(tt, ts, n)
    /* Implementation defined class operation. */
endfunction ClassFP
```

```c
31 26 25 21 20 16 15 11 10 6 5 0
<table>
<thead>
<tr>
<th>COPI</th>
<th>fmt</th>
<th>00000</th>
<th>fs</th>
<th>fd</th>
<th>CLASS</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>0 5</td>
<td>5</td>
<td>5</td>
<td>2</td>
<td>9</td>
</tr>
</tbody>
</table>
```

The input operand is a scalar value in floating-point data format fmt. Bits beyond the width of fmt are ignored. The result is a 10-bit bitmask as described above, zero extended to fmt-width bits. Coprocessor register bits beyond fmt-width bits are UNPREDICTABLE (e.g., for CLASS.S bits 32-63 are UNPREDICTABLE on a 64-bit FPU, while bits 32-128 bits are UNPREDICTABLE if the processor supports MSA).
Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Unimplemented Operation
**CLO Count Leading Ones in Word**

**Pre-Release 6**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL2 011100</td>
<td>rs</td>
<td>rt</td>
<td>rd</td>
<td>0 00000</td>
<td>CLO 100001</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Release 6**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL 000000</td>
<td>rs</td>
<td>00000</td>
<td>rd</td>
<td>00001</td>
<td>CLO 010001</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** CLO rd, rs  

**Purpose:** Count Leading Ones in Word

To count the number of leading ones in a word.

**Description:** GPR[rd] ← count_leading_ones GPR[rs]

Bits 31..0 of GPR rs are scanned from most significant to least significant bit. The number of leading ones is counted and the result is written to GPR rd. If all of bits 31..0 were set in GPR rs, the result written to GPR rd is 32.

**Restrictions:**

Pre-Release 6: To be compliant with the MIPS32 Architecture, software must place the same GPR number in both the rt and rd fields of the instruction. The operation of the instruction is UNPREDICTABLE if the rt and rd fields of the instruction contain different values. Release 6’s new instruction encoding does not contain an rt field.

**Availability and Compatibility:**

This instruction has been recoded for Release 6.

**Operation:**

```plaintext
temp ← 32
for i in 31 .. 0
    if GPR[rs]i = 0 then
        temp ← 31 - i
        break
    endif
endfor
GPR[rd] ← temp
```

**Exceptions:**

None

**Programming Notes:**

As shown in the instruction drawing above, the Release 6 architecture sets the ‘rt’ field to a value of 00000.
Count Leading Zeros in Word

Format: CLZ rd, rs

Purpose: Count Leading Zeros in Word
Count the number of leading zeros in a word.

Description: GPR[rd] ← count_leading_zeros GPR[rs]
Bits 31..0 of GPR rs are scanned from most significant to least significant bit. The number of leading zeros is counted and the result is written to GPR rd. If no bits were set in GPR rs, the result written to GPR rd is 32.

Restrictions:
Pre-Release 6: To be compliant with the MIPS32 Architecture, software must place the same GPR number in both the rt and rd fields of the instruction. The operation of the instruction is UNPREDICTABLE if the rt and rd fields of the instruction contain different values. Release 6’s new instruction encoding does not contain an rt field.

Availability and Compatibility:
This instruction has been recoded for Release 6.

Operation:
```
temp ← 32
for i in 31 .. 0
   if GPR[rs]_i = 1 then
      temp ← 31 - i
      break
   endif
endfor
GPR[rd] ← temp
```

Exceptions:
None

Programming Notes:
Release 6 sets the ‘rt’ field to a value of 00000.
CMP.condn.fmt Floating Point Compare Setting Mask

Format:

CMP.condn.fmt

CMP.condn.S fd, fs, ft  
MIPS32 Release 6

CMP.condn.D fd, fs, ft  
MIPS32 Release 6

Purpose: Floating Point Compare Setting Mask

To compare FP values and record the result as a format-width mask of all 0s or all 1s in a floating point register.

Description:

FPR[fd]  FPR[fs] compare_cond FPR[ft]

The value in FPR fs is compared to the value in FPR ft.

The comparison is exact and neither overflows nor underflows.

If the comparison specified by the condn field of the instruction is true for the operand values, the result is true; otherwise, the result is false. If no exception is taken, the result is written into FPR fd; true is all 1s and false is all 0s, repeated the operand width of fmt. All other bits beyond the operand width fmt are UNPREDICTABLE. For example, a 32-bit single precision comparison writes a mask of 32 0s or 1s into bits 0 to 31 of FPR fd. It makes bits 32 to 63 UNPREDICTABLE if a 64-bit FPU without MSA is present. It makes bits 32 to 127 UNPREDICTABLE if MSA is present.

All encodings of the condn field that are not specified (for example, items shaded in Table 3.9) are reserved in Release 6 and produce a Reserved Instruction exception.

Release 6: The condn field bits have specific purposes: cond4, cond2..1 specify the nature of the comparison (equals, less than, and so on); cond0 specifies whether the comparison is ordered or unordered, that is false or true if any operand is a NaN; cond3 indicates whether the instruction should signal an exception on QNaN inputs. However, in the future the MIPS ISA may be extended in ways that do not preserve these meanings.

There are four mutually exclusive ordering relations for comparing floating point values; one relation is always true and the others are false. The familiar relations are greater than, less than, and equal. In addition, the IEEE floating point standard defines the relation unordered, which is true when at least one operand value is NaN. NaN compares unordered with everything, including itself. Comparisons ignore the sign of zero, so +0 equals -0.

The comparison condition is a logical predicate, or equation, of the ordering relations such as less than or equal, equal, not less than, or unordered or equal. Compare distinguishes among the 16 comparison predicates. The Boolean result of the instruction is obtained by substituting the Boolean value of each ordering relation for the two FP values in the equation. For example: If the equal relation is true, then all four example predicates above yield a true result. If the unordered relation is true then only the final predicate, unordered or equal, yields a true result.
The predicates implemented are described in Table 3.9 “Comparing CMP.condn.fmt, IEEE 754-2008, C.cond.fmt, and MSA FP compares” on page 136. Not all of the 16 IEEE predicates are implemented directly by hardware. For the directed comparisons (LT, LE, GT, GE) the missing predicates can be obtained by reversing the FPR register operands $ft$ and $fs$. For example, the hardware implements the “Ordered Less Than” predicate $LT(fs,ft)$; reversing the operands $LT(ft,fs)$ produces the dual predicate “Unordered or Greater Than or Equal” $UGE(fs,ft)$. Table 3.9 shows these mappings. Reversing inputs is ineffective for the symmetric predicates such as EQ; Release 6 implements these negative predicates directly, so that all mask values can be generated in a single instruction.

Table 3.9 compares CMP.condn.fmt to (1) the MIPS32 Pre-Release 6 C.cond.fmt instructions, and (2) the (MSA) MIPS SIMD Architecture packed vector floating point comparison instructions. CMP.condn.fmt provides exactly the same comparisons for FPU scalar values that MSA provides for packed vectors, with similar mnemonics. CMP.condn.fmt provides a superset of the MIPS32 Release 5 C.cond.fmt comparisons.

In addition, Table 3.9 shows the corresponding IEEE 754-2008 comparison operations.
## Table 3.9 Comparing CMP.condn.fmt, IEEE 754-2008, C.condn.fmt, and MSA FP compares

Shaded entries in the table are unimplemented, and reserved.

<table>
<thead>
<tr>
<th>Relation</th>
<th>MSA: operation</th>
<th>MSA: minor opcode</th>
<th>Predicates</th>
<th>Negated Predicates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>AF</td>
<td>AT</td>
</tr>
<tr>
<td>&lt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>Always False</td>
<td>Always True</td>
</tr>
<tr>
<td>=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>False</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&gt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&lt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>==</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
</tbody>
</table>

---

<table>
<thead>
<tr>
<th>Relation</th>
<th>MSA: operation</th>
<th>MSA: minor opcode</th>
<th>Predicates</th>
<th>Negated Predicates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>AF</td>
<td>AT</td>
</tr>
<tr>
<td>&lt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>Always False</td>
<td>Always True</td>
</tr>
<tr>
<td>=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>False</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&gt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&lt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>==</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
</tbody>
</table>

---

<table>
<thead>
<tr>
<th>Relation</th>
<th>MSA: operation</th>
<th>MSA: minor opcode</th>
<th>Predicates</th>
<th>Negated Predicates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>AF</td>
<td>AT</td>
</tr>
<tr>
<td>&lt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>Always False</td>
<td>Always True</td>
</tr>
<tr>
<td>=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>False</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&gt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&lt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>==</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
</tbody>
</table>

---

<table>
<thead>
<tr>
<th>Relation</th>
<th>MSA: operation</th>
<th>MSA: minor opcode</th>
<th>Predicates</th>
<th>Negated Predicates</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>AF</td>
<td>AT</td>
</tr>
<tr>
<td>&lt;</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>Always False</td>
<td>Always True</td>
</tr>
<tr>
<td>=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>False</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&gt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>&lt;=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>==</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
<tr>
<td>!=</td>
<td>F F F F F</td>
<td>FCAF</td>
<td>FCAF</td>
<td>FCAF</td>
</tr>
</tbody>
</table>
Table 3.9 Comparing CMP.condn.fmt, IEEE 754-2008, C.cond.fmt, and MSA FP compares (Continued)

Shaded entries in the table are unimplemented, and reserved.

### Instruction Encodings

<table>
<thead>
<tr>
<th>Instruction Encodings</th>
</tr>
</thead>
<tbody>
<tr>
<td>CMP: condn.fmt: 010001 fffff ttttt sssss ddddd 0ccccc</td>
</tr>
<tr>
<td>C: condn.fmt: 010001 fffff ttttt sssss CCC00 11</td>
</tr>
<tr>
<td>MSA: 011110 oooof ttttt sssss ddddd mmmmm</td>
</tr>
</tbody>
</table>

#### Predicates

<table>
<thead>
<tr>
<th>Relation</th>
<th>Long names</th>
<th>IEEE</th>
</tr>
</thead>
<tbody>
<tr>
<td>&gt;</td>
<td>SAF</td>
<td>ST</td>
</tr>
<tr>
<td>=</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt;</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt;=</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt;=</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&gt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; = ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
<tr>
<td>&lt; &gt; ?</td>
<td>SAF</td>
<td>SAT</td>
</tr>
</tbody>
</table>
Restrictions:

Operation:
if SNaN(ValueFPR(fs, fmt)) or SNaN(ValueFPR(ft, fmt)) or QNaN(ValueFPR(fs, fmt)) or QNaN(ValueFPR(ft, fmt))
    then
        less ← false
        equal ← false
        unordered ← true
        if (SNaN(ValueFPR(fs,fmt)) or SNaN(ValueFPR(ft,fmt))) or (cond3 and (QNaN(ValueFPR(fs,fmt)) or QNaN(ValueFPR(ft,fmt)))) then
            SignalException(InvalidOperation)
        endif
    else
        less ← ValueFPR(fs, fmt) <fmt ValueFPR(ft, fmt)
        equal ← ValueFPR(fs, fmt) =fmt ValueFPR(ft, fmt)
        unordered ← false
    endif
    condition ← cond4 xor (cond2 and less) or (cond1 and equal) or (cond0 and unordered)
    StoreFPR (fd, fmt, ExtendBit.fmt(condition))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Unimplemented Operation, Invalid Operation
COP2 Instruction

**Purpose:** Coprocessor Operation to Coprocessor 2
To perform an operation to Coprocessor 2.

**Description:** CoprocessorOperation(2, cofun)
An implementation-dependent operation is performed to Coprocessor 2, with the cofun value passed as an argument. The operation may specify and reference internal coprocessor registers, and may change the state of the coprocessor conditions, but does not modify state within the processor. Details of coprocessor operation and internal state are described in the documentation for each Coprocessor 2 implementation.

**Restrictions:**

**Operation:**
CoprocessorOperation(2, cofun)

**Exceptions:**
Coprocessor Unusable, Reserved Instruction
CTC1 Move Control Word to Floating Point

Format:  CTC1 rt, fs  

Purpose:  Move Control Word to Floating Point

To copy a word from a GPR to an FPU control register.

Description:  FP_Control[fs] ← GPR[rt]

Copy the low word from GPR rt into the FP (coprocessor 1) control register indicated by fs.

Writing to the floating point Control/Status register, the FCSR, causes the appropriate exception if any Cause bit and its corresponding Enable bit are both set. The register is written before the exception occurs. Writing to FEXR to set a cause bit whose enable bit is already set, or writing to FENR to set an enable bit whose cause bit is already set causes the appropriate exception. The register is written before the exception occurs and the EPC register contains the address of the CTC1 instruction.

The definition of this instruction has been extended in Release 5 to support user mode read and write of StatusFR under the control of Config5UFR. This optional feature is meant to facilitate transition from FR=0 to FR=1 floating-point register modes in order to obsolete FR=0 mode in a future architecture release. User code may set and clear StatusFR without kernel intervention, providing kernel explicitly provides permission. The UFR facility is not supported in Release 6 since Release 6 only allows FR=1 mode. Accessing the UFR and UNFR registers causes a Reserved Instruction exception in Release 6 since FIRUFRP is always 0.

The definition of this instruction has been extended in Release 6 to allow user code to read and modify the Config5FRE bit. Such modification is allowed when this bit is present (as indicated by FIRUFRP) and user mode modification of the bit is enabled by the kernel (as indicated by Config5UFE). Setting Config5FRE to 1 causes all floating point instructions which are not compatible with FR=1 mode to take a Reserved Instruction exception. This makes it possible to run pre-Release 6 FR=0 floating point code on a Release 6 core which only supports FR=1 mode, provided the kernel has been set up to trap and emulate FR=0 behavior for these instructions. These instructions include floating-point arithmetic instructions that read/write single-precision registers, LWC1, SWC1, MTC1, and MFC1 instructions.

The FRE facility uses COP1 register aliases FRE and NFRE to access Config5FRE.

Restrictions:

There are a few control registers defined for the floating point unit. Prior to Release 6, the result is UNPREDICTABLE if fs specifies a register that does not exist. In Release 6 and later, a Reserved Instruction exception occurs if fs specifies a register that does not exist.

Furthermore, the result is UNPREDICTABLE if fd specifies the UFR, UNFR, FRE and NFRE aliases, with fs anything other than 00000, GPR[0]. Release 6 implementations and later are required to produce a Reserved Instruction exception; software must assume it is UNPREDICTABLE.

Operation:

```assembly
temp ← GPR[rt]31..0
if (fs = 1 or fs = 4) then
   /* clear UFR or UNFR(CP1 Register 1)*/
   if ConfigAR ≥ 2 SignalException(ReservedInstruction) /* Release 6 traps */ endif
```

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
if not Config5UFR then SignalException(ReservedInstruction) endif
if not (rt = 0 and FIRUFRP) then UNPREDICTABLE /*end of instruction*/ endif
if fs = 1 then StatusFR ← 0
else if fs = 4 then StatusFR ← 1
else /* cannot happen */
else if fs=5 then /* user write of 1 to FRE, if permitted */
if ConfigAR ≤ 2 then UNPREDICTABLE
else
  if rt ≠ 0 then SignalException(ReservedInstruction) endif
  if not Config5UFR then SignalException(ReservedInstruction) endif
  Config5UFR ← 0
endif
else if fs=6 then /* user write of 0 to FRE, if permitted (NFRE alias) */
if ConfigAR ≤ 2 then UNPREDICTABLE
else
  if rt ≠ 0 then SignalException(ReservedInstruction) endif
  if not Config5UFR then SignalException(ReservedInstruction) endif
  Config5UFR ← 1
endif
else if fs = 25 then /* FCCR */
if temp31..8 ≠ 0^24 then
  UNPREDICTABLE
else
  FCSR ← temp7..1 || FCSR24 || temp0 || FCSR22..0
endif
else if fs = 26 then /* FEXR */
if temp31..18 ≠ 0 or temp11..7 ≠ 0 or temp2..0 ≠ 0 then
  UNPREDICTABLE
else
  FCSR ← FCSR31..18 || temp17..12 || FCSR11..7 ||
  temp6..2 || FCSR1..0
endif
else if fs = 28 then /* FENR */
if temp31..12 ≠ 0 or temp6..3 ≠ 0 then
  UNPREDICTABLE
else
  FCSR ← FCSR31..25 || temp2 || FCSR23..12 || temp11..7
  || FCSR6..2 || temp1..0
endif
else if fs = 31 then /* FCSR */
if (FCSRImpl field is not implemented) and(temp22..18 ≠ 0) then
  UNPREDICTABLE
else if (FCSRImpl field is implemented) and temp20..18 ≠ 0 then
  UNPREDICTABLE
else
  FCSR ← temp
endif
else
  if Config2AR ≥ 2 SignalException(ReservedInstruction) /* Release 6 traps */
endif
UNPREDICTABLE
endif
CheckFPException()

Exceptions:

Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation, Invalid Operation, Division-by-zero, Inexact, Overflow, Underflow

Historical Information:
For the MIPS I, II and III architectures, the contents of floating point control register $f_s$ are UNPREDICTABLE for the instruction immediately following CTC1.

MIPS V and MIPS32 introduced the three control registers that access portions of $FCSR$. These registers were not available in MIPS I, II, III, or IV.

MIPS32 Release 5 introduced the UFR and UNFR register aliases that allow user level access to $Status_{FR}$.

MIPS32 Release 6 introduced the FRE and NFRE register aliases that allow user to cause traps for $FR=0$ mode emulation.
CTC2 IMove Control Word to Coprocessor 2

MIPS32

The syntax shown above is an example using CTC1 as a model. The specific syntax is implementation dependent.

**Purpose:** Move Control Word to Coprocessor 2

To copy a word from a GPR to a Coprocessor 2 control register.

**Description:** \( \text{CP2CCR}[\text{Impl}] \leftarrow \text{GPR}[rt] \)

Copy the low word from GPR \( rt \) into the Coprocessor 2 control register denoted by the \( \text{Impl} \) field. The interpretation of the \( \text{Impl} \) field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

**Restrictions:**

The result is **UNPREDICTABLE** if \( rd \) specifies a register that does not exist.

**Operation:**

\[
\text{temp} \leftarrow \text{GPR}[rt] \\
\text{CP2CCR}[\text{Impl}] \leftarrow \text{temp}
\]

**Exceptions:**

Coprocessor Usable, Reserved Instruction
### Format:

- \texttt{CVT.D.fmt}
- \texttt{CVT.D.S fd, fs}
- \texttt{CVT.D.W fd, fs}
- \texttt{CVT.D.L fd, fs}

### Purpose:
Floating Point Convert to Double Floating Point
To convert an FP or fixed point value to double FP.

### Description:
\[ \text{FPR}[fd] \leftarrow \text{convert\_and\_round}(\text{FPR}[fs]) \]

The value in FPR \(fs\), in format \(fmt\), is converted to a value in double floating point format and rounded according to the current rounding mode in \(FCSR\). The result is placed in FPR \(fd\). If \(fmt\) is S or W, then the operation is always exact.

### Restrictions:
The fields \(fs\) and \(fd\) must specify valid FPRs, \(fs\) for type \(fmt\) and \(fd\) for double floating point. If the fields are not valid, the result is \textbf{UNPREDICTABLE}.

The operand must be a value in format \(fmt\); if it is not, the result is \textbf{UNPREDICTABLE} and the value of the operand FPR becomes \textbf{UNPREDICTABLE}.

For CVT.D.L, the result of this instruction is \textbf{UNPREDICTABLE} if the processor is executing in the \(FR=0\) 32-bit FPU register model.

### Operation:
\[ \text{StoreFPR}(fd, D, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, D)) \]

### Exceptions:
Coprocessor Unusable, Reserved Instruction

### Floating Point Exceptions:
Invalid Operation, Unimplemented Operation, Inexact

---

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>CVT.D</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

- \textbf{fmt} indicates the format of the input floating point or fixed point value.
- \textbf{fs} is the source floating point or fixed point register.
- \textbf{fd} is the destination floating point register for double floating point format.
- \textbf{CVT.D} is the instruction to convert the value from \textbf{fmt} format to double floating point.
**Purpose:** Floating Point Convert to Long Fixed Point

To convert an FP value to a 64-bit fixed point.

**Description:**

\[
\text{FPR}[fd] \leftarrow \text{convert_and_round}(\text{FPR}[fs])
\]

Convert the value in format `fmt` in FPR `fs` to long fixed point format and round according to the current rounding mode in `FCSR`. The result is placed in FPR `fd`.

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{63} + 2^{63} - 1\), the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the `FCSR`. If the Invalid Operation Enable bit is set in the `FCSR`, no result is written to `fd` and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to `fd`. On cores with `FCSR_{NAN2008}=0`, the default result is:

- 0 when the input value is NaN
- \(2^{63} - 1\) when the input value is \(+\infty\) or rounds to a number larger than \(2^{63} - 1\)
- \(-2^{63} - 1\) when the input value is \(-\infty\) or rounds to a number smaller than \(-2^{63} - 1\)

**Restrictions:**

The fields `fs` and `fd` must specify valid FPRs, `fs` for type `fmt` and `fd` for long fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format `fmt`; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of this instruction is UNPREDICTABLE if the processor is executing in the `FR=0` 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the `FR=1` mode, but not with `FR=0`, and not on a 32-bit FPU.

**Operation:**

\[
\text{StoreFPR} \ (fd, L, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, L))
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Invalid Operation, Unimplemented Operation, Inexact,
CVT.PS.S Floating Point Convert Pair to Paired Single

Format: CVT.PS.S fd, fs, ft

MIPS32 Release 2, removed in Release 6

Purpose: Floating Point Convert Pair to Paired Single
To convert two FP values to a paired single value.

Description:
FPR[fd] ← FPR[fs]31..0 || FPR[ft]31..0

The single-precision values in FPR fs and ft are written into FPR fd as a paired-single value. The value in FPR fs is written into the upper half, and the value in FPR ft is written into the lower half.

CVT.PS.S is similar to PLL.PS, except that it expects operands of format S instead of PS.

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR Cause and FCSR Flags fields are not modified.

Restrictions:
The fields fs and ft must specify FPRs valid for operands of type S. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format S; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
\[ \text{StoreFPR}(fd, S, \text{ValueFPR}(fs,S) || \text{ValueFPR}(ft,S)) \]

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Invalid Operation, Unimplemented Operation
| CVT.PS.S | Floating Point Convert Pair to Paired Single |
### CVT.S.PL

**Floating Point Convert Pair Lower to Single Floating Point**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>CVT.S.PL</td>
</tr>
<tr>
<td>010001</td>
<td>10110</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>8</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** CVT.S.PL fd, fs  
MIPS32 Release 2, removed in Release 6

**Purpose:** Floating Point Convert Pair Lower to Single Floating Point  
To convert one half of a paired single FP value to single FP.

**Description:**  
FPR[fd] ← FPR[fs]31..0  
The lower paired single value in FPR fs, in format PS, is converted to a value in single floating point format. The result is placed in FPR fd. This instruction can be used to isolate the lower half of a paired single value.

The operation is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR\text{Cause} and FCSR\text{Flags} fields are not modified.

**Restrictions:**  
The fields fs and fd must specify valid FPRs—fs for type PS and fd for single floating point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format PS; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of CVT.S.PL is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  
\[
\text{StoreFPR (fd, S, ConvertFmt(ValueFPR(fs, PS), PL, S))}
\]

**Exceptions:**  
Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**
CVT.S.PU Floating Point Convert Pair Upper to Single Floating Point

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>00000</td>
<td>fs</td>
<td>fd</td>
<td>CVT.S.PU</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010001</td>
<td>10110</td>
<td>100000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** CVT.S.PU fd, fs

**Purpose:** Floating Point Convert Pair Upper to Single Floating Point

To convert one half of a paired single FP value to single FP

**Description:** FPR[fd] ← FPR[fs]_{63..32}

The upper paired single value in FPR fs, in format PS, is converted to a value in single floating point format. The result is placed in FPR fd. This instruction can be used to isolate the upper half of a paired single value.

The operation is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR_Cause and FCSR Flags fields are not modified.

**Restrictions:**

The fields fs and fd must specify valid FPRs—fs for type PS and fd for single floating point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format PS; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of CVT.S.PU is UNPREDICTABLE if the processor is executing the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU

**Availability and Compatibility:**

This instruction was removed in Release 6.

**Operation:**

\[
\text{StoreFPR (fd, S, ConvertFmt(ValueFPR(fs, PS), PU, S))}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**
**Format: CVT.S.fmt**

- **CVT.S.D fd, fs**  
- **CVT.S.W fd, fs**  
- **CVT.S.L fd, fs**

**Purpose:** Floating Point Convert to Single Floating Point

To convert an FP or fixed point value to single FP.

**Description:**

\[
\text{FPR}[fd] \leftarrow \text{convert\_and\_round}(\text{FPR}[fs])
\]

The value in FPR \( fs \), in format \( fmt \), is converted to a value in single floating point format and rounded according to the current rounding mode in \( FCSR \). The result is placed in FPR \( fd \).

**Restrictions:**

The fields \( fs \) and \( fd \) must specify valid FPRs—\( fs \) for type \( fmt \) and \( fd \) for single floating point. If the fields are not valid, the result is \texttt{UNPREDICTABLE}.

The operand must be a value in format \( fmt \); if it is not, the result is \texttt{UNPREDICTABLE} and the value of the operand FPR becomes \texttt{UNPREDICTABLE}.

For CVT.S.L, the result of this instruction is \texttt{UNPREDICTABLE} if the processor is executing in the \( FR=0 \) 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the \( FR=1 \) mode, but not with \( FR=0 \), and not on a 32-bit FPU.

**Operation:**

\[
\text{StoreFPR}(fd, S, \text{ConvertFmt(ValueFPR(fs, fmt), fmt, S)})
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Invalid Operation, Unimplemented Operation, Inexact, Overflow, Underflow
**Floating Point Convert to Word Fixed Point**

**Format:**

\[
\text{CVT.W.fmt} \quad \text{CVT.W.S} \quad \text{fd, fs} \\
\text{CVT.W.D} \quad \text{fd, fs}
\]

**Purpose:**

Floating Point Convert to Word Fixed Point

To convert an FP value to 32-bit fixed point.

**Description:**

\[
\text{FPR}[fd] \leftarrow \text{convert_and_round(FPR[fs])}
\]

The value in FPR \(fs\), in format \(fmt\), is converted to a value in 32-bit word fixed point format and rounded according to the current rounding mode in \(FCSR\). The result is placed in FPR \(fd\).

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{31} \text{ to } 2^{31}-1\), the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the \(FCSR\). If the Invalid Operation Enable bit is set in the \(FCSR\), no result is written to \(fd\) and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to \(fd\). On cores with \(FCSR_{\text{NAN2008}}=0\), the default result is \(2^{63}-1\). On cores with \(FCSR_{\text{NAN2008}}=1\), the default result is:

- 0 when the input value is NaN
- \(2^{63}-1\) when the input value is \(+\infty\) or rounds to a number larger than \(2^{63}-1\)
- \(-2^{63}-1\) when the input value is \(-\infty\) or rounds to a number smaller than \(-2^{63}-1\)

**Restrictions:**

The fields \(fs\) and \(fd\) must specify valid FPRs: \(fs\) for type \(fmt\) and \(fd\) for word fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \(fmt\); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

**Operation:**

\[
\text{StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Invalid Operation, Unimplemented Operation, Inexact
**DDIV**  
**Doubleword Divide**

**Format:**  
DDIV rs, rt

**Purpose:**  
Doubleword Divide

To divide 64-bit signed integers.

**Description:**  
(LO, HI) ← GPR[rs] / GPR[rt]

The 64-bit doubleword in GPR rs is divided by the 64-bit doubleword in GPR rt, treating both operands as signed values. The 64-bit quotient is placed into special register LO and the 64-bit remainder is placed into special register HI.

No arithmetic exception occurs under any circumstances.

**Restrictions:**

If the divisor in GPR rt is zero, the arithmetic result value is UNPREDICTABLE.

**Availability and Compatibility:**

This instruction has been removed in Release 6.

**Operation:**

\[
\begin{align*}
\text{LO} & \leftarrow \text{GPR}[rs] \div \text{GPR}[rt] \\
\text{HI} & \leftarrow \text{GPR}[rs] \mod \text{GPR}[rt]
\end{align*}
\]

**Exceptions:**

Reserved Instruction

**Programming Notes:**

See “Programming Notes” for the DIV instruction.

**Historical Perspective:**

In MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of the MFHI or MFLO is UNPREDICTABLE. Reads of the HI or LO special register must be separated from subsequent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and MIPS32 and all subsequent levels of the architecture.
The MIPS32® Instruction Set Manual, Revision 6.04

**DDIVU**

**Doubleword Divide Unsigned**

---

**Format:**  
DDIVU rs, rt

**MIPS64, removed in Release 6**

**Purpose:** Doubleword Divide Unsigned  
To divide 64-bit unsigned integers.

**Description:**  
\[(LO, HI) \leftarrow GPR[rs] / GPR[rt]\]

The 64-bit doubleword in GPR \(rs\) is divided by the 64-bit doubleword in GPR \(rt\), treating both operands as unsigned values. The 64-bit quotient is placed into special register \(LO\) and the 64-bit remainder is placed into special register \(HI\).

No arithmetic exception occurs under any circumstances.

**Restrictions:**  
If the divisor in GPR \(rt\) is zero, the arithmetic result value is **UNPREDICTABLE**.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  
\[
\begin{align*}
q & \leftarrow (0 || GPR[rs]) \text{ div } (0 || GPR[rt]) \\
r & \leftarrow (0 || GPR[rs]) \text{ mod } (0 || GPR[rt]) \\
LO & \leftarrow q_{63..0} \\
HI & \leftarrow r_{63..0}
\end{align*}
\]

**Exceptions:**  
Reserved Instruction

**Programming Notes:**  
See “Programming Notes” for the DIV instruction.

**Historical Perspective:**  
In MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of the MFHI or MFLO is **UNPREDICTABLE**. Reads of the \(HI\) or \(LO\) special register must be separated from subsequent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and MIPS32 and all subsequent levels of the architecture.
**DERET**

**Debug Exception Return**

**Format:**

DERET

**Purpose:** Debug Exception Return

To Return from a debug exception.

**Description:**

DERET clears execution and instruction hazards, returns from Debug Mode and resumes non-debug execution at the instruction whose address is contained in the DEPC register. DERET does not execute the next instruction (i.e. it has no delay slot).

**Restrictions:**

A DERET placed between an LL and SC instruction does not cause the SC to fail.

If the DEPC register with the return address for the DERET was modified by an MTC0 or a DMTC0 instruction, a CP0 hazard exists that must be removed via software insertion of the appropriate number of SSNOP instructions (for implementations of Release 1 of the Architecture) or by an EHB, or other execution hazard clearing instruction (for implementations of Release 2 of the Architecture).

DERET implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the DERET returns.

This instruction is legal only if the processor is executing in Debug Mode.

Pre-Release 6: The operation of the processor is **UNDEFINED** if a DERET is executed in the delay slot of a branch or jump instruction.

Release 6 implementations are required to signal a Reserved Instruction exception if DERET is encountered in the delay slot or forbidden slot of a branch or jump instruction.

**Operation:**

```plaintext
DebugDM ← 0
DebugEXI ← 0
if IsMIPS16Implemented() | (Config3ISA > 0) then
    PC ← DEPC31..1 || 0
    ISAMode ← DEPC0
else
    PC ← DEPC
endif
ClearHazards()
```

**Exceptions:**

Coprocessor Unusable, Reserved Instruction
**Purpose:** Disable Interrupts

To return the previous value of the *Status* register and disable interrupts. If DI is specified without an argument, GPR r0 is implied, which discards the previous value of the *Status* register.

**Description:**

\[
\text{GPR}[rt] \leftarrow \text{Status}; \text{Status}_{IE} \leftarrow 0
\]

The current value of the *Status* register is loaded into general register *rt*. The Interrupt Enable (IE) bit in the *Status* register is then cleared.

**Restrictions:**

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

**Operation:**

This operation specification is for the general interrupt enable/disable operation, with the *sc* field as a variable. The individual instructions DI and EI have a specific value for the *sc* field.

\[
\begin{align*}
\text{data} & \leftarrow \text{Status} \\
\text{GPR}[rt] & \leftarrow \text{data} \\
\text{Status}_{IE} & \leftarrow 0
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable

Reserved Instruction (Release 1 implementations)

**Programming Notes:**

The effects of this instruction are identical to those accomplished by the sequence of reading *Status* into a GPR, clearing the IE bit, and writing the result back to *Status*. Unlike the multiple instruction sequence, however, the DI instruction cannot be aborted in the middle by an interrupt or exception.

This instruction creates an execution hazard between the change to the *Status* register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Software must not assume that a fixed latency will clear the execution hazard.
Format: \texttt{DIV rs, rt}  \quad \textit{MIPS32, removed in Release 6}

**Purpose:** Divide Word

To divide a 32-bit signed integers.

**Description:** \((HI, LO) \leftarrow \text{GPR}[rs] / \text{GPR}[rt]\)

The 32-bit word value in GPR \textit{rs} is divided by the 32-bit value in GPR \textit{rt}, treating both operands as signed values. The 32-bit quotient is placed into special register \textit{LO} and the 32-bit remainder is placed into special register \textit{HI}.

No arithmetic exception occurs under any circumstances.

**Restrictions:**

If the divisor in GPR \textit{rt} is zero, the arithmetic result value is \textbf{UNPREDICTABLE}.

**Availability and Compatibility:**

DIV has been removed in Release 6 and has been replaced by DIV and MOD instructions that produce only quotient and remainder, respectively. Refer to the Release 6 introduced ‘DIV’ and ‘MOD’ instructions in this manual for more information. This instruction remains current for all release levels lower than Release 6 of the MIPS architecture.

**Operation:**

\[
\begin{align*}
q & \leftarrow \text{GPR}[rs]_{31..0} \text{ div GPR}[rt]_{31..0} \\
LO & \leftarrow q \\
r & \leftarrow \text{GPR}[rs]_{31..0} \text{ mod GPR}[rt]_{31..0} \\
HI & \leftarrow r
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

No arithmetic exception occurs under any circumstances. If divide-by-zero or overflow conditions are detected and some action taken, then the divide instruction is followed by additional instructions to check for a zero divisor and/or for overflow. If the divide is asynchronous then the zero-divisor check can execute in parallel with the divide. The action taken on either divide-by-zero or overflow is either a convention within the program itself, or within the system software. A possibility is to take a BREAK exception with a \textit{code} field value to signal the problem to the system software.

As an example, the C programming language in a UNIX® environment expects division by zero to either terminate the program or execute a program-specified signal handler. C does not expect overflow to cause any exceptional condition. If the C compiler uses a divide instruction, it also emits code to test for a zero divisor and execute a BREAK instruction to inform the operating system if a zero is detected.

By default, most compilers for the MIPS architecture emits additional instructions to check for the divide-by-zero and overflow cases when this instruction is used. In many compilers, the assembler mnemonic “DIV r0, rs, rt” can be used to prevent these additional test instructions to be emitted.

In some processors the integer divide operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read \textit{LO} or \textit{HI} before the results are written interlocks until the results are
ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the divide so that other instructions can execute in parallel.

**Historical Perspective:**
In MIPS 1 through MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of the MFHI or MFLO is **UNPREDICTABLE**. Reads of the HI or LO special register must be separated from subsequent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and MIPS32 and all subsequent levels of the architecture.
DIV MOD DIVU MODU

Divide Integers (with result to GPR)

Format:
DIV MOD DIVU MODU
DIV rd,rs,rt
MOD rd,rs,rt
DIVU rd,rs,rt
MODU rd,rs,rt

Purpose: Divide Integers (with result to GPR)
DIV: Divide Words Signed
MOD: Modulo Words Signed
DIVU: Divide Words Unsigned
MODU: Modulo Words Unsigned

Description:
DIV: GPR[rd] ← (divide.signed( GPR[rs], GPR[rt] )
MOD: GPR[rd] ← (modulo.signed( GPR[rs], GPR[rt] )
DIVU: GPR[rd] ← (divide.unsigned( GPR[rs], GPR[rt] )
MODU: GPR[rd] ← (modulo.unsigned( GPR[rs], GPR[rt] )

The Release 6 divide and modulo instructions divide the operands in GPR rs and GPR rt, and place the quotient or remainder in GPR rd.

For each of the div/mod operator pairs DIV/MOD, DIVU/MODU, the results satisfy the equation (A div B)*B + (A mod B) = A, where (A mod B) has same sign as the dividend A, and abs(A mod B) < abs(B). This equation uniquely defines the results.

NOTE: if the divisor B=0, this equation cannot be satisfied, and the result is UNPREDICTABLE. This is commonly called “truncated division”.

DIV performs a signed 32-bit integer division, and places the 32-bit quotient result in the destination register.
MOD performs a signed 32-bit integer division, and places the 32-bit remainder result in the destination register. The remainder result has the same sign as the dividend.
DIVU performs an unsigned 32-bit integer division, and places the 32-bit quotient result in the destination register.
MODU performs an unsigned 32-bit integer division, and places the 32-bit remainder result in the destination register.

Restrictions:
If the divisor in GPR rt is zero, the result value is UNPREDICTABLE.
Availability and Compatibility:

These instructions are introduced by and required as of Release 6.

Release 6 divide instructions have the same opcode mnemonic as the pre-Release 6 divide instructions (DIV, DIVU). The instruction encodings are different, as are the instruction semantics: the Release 6 instruction produces only the quotient, whereas the pre-Release 6 instruction produces quotient and remainder in HI/LO registers respectively, and separate modulo instructions are required to obtain the remainder.

The assembly syntax distinguishes the Release 6 from the pre-Release 6 divide instructions. For example, Release 6 “DIV rd, rs, rt” specifies 3 register operands, versus pre-Release 6 “DIV rs, rt”, which has only two register arguments, with the HI/LO registers implied. Some assemblers accept the pseudo-instruction syntax “DIV rd, rs, rt” and expand it to do “DIV rs, rt; MFHI rd”. Phrases such as “DIV with GPR output” and “DIV with HI/LO output” may be used when disambiguation is necessary.

Pre-Release 6 divide instructions that produce quotient and remainder in the HI/LO registers produce a Reserved Instruction exception on Release 6. In the future, the instruction encoding may be reused for other instructions.

Programming Notes:

Because the divide and modulo instructions are defined to not trap if dividing by zero, it is safe to emit code that checks for zero-divide after the divide or modulo instruction.

Operation

DIV, MOD:
   s1 ← signed_word(GPR[rs])
   s2 ← signed_word(GPR[rt])
DIVU, MODU:
   s1 ← unsigned_word(GPR[rs])
   s2 ← unsigned_word(GPR[rt])
DIV, DIVU:
   quotient ← s1 div s2
MOD, MODU:
   remainder ← s1 mod s2
DIV:   GPR[rd] ← quotient
MOD:   GPR[rd] ← remainder
DIVU:  GPR[rd] ← quotient
MODU:  GPR[rd] ← remainder
/* end of instruction */

Exceptions:

No arithmetic exceptions occur. Division by zero produces an UNPREDICTABLE result.
## DIV.fmt

### Format

- **DIV.fmt**
- **DIV.S fd, fs, ft**
- **DIV.D fd, fs, ft**

### Purpose

Floating Point Divide

To divide FP values.

### Description

\[ \text{FPR}_\{fd\} \leftarrow \text{FPR}_\{fs\} / \text{FPR}_\{ft\} \]

The value in FPR \( fs \) is divided by the value in FPR \( ft \). The result is calculated to infinite precision, rounded according to the current rounding mode in \( FCSR \), and placed into FPR \( fd \). The operands and result are values in format \( fmt \).

### Restrictions

- The fields \( fs, ft, \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is **UNPREDICTABLE**.
- The operands must be values in format \( fmt \); if they are not, the result is **UNPREDICTABLE** and the value of the operand FPRs becomes **UNPREDICTABLE**.

### Operation

\[
\text{StoreFPR (fd, fmt, ValueFPR(fs, fmt) / ValueFPR(ft, fmt))}
\]

### Exceptions

- Coprocessor Unusable, Reserved Instruction

### Floating Point Exceptions

- Inexact, Invalid Operation, Unimplemented Operation, Division-by-zero, Overflow, Underflow
DIVU IDivide Unsigned Word

Format: \texttt{DIVU rs, rt}  \hspace{1cm} \textit{MIPS32, removed in Release 6}

Purpose: Divide Unsigned Word

To divide 32-bit unsigned integers

Description: \((\text{HI}, \text{LO}) \leftarrow \text{GPR}[rs] / \text{GPR}[rt]\)

The 32-bit word value in GPR \(rs\) is divided by the 32-bit value in GPR \(rt\), treating both operands as unsigned values.
The 32-bit quotient is placed into special register \(LO\) and the 32-bit remainder is placed into special register \(HI\).

No arithmetic exception occurs under any circumstances.

Restrictions:

If the divisor in GPR \(rt\) is zero, the arithmetic result value is \textbf{UNPREDICTABLE}.

Availability and Compatibility:

This instruction has been removed in Release 6.

Operation:

\[
\begin{align*}
q & \leftarrow (0 || \text{GPR}[rs]_{31..0}) \text{ div } (0 || \text{GPR}[rt]_{31..0}) \\
r & \leftarrow (0 || \text{GPR}[rs]_{31..0}) \text{ mod } (0 || \text{GPR}[rt]_{31..0}) \\
\text{LO} & \leftarrow \text{sign\_extend}(q_{31..0}) \\
\text{HI} & \leftarrow \text{sign\_extend}(r_{31..0})
\end{align*}
\]

Exceptions:

None

Programming Notes:

Pre-Release 6 instruction DIV has been removed in Release 6 and has been replaced by DIV and MOD instructions
that produce only quotient and remainder, respectively. Refer to the Release 6 introduced ‘DIV’ and ‘MOD’ instructions
in this manual for more information. This instruction remains current for all release levels lower than Release 6
of the MIPS architecture.

See “Programming Notes” for the DIV instruction.

Historical Perspective:

In MIPS I through MIPS III, if either of the two instructions preceding the divide is an MFHI or MFLO, the result of
the MFHI or MFLO is UNPREDICTABLE. Reads of the \(HI\) or \(LO\) special register must be separated from subse-
quent instructions that write to them by two or more instructions. This restriction was removed in MIPS IV and
MIPS32 and all subsequent levels of the architecture.
DVP

Disable Virtual Processor

Purpose: Disable Virtual Processor

To disable all virtual processors in a physical core other than the virtual processor that issued the instruction.

Description:

\[ \text{GPR}[rt] \leftarrow \text{VPControl} ; \ \text{VPControl}_{\text{DIS}} \leftarrow 1 \]

Disabling a virtual processor means that instruction fetch is terminated, and all outstanding instructions for the affected virtual processor(s) must be complete before the DVP itself is allowed to retire. Any outstanding events such as hardware instruction or data prefetch, or page-table walks must also be terminated.

The DVP instruction has implicit \( \text{SYNC} (\text{stype}=0) \) semantics but with respect to the other virtual processors in the physical core.

After all other virtual processors have been disabled, \( \text{VPControl}_{\text{DIS}} \) is set. Prior to modification and if \( rt \) is non-zero, \( \text{VPControl} \) is written to \( \text{GPR}[rt] \). If DVP is specified without \( rt \), then \( rt \) must be 0.

DVP may also take effect on a virtual processor that has executed a WAIT or a PAUSE instruction. If a virtual processor has executed a WAIT instruction, then it cannot resume execution on an interrupt until an EVP has been executed. If the EVP is executed before the interrupt arrives, then the virtual processor resumes in a state as if the DVP had not been executed, that is, it waits for the interrupt.

If a virtual processor has executed a PAUSE instruction, then it cannot resume execution until an EVP has been executed, even if LLbit is cleared. If an EVP is executed before the LLbit is cleared, then the virtual processor resumes in a state as if the DVP has not been executed, that is, it waits for the LLbit to clear.

The execution of a DVP must be followed by the execution of an EVP. The execution of an EVP causes execution to resume immediately—where applicable—on all other virtual processors, as if the DVP had not been executed. The execution is completely restorable after the EVP. If an event occurs in between the DVP and EVP that renders state of the virtual processor UNPREDICTABLE (such as power-gating), then the effect of EVP is UNPREDICTABLE.

DVP may only take effect if \( \text{VPControl}_{\text{DIS}} = 0 \). Otherwise it is treated as a NOP instruction.

If a virtual processor is disabled due to a DVP, then interrupts are also disabled for the virtual processor, that is, logically \( \text{Status}_{\text{IE}} = 0 \). \( \text{Status}_{\text{IE}} \) for the target virtual processors though is not cleared though as software cannot access state on the virtual processors that have been disabled. Similarly, deferred exceptions will not cause a disabled virtual processor to be re-enabled for execution, at least until execution is re-enabled by the EVP instruction. The virtual processor that executes the DVP, however, continues to be interruptible.

In an implementation, the ability of a virtual processor to execute instructions may also be under control external to the physical core which contains the virtual processor. If disabled by DVP, a virtual processor must not resume fetch in response to the assertion of this external signal to enable fetch. Conversely, if fetch is disabled by such external control, then execution of EVP will not cause fetch to resume at a target virtual processor for which the control is disasserted.

This instruction never executes speculatively. It must be the oldest unretired instruction to take effect.

This instruction is only available in Release 6 implementations. For implementations that do not support multithreading (\( \text{Config}_{5VP}=0 \)), this instruction must be treated as a NOP instruction.

Restrictions:

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.
In implementations prior to Release 6 of the architecture, this instruction resulted in a Reserved Instruction exception.

Operation:
The pseudo-code below assumes that the DVP is executed by virtual processor 0, while the target virtual processor is numbered 'n', where n is each of all remaining virtual processors.

```plaintext
if (VPControlDIS = 0)
    // Pseudo-code in italics provides recommended action wrt other VPs
    disable_fetch(VPn) {
        if PAUSE(VPn) retires prior or at disable event
            then VPn execution is not resumed if LLbit is cleared prior to EVP
    }
    disable_interrupt(VPn) {
        if WAIT(VPn) retires prior or at disable event
            then interrupts are ignored by VPn until EVP
    }
    // DVP0 not retired until instructions for VPn completed
    while (VPn outstanding instruction)
        DVP0 unretired
    endwhile
endif

data ← VPControl
GPR[rt] ← data
VPControlDIS ← 1
```

Exceptions:
Coprocessor Unusable
Reserved Instruction (pre-Release 6 implementations)

Programming Notes:
DVP may disable execution in the target virtual processor regardless of the operating mode - kernel, supervisor, user. Kernel software may also be in a critical region, or in a high-priority interrupt handler when the disable occurs. Since the instruction is itself privileged, such events are considered acceptable.

Before executing an EVP in a DVP/EVP pair, software should first read VPControlDIS, returned by DVP, to determine whether the virtual processors are already disabled. If so, the DVP/EVP sequence should be abandoned. This step allows software to safely nest DVP/EVP pairs.

Privileged software may use DVP/EVP to disable virtual processors on a core, such as for the purpose of doing a cache flush without interference from other processes in a system with multiple virtual processors or physical cores.

DVP (and EVP) may be used in other cases such as for power-savings or changing state that is applicable to all virtual processors in a core, such as virtual processor scheduling priority, as described below:

```plaintext
ll t0 0(a0)
dvp  // disable all other virtual processors
pause  // wait for LLbit to clear
evp  // enable all other virtual processors
```
ll $t0, 0($a0)
dvp // disable all other virtual processors
<change core-wide state>
evp // enable all other virtual processors
**Execution Hazard Barrier**

**Purpose:**
To stop instruction execution until all execution hazards have been cleared.

**Description:**
EHB is used to denote execution hazard barrier. The actual instruction is interpreted by the hardware as SLL r0, r0, 3. This instruction alters the instruction issue behavior on a pipelined processor by stopping execution until all execution hazards have been cleared. Other than those that might be created as a consequence of setting $Status_{CU0}$, there are no execution hazards visible to an unprivileged program running in User Mode. All execution hazards created by previous instructions are cleared for instructions executed immediately following the EHB, even if the EHB is executed in the delay slot of a branch or jump. The EHB instruction does not clear instruction hazards—such hazards are cleared by the JALR.HB, JR.HB, and ERET instructions.

**Restrictions:**
None

**Operation:**
ClearExecutionHazards()

**Exceptions:**
None

**Programming Notes:**
In Release 2 implementations, this instruction resolves all execution hazards. On a superscalar processor, EHB alters the instruction issue behavior in a manner identical to SSNOP. For backward compatibility with Release 1 implementations, the last of a sequence of SSNOPs can be replaced by an EHB. In Release 1 implementations, the EHB will be treated as an SSNOP, thereby preserving the semantics of the sequence. In Release 2 implementations, replacing the final SSNOP with an EHB should have no performance effect because a properly sized sequence of SSNOPs will have already cleared the hazard. As EHB becomes the standard in MIPS implementations, the previous SSNOPs can be removed, leaving only the EHB.
EI

Enable Interrupts

Format:

\[
\begin{array}{c}
\text{EI} \\
\text{EI } rt
\end{array}
\]

MIPS32 Release 2
MIPS32 Release 2

Purpose:
Enable Interrupts

To return the previous value of the \textit{Status} register and enable interrupts. If EI is specified without an argument, GPR r0 is implied, which discards the previous value of the \textit{Status} register.

Description:

\[
\text{GPR}[rt] \leftarrow \text{Status}; \text{Status}_{IE} \leftarrow 1
\]

The current value of the \textit{Status} register is loaded into general register \textit{rt}. The Interrupt Enable (\textit{IE}) bit in the \textit{Status} register is then set.

Restrictions:

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

Operation:

This operation specification is for the general interrupt enable/disable operation, with the \textit{sc} field as a variable. The individual instructions DI and EI have a specific value for the \textit{sc} field.

\[
\text{data} \leftarrow \text{Status} \\
\text{GPR}[rt] \leftarrow \text{data} \\
\text{Status}_{IE} \leftarrow 1
\]

Exceptions:

Coprocessor Unusable
Reserved Instruction (Release 1 implementations)

Programming Notes:

The effects of this instruction are identical to those accomplished by the sequence of reading \textit{Status} into a GPR, setting the \textit{IE} bit, and writing the result back to \textit{Status}. Unlike the multiple instruction sequence, however, the EI instruction cannot be aborted in the middle by an interrupt or exception.

This instruction creates an execution hazard between the change to the Status register and the point where the change to the interrupt enable takes effect. This hazard is cleared by the EHB, JALR.HB, JR.HB, or ERET instructions. Software must not assume that a fixed latency will clear the execution hazard.
**Format:** ERET

**Purpose:** Exception Return

To return from interrupt, exception, or error trap.

**Description:**

ERET clears execution and instruction hazards, conditionally restores $SRSCtl_{CSS}$ from $SRSCtl_{PSS}$ in a Release 2 implementation, and returns to the interrupted instruction at the completion of interrupt, exception, or error processing. ERET does not execute the next instruction (that is, it has no delay slot).

**Restrictions:**

Pre-Release 6: The operation of the processor is **UNDEFINED** if an ERET is executed in the delay slot of a branch or jump instruction.

Release 6: Implementations are required to signal a Reserved Instruction exception if ERET is encountered in the delay slot or forbidden slot of a branch or jump instruction.

An ERET placed between an LL and SC instruction will always cause the SC to fail.

ERET implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the ERET returns.

In a Release 2 implementation, ERET does not restore $SRSCtl_{CSS}$ from $SRSCtl_{PSS}$ if $Status_{BEV} = 1$, or if $Status_{ERL} = 1$ because any exception that sets $Status_{ERL}$ to 1 (Reset, Soft Reset, NMI, or cache error) does not save $SRSCtl_{CSS}$ in $SRSCtl_{PSS}$. If software sets $Status_{ERL}$ to 1, it must be aware of the operation of an ERET that may be subsequently executed.

**Operation:**

```plaintext
if Status_{ERL} = 1 then
    temp ← ErrorEPC
    Status_{ERL} ← 0
else
    temp ← EPC
    Status_{EPL} ← 0
    if (ArchitectureRevision ≥ 2) and (SRSCtl_{HSS} > 0) and (Status_{BEV} = 0) then
        SRSCtl_{CSS} ← SRSCtl_{PSS}
    endif
endif
if IsMIPS16Implemented() | (Config3ISA > 0) then
    PC ← temp_{11..1} || 0
    ISAMode ← temp_{0}
else
    PC ← temp
endif
LLbit ← 0
ClearHazards()
```
Exceptions:
Coprocessor Usable Exception
ERETNC

Exception Return No Clear

Format: ERETNC

MIPS32 Release 5

Purpose: Exception Return No Clear

To return from interrupt, exception, or error trap without clearing the LL.bit.

Description:

ERETNC clears execution and instruction hazards, conditionally restores $SRSCtl_{CSS}$ from $SRSCtl_{PSS}$ when implemented, and returns to the interrupted instruction at the completion of interrupt, exception, or error processing. ERETNC does not execute the next instruction (i.e., it has no delay slot).

ERETNC is identical to ERET except that an ERETNC will not clear the LL.bit that is set by execution of an LL instruction, and thus when placed between an LL and SC sequence, will never cause the SC to fail.

An ERET must continue to be used by default in interrupt and exception processing handlers. The handler may have accessed a synchronizable block of memory common to code that is atomically accessing the memory, and where the code caused the exception or was interrupted. Similarly, a process context-swap must also continue to use an ERET in order to avoid a possible false success on execution of SC in the restored context.

Multiprocessor systems with non-coherent cores (i.e., without hardware coherence snooping) should also continue to use ERET, because it is the responsibility of software to maintain data coherence in the system.

An ERETNC is useful in cases where interrupt/exception handlers and kernel code involved in a process context-swap can guarantee no interference in accessing synchronizable memory across different contexts. ERETNC can also be used in an OS-level debugger to single-step through code for debug purposes, avoiding the false clearing of the LL.bit and thus failure of an LL and SC sequence in single-stepped code.

Software can detect the presence of ERETNC by reading $Config_{5LLB}$.

Restrictions:

Release 6 implementations are required to signal a Reserved Instruction exception if ERETNC is executed in the delay slot or Release 6 forbidden slot of a branch or jump instruction.

ERETNC implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes. (For Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream.) The effects of this barrier are seen starting with the instruction fetch and decode of the instruction in the PC to which the ERETNC returns.

Operation:

```plaintext
if Status_{ERL} = 1 then
    temp ← ErrorEPC
    Status_{ERL} ← 0
else
    temp ← EPC
    Status_{EXL} ← 0
    if (ArchitectureRevision ≥ 2) and (SRSCtl_{PSS} > 0) and (Status_{BEV} = 0) then
        SRSCtl_{CSS} ← SRSCtl_{PSS}
    endif
endif
if IsMIPS16Implemented() | (Config_{3ISA} > 0) then
```

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
PC ← temp31.1 || 0
ISAMode ← temp0
else
    PC ← temp
endif
ClearHazards()

Exceptions:
Coprocessor Unusable Exception
Enable Virtual Processor

To enable all virtual processors in a physical core other than the virtual processor that issued the instruction.

Description:  

\[ GPR[rt] \leftarrow \text{VPControl} ; \text{VPControl}_{\text{DIS}} \leftarrow 0 \]

Enabling a virtual processor means that instruction fetch is resumed.

After all other virtual processors have been enabled, \( \text{VPControl}_{\text{DIS}} \) is cleared. Prior to modification, if \( rt \) is non-zero, \( \text{VPControl} \) is written to \( GPR[rt] \). If EVP is specified without \( rt \), then \( rt \) must be 0.

See the DVP instruction to understand the application of EVP in the context of WAIT/PAUSE/external-control ("DVP" on page 162).

The execution of a DVP must be followed by the execution of an EVP. The execution of an EVP causes execution to resume immediately, where applicable, on all other virtual processors, as if the DVP had not been executed, that is, execution is completely restorable after the EVP. On the other hand, if an event occurs in between the DVP and EVP that renders state of the virtual processor UNPREDICTABLE (such as power-gating), then the effect of EVP is UNPREDICTABLE.

EVP may only take effect if \( \text{VPControl}_{\text{DIS}} = 1 \). Otherwise it is treated as a NOP.

This instruction never executes speculatively. It must be the oldest unretired instruction to take effect.

This instruction is only available in Release 6 implementations. For implementations that do not support multi-threading \( (\text{Config}5_{\text{VP}} = 0) \), this instruction must be treated as a NOP instruction.

Restrictions:

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

In implementations prior to Release 6 of the architecture, this instruction resulted in a Reserved Instruction exception.

Operation:

The pseudo-code below assumes that the EVP is executed by virtual processor 0, while the target virtual processor is numbered ‘n’, where n is each of all remaining virtual processors.

```c
if (\text{VPControl}_{\text{DIS}} = 1)
    \{
        // Pseudo-code in italics provides recommended action wrt other VPs
        \text{enable_fetch}(\text{VPn}) \{
            if \text{PAUSE}(\text{VPn}) \text{ retires prior or at disable event}
            then \text{VPn} \text{ execution is not resumed if LLbit is cleared prior to EVP}
        \}
        \text{enable_interrupt}(\text{VPn}) \{
            if \text{WAIT}(\text{VPn}) \text{ retires prior or at disable event}
            then interrupts are ignored by \text{VPn} until EVP
        \}
    
```
EVP

Enable Virtual Processor

endif

data ← VPCtrl
GPR[rt] ← data
VPCtrlDIS ← 0

Exceptions:
Coprocessor Unusable
Reserved Instruction (pre-Release 6 implementations)

Programming Notes:
Before executing an EVP in a DVP/EVP pair, software should first read VPCtrlDIS, returned by DVP, to determine whether the virtual processors are already disabled. If so, the DVP/EVP sequence should be abandoned. This step allows software to safely nest DVP/EVP pairs.

Privileged software may use DVP/EVP to disable virtual processors on a core, such as for the purpose of doing a cache flush without interference from other processes in a system with multiple virtual processors or physical cores.

DVP (and EVP) may be used in other cases such as for power-savings or changing state that is applicable to all virtual processors in a core, such as virtual processor scheduling priority, as described below:

```
ll t0 0(a0)
dvp     // disable all other virtual processors
pause   // wait for LLbit to clear
evp     // enable all other virtual processors

ll t0 0(a0)
dvp     // disable all other virtual processors
<change core-wide state>
evp     // enable all other virtual processors
```
EXT Extract Bit Field

**Format:**  EXT rt, rs, pos, size

**Purpose:** Extract Bit Field

To extract a bit field from GPR rs and store it right-justified into GPR rt.

**Description:** GPR[rt] ← ExtractField(GPR[rs], msbd, lsb)

The bit field starting at bit pos and extending for size bits is extracted from GPR rs and stored zero-extended and right-justified in GPR rt. The assembly language arguments pos and size are converted by the assembler to the instruction fields msbd (the most significant bit of the destination field in GPR rt), in instruction bits 15..11, and lsb (least significant bit of the source field in GPR rs), in instruction bits 10..6, as follows:

\[
\begin{align*}
\text{msbd} & \leftarrow \text{size-1} \\
\text{lsb} & \leftarrow \text{pos}
\end{align*}
\]

The values of pos and size must satisfy all of the following relations:

\[
\begin{align*}
0 & \leq \text{pos} < 32 \\
0 & < \text{size} \leq 32 \\
0 & < \text{pos} + \text{size} \leq 32
\end{align*}
\]

Figure 3.5 shows the symbolic operation of the instruction.

**Figure 3.5 Operation of the EXT Instruction**

<table>
<thead>
<tr>
<th>IJKL</th>
<th>MNOP</th>
<th>QRST</th>
</tr>
</thead>
<tbody>
<tr>
<td>31</td>
<td>pos+size</td>
<td>pos+size-1</td>
</tr>
<tr>
<td>32-(pos+size)</td>
<td>size+msbd+1</td>
<td>pos+msbd+1</td>
</tr>
<tr>
<td>32-(lsb+msbd+1)</td>
<td>lsb+msbd</td>
<td>lsb</td>
</tr>
<tr>
<td>GPR rs Initial Value</td>
<td>GPR rt Final Value</td>
<td></td>
</tr>
<tr>
<td>31</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>32-size</td>
<td>size</td>
<td></td>
</tr>
<tr>
<td>32-(msbd+1)</td>
<td>msbd+1</td>
<td></td>
</tr>
</tbody>
</table>

**Restrictions:**

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception. The operation is UNPREDICTABLE if lsb+msbd > 31.

**Operation:**

\[
\begin{align*}
\text{if (lsb + msbd) > 31) then} \\
\text{UNPREDICTABLE} \\
\text{endif} \\
\text{temp} & \leftarrow 0^{32-(msbd+1)} \| \| \text{GPR[rs]_{msbd+1..lsb}} \| \| \text{GPR[rt]} \leftarrow \text{temp}
\end{align*}
\]
Exceptions:

Reserved Instruction
FLOOR.L.fmt

Floating Point Floor Convert to Long Fixed Point

Format:
FLOOR.L.fmt
FLOOR.L.S fd, fs
FLOOR.L.D fd, fs

MIPS32 Release 2

Purpose: Floating Point Floor Convert to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding down

Description:
FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounded toward ≥ (rounding mode 3). The result is placed in FPR fd.

When the source value is Infinity, NaN, or rounds to an integer outside the range $-2^{63}$ to $2^{63}-1$, the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to fd. On cores with FCSR_NAN2008=0, the default result is $2^{63}-1$. On cores with FCSR_NAN2008=1, the default result is:
- 0 when the input value is NaN
- $2^{63}-1$ when the input value is $+\infty$ or rounds to a number larger than $2^{63}-1$
- $-2^{63}-1$ when the input value is $-\infty$ or rounds to a number smaller than $-2^{63}-1$

Restrictions:
The fields fs and fd must specify valid FPRs: fs for type fmt and fd for long fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Operation:
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Invalid Operation, Unimplemented Operation, Inexact
FLOOR.W.fmt Floating Point Floor Convert to Word Fixed Point

Format:  
FLOOR.W.fmt  
FLOOR.W.S   fd, fs  
FLOOR.W.D   fd, fs

Purpose:  Floating Point Floor Convert to Word Fixed Point  
To convert an FP value to 32-bit fixed point, rounding down

Description:  
FPR[fd] ← convert_and_round(FPR[fs])

The value in FPR $fs$, in format $fmt$, is converted to a value in 32-bit word fixed point format and rounded toward $-\infty$ (rounding mode 3). The result is placed in FPR $fd$.

When the source value is Infinity, NaN, or rounds to an integer outside the range $-2^{31}$ to $2^{31}-1$, the result cannot be represented correctly, an IEEE Invalid Operation condition exists, and the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to $fd$ and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to $fd$. On cores with FCSR$_{NAN2008}=0$, the default result is $2^{31}-1$. On cores with FCSR$_{NAN2008}=1$, the default result is:

- $0$ when the input value is NaN
- $2^{31}-1$ when the input value is $+\infty$ or rounds to a number larger than $2^{31}-1$
- $-2^{31}-1$ when the input value is $-\infty$ or rounds to a number smaller than $-2^{31}-1$

Restrictions:

The fields $fs$ and $fd$ must specify valid FPRs: $fs$ for type $fmt$ and $fd$ for word fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format $fmt$; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

Operation:

\[
\text{StoreFPR}(fd, W, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, W))
\]

Exceptions:

Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:

Invalid Operation, Unimplemented Operation, Inexact
**INS**

Format: **INS rt, rs, pos, size**  
MIPS32 Release 2

**Purpose:** Insert Bit Field

To merge a right-justified bit field from GPR *rs* into a specified field in GPR *rt*.

**Description:**

The right-most *size* bits from GPR *rs* are merged into the value from GPR *rt* starting at bit position *pos*. The result is placed back in GPR *rt*. The assembly language arguments *pos* and *size* are converted by the assembler to the instruction fields *msb* (the most significant bit of the field), in instruction bits 15..11, and *lsb* (least significant bit of the field), in instruction bits 10..6, as follows:

```
msb ← pos+size-1
lsb ← pos
```

The values of *pos* and *size* must satisfy all of the following relations:

- \(0 \leq pos < 32\)
- \(0 < size \leq 32\)
- \(0 < pos+size \leq 32\)

**Figure 3.6 Operation of the INS Instruction**

**Restrictions:**

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.
The operation is **UNPREDICTABLE** if $lsb > msb$.

**Operation:**

```plaintext
if lsb > msb) then
    UNPREDICTABLE
endif
GPR[rt] ← GPR[rt]_{31..msb+1} | | GPR[rs]_{msb-1..0} | | GPR[rt]_{lsb-1..0}
```

**Exceptions:**

Reserved Instruction
Format: \texttt{J target}

Purpose: Jump

To branch within the current 256 MB-aligned region.

Description:
This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 MB-aligned region. The low 28 bits of the target address is the \textit{instr_index} field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself).

Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.

Restrictions:
\textit{Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.} CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is \texttt{UNPREDICTABLE} if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Operation:

\begin{align*}
I & : \\
I+1 & : \text{PC} \leftarrow \text{PC}_{GPRLEN-1..28} || \text{instr_index} || 0^2 \\
\end{align*}

Exceptions:
None

Programming Notes:

Forming the branch target address by concatenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256MB region aligned on a 256MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.

This definition creates the following boundary case: When the jump instruction is in the last word of a 256MB region, it can branch only to the following 256MB region containing the branch delay slot.

The Jump instruction has been deprecated in Release 6. Use BC instead.
Format: JAL target

Purpose: Jump and Link

To execute a procedure call within the current 256MB-aligned region.

Description:
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution continues after a procedure call.

This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself).

Jump to the effective target address. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.

Restrictions:
Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Operation:

I: GPR[31] ← PC + 8
I+1: PC ← PC_{GPRLEN-1..28} || instr_index || 0^2

Exceptions:
None

Programming Notes:
Forming the branch target address by catenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256MB region aligned on a 256MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.

This definition creates the following boundary case: When the branch instruction is in the last word of a 256MB region, it can branch only to the following 256MB region containing the branch delay slot.

The Jump-and-Link instruction has been deprecated in Release 6. Use BALC instead.
JALR

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

JALR IJump and Link Register

Format:

JALR rs (rd = 31 implied)  
MIPS32
JALR rd, rs  
MIPS32

Purpose: Jump and Link Register

To execute a procedure call to an instruction address in a register

Description: GPR[rd] ← return_addr, PC ← GPR[rs]

Place the return address link in GPR rd. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

For processors that do not implement the MIPS16e or microMIPS ISA:

- Jump to the effective target address in GPR rs. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

For processors that do implement the MIPS16e or microMIPS ISA:

- Jump to the effective target address in GPR rs. Set the ISA Mode bit to the value in GPR rs bit 0. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

In both cases, execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.

In Release 1 of the architecture, the only defined hint field value is 0, which sets default handling of JALR. In Release 2 of the architecture, bit 10 of the hint field is used to encode a hazard barrier. See the JALR.HB instruction description for additional information.

Restrictions:

Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Jump-and-Link Restartability: Register specifiers rs and rd must not be equal, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by re-executing the branch when an exception occurs in the delay slot.

Restrictions Related to Multiple Instruction Sets: This instruction can change the active instruction set, if more than one instruction set is implemented.
If only one instruction set is implemented, then the effective target address must obey the alignment rules of the
instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules
of the intended instruction set of the target address as specified by the bit 0 or GPR $rs$.

For processors that do not implement the microMIPS32/64 ISA, the effective target address in GPR $rs$ must be naturally-aligned. For processors that do not implement the MIPS16e ASE nor microMIPS32/64 ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.

For processors that do implement the MIPS16e ASE or microMIPS32/64 ISA, if target ISAMode bit is zero (GPR $rs$ bit 0) and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.

**Availability and Compatibility:**

Release 6 maps JR and JR.HB to JALR and JALR.HB with $rd = 0$:

Pre-Release 6, JR and JALR were distinct instructions, both with primary opcode SPECIAL, but with distinct function codes.

Release 6: JR is defined to be JALR with the destination register specifier $rd$ set to 0. The primary opcode and function field are the same for JR and JALR. The pre-Release 6 instruction encoding for JR is removed in Release 6.

Release 6 assemblers should accept the JR and JR.HB mnemonics, mapping them to the Release 6 instruction encodings.

**Operation:**

```
I: temp ← GPR[rs]
   GPR[rd] ← PC + 8
I+1: if Config3ISA = 1 then
      PC ← temp
   else
      PC ← tempGPRLEN-1..1 || 0
      ISAMode ← temp0
   endif
```

**Exceptions:**

None

**Programming Notes:**

This jump-and-link register instruction can select a register for the return link; other link instructions use GPR 31.

The default register for GPR $rd$, if omitted in the assembly language instruction, is GPR 31.
**JALR.HB**

**Jump and Link Register with Hazard Barrier**

**Format:**
- JALR.HB rs (rd = 31 implied)  
- JALR.HB rd, rs

**Purpose:** Jump and Link Register with Hazard Barrier

To execute a procedure call to an instruction address in a register and clear all execution and instruction hazards

**Description:**
- GPR[rd] ← return_addr, PC ← GPR[rs], clear execution and instruction hazards
- Place the return address link in GPR rd. The return link is the address of the second instruction following the branch, where execution continues after a procedure call.

*For processors that do not implement the MIPS16e or microMIPS ISA:*
- Jump to the effective target address in GPR rs. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

*For processors that do implement the MIPS16e or microMIPS ISA:*
- Jump to the effective target address in GPR rs. Set the ISA Mode bit to the value in GPR rs bit 0. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

In both cases, execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.

JALR.HB implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCl instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the JALR.HB instruction jumps. An equivalent barrier is also implemented by the ERET instruction, but that instruction is only available if access to Coprocessor 0 is enabled, whereas JALR.HB is legal in all operating modes.

This instruction clears both execution and instruction hazards. Refer to the EHB instruction description for the method of clearing execution hazards alone.

JALR.HB uses bit 10 of the instruction (the upper bit of the hint field) to denote the hazard barrier operation.

**Restrictions:**

JALR.HB does not clear hazards created by any instruction that is executed in the delay slot of the JALR.HB. Only hazards created by instructions executed before the JALR.HB are cleared by the JALR.HB.

After modifying an instruction stream mapping or writing to the instruction stream, execution of those instructions has **UNPREDICTABLE** behavior until the instruction hazard has been cleared with JALR.HB, JR.HB, ERET, or DERET. Further, the operation is **UNPREDICTABLE** if the mapping of the current instruction stream is modified.
Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

Jump-and-Link Restartability: Register specifiers \( rs \) and \( rd \) must not be equal, because such an instruction does not have the same effect when re-executed. The result of executing such an instruction is UNPREDICTABLE. This restriction permits an exception handler to resume execution by re-executing the branch when an exception occurs in the delay slot.

Restrictions Related to Multiple Instruction Sets: This instruction can change the active instruction set, if more than one instruction set is implemented.

If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR \( rs \).

For processors that do not implement the microMIPS32/64 ISA, the effective target address in GPR \( rs \) must be naturally-aligned. For processors that do not implement the MIPS16 ASE nor microMIPS32/64 ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.

For processors that do implement the MIPS16 ASE or microMIPS32/64 ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.

Availability and Compatibility:

Release 6 maps JR and JR.HB to JALR and JALR.HB with \( rd = 0 \):

Pre-Release 6, JR.HB and JALR.HB were distinct instructions, both with primary opcode SPECIAL, but with distinct function codes.

Release 6: JR.HB is defined to be JALR.HB with the destination register specifier \( rd \) set to 0. The primary opcode and function field are the same for JR.HB and JALR.HB. The pre-Release 6 instruction encoding for JR.HB is removed in Release 6.

Release 6 assemblers should accept the JR and JR.HB mnemonics, mapping them to the Release 6 instruction encodings.

Operation:

\[
\begin{align*}
I &: \ \text{temp} \leftarrow \text{GPR}[rs] \\
& \ \text{GPR}[rd] \leftarrow \text{PC} + 8 \\
I+1 &: \ \text{if Config3}_{ISA} = 1 \text{ then} \\
& \ \text{PC} \leftarrow \text{temp} \\
& \ \text{else} \\
& \ \text{PC} \leftarrow \text{temp}_{GPRLEN-1..1} \ || \ 0 \\
& \ \text{ISAMode} \leftarrow \text{temp}_0 \\
& \ \text{endif} \\
& \ \text{ClearHazards}() \\
\end{align*}
\]

Exceptions:

None

Programming Notes:

This branch-and-link instruction can select a register for the return link; other link instructions use GPR 31.
default register for GPR *rd*, if omitted in the assembly language instruction, is GPR 31.

Release 6 JR.HB *rs* is implemented as JALR.HB r0,*rs*. For example, as JALR.HB with the destination set to the zero register, r0.

This instruction implements the final step in clearing execution and instruction hazards before execution continues. A hazard is created when a Coprocessor 0 or TLB write affects execution or the mapping of the instruction stream, or after a write to the instruction stream. When such a situation exists, software must explicitly indicate to hardware that the hazard should be cleared. Execution hazards alone can be cleared with the EHB instruction. Instruction hazards can only be cleared with a JR.HB, JALR.HB, or ERET instruction. These instructions cause hardware to clear the hazard before the instruction at the target of the jump is fetched. Note that because these instructions are encoded as jumps, the process of clearing an instruction hazard can often be included as part of a call (JALR) or return (JR) sequence, by simply replacing the original instructions with the HB equivalent.

Example: Clearing hazards due to an ASID change

```plaintext
/*
  * Code used to modify ASID and call a routine with the new
  * mapping established.
  * a0 = New ASID to establish
  * a1 = Address of the routine to call
  */
  mfc0 v0, C0_EntryHi /* Read current ASID */
  li v1, ~M_EntryHiASID /* Get negative mask for field */
  and v0, v0, v1 /* Clear out current ASID value */
  or v0, v0, a0 /* OR in new ASID value */
  mtc0 v0, C0_EntryHi /* Rewrite EntryHi with new ASID */
  jalr.hb a1 /* Call routine, clearing the hazard */
```
JALX Jump and Link Exchange

Format: JALX target

MIPS32 with (microMIPS or MIPS16e), removed in Release 6

Purpose: Jump and Link Exchange
To execute a procedure call within the current 256 MB-aligned region and change the ISA Mode from MIPS32 to microMIPS32 or MIPS16e.

Description:
Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution continues after a procedure call. The value stored in GPR 31 bit 0 reflects the current value of the ISA Mode bit.

This is a PC-region branch (not PC-relative); the effective target address is in the “current” 256 MB-aligned region. The low 28 bits of the target address is the instr_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself).

Jump to the effective target address, toggling the ISA Mode bit. Execute the instruction that follows the jump, in the branch delay slot, before executing the jump itself.

Restrictions:
This instruction only supports 32-bit aligned branch target addresses.

Control Transfer Instructions (CTIs) should not be placed in branch delay slots. CTIs include all branches and jumps, NAL, ERET, ERETNc, DERET, WAIT, and PAUSE.

Processor operation is UNPREDICTABLE if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Availability and Compatibility:
If the microMIPS base architecture is not implemented and the MIPS16e ASE is not implemented, a Reserved Instruction exception is initiated.

The JALX instruction has been removed in Release 6. Pre-Release 6 code using JALX cannot run on Release 6 by trap-and-emulate. Equivalent functionality is provided by the JIALC instruction added by Release 6.

Operation:

\[
\begin{align*}
I & : & \text{GPR}[31] & \leftarrow \text{PC} + 8 \\
I+1 & : & \text{PC} & \leftarrow \text{PC}_{\text{GPRLEN-1..28}} || \text{instr_index} || 0^2 \\
& & \text{ISAMode} & \leftarrow (\text{not ISAMode})
\end{align*}
\]

Exceptions:
None

Programming Notes:
Forming the branch target address by concatenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.

This definition creates the following boundary case: When the branch instruction is in the last word of a 256 MB
region, it can branch only to the following 256 MB region containing the branch delay slot.
## JIALC

**Jump Indexed and Link, Compact**

### Format:

\[ \text{JIALC } rt, \text{ offset} \]

### MIPS32 Release 6

**Purpose:** Jump Indexed and Link, Compact

**Description:**

\[
\text{GPR}[31] \leftarrow \text{PC} + 4, \text{ PC} \leftarrow ( \text{GPR}[rt] + \text{sign}_{-}\text{extend}\text{( offset ) } )
\]

The jump target is formed by sign extending the offset field of the instruction and adding it to the contents of GPR `rt`.

The offset is NOT shifted, that is, each bit of the offset is added to the corresponding bit of the GPR.

Places the return address link in GPR 31. The return link is the address of the following instruction, where execution continues after a procedure call returns.

*For processors that do not implement the MIPS16e or microMIPS ISA:*

- Jump to the effective target address derived from GPR `rt` and the offset. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

*For processors that do implement the MIPS16e or microMIPS ISA:*

- Jump to the effective target address derived from GPR `rt` and the offset. Set the ISA Mode bit to bit 0 of the effective address. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

Compact jumps do not have delay slots. The instruction after the jump is NOT executed when the jump is executed.

### Restrictions:

This instruction is an unconditional, always taken, compact jump, and hence has neither a delay slot nor a forbidden slot. The instruction after the jump is not executed when the jump is executed.

The register specifier may be set to the link register $31$, because compact jumps do not have the restartability issues of jumps with delay slots. However, this is not common programming practice.

### Availability and Compatibility:

This instruction is introduced by and required as of Release 6.

Release 6 instructions JIALC and BNEZC differ only in the `rs` field, instruction bits 21-25. JIALC and BNEZC occupy the same encoding as pre-Release 6 instruction encoding SDC2, which is recoded in Release 6.

### Exceptions:

None

### Operation:

```plaintext
temp \leftarrow \text{GPR}[rt] + \text{sign}_{-}\text{extend}\text{( offset )}
\text{GPR}[31] \leftarrow \text{PC} + 4
\text{if Config3}_{ISA} = 1 \text{ then }
\quad \text{PC} \leftarrow \text{temp}
\text{else}
\quad \text{PC} \leftarrow (\text{temp}_{\text{GPRLEN}-1..1} || 0)
\quad \text{ISAMode} \leftarrow \text{temp}_0
\text{endif}
```
Programming Notes:

JIALC does NOT shift the offset before adding it the register. This can be used to eliminate tags in the least significant bits that would otherwise produce misalignment. It also allows JIALC to be used as a substitute for the JALX instruction, removed in Release 6, where the lower bits of the target PC, formed by the addition of GPR[rt] and the unshifted offset, specify the target ISA mode.
Format: JIC rt, offset

Purpose: Jump Indexed, Compact

Description: The branch target is formed by sign extending the offset field of the instruction and adding it to the contents of GPR rt.

The offset is NOT shifted, that is, each bit of the offset is added to the corresponding bit of the GPR.

For processors that do not implement the MIPS16e or microMIPS ISA:

- Jump to the effective target address derived from GPR rt and the offset. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

For processors that do implement the MIPS16e or microMIPS ISA:

- Jump to the effective target address derived from GPR rt and the offset. Set the ISA Mode bit to bit 0 of the effective address. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

Compact jumps do not have a delay slot. The instruction after the jump is NOT executed when the jump is executed.

Restrictions:

This instruction is an unconditional, always taken, compact jump, and hence has neither a delay slot nor a forbidden slot. The instruction after the jump is not executed when the jump is executed.

Availability and Compatibility:

This instruction is introduced by and required as of Release 6.

Release 6 instructions JIC and BEQZC differ only in the rs field. JIC and BEQZC occupy the same encoding as pre-Release 6 instruction LDC2, which is recoded in Release 6.

Exceptions:

None

Operation:

\[
\text{temp} \leftarrow \text{GPR}[rt] + \text{sign}_{\text{extend}}(\text{offset})
\]

if Config3ISA = 1 then

\[
\text{PC} \leftarrow \text{temp}
\]

else

\[
\text{PC} \leftarrow (\text{temp}_{\text{GPRLEN}-1\ldots1} || 0)
\]

\[
\text{ISAMode} \leftarrow \text{temp}_{0}
\]

endif

Programming Notes:

JIC does NOT shift the offset before adding it the register. This can be used to eliminate tags in the least significant bits that would otherwise produce misalignment. It also allows JIALC to be used as a substitute for the JALX instruction, removed in Release 6, where the lower bits of the target PC, formed by the addition of GPR[rt] and the unshifted offset, specify the target ISAmode.
JR Jump Register

Format: \texttt{JR \text{rs}}

\textbf{Purpose:} Jump Register

To execute a branch to an instruction address in a register

\textbf{Description: } \texttt{PC \leftarrow GPR[rs]}

Jump to the effective target address in GPR \text{rs}. Execute the instruction following the jump, in the branch delay slot, before jumping.

\textit{For processors that do not implement the MIPS16e or microMIPS ISA:}

- Jump to the effective target address in GPR \text{rs}. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

\textit{For processors that do implement the MIPS16e or microMIPS ISA:}

- Jump to the effective target address in GPR \text{rs}. Set the ISA Mode bit to the value in GPR \text{rs} bit 0. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

\textbf{Restrictions:}

\textit{Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots.} CTIs include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is \textbf{UNPREDICTABLE} if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

\textit{Restrictions Related to Multiple Instruction Sets:} This instruction can change the active instruction set, if more than one instruction set is implemented.

If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR \text{rs}.

For processors that do not implement the microMIPS ISA, the effective target address in GPR \text{rs} must be naturally-aligned. For processors that do not implement the MIPS16e ASE or microMIPS ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.

For processors that do implement the MIPS16e ASE or microMIPS ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.

\begin{center}
\begin{tabular}{|c|c|c|c|c|c|c|c|}
\hline
pre-Release 6: & 31 & 26 & 25 & 21 & 20 & 11 & 10 \tabularnewline SPECIAL & rs & 000000 & & & & & \tabularnewline 000000 & & & & & & & \tabularnewline 0 & hint & & & & & & \tabularnewline 000000 & & & & & & & \tabularnewline 0 & & & & & & & \tabularnewline \hline
\end{tabular}
\end{center}
In release 1 of the architecture, the only defined hint field value is 0, which sets default handling of JR. In Release 2 of the architecture, bit 10 of the hint field is used to encode an instruction hazard barrier. See the JR.HB instruction description for additional information.

**Availability and Compatibility:**

Release 6 maps JR and JR.HB to JALR and JALR.HB with $rd = 0$:

Pre-Release 6, JR and JALR were distinct instructions, both with primary opcode SPECIAL, but with distinct function codes.

Release 6: JR is defined to be JALR with the destination register specifier $rd$ set to 0. The primary opcode and function field are the same for JR and JALR. The pre-Release 6 instruction encoding for JR is removed in Release 6.

Release 6 assemblers should accept the JR and JR.HB mnemonics, mapping them to the Release 6 instruction encodings.

**Operation:**

\[
\begin{align*}
I: & \quad \text{temp} \leftarrow \text{GPR}[rs] \\
I+1: & \quad \text{if Config1CA = 0 then} \\
& \quad \quad \text{PC} \leftarrow \text{temp} \\
& \quad \quad \text{else} \\
& \quad \quad \quad \quad \text{PC} \leftarrow \text{temp}_{\text{GPRLEN}-1..1} \ || \ 0 \\
& \quad \quad \quad \quad \text{ISAMode} \leftarrow \text{temp}_0 \\
& \quad \quad \text{endif}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

Software should use the value 31 for the $rs$ field of the instruction word on return from a JAL, JALR, or BGEZAL, and should use a value other than 31 for remaining uses of JR.
JR.HB: Jump Register with Hazard Barrier

**Purpose:** Jump Register with Hazard Barrier

To execute a branch to an instruction address in a register and clear all execution and instruction hazards.

**Description:**

PC \( \leftarrow \) GPR[rs], clear execution and instruction hazards

Jump to the effective target address in GPR rs. Execute the instruction following the jump, in the branch delay slot, before jumping.

*For processors that do not implement the MIPS16e or microMIPS ISA:*

- Jump to the effective target address in GPR rs. If the target address is not 4-byte aligned, an Address Error exception will occur when the target address is fetched.

*For processors that do implement the MIPS16e or microMIPS ISA:*

- Jump to the effective target address in GPR rs. Set the ISA Mode bit to the value in GPR rs bit 0. Set bit 0 of the target address to zero. If the target ISA Mode bit is 0 and the target address is not 4-byte aligned, an Address Error exception will occur when the target instruction is fetched.

JR.HB implements a software barrier that resolves all execution and instruction hazards created by Coprocessor 0 state changes (for Release 2 implementations, refer to the SYNCI instruction for additional information on resolving instruction hazards created by writing the instruction stream). The effects of this barrier are seen starting with the instruction fetch and decode of the instruction at the PC to which the JR.HB instruction jumps. An equivalent barrier is also implemented by the ERET instruction, but that instruction is only available if access to Coprocessor 0 is enabled, whereas JR.HB is legal in all operating modes.

This instruction clears both execution and instruction hazards. Refer to the EHB instruction description for the method of clearing execution hazards alone.

JR.HB uses bit 10 of the instruction (the upper bit of the hint field) to denote the hazard barrier operation.

**Restrictions:**

JR.HB does not clear hazards created by any instruction that is executed in the delay slot of the JR.HB. Only hazards created by instructions executed before the JR.HB are cleared by the JR.HB.

After modifying an instruction stream mapping or writing to the instruction stream, execution of those instructions has UNPREDICTABLE behavior until the hazard has been cleared with JALR.HB, JR.HB, ERET, or DERET. Further, the operation is UNPREDICTABLE if the mapping of the current instruction stream is modified.

*Control Transfer Instructions (CTIs) should not be placed in branch delay slots or Release 6 forbidden slots. CTIs*
include all branches and jumps, NAL, ERET, ERETNC, DERET, WAIT, and PAUSE.

Pre-Release 6: Processor operation is **UNPREDICTABLE** if a control transfer instruction (CTI) is placed in the delay slot of a branch or jump.

Release 6: If a control transfer instruction (CTI) is executed in the delay slot of a branch or jump, Release 6 implementations are required to signal a Reserved Instruction exception.

**Restrictions Related to Multiple Instruction Sets:** This instruction can change the active instruction set, if more than one instruction set is implemented.

If only one instruction set is implemented, then the effective target address must obey the alignment rules of the instruction set. If multiple instruction sets are implemented, the effective target address must obey the alignment rules of the intended instruction set of the target address as specified by the bit 0 or GPR rs.

For processors that do not implement the microMIPS ISA, the effective target address in GPR rs must be naturally-aligned. For processors that do not implement the MIPS16 ASE or microMIPS ISA, if either of the two least-significant bits are not zero, an Address Error exception occurs when the branch target is subsequently fetched as an instruction.

For processors that do implement the MIPS16 ASE or microMIPS ISA, if bit 0 is zero and bit 1 is one, an Address Error exception occurs when the jump target is subsequently fetched as an instruction.

**Availability and Compatibility:**

*Release 6 maps JR and JR.HB to JALR and JALR.HB with rd = 0:*

Pre-Release 6, JR.HB and JALR.HB were distinct instructions, both with primary opcode SPECIAL, but with distinct function codes.

Release 6: JR.HB is defined to be JALR.HB with the destination register specifier rd set to 0. The primary opcode and function field are the same for JR.HB and JALR.HB. The pre-Release 6 instruction encoding for JR.HB is removed in Release 6.

Release 6 assemblers should accept the JR and JR.HB mnemonics, mapping them to the Release 6 instruction encodings.

**Operation:**

\[
\begin{align*}
I & : \text{temp} \leftarrow \text{GPR}[rs] \\
I+1 & : \begin{cases} 
\text{if Config1CA = 0 then} \\
\text{else}
\end{cases} \\
& \text{PC} \leftarrow \text{temp} \\
& \text{PC} \leftarrow \text{temp}_{\text{GPRLEN}-1..1} || 0 \\
& \text{ISAMode} \leftarrow \text{temp}_0 \\
& \text{ClearHazards()}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

This instruction implements the final step in clearing execution and instruction hazards before execution continues. A hazard is created when a Coprocessor 0 or TLB write affects execution or the mapping of the instruction stream, or after a write to the instruction stream. When such a situation exists, software must explicitly indicate to hardware that the hazard should be cleared. Execution hazards alone can be cleared with the EHB instruction. Instruction hazards can only be cleared with a JR.HB, JALR.HB, or ERET instruction. These instructions cause hardware to clear the hazard before the instruction at the target of the jump is fetched. Note that because these instructions are encoded as jumps, the process of clearing an instruction hazard can often be included as part of a call (JALR) or return (JR)
sequence, by simply replacing the original instructions with the HB equivalent.

Example: Clearing hazards due to an ASID change

```mips
/*
 * Routine called to modify ASID and return with the new
 * mapping established.
 * a0 = New ASID to establish
 */
mfc0 v0, C0_EntryHi /* Read current ASID */
l1 v1, ~M_EntryHiASID /* Get negative mask for field */
and v0, v0, v1 /* Clear out current ASID value */
or v0, v0, a0 /* OR in new ASID value */
mtc0 v0, C0_EntryHi /* Rewrite EntryHi with new ASID */
jr.hb ra /* Return, clearing the hazard */
nop
```

Example: Making a write to the instruction stream visible

```mips
/*
 * Routine called after new instructions are written to
 * make them visible and return with the hazards cleared.
 */
{Synchronize the caches - see the SYNCI and CACHE instructions}
sync /* Force memory synchronization */
jr.hb ra /* Return, clearing the hazard */
nop
```

Example: Clearing instruction hazards in-line

```mips
la AT, 10f
jr.hb AT /* Jump to next instruction, clearing */
nop /* hazards */
10:
```
**Format:**  \[ \text{LB } rt, \text{ offset}(\text{base}) \]  

**Purpose:** Load Byte  
To load a byte from memory as a signed value.

**Description:** \( \text{GPR}[rt] \leftarrow \text{memory}[\text{GPR}[\text{base}] + \text{offset}] \)  
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended, and placed in GPR \( rt \). The 16-bit signed \( \text{offset} \) is added to the contents of GPR \( \text{base} \) to form the effective address.

**Restrictions:** None

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}] \\
(\text{pAddr, CCA}) & \leftarrow \text{Address\_Translation}(\text{vAddr, DATA, LOAD}) \\
\text{pAddr} & \leftarrow \text{pAddr}_{\text{P\_SIZE-1..2}} | | (\text{pAddr}_{1..0} \text{ xor ReverseEndian}^2) \\
\text{memword} & \leftarrow \text{Load\_Memory}(\text{CCA, BYTE, pAddr, vAddr, DATA}) \\
\text{byte} & \leftarrow \text{vAddr}_{1..0} \text{ xor Big\_Endian\_CPU}^2 \\
\text{GPR}[rt] & \leftarrow \text{sign\_extend}(\text{memword}_{8*\text{byte}..8*\text{byte}}) 
\end{align*}
\]

**Exceptions:**  
TLB Refill, TLB Invalid, Address Error, Watch
LBE Load Byte EVA

**Format:** LBE rt, offset(base)

**Purpose:** Load Byte EVA

To load a byte as a signed value from user mode virtual address space when executing in kernel mode.

**Description:** GPR[rt] ← memory[GPR[base] + offset]

The contents of the 8-bit byte at the memory location specified by the effective address are fetched, sign-extended, and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effective address.

The LBE instruction functions the same as the LB instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode and executing in kernel mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to one.

**Restrictions:**

Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR[base]} \\
(\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{pAddr} & \leftarrow \text{pAddr}_{PSIZE-1..2} || (\text{pAddr}\_1..0 \oplus \text{ReverseEndian}^2) \\
\text{memword} & \leftarrow \text{LoadMemory}(\text{CCA}, \text{BYTE}, \text{pAddr}, \text{vAddr}, \text{DATA}) \\
\text{byte} & \leftarrow \text{vAddr}..0 \oplus \text{BigEndianCPU}^2 \\
\text{GPR}[rt] & \leftarrow \text{sign\_extend}(\text{memword}_{8\times\text{byte}..8\times\text{byte}})
\end{align*}
\]

**Exceptions:**

TLB Refill, TLB Invalid

Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
Format: \texttt{LBU\ rt, offset(base)}

Purpose: Load Byte Unsigned
To load a byte from memory as an unsigned value

Description: \(GPR[rt] \leftarrow \text{memory}[GPR[base] + offset]\)
The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended, and placed in GPR \(rt\). The 16-bit signed \(offset\) is added to the contents of GPR \(base\) to form the effective address.

Restrictions:
None

Operation:
\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend}(\text{offset}) + GPR[\text{base}] \\
(p\text{Addr}, \text{CCA}) & \leftarrow \text{AddressTranslation}\ (\text{vAddr}, \text{DATA}, \text{LOAD}) \\
p\text{Addr} & \leftarrow p\text{Addr}_{p\text{SIZE}-1..2} \| (p\text{Addr}_{1..0} \text{ xor ReverseEndian}^2) \\
\text{memword} & \leftarrow \text{LoadMemory}\ (\text{CCA}, \text{BYTE}, p\text{Addr}, \text{vAddr}, \text{DATA}) \\
\text{byte} & \leftarrow \text{vAddr}_{1..0} \text{ xor BigEndianCPU}^2 \\
GPR[rt] & \leftarrow \text{zero\_extend}(\text{memword}_{8*\text{byte}..8*\text{byte}})
\end{align*}
\]

Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Format: \( \text{LBUE } rt, \text{ offset(base) } \)

Purpose: Load Byte Unsigned EVA

To load a byte as an unsigned value from user mode virtual address space when executing in kernel mode.

Description: \( \text{GPR}[rt] \leftarrow \text{memory}[\text{GPR}[base] + \text{offset}] \)

The contents of the 8-bit byte at the memory location specified by the effective address are fetched, zero-extended, and placed in GPR \( rt \). The 9-bit signed \( \text{offset} \) is added to the contents of GPR \( \text{base} \) to form the effective address.

The LBUE instruction functions the same as the LBU instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the \( \text{Config}_5^{EVA} \) field being set to one.

Restrictions:

Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Operation:

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign}_{\text{extend}}(\text{offset}) + \text{GPR}[\text{base}] \\
(p\text{Addr}, \text{CCA}) & \leftarrow \text{AddressTranslation} (\text{vAddr, DATA, LOAD}) \\
p\text{Addr} & \leftarrow p\text{Addr}_{\text{PAGE}-1..2} || (p\text{Addr}_{1..0} \oplus \text{ReverseEndian}^2) \\
\text{memword} & \leftarrow \text{LoadMemory} (\text{CCA, BYTE, pAddr, vAddr, DATA}) \\
\text{byte} & \leftarrow \text{vAddr}_{1..0} \oplus \text{BigEndianCPU}^2 \\
\text{GPR}[rt] & \leftarrow \text{zero}_{\text{extend}}(\text{memword}_{7..8*\text{byte}}..8*\text{byte})
\end{align*}
\]

Exceptions:

TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
**LDC1**

**Load Doubleword to Floating Point**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LDC1</td>
<td>base</td>
<td>ft</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** LDC1 ft, offset(base)

**Purpose:** Load Doubleword to Floating Point
To load a doubleword from memory to an FPR.

**Description:** FPR[ft] ← memory[GPR[base] + offset]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched and placed in FPR ft. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**
Pre-Release 6: An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned).
Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.
Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend(offset)} + \text{GPR[base]}\\
\text{pAddr, CCA} & \leftarrow \text{AddressTranslation (vAddr, DATA, LOAD)}\\
\text{paddr} & \leftarrow \text{paddr xor ((BigEndianCPU xor ReverseEndian) || 0²)}\\
\text{memlsw} & \leftarrow \text{LoadMemory(CCA, WORD, pAddr, vAddr, DATA)}\\
\text{paddr} & \leftarrow \text{paddr xor 0b100}\\
\text{memmsw} & \leftarrow \text{LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA)}\\
\text{memdoubleword} & \leftarrow \text{memmsw || memlsw}\\
\text{StoreFPR(ft, UNINTERPRETED\_DOUBLEWORD, memdoubleword)}
\end{align*}
\]

**Exceptions:**
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Address Error, Watch
**LDC2**

Load Doubleword to Coprocessor 2

**Format:**  
LDC2 rt, offset(base)

**Purpose:**  
To load a doubleword from memory to a Coprocessor 2 register.

**Description:**  
\[ \text{CPR}[2, \text{rt}, 0] \leftarrow \text{memory}[\text{GPR}[\text{base}] + \text{offset}] \]

The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched and placed in Coprocessor 2 register \( rt \). The signed \( \text{offset} \) is added to the contents of GPR \( \text{base} \) to form the effective address.

**Restrictions:**  
Pre-Release 6: An Address Error exception occurs if \( \text{EffectiveAddress}_{2,0} \neq 0 \) (not doubleword-aligned).

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Availability and Compatibility:**  
This instruction has been recoded for Release 6.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign_extend(\text{offset})} + \text{GPR[base]} \\
(\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation(vAddr, DATA, LOAD)} \\
\text{paddr} & \leftarrow \text{paddr xor ((BigEndianCPU xor ReverseEndian) \| 02)} \\
\text{memlsw} & \leftarrow \text{LoadMemory(CCA, WORD, pAddr, vAddr, DATA)} \\
\text{paddr} & \leftarrow \text{paddr xor 0b100} \\
\text{memmsw} & \leftarrow \text{LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA)} \\
\text{memlsw} & \leftarrow \text{memlsw}
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Address Error, Watch

**Programming Notes:**

Release 6 implements a 9-bit offset, whereas all release levels lower than Release 6 implement a 16-bit offset.

**Programming Notes:**

As shown in the instruction drawing above, Release 6 implements an 11-bit offset, whereas all release levels lower
than Release 6 of the MIPS architecture implement a 16-bit offset.
LDXC1 Load Doubleword Indexed to Floating Point

Format:  
LDXC1 fd, index(base)

MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Purpose:  
Load Doubleword Indexed to Floating Point
To load a doubleword from memory to an FPR (GPR+GPR addressing)

Description:  
FPR[fd] ← memory[GPR[base] + GPR[index]]
The contents of the 64-bit doubleword at the memory location specified by the aligned effective address are fetched
and placed in FPR fd. The contents of GPR index and GPR base are added to form the effective address.

Restrictions:
An Address Error exception occurs if EffectiveAddress2_0 ≠ 0 (not doubleword-aligned).

Availability and Compatibility:
This instruction has been removed in Release 6.
Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in
MIPS32 Release 2 and all subsequent versions of MIPS32. When required, required whenever FPU is present,
whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).

Operation:

\[
\begin{align*}
vAddr & \leftarrow GPR[base] + GPR[index] \\
n & \text{if } vAddr2_0 \neq 0^3 \text{ then} \\
& \quad \text{SignalException(AddressError)} \\
paddr & \leftarrow \text{AddressTranslation}(vAddr, DATA, LOAD) \\
paddr & \leftarrow paddr \oplus ((\text{BigEndianCPU} \oplus \text{ReverseEndian}) \mid \mid 0^2) \\
memlsw & \leftarrow \text{LoadMemory}(CCA, WORD, pAddr, vAddr, DATA) \\
paddr & \leftarrow paddr \oplus 0b100 \\
memmsw & \leftarrow \text{LoadMemory}(CCA, WORD, pAddr, vAddr+4, DATA) \\
memdoubleword & \leftarrow memmsw \mid \mid memlsw \\
& \text{StoreFPR(fd, UNINTERPRETED_DOUBLEWORD, memdoubleword)}
\end{align*}
\]

Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
Load Halfword

Format: LH rt, offset(base)

Purpose: Load Halfword
To load a halfword from memory as a signed value

Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

Restrictions:
Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.
Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddr_{SIZE-1..2} || (pAddr_{1..0} xor (ReverseEndian || 0))
memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr_{1..0} xor (BigEndianCPU || 0)
GPR[rt] ← sign_extend(memword_{15+8*byte..8*byte})

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
Format: LHE rt, offset(base)

Purpose: Load Halfword EVA

To load a halfword as a signed value from user mode virtual address space when executing in kernel mode.

Description: GPR[rt] ← memory[GPR[base] + offset]

The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, sign-extended, and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effective address.

The LHE instruction functions the same as the LH instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to one.

Restrictions:

Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

vAddr ← sign_extend(offset) + GPR[base]
pAddr ← AddressTranslation(vAddr, DATA, LOAD)
pAddr ← pAddrPSIZE-1.2 || (pAddr1..0 xor (ReverseEndian || 0))
memword ← LoadMemory(CCA, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← sign_extend(memword15+8*byte..8*byte)

Exceptions:

TLB Refill, TLB Invalid, Bus Error, Address Error
Watch, Reserved Instruction, Coprocessor Unusable
### LHU

**Load Halfword Unsigned**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>LHU</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>6</th>
<th>5</th>
<th>16</th>
</tr>
</thead>
<tbody>
<tr>
<td>100101</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Format:

LHU rt, offset(base)

#### Purpose:

Load Halfword Unsigned

To load a halfword from memory as an unsigned value.

#### Description:

GPR[rt] ← memory[GPR[base] + offset]

The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

#### Restrictions:

Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

#### Operation:

1. vAddr ← sign_extend(offset) + GPR[base]
2. (pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
3. pAddr ← pAddrPE128-1..2 || (pAddr1..0 xor (ReverseEndian || 0))
4. memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA)
5. byte ← vAddr1..0 xor (BigEndianCPU || 0)
6. GPR[rt] ← zero_extend(memword15+8*byte..8*byte)

#### Exceptions:

TLB Refill, TLB Invalid, Address Error, Watch
LHUE Load Halfword Unsigned EVA

Format: LHUE rt, offset(base)

Purpose: Load Halfword Unsigned EVA
To load a halfword as an unsigned value from user mode virtual address space when executing in kernel mode.

Description: GPR[rt] ← memory[GPR[base] + offset]
The contents of the 16-bit halfword at the memory location specified by the aligned effective address are fetched, zero-extended, and placed in GPR rt. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The LHUE instruction functions the same as the LHU instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the ConfigEVA field being set to one.

Restrictions:
Only usable when access to Coprocessor0 is enabled and accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.
Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.
Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

```
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
pAddr ← pAddrNZSIF-1..2 || (pAddr1..0 xor (ReverseEndian || 0))
memword ← LoadMemory (CCA, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr1..0 xor (BigEndianCPU || 0)
GPR[rt] ← zero_extend(memword15..byte..byte)
```

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
**Format:**  LL rt, offset(base)

**Purpose:**  Load Linked Word

To load a word from memory for an atomic read-modify-write

**Description:**

The LL and SC instructions provide the primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and written into GPR rt. The signed offset is added to the contents of GPR base to form an effective address.

This begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor. When an LL is executed it starts an active RMW sequence replacing any other sequence that was active. The RMW sequence is completed by a subsequent SC instruction that either completes the RMW sequence atomically and succeeds, or does not and fails.

Executing LL on one processor does not cause an action that, by itself, causes an SC for the same block to fail on another processor.

An execution of LL does not have to be followed by execution of SC; a program is free to abandon the RMW sequence without attempting a write.

**Restrictions:**

The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is *UNPREDICTABLE*. Which storage is synchronizable is a function of both CPU and system implementations. See the documentation of the SC instruction for the formal definition.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the effective address is non-zero, an Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

**Operation:**

```
vAddr ← sign_extend(offset) + GPR[base]
if vAddr[1:0] ≠ 0 then
   SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
memword ← LoadMemory (CCA, WORD, pAddr, vAddr, DATA)
GPR[rt] ← memword
LLbit ← 1
```
Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch

Programming Notes:
Release 6 implements a 9-bit offset, whereas all release levels lower than Release 6 implement a 16-bit offset.
Format: \texttt{LLE rt, offset(base)}

\textbf{Purpose:} Load Linked Word EVA

To load a word from a user mode virtual address when executing in kernel mode for an atomic read-modify-write

\textbf{Description:} \texttt{GPR[rt] \leftarrow memory[GPR[base] + offset]}

The LLE and SCE instructions provide the primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations using user mode virtual addresses while executing in kernel mode.

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and written into GPR \texttt{rt}. The 9-bit signed \texttt{offset} is added to the contents of GPR \texttt{base} to form an effective address.

This begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor. When an LLE is executed it starts an active RMW sequence replacing any other sequence that was active. The RMW sequence is completed by a subsequent SCE instruction that either completes the RMW sequence atomically and succeeds, or does not and fails.

Executing LLE on one processor does not cause an action that, by itself, causes an SCE for the same block to fail on another processor.

An execution of LLE does not have to be followed by execution of SCE; a program is free to abandon the RMW sequence without attempting a write.

The LLE instruction functions the same as the LL instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Segmentation Control for additional information.

Implementation of this instruction is specified by the \texttt{Config5_{EVA}} field being set to one.

\textbf{Restrictions:}

The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is \texttt{UNPREDICTABLE}. Which storage is synchronizable is a function of both CPU and system implementations. See the documentation of the SCE instruction for the formal definition.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the effective address is non-zero, an Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

\textbf{Operation:}

\begin{verbatim}
  vAddr \leftarrow \text{sign_extend}(offset) + GPR[base]
  if vAddr\_1\_0 \neq 0^2 then
    \text{SignalException(AddressError)}
  endif
  (pAddr, CCA) \leftarrow \text{AddressTranslation (vAddr, DATA, LOAD)}
  memword \leftarrow \text{LoadMemory (CCA, WORD, pAddr, vAddr, DATA)}
  \text{\tt GPR[rt]} \leftarrow \text{memword}
  \text{\tt LLbit} \leftarrow 1
\end{verbatim}
Exceptions:
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Watch, Coprocessor Unusable

Programming Notes:
Load Linked Extended {Word,Word EVA}

**Format:**

<table>
<thead>
<tr>
<th>SPECIAL3</th>
<th>base</th>
<th>rt</th>
<th>offset</th>
<th>1</th>
<th>LL</th>
<th>110110</th>
</tr>
</thead>
</table>

<table>
<thead>
<tr>
<th>SPECIAL3</th>
<th>base</th>
<th>rt</th>
<th>offset</th>
<th>1</th>
<th>LLE</th>
<th>101110</th>
</tr>
</thead>
</table>

**LLX instruction encoding:**

<table>
<thead>
<tr>
<th>31 26 25 21 20 16 15 7 6 5 0</th>
</tr>
</thead>
</table>

**LLXE instruction encoding**

<table>
<thead>
<tr>
<th>31 26 25 21 20 16 15 7 6 5 0</th>
</tr>
</thead>
</table>

**Purpose:** Load Linked Extended {Word,Word EVA}

Load from memory, extending following Load Linked; word, or word EVA

**Description:**

The LLX/SCX family of instructions (LLX, LLXE, SCX, SCXE) extends the MIPS LL/SC mechanism for performing atomic read-modify-writes to permit more than one memory location to be accessed atomically. The memory locations are constrained to be aligned, adjacent and within both the same synchronization block and the same cache line (if applicable).

LL-SC and LLE-SCE allow 32-bit aligned atomic memory operations to be performed on MIPS32. LLX/LL-SCX/SC and LLXE/LLE-SCXE/SCE allow 64-bit aligned atomic memory operations to be performed on MIPS32.

LL-SC code sequences in general, and LLX/LL-SCX/SC in particular, provide atomicity if the computer system can guarantee that, if the SC succeeds, then atomicity has not been violated by operations between the LL and SC. It should also guarantee eventual success, i.e. that failures will not persist forever.

An LLX family instruction (LLX/LLXE) (at PC) must be followed by a matching LL family instruction (LL/LLE) (at PC+4), forming an LLX/LL instruction family pair (LLX/LL, LLXE/LLE). See Restrictions section for a full description of match requirements, and special case for SDBBP and BREAK breakpoint instructions.

The signed offset is added to the contents of GPR base to form an effective address. This address must be naturally aligned.

The memory bytes accessed by the LLX family instruction and the following, matching LL family instruction must be adjacent, non-overlapping, and aligned. The following, matching, LL family instruction must be double the access width. I.e. in an LLX/LL pair, the LL instruction must be aligned to an 8-byte boundary, and the LLX data address must be 4 bytes higher; similarly for an LLXE/LLE pair, the LLE instruction must be aligned to an 8-byte boundary, and the LLXE data address must be 4 bytes higher.

For LLX and LLXE: the 32-bit word at the memory location specified by the effective address is fetched, and written into GPR rt.

If the LLX family instruction is followed by a matching LL family instruction, behavior is as if a double width load access suitable for starting an atomic sequence is performed\(^1\). Memory data corresponding to the low byte addresses returned is written to GPR rt of the LL family instruction; the part corresponding to high byte addresses is written to GPR rt of the LLX instruction.

---

1. It is implementation dependent whether a single double width access, or two separate normal width accesses, are performed.
An LLX/LL family instruction pair (LLX/LL, LLXE/LLE) begins a RMW sequence on the current processor. There can be only one active RMW sequence per processor. Any subsequent LL family instruction or LLX/LL family instruction pair, when executed, starts an active RMW sequence replacing any other sequence that was active. The RMW sequence for an LLX/LL family instruction pair is completed by a subsequent SCX/SC family instruction pair, which should match the LLX/LL pair in type and size, and which either completes the RMW sequence atomically and succeeds, or does not and fails.

If the PC and PC+4 instruction encodings do not match, a Reserved Instruction exception is signaled. If the effective addresses of the LLX/LL or LLXE/LLE family instruction pair are not 32-bit word aligned separately and 64-bit doubleword aligned together, then Address Error is signaled. If the effective address of the following LL family instruction (at PC+4) is not the lowest byte address, then an Address Error exception is signaled. See Restrictions section for a full description of match requirements, and special case for SDBBP and BREAK breakpoint instructions.

If an exception occurs between the LLX family instruction at PC and the instruction at PC+4 (LL family, SDBBP or BREAK, or non-matching instruction which will signal a Reserved Instruction exception), the exception is reported with EPC=PC and Status.BD=1. In this case the LLX family instruction will have partially executed: exceptions relating solely to the LLX family instruction in isolation will already have been reported, including Address Error and TLB exceptions, but the actual memory reference will not yet have been performed, since it can only be performed atomically in conjunction with the following LL family instruction. The target register of the LLX family instruction will NOT have been updated. However, LLbit will be clear on entry to the exception handler, even if LLbit was set before the LLX family instruction started.2

Executing an LLX/LL family instruction pair on one processor does not cause an action that, by itself, causes an SC or SCX/SC pair for the same block to fail on another processor.

An execution of an LLX/LL family instruction pair does not have to be followed by execution of a matching SCX/SC instruction pair; a program is free to abandon the RMW sequence without attempting a write.

Restrictions:

The following restrictions apply to load-linked and store-conditional extended instructions in the LLX/SCX instruction family:

Coprocessor 0’s Cause register bit BD is extended to indicate exceptions related to the next instruction after the LLX/SCX-family instruction. Pseudocode indicates what value Cause.BD should be set to via comments such as SentryException(AddressError) /*BD=1*/. Similarly, the status register BadInstrP is extended to hold the LLX/SCX-family instruction if an exception is signaled for the next instruction, with BD=1.

An LLX/SCX family instruction must not be placed in a branch delay slot or compact branch forbidden slot: if this rule is violated, a Reserved Instruction exception will be signaled (with EPC=PC of branch, BD=1).

An LLX/SCX family instruction must be followed by a matching LL/SC-family instruction: An SCX instruction must be followed by an SC instruction of the same type. Similarly for LLX/LL, LLXE/LLE, and SCXE/SCE. If the following instruction does not match, a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=1).

Except: An LLX/SCX instruction may be followed by one of the breakpoint instructions BREAK or SDBBP, in which case the appropriate breakpoint exception takes priority over the Reserved Instruction exception. The BREAK exception will be signaled with EPC=PC of the LLX/SCX family instruction and BD=1. The debug exception caused by such an SDBBP will be reported with DEPC=PC of the LLX/SCX family instruction and DBD=1.

The base field must be the same in an LLX/SCX family instruction and the following, matching, LL/SC-family instruction: If the following instruction does not match, a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=1).

2. E.g. LLX rt, mem; Trap... SC => LLX’s rt is not updated, but the SC is required to fail unless the trap handler has successfully completed the LLX/LL family instruction pair.
The base and rt fields of the LLX family instruction must not be the same. If they are the same a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=0).

The LLX/SCX and following LL/SC family instructions must match in their offset field: Given matching in instruction type and base, the difference between the offset fields of the instruction at PC and the instruction at PC+4 should be the data size, 4 for LLX/LLE/SCX/SCXE. Programmers should follow this rule in coding. However, implementations do not need to explicitly check this rule, since it is implied by other rules. TBD

Natural Alignment: The effective address must be naturally aligned for any LLX/SCX family instruction; if not naturally aligned, an Address Error exception is signaled. i.e. for LLX, LLXE, SCX and SCXE, if the two least significant bits of the effective address are not both zero, an Address Error exception is signaled. Such an Address Error exception is signaled with EPC=PC of the LLX/SCX family instruction, BD=0.

Release 6 requires systems to provide support for misaligned memory accesses for all ordinary memory reference instructions such as LW (Load Word). However, this instruction is a special memory reference instruction for which misaligned support is NOT provided, and for which signalling an exception (AddressError) on a misaligned access is required.

Double Width Alignment: In addition to natural alignment, the memory bytes written by the LLX/SCX family instruction and the following LL/SC family instruction must be adjacent, non-overlapping, and must have the alignment natural for double the memory access size: The lowest byte address in an LLX/LL, LLXE/LLE, SCX/SC or SCXE/SCE pair must be 8-byte aligned. It is required that the LL/SC family instruction byte address be lower than that of the LLX/SCX family instruction. i.e. that the LL/SC family instruction in an LLX/LL or SCX/SC family instruction pair must be naturally aligned for double the memory access width.

The double width alignment condition must be satisfied for both virtual and physical addresses. If this condition is not met, then an Address Error exception is signaled, with EPC = PC of first instruction, and BD=1. This condition is guaranteed to be met in the physical address if met in the virtual address and if the SCX and SC translations are consistent.

Exception Priority: although LLX and LL may complete execution together, all exceptions for an LLX instruction (at PC) must be signaled, with EPC=PC and BD=0, before any exceptions are signaled, with EPC=PC and BD=1, for the next instruction (at PC+4) or for any exceptions caused by the interaction between the LLX instruction and the next instruction. This is as if the LLX instruction is executed enough to signal all exceptions, followed by exception checks for the combination of LLX and the next instruction. Similarly for LLX/LL, LLXE/LLE, and SCXE/SCE instructions.

Exceptions relating to an LLX/SCX family instruction are reported with EPC=PC of the LLX/SCX family instruction, and BD=0.

Exceptions relating to interaction between an LLX/SCX family instruction and the following instruction are reported with EPC=PC of LLX/SCX instruction and BD=1.

Debug single step exceptions are reported with DEPC=PC of the LLX/SCX family instruction, and BD=0. No debug single step exception will be reported for the SC instruction of an SCX/SC pair: For the purposes of debug single stepping, the SCX/SC pair is atomic. Similarly for LLX/LL, LLE/LLXE, and SCXE/SCE pairs of instructions.

Exceptions related to the SCX/SC family instruction pair before following instruction cancel SCX but do not clear LLbit: if an exception or interrupt occurs at or after the SCX-family instruction and before or at the next instruction, the SCX is canceled, but LLbit is not cleared. i.e. the LLX/LL-SCX/SC atomic is not necessarily forced to fail. Exceptions are therefore reported with EPC=PC of SCX, and BD=0 or 1 as appropriate. Exception handling software should return (ERET or ERETNC) to the PC of the SCX instruction, re-executing the SCX/SC pair. Adjusting EPC or DEPC and returning to the SC instruction without re-executing the SCX instruction will result in incorrect behavior.

For exceptions related to an LLX/LL family instruction pair:

- No memory access is performed.
• Neither target register of the LLX/LL family instruction pair is updated.

• \textit{LLbit} is not set.

• \textit{EPC} (or \textit{DEPC}) is set to the PC of the LLX family instruction.

• Status.BD is set to 0 or 1 as appropriate, as described below.

Exception handling software should return (ERET or ERETNC) to the PC of the LLX instruction, re-executing the LLX/LL pair. Adjusting EPC or DEPC and returning to the LL instruction without re-executing the LLX instruction will result in incorrect behavior.

LLX/LL and SCX/SC matching: the LL-family instruction, the SC-family instruction, and the optional LLX/SCX-family instructions in a MIPS atomic sequence \textit{should} match. Portable software should not rely on mismatching LLX/LL/SCX/SC to complete successfully, nor to fail. Implementations are permitted to cause the SC to fail if the LL/SCX/SC do not match, but are not required to do so. Matching LLX/LL/SCX/SC should be of the same instruction type (word (LLX/LL/SCX/SC), or word EVA (LLXE/LLE/SCXE/SCE)). Table 3.10 summarizes these rules for LL/SC family instructions.

Table 3.10 Recommended and non-recommended LL/SC family instructions to start and end atomic code sequences

<table>
<thead>
<tr>
<th>Start of atomic sequence</th>
<th>LL</th>
<th>LLD</th>
<th>LLE</th>
<th>LLX /LL</th>
<th>LLDX /LLD</th>
<th>LLXE /LLE</th>
</tr>
</thead>
<tbody>
<tr>
<td>SC</td>
<td>OK²</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
</tr>
<tr>
<td>SCD</td>
<td>BAD³</td>
<td>OK</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
</tr>
<tr>
<td>SCE</td>
<td>BAD</td>
<td>BAD</td>
<td>OK</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
</tr>
<tr>
<td>SCX/SC</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>OK</td>
<td>BAD</td>
<td>BAD</td>
</tr>
<tr>
<td>SCDX/SCD¹</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>OK</td>
<td>BAD</td>
</tr>
<tr>
<td>SCXE/SCE</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>BAD</td>
<td>OK</td>
</tr>
</tbody>
</table>

1. SCDX/SCD and LLDX/LLD are 64-bit operations.
2. Cells marked OK indicate recommended combinations of instructions to start and end LL/SC atomic code sequences.
3. Cells marked BAD (and shaded) indicate non-recommended combinations of instructions to start and end LL/SC atomic code sequences. Software should not be coded in this way. Implementations are not required to enforce this restriction, but software coded this way may succeed on some implementations, and fail on other implementations. I.e. success or failure of the SC family instruction is UNPREDICTABLE.

The LL and SC virtual and physical addresses should match completely. However, the memory addressing mode - the and offset - need not match between LLX/LL and SCX/SC. All physical address bits in the LL physical address and the corresponding bits in the SC physical address should match to the alignment required for the size of the LL/SC

3. Terminology: “\textit{Should}” is a recommendation. Implementations are encouraged to provide \textit{should} behavior, but are not required to do so. Portable software should not rely on such behavior, but is encouraged to follow \textit{should} rules. “\textit{Must}” behavior are requirements: Implementations are required to implement such behavior, and software that violates such requirements will fail, typically with a exception such as a Reserved Instruction exception or Address Error.
family instructions or LLX/LL and SCX/SC family instruction pairs. This applies to atomic code sequences created via LL/SC, LLE/SCE, and their corresponding extended versions LLX/LL-SCX/SC, LLXE/LLE-SCXE/SC.

Translation Consistency: It is required that LL and SC match addresses, and that LLX/SCX family instructions lie in the same synchronization block. Even if all virtual addresses match, on a processor with hardware page table walking it is possible for physical address translation to change between LL and SC, and between the execution phase of LLX, LL, SCX and SC family instructions. e.g., between the time that SCX is first executed, and the time that the SCX store data is committed along with SC. The SCX/SC must only succeed if the SCX and SC physical addresses are consistent. If the address translations are inconsistent, implementations are required to fail the SCX/SC pair, or to retry them in a manner transparent to software. Similarly for LLX/LL pairs. Similarly for other information obtained from translation, such as the CCA (Cacheability and Coherence Attribute).

It is required that LLX/LL or SCX/SC instruction pairs act as if only a single address translation is done for the first instruction in the pair, and that translation is used for the second instruction, changing only lower address bits 3:0. Similarly for LLX/LL, LLXE/LLE, and SCXE/SCE instruction pairs.

Synchronizable memory type (CCA): The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is synchronizable is a function of both CPU and system implementations. See the documentation of the SC instruction for the formal definition.

LLX/LL need not be writeable: The addressed location need not be writable for LL or LLX family instructions. If it is not writable a subsequent SC or SCX family instruction will fault, but LL or LLX family instructions may be used in situations that do not generate such faults, e.g., the PAUSE instruction.

LLX/LL and PAUSE: If an LLX/LL family instruction pair is followed by a PAUSE instruction, the PAUSE instruction must terminate if it cannot be guaranteed that any of the memory byte addresses by the LLX/LL instruction pair have not been modified.

Memory Ordering of LL/SC family instructions (included LLX/SCX family instructions):

- An SCX/SC family instruction pair is executed atomically as seen by the processor executing these instructions and by other processors. I.e. the SC will not be seen to be executed before the SCX, and no other instruction, processor or device, can observe the SCX store without also being able to observe the SC store, or vice versa.

- LLX/LL family instruction pairs are not required to perform a double width atomic read of memory, but violations of atomicity will be detected, clearing LLbit, so that the matching SC will fail.

- Atomicity of LLX/LL family instruction pairs may be provided by MIPS CPU implementations as and if required by certain system configurations for uncached memory.

4. Note that the implementation dependent LLAddr register (Load Linked Address (CP0 Register 17, Select 0)) does not hold physical address bits 0 to 4 as of Release 5 or after. The requirement all LL and SC address bits match therefore involves comparing LL address bits not stored in any software accessible register state.

5. For example, an implementation of LLX/LL in cached memory may have LLX set LLAddr and then perform the LLX word load, and then may execute LL separately. A separate processor may perform an atomic doubleword write that changes both the LLX and LL memory locations, such that the values returned by LLX and LL may not have both been simultaneously present in memory. However, if atomicity is violated in this way, then LLbit must be cleared. The LL instruction of an LLX/LL instruction pair will not set LLbit if it has been cleared after the LLX instruction. Overall, LLX/LL family instruction pairs are not required to be atomic; whereas SCX/SC family instruction pairs are required to be atomic, if performed.

However, certain system configurations, for uncached memory in particular, require that the LLX/LL family instruction pair be performed atomically via a single bus transaction.

6. MIPS recommends that implementations perform a double width atomic read memory access for LLX/LL family instruction pairs, for cached as well as uncached memory, but does not require this. Portable software should not assume that an LLX/LL family instruction pair is atomic without using a matching SCX/SC family instruction pair to detect possible violations of atomicity.
• All LL/SC family instructions, including LLX/LL and SCX/SC family instruction pairs, are ordered by their implicit dependency on LLbit: e.g., a later LL will not be executed before an earlier SC from the same processor, even if their data memory addresses do not overlap.

• In the MIPS memory consistency architecture, LL/SC family instructions (including LLX/SCX family instructions) are not ordered with respect to other memory accesses from the same processor, except when their addresses overlap, or explicit SYNC instructions lie between them. For example, a later LL can be executed before an earlier SW, or vice versa.\(^7\)

An LLX family instruction should not overwrite its own base register: code sequences such as that below

\[
\text{LLX } r10, (r10)4 \\
\text{LL } r8, (r10)0
\]

where the \(rt\) and \(base\) fields of an LLX family instruction specify the same GPR are discouraged.

LLX/LL family instruction pair writing the same target GPR \(rt\): in code sequences such as that below

\[
\text{LLX } r4, (r10)4 \\
\text{LL } r4, (r10)0
\]

where the \(rt\) fields are the same for both members of an LLX/LL family instruction pair, the value loaded and written by the last instruction, the LL family member, will be the value written. The value loaded and supposedly written into the register by the first instruction, the LLX family member, is not directly observable: if an exception prevents the LL from executing, the LLX target register is not written.

Availability and Compatibility:

The LLX/SCX instruction family is introduced by and required as of the MIPS Release 6 and microMIPS Release 6 architecture.

LLX and SCX are introduced by and required as of MIPS32 Release 6. LLXE and SCXE are introduced by and required as of MIPS32 Release 6 when EVA is also implemented, which is indicated by bit \(EVA\) of coprocessor 0’s \(Config5\) register.

Operation:

```c
/* pseudocode for LLX and for the following instruction; 
   * this replaces the following instruction pseudocode. 
   */

/* this_instruction = LLX instruction at PC during instruction time I 
next_instruction = instruction at PC+4 during instruction time I 
       = instruction at PC during instruction time I+1 
       = LL, or BREAK or SDBBP, else invalid 
   'LLX' and 'LL' are generic, applicable to LLX-family and LL-family. 
   */

/* All exceptions are signaled with EPC or DEPC = PC of LLX instruction. 
   All exceptions in instruction time I are signaled with BD=0. 
   All exceptions in instruction time I+1 are signaled with BD=1. 
*/
```

```
I: /* LLX-only execution in instruction time I */ 
/* perform address calculation and translation and LLX-only checks. */

/* LLbit is set only on successful completion; 
   * LLbit is cleared after all unsuccessful completions of LLX/LL pairs 
   * including when exceptions are signalled 
   * (unlike all other situations, where exceptions do not affect LLbit) 
```

\(^7\) Note that this applies also to ordinary load instructions lying between LL and SC, inside the atomic RMW sequence.
if this_instruction is LLX then
    size ← 4
else if this_instruction is LLXE then
    EVA_Checks() /*BD=0*/
    size ← 4
else
    assert(IMPOSSIBLE)
endif

/* LLX family instructions must not write their base register */
if this_instruction.base ≠ this_instruction.rt then SignalException(ReservedInstruction) /*BD=0*/ endif

this_va ← GPR[this_instruction.base] + sign_extend( this_instruction.offset )
if this_va & (size-1) ≠ 0 then SignalException(AddressError) /*BD=0*/ endif

/* AddressTranslation of first instruction
* will be used for the second instruction as well,
* changing lower address bits,
* to avoid translation consistency issues */
(this_pa,this_cca) ← AddressTranslation( this_va, DATA, LOAD) /*BD=0*/

I+1:
/* LLX execution time I+1 and next_instruction execution time I combined */
/* All exceptions in instruction time I+1 are signaled with BD=1. */

LLX_SCX_family_common_code(
    /*in:* this_instruction, this_pa, this_cca, size,
    /*out:* next_instruction, next_va, next_pa, next_cca

/* Actual execution of the double-width LLX/LL family instruction pair
* LLX/LL // LLXE/LLX */
/* note that next_pa is derived from this_pa8 */
memdoubleword ← LoadMemory(next_cca, 8, next_pa, next_va, DATA)
    /* extended for special uncached bus transaction */
if BigEndianCPU then
    GPR[this.rt] ← memdoubleword63..0
    GPR[next.rt] ← memdoubleword31..0
else
    GPR[this.rt] ← memdoubleword63..32
    GPR[next.rt] ← memdoubleword31..0
endif /* endianness */

/* LLbit is set only on successful completion;
* LLbit is cleared after all unsuccessful completions of LLX/LL pairs
* including when exceptions are signalled
* (unlike all other situations, where exceptions do not affect LLbit)
*/
LLbit ← 1

8. Note that LLX_SCX_common_code() sets next_pa = this_pa-size = this_pa & (size-1), assuming all other constraints are met. Only a single address translation is required.
The MIPS32® Instruction Set Manual, Revision 6.04

LLX, LLXE

Load Linked Extended {Word,Word EVA}

/* end of combined LLX/ LLpseudocode */

where /* helper pseudocode */

function EVA_checks(vaddress)
    if (Config5EVA=0) then SignalException(ReservedInstruction) endif
    if !IsCoprocessorEnabled(0)
        then SignalException(CoprocessorUnusable, 0)endif
    AM = SegmentAM(vaddress)
    if (AM != UUSK && AM != MUSK && AM != MUSUK)
        then SignalException(AddressError) endif
end function

function LLX_SCX_family_common_code(
    /*inputs: */ this_instruction, this_pa, this_cca, size,
    /*outputs:*/ next_instruction, next_va, next_pa, next_cca
)
    /* begin function */
    if next_instruction is BREAK or SDBBP then
        /* Execute BREAK or SDBBP in normal I+1 manner,
        * as if in a branch delay slot or compact branch forbidden slot.
        * signaling appropriate exception */
    endif
    /* next_instruction must be matching non-extended LL/SC family
    * - this pseudocode replaces normal pseudocode for next instruction. */
    if (this_instruction is LLX and next_instruction is not LL)
        or (this_instruction is LLXE and next_instruction is not LLE)
        or (this_instruction is SCX and next_instruction is not SC)
        or (this_instruction is SCXE and next_instruction is not SCE)
        then
            SignalException(ReservedInstruction) /*BD=1*/
        endif
    /* next instruction is non-extended LL/SC family: consistency checks */
    /* Check base register field for consistency */
    if this_instruction.base != next_instruction.base
        then SignalException(ReservedInstruction) /*BD=1*/ endif

    /* Address computation for LL/SC-family next_instruction */
    next_va ← GPR[next_instruction.base] + sign_extend(next_instruction.offset)

    /* LL/SC following LLX/SCX virtual address must be doublewidth aligned
    if next_va & (size*2-1) ≠ 0
        then SignalException(AddressError) /*BD=1*/ endif

    /* LLX/SCX and LL/SC address virtual addresses must be adjacent
    * (adjacent, nonoverlapping, doubleword aligned) */
    if this_va&(2*size-1) - next_va&(2*size-1) ≠ size
        then SignalException(AddressError) /*BD=1*/ endif
    /* assert( this_va-next_va ≠ size ) */

    /* Check offsets for consistency */
    /* assert( this_instruction.offset - next_instruction.offset = size ) */
    /* offset check not needed - other constraints ensure */

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
/* LL/SC virtual to physical address translation */
/* Reuse the translation of the first instruction to ensure consistency. */
/* Note: after all RI and AE exceptions, for standard exception priority. */
next_pa ← this_pa & (2*size-1)
    /* given alignment constraints,
    * next_pa = this_pa - size = this_pa & (2*size-1) */
next_cca ← this_cca

end function /* LLX_SCX_family_common_code */

Exceptions:
TLB Refill, TLB Invalid, Address Error, Watch
Reserved Instruction

Programming Notes:
None

Implementation Notes:
The synchronization block of memory used for LL/SC (and when extended by LLX/SCX) is typically the largest cache line in use.

Implementations of LL/SC in general, and LLX/LL-SCX/SC in particular, provide atomicity if the computer system can guarantee that, if the SC passes, then atomicity has not been violated by transactions between the LL and SC. It should also guarantee eventual success, i.e. that failures will not persist forever.

Correct implementation depends on the system, both the CPU and the external memory subsystem. For example, the CPU may implement LL/SC correctly for cacheable coherent memory, but if the I/O subsystem can write to memory without being exposed to the cache coherency mechanism, LL/SC will not detect violations of atomicity caused by such non-coherent I/O accesses. Similarly, the CPU may implement uncached memory requests for LL and SC, but if the external memory subsystem performs an SC request and returns success without guaranteeing atomicity, LL/SC may not provide the expected guarantee of atomicity.

If it is not possible to guarantee such atomicity then it is recommended that implementations cause the SC to fail, returning the failure code in GPR[rt] without performing the store.

LL/SC and LLX/LL-SCX/SC code sequences should only be used for the following memory types (Cache and Coherency Attributes (CCAs)):

- \textit{cached coherent}: if the cache protocol can guarantee that atomicity has not been violated by transactions between the LL and SC.

- \textit{uncached}:
  - for uncached memory that is memory-like, i.e. which does not have memory-mapped I/O side effects
  - if the CPU supports bus transactions visible to external hardware so that such external hardware can guarantee that atomicity has not been violated by transactions between the LL and SC, and can signal success or failure by replying to the uncached bus transaction triggered by the SC-family instruction.
  - or if the system configuration is such that the CPU can observe all memory transactions that would violate atomicity
• *cached noncoherent or uncached* (no side effects): on uniprocessor systems lacking cache coherence or external hardware that can make atomicity assertions, LL-SC and LLX/LL-SCX/SC code sequences can be used to detect violations of atomicity caused by interrupt handling

• for other memory types: it may be **UNPREDICTABLE** whether the SC and possible SCX stores are performed, and whether the SC reports success or failure.
LLX, LLXE
Load Linked Extended {Word,Word EVA}
LSA

Format: LSA
LSA rd,rs,rt,sa

Purpose: Load Scaled Address

Description:

\[
GPR[rd] \leftarrow \text{sign\_extend.32} ( GPR[rs] \ll (sa+1) + GPR[rt] )
\]

LSA adds two values derived from registers \( rs \) and \( rt \), with an optional scaling shift on \( rs \). The scaling shift is formed by adding 1 to the 2-bit \( sa \) field, which is interpreted as unsigned. The scaling left shift varies from 1 to 5, corresponding to multiplicative scaling values of \( \times2, \times4, \times8, \times16 \), bytes, or 16, 32, 64, or 128 bits.

Restrictions:
None

Availability and Compatibility:

LSA instruction is introduced by and required as of Release 6.

Operation

\[
GPR[rd] \leftarrow \text{sign\_extend.32} ( GPR[rs] \ll (sa+1) + GPR[rt] )
\]

Exceptions:
None
LUI Load Upper Immediate

Format: LUI rt, immediate

MIPS32, Assembly Idiom Release 6

Purpose: Load Upper Immediate
To load a constant into the upper half of a word

Description: GPR[rt] ← immediate || 0^{16}
The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is placed into GPR rt.

Restrictions:
None.

Operation:
GPR[rt] ← immediate || 0^{16}

Exceptions:
None

Programming Notes:
In Release 6, LUI is an assembly idiom of AUI with rs=0.
**Format:** LUXC1 fd, index(base)  

**Purpose:** Load Doubleword Indexed Unaligned to Floating Point  
To load a doubleword from memory to an FPR (GPR+GPR addressing), ignoring alignment

**Description:**  
FPR[fd] ← memory[(GPR[base] + GPR[index])PSIZE-1..3]  
The contents of the 64-bit doubleword at the memory location specified by the effective address are fetched and placed into the low word of FPR fd. The contents of GPR index and GPR base are added to form the effective address. The effective address is doubleword-aligned; EffectiveAddress2..0 are ignored.

**Restrictions:**  
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  

\[
\begin{align*}
vAddr & \leftarrow (GPR[base]+GPR[index])_{31..3} \ || \ 0^3 \\
pAddr, CCA & \leftarrow AddressTranslation(vAddr, DATA, LOAD)
paddr & \leftarrow paddr \ xor ((BigEndianCPU \ xor \ ReverseEndian) \ || \ 0^2)
memlsw & \leftarrow LoadMemory(CCA, WORD, pAddr, vAddr, DATA)
paddr & \leftarrow paddr \ xor \ 0b100
memmsw & \leftarrow LoadMemory(CCA, WORD, pAddr, vAddr+4, DATA)
memdoubleword & \leftarrow memmsw \ || \ memlsw
\end{align*}
\]

**Exceptions:**  
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, Watch
**Format:** \( \text{LW} \ rt, \ offset(\text{base}) \)

**Purpose:** Load Word

To load a word from memory as a signed value

**Description:** \( \text{GPR}[rt] \leftarrow \text{memory}[\text{GPR}[\text{base}] + \text{offset}] \)

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR \( rt \). The 16-bit signed \( offset \) is added to the contents of GPR \( base \) to form the effective address.

**Restrictions:**

Pre-Release 6: The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign}_{-}\text{extend}(\text{offset}) + \text{GPR}[\text{base}] \\
(\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation} (\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{memword} & \leftarrow \text{LoadMemory} (\text{CCA}, \text{WORD}, \text{pAddr}, \text{vAddr}, \text{DATA}) \\
\text{GPR}[rt] & \leftarrow \text{memword}
\end{align*}
\]

**Exceptions:**

TLB Refill, TLB Invalid, Bus Error, Address Error, Watch
**LWC1**

**Load Word to Floating Point**

**Format:**  
LWC1 ft, offset(base)

**Purpose:**  
Load Word to Floating Point
To load a word from memory to an FPR

**Description:**  
FPR[ft] ← memory[GPR[base] + offset]  
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of FPR ft. If FPRs are 64 bits wide, bits 63..32 of FPR ft become UNPREDICTABLE. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**
Pre-Release 6: An Address Error exception occurs if EffectiveAddress_{1,0} ≠ 0 (not word-aligned).

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

\[
v\text{Addr} \leftarrow \text{sign_extend}(\text{offset}) + GPR[base] \\
(p\text{Addr}, \text{CCA}) \leftarrow \text{AddressTranslation}(v\text{Addr}, \text{DATA}, \text{LOAD}) \\
\text{memword} \leftarrow \text{LoadMemory}(\text{CCA}, \text{WORD}, p\text{Addr}, v\text{Addr}, \text{DATA}) \\
\text{StoreFPR}(ft, \text{UNINTERPRETED}_\text{WORD}, \text{memword})
\]

**Exceptions:**

TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
LWC2  Load Word to Coprocessor 2

**Format:**  LWC2  rt, offset(base)  

**Purpose:**  Load Word to Coprocessor 2  
To load a word from memory to a COP2 register.

**Description:**  
CPR[2,rt,0] ← memory[GPR[base] + offset]  
The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of COP2 (Coprocessor 2) general register rt. The signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**  
Pre-Release 6: An Address Error exception occurs if +EffectiveAddress1..0 ≠ 0 (not word-aligned).
Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

**Note:**  The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Availability and Compatibility**  
This instruction has been recoded for Release 6.

**Operation:**  
```
vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, LOAD)
memword ← LoadMemory(CCA, DOUBLEWORD, pAddr, vAddr, DATA)
CPR[2,rt,0] ← memword
```

**Exceptions:**  
TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch

**Programming Notes:**  
Release 6 implements an 11-bit offset, whereas all release levels lower than Release 6 implement a 16-bit offset.
LWE Load Word EVA

**Format:**  LWE rt, offset(base)

**Purpose:** Load Word EVA

To load a word from user mode virtual address space when executing in kernel mode.

**Description:**

\[
\text{GPR}[rt] \leftarrow \text{memory}[\text{GPR}[base] + \text{offset}]
\]

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR \(rt\). The 9-bit signed \(offset\) is added to the contents of GPR \(base\) to form the effective address.

The LWE instruction functions the same as the LW instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the \(\text{Config5}_{EVA}\) field being set to one.

**Restrictions:**

Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Pre-Release 6: The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign_extend(offset)} + \text{GPR}[base] \\
(pAddr, CCA) & \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{memword} & \leftarrow \text{LoadMemory}(CCA, \text{WORD}, pAddr, \text{vAddr}, \text{DATA}) \\
\text{GPR}[rt] & \leftarrow \text{memword}
\end{align*}
\]

**Exceptions:**

TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
**Format:**  \( \text{LWL} \ rt, \ offset(\text{base}) \)  

**MIPS32, removed in Release 6**

**Purpose:** Load Word Left

To load the most-significant part of a word as a signed value from an unaligned memory address

**Description:**  
\[
\text{GPR}[rt] \leftarrow \text{GPR}[rt] \ 	ext{MERGE} \ 	ext{memory}[\text{GPR}[\text{base}] + \text{offset}]
\]

The 16-bit signed \( \text{offset} \) is added to the contents of GPR \( \text{base} \) to form an effective address (\( \text{EffAddr} \)). \( \text{EffAddr} \) is the address of the most-significant of 4 consecutive bytes forming a word (\( W \)) in memory starting at an arbitrary byte boundary.

The most-significant 1 to 4 bytes of \( W \) is in the aligned word containing the \( \text{EffAddr} \). This part of \( W \) is loaded into the most-significant (left) part of the word in GPR \( rt \). The remaining least-significant part of the word in GPR \( rt \) is unchanged.

The figure below illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of \( W \), 2 bytes, is in the aligned word containing the most-significant byte at 2. First, LWL loads these 2 bytes into the left part of the destination register word and leaves the right part of the destination word unchanged. Next, the complementary LWR loads the remainder of the unaligned word

![Figure 4.1 Unaligned Word Load Using LWL and LWR](image)

The bytes loaded from memory to the destination register depend on both the offset of the effective address within an aligned word, that is, the low 2 bits of the address (\( \text{vAddr}_{1:0} \)), and the current byte-ordering mode of the processor (big- or little-endian). The figure below shows the bytes loaded for every combination of offset and byte ordering.
Restrictions:
None

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}] \\
\text{pAddr} & \leftarrow \text{Address\_Translation}(\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE-1..2}} \ || \ (\text{pAddr}_{1..0} \text{xor ReverseEndian}^2) \\
\text{if Big\_Endian\_Mem} = 0 \text{ then} \\
\text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE-1..2}} \ || \ 0^2 \\
\text{endif} \\
\text{byte} & \leftarrow \text{vAddr}_{1..0} \text{xor Big\_Endian\_CPU}^2 \\
\text{memword} & \leftarrow \text{LoadMemory (CCA, byte, pAddr, vAddr, DATA)} \\
\text{temp} & \leftarrow \text{memword}_{7*8+\text{byte}.0} \ || \ \text{GPR}[\text{rt}]_{23-8*\text{byte}.0} \\
\text{GPR}[\text{rt}] & \leftarrow \text{temp}
\end{align*}
\]

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch

Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.

Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
LWLE  

Load Word Left EVA

Format:  
LWLE rt, offset(base)  

MIPS32, removed in Release 6

Purpose:  
Load Word Left EVA  

To load the most-significant part of a word as a signed value from an unaligned user mode virtual address while executing in kernel mode.

Description:  

The 9-bit signed offset is added to the contents of GPR base to form an effective address (EffAddr). EffAddr is the address of the most-significant of 4 consecutive bytes forming a word (W) in memory starting at an arbitrary byte boundary.

The most-significant 1 to 4 bytes of W is in the aligned word containing the EffAddr. This part of W is loaded into the most-significant (left) part of the word in GPR rt. The remaining least-significant part of the word in GPR rt is unchanged.

The figure below illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of W (2 bytes) is in the aligned word containing the most-significant byte at 2.

1. LWLE loads these 2 bytes into the left part of the destination register word and leaves the right part of the destination word unchanged.

2. The complementary LWRE loads the remainder of the unaligned word.

Figure 4.3 Unaligned Word Load Using LWLE and LWRE

The bytes loaded from memory to the destination register depend on both the offset of the effective address within an aligned word, that is, the low 2 bits of the address (vAddr1_0), and the current byte-ordering mode of the processor (big- or little-endian). The figure below shows the bytes loaded for every combination of offset and byte ordering.

The LWLE instruction functions the same as the LWL instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to 1.
Figure 4.4 Bytes Loaded by LWLE Instruction

Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:
```
vAddr ← sign_extend(offset) + GPR[base]
pAddr ← pAddr + AddresTranslation (vAddr, DATA, LOAD)
if BigEndianMem = 0 then
    pAddr ← pAddr + AddressTranslation-pSIZE-1.2 || 0^2
endif
byte ← vAddr1.0 xor BigEndianCPU^2
memword ← LoadMemory (CCA, byte, pAddr, vAddr, DATA)
temp ← memword | byte..0 || GPR[rt]23-8*byte..0
GPR[rt] ← temp
```

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable

Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.

Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the
same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
**Format:**  LWPC rs, offset

**Purpose:**  Load Word PC-relative

To load a word from memory as a signed value, using a PC-relative address.

**Description:**  \[ \text{GPR}[rs] \leftarrow \text{memory}[ \text{PC} + \text{sign\_extend}(\text{offset} \ll 2) ] \]

The offset is shifted left by 2 bits, sign-extended, and added to the address of the LWPC instruction.

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR rs.

**Restrictions:**

LWPC is naturally aligned, by specification.

**Availability and Compatibility:**

This instruction is introduced by and required as of Release 6.

**Operation**

\[
\text{vAddr} \leftarrow ( \text{PC} + \text{sign\_extend}(\text{offset}) \ll 2) \\
(\text{pAddr}, \text{CCA}) \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{memword} \leftarrow \text{LoadMemory}(\text{CCA}, \text{WORD}, \text{pAddr}, \text{vAddr}, \text{DATA}) \\
\text{GPR}[rs] \leftarrow \text{memword}
\]

**Exceptions:**

TLB Refill, TLB Invalid, TLB Read Inhibit, Bus Error, Address Error, Watch

**Programming Note**

The Release 6 PC-relative loads (LWPC) are considered data references.

For the purposes of watchpoints (provided by the CP0 WatchHi and WatchLo registers) and EJTAG breakpoints, the PC-relative reference is considered to be a data reference rather than an instruction reference. That is, the watchpoint or breakpoint is triggered only if enabled for data references.
Format: \texttt{LWR \ rt, \ offset(base)}

MIPS32, removed in Release 6

Purpose: Load Word Right

To load the least-significant part of a word from an unaligned memory address as a signed value

Description: $\text{GPR}[rt] \leftarrow \text{GPR}[rt] \text{ MERGE memory}[\text{GPR}[\text{base}] + \text{offset}]$

The 16-bit signed offset is added to the contents of GPR base to form an effective address (\textit{EffAddr}). \textit{EffAddr} is the address of the least-significant of 4 consecutive bytes forming a word (\textit{W}) in memory starting at an arbitrary byte boundary.

A part of \textit{W} (the least-significant 1 to 4 bytes) is in the aligned word containing \textit{EffAddr}. This part of \textit{W} is loaded into the least-significant (right) part of the word in GPR \textit{rt}. The remaining most-significant part of the word in GPR \textit{rt} is unchanged.

Executing both LWR and LWL, in either order, delivers a sign-extended word value in the destination register.

The figure below illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of \textit{W}, 2 bytes, is in the aligned word containing the least-significant byte at 5.

1. LWR loads these 2 bytes into the right part of the destination register.

2. The complementary LWL loads the remainder of the unaligned word.

\textbf{Figure 4.5 Unaligned Word Load Using LWL and LWR}

The bytes loaded from memory to the destination register depend on both the offset of the effective address within an aligned word, that is, the low 2 bits of the address ($v\text{Addr}_{1..0}$), and the current byte-ordering mode of the processor (big- or little-endian). The figure below shows the bytes loaded for every combination of offset and byte ordering.
Restrictions:
None

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:

\[
\begin{align*}
    \text{vAddr} & \leftarrow \text{sign_extend} (\text{offset}) + \text{GPR}[\text{base}] \\
    (\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation (vAddr, DATA, LOAD)} \\
    \text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE} - 1..2} || (\text{pAddr}_{1..0} \text{xor ReverseEndian}^2) \\
    \text{if BigEndianMem = 0} & \text{ then} \\
    \quad \text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE} - 1..2} || 0^2 \\
    \text{endif} \\
    \text{byte} & \leftarrow \text{vAddr}_{1..0} \text{xor BigEndianCPU}^2 \\
    \text{memword} & \leftarrow \text{LoadMemory (CCA, byte, pAddr, vAddr, DATA)} \\
    \text{temp} & \leftarrow \text{memword}_{31..32-8*\text{byte} || \text{GPR[rt]}_{31-8*\text{byte}..0}} \\
    \text{GPR[rt]} & \leftarrow \text{temp}
\end{align*}
\]

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch

Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.

Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
Format: \texttt{LWRE \textit{rt}}, \texttt{offset(base)}

Purpose: Load Word Right EVA

To load the least-significant part of a word from an unaligned user mode virtual memory address as a signed value while executing in kernel mode.

Description: \texttt{GPR[rt] \leftarrow GPR[rt] \text{ MERGE memory}[GPR[base] + offset]}

The 9-bit signed \textit{offset} is added to the contents of \textit{GPR base} to form an effective address (\textit{EffAddr}). \textit{EffAddr} is the address of the least-significant of 4 consecutive bytes forming a word (\textit{W}) in memory starting at an arbitrary byte boundary.

A part of \textit{W} (the least-significant 1 to 4 bytes) is in the aligned word containing \textit{EffAddr}. This part of \textit{W} is loaded into the least-significant (right) part of the word in \textit{GPR rt}. The remaining most-significant part of the word in \textit{GPR rt} is unchanged.

Executing both LWRE and LWLE, in either order, delivers a sign-extended word value in the destination register.

The figure below illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of \textit{W} (2 bytes) is in the aligned word containing the least-significant byte at 5.

1. LWRE loads these 2 bytes into the right part of the destination register.

2. The complementary LWLE loads the remainder of the unaligned word.

The LWRE instruction functions in exactly the same fashion as the LWR instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the \textit{Config5\textsubscript{EVA}} field being set to one.

\textit{Figure 4.7 Unaligned Word Load Using LWLE and LWRE}

\begin{figure}
\centering
\includegraphics[width=\textwidth]{figure4.7}
\caption{Unaligned Word Load Using LWLE and LWRE}
\end{figure}
The bytes loaded from memory to the destination register depend on both the offset of the effective address within an aligned word, that is, the low 2 bits of the address (vAddr1..0), and the current byte-ordering mode of the processor (big- or little-endian). The figure below shows the bytes loaded for every combination of offset and byte ordering.

Figure 4.8 Bytes Loaded by LWRE Instruction

<table>
<thead>
<tr>
<th>Memory contents and byte offsets</th>
<th>Initial contents of Dest Register</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 2 3</td>
<td>e f g h</td>
</tr>
<tr>
<td>big-endian</td>
<td>offset (vAddr1,0)</td>
</tr>
<tr>
<td>1 2 3</td>
<td>e f g h</td>
</tr>
<tr>
<td>little-endian</td>
<td>most least</td>
</tr>
<tr>
<td>most least</td>
<td>— significance —</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Destination register contents after instruction (shaded is unchanged)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Big-endian</td>
</tr>
<tr>
<td>e f g l</td>
</tr>
<tr>
<td>e f l i</td>
</tr>
<tr>
<td>e l j k</td>
</tr>
<tr>
<td>i j k l</td>
</tr>
</tbody>
</table>

Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:
\[
vAddr \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}]
\]
\[
(pAddr, CCA) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA, LOAD})
\]
\[
pAddr \leftarrow pAddr_{\text{PSIZE}-1..2} || (pAddr_{1..0} \text{ xor ReverseEndian}^2)
\]
\[
\text{if BigEndianMem} = 0 \text{ then}
\]
\[
pAddr \leftarrow pAddr_{\text{PSIZE}-1..2} || 0^2
\]
\[
\text{endif}
\]
\[
\text{byte} \leftarrow vAddr_{1..0} \text{ xor BigEndianCPU}^2
\]
\[
\text{memword} \leftarrow \text{LoadMemory}(CCA, \text{byte, pAddr, vAddr, DATA})
\]
\[
\text{temp} \leftarrow \text{memword}_{31..32-8*\text{byte} || \text{GPR}[rt]_{31-8*\text{byte}..0}}
\]
\[
\text{GPR}[rt] \leftarrow \text{temp}
\]

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable

Programming Notes:
The architecture provides no direct support for treating unaligned words as unsigned values, that is, zeroing bits 63..32 of the destination register when bit 31 is loaded.
Historical Information:
In the MIPS I architecture, the LWL and LWR instructions were exceptions to the load-delay scheduling restriction. A LWL or LWR instruction which was immediately followed by another LWL or LWR instruction, and used the same destination register would correctly merge the 1 to 4 loaded bytes with the data loaded by the previous instruction. All such restrictions were removed from the architecture in MIPS II.
LWXC1

**Format:** LWXC1 fd, index(base)

**Purpose:** Load Word Indexed to Floating Point

To load a word from memory to an FPR (GPR+GPR addressing).

**Description:**

\[ FPR[fd] \leftarrow \text{memory}[GPR[base] + GPR[index]] \]

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched and placed into the low word of FPR \( fd \). If FPRs are 64 bits wide, bits 63..32 of FPR \( fs \) become UNPREDICTABLE. The contents of GPR \( index \) and GPR \( base \) are added to form the effective address.

**Restrictions:**

An Address Error exception occurs if EffectiveAddress\(_{1,0}\) \( \neq 0 \) (not word-aligned).

**Availability and Compatibility:**

This instruction has been removed in Release 6.

Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (\( FIR_{F64}=0 \) or 1, \( Status_{FR}=0 \) or 1).

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{GPR}[base] + \text{GPR}[index] \\
\text{if} \ v\text{Addr}_{1,0} \neq 0^2 \text{ then} & \quad \text{SignalException(AddressError)} \\
\text{endif} & \quad (\text{pAddr, CCA}) \leftarrow \text{AddressTranslation (vAddr, DATA, LOAD)} \\
\text{memword} & \leftarrow \text{LoadMemory(CCA, WORD, pAddr, vAddr, DATA)} \\
\text{StoreFPR(fd, UNINTERPRETED_WORD, memword)}
\end{align*}
\]

**Exceptions:**

TLB Refill, TLB Invalid, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
**MADD**

Multiply and Add Word to Hi, Lo

**Format:**

\[
\text{MADD } rs, \ rt
\]

**MIPS32, removed in Release 6**

**Purpose:**

To multiply two words and add the result to Hi, Lo.

**Description:**

\[
(HI,LO) \leftarrow (HI,LO) + (GPR[rs] \times GPR[rt])
\]

The 32-bit word value in GPR \(rs\) is multiplied by the 32-bit word value in GPR \(rt\), treating both operands as signed values, to produce a 64-bit result. The product is added to the 64-bit concatenated values of \(HI\) and \(LO\). The most significant 32 bits of the result are written into \(HI\) and the least significant 32 bits are written into \(LO\). No arithmetic exception occurs under any circumstances.

**Restrictions:**

This instruction does not provide the capability of writing directly to a target GPR.

**Availability and Compatibility:**

This instruction has been removed in Release 6.

**Operation:**

\[
\begin{align*}
\text{temp} & \leftarrow (HI \mid LO) + (GPR[rs] \times GPR[rt]) \\
HI & \leftarrow \text{temp}_{63..32} \\
LO & \leftarrow \text{temp}_{31..0}
\end{align*}
\]

**Exceptions:**

None

**Programming Notes:**

Where the size of the operands are known, software should place the shorter operand in GPR \(rt\). This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
MADD.fmt

Floating Point Multiply Add

<table>
<thead>
<tr>
<th>Format:</th>
<th>MADD.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>MADD.S fd, fr, fs, ft</td>
<td>MIPS32 Release 2, removed in Release 6</td>
</tr>
<tr>
<td>MADD.D fd, fr, fs, ft</td>
<td>MIPS32 Release 2, removed in Release 6</td>
</tr>
<tr>
<td>MADD.PS fd, fr, fs, ft</td>
<td>MIPS32 Release 2, removed in Release 6</td>
</tr>
</tbody>
</table>

Purpose: Floating Point Multiply Add

To perform a combined multiply-then-add of FP values.

Description: $\text{FPR}[fd] \leftarrow (\text{FPR}[fs] \times \text{FPR}[ft]) + \text{FPR}[fr]$

The value in FPR $fs$ is multiplied by the value in FPR $ft$ to produce an intermediate product.

The intermediate product is rounded according to the current rounding mode in $FCSR$. The value in FPR $fr$ is added to the product. The result sum is calculated to infinite precision, rounded according to the current rounding mode in $FCSR$, and placed into FPR $fd$. The operands and result are values in format $fmt$. The results and flags are as if separate floating-point multiply and add instructions were executed.

MADD.PS multiplies then adds the upper and lower halves of FPR $fr$, FPR $fs$, and FPR $ft$ independently, and ORs together any generated exceptional conditions.

The $Cause$ bits are ORed into the $Flag$ bits if no exception is taken.

Restrictions:

The fields $fr$, $fs$, $ft$, and $fd$ must specify FPRs valid for operands of type $fmt$. If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format $fmt$; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of MADD.PS is UNPREDICTABLE if the processor is executing in the $FR=0$ 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the $FR=1$ mode, but not with $FR=0$, and not on a 32-bit FPU.

Availability and Compatibility:

MADD.S and MADD.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, these instructions are to be implemented if an FPU is present either in a 32-bit or 64-bit FPU or in a 32-bit or 64-bit FP Register Mode ($FR_{F64}=0$ or 1, $Status_{FR}=0$ or 1).

This instruction has been removed in Release 6 and has been replaced by the fused multiply-add instruction. Refer to the fused multiply-add instruction ‘MADDF.fmt’ in this manual for more information. Release 6 does not support Paired Single (PS).

Operation:

- $vfr \leftarrow \text{ValueFPR}(fr, fmt)$
- $vfs \leftarrow \text{ValueFPR}(fs, fmt)$
- $vft \leftarrow \text{ValueFPR}(ft, fmt)$
- StoreFPR(fd, fmt, ($vfs \times_{fmt} vft) +_{fmt} vfr$)

Exceptions:

Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
**Purpose:** Floating Point Fused Multiply Add, Floating Point Fused Multiply Subtract

MADDF.fmt: To perform a fused multiply-add of FP values.

MSUBF.fmt: To perform a fused multiply-subtract of FP values.

**Description:**

MADDF.fmt: \( FPR[fd] \leftarrow FPR[fd] + (FPR[fs] \times FPR[ft]) \)

MSUBF.fmt: \( FPR[fd] \leftarrow FPR[fd] - (FPR[fs] \times FPR[ft]) \)

The value in FPR \( fs \) is multiplied by the value in FPR \( ft \) to produce an intermediate product. The intermediate product is calculated to infinite precision. The product is added to the value in FPR \( fd \). The result sum is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR \( fd \). The operands and result are values in format \( fmt \).

(For MSUBF.fmt, the product is subtracted from the value in FPR \( fd \).)

*Cause* bits are ORed into the *Flag* bits if no exception is taken.

**Restrictions:**

None

**Availability and Compatibility:**

MADDF.fmt and MSUBF.fmt are required in Release 6.

MADDF.fmt and MSUBF.fmt are not available in architectures pre-Release 6.

The fused multiply add instructions, MADDF.fmt and MSUBF.fmt, replace pre-Release 6 instructions such as MADD.fmt, SUB.fmt, NMADD.fmt, and NMSUB.fmt. The replaced instructions were unfused multiply-add, with an intermediate rounding.

Release 6 MSUBF.fmt, \( fd \leftarrow fd - fs \times ft \), corresponds more closely to pre-Release 6 NMADD.fmt, \( fd \leftarrow fr - fs \times ft \), than to pre-Release 6 MSUB.fmt, \( fd \leftarrow fs \times ft - fr \).

FPU scalar MADDF.fmt corresponds to MSA vector MADD.df.

FPU scalar MSUBF.fmt corresponds to MSA vector MSUB.df.

**Operation:**

```plaintext
if not IsCoprocessorEnabled(1)
    then SignalException(CoprocessorUnusable, 1) endif
if not IsFloatingPointImplemented(fmt))
    then SignalException(ReservedInstruction) endif
```
### MADDF.fmt MSUBF.fmt

Floating Point Fused Multiply Add, Floating Point Fused Multiply Subtract

\[
\begin{align*}
  vfr & \leftarrow \text{ValueFPR}(fr, \text{fmt}) \\
  vfs & \leftarrow \text{ValueFPR}(fs, \text{fmt}) \\
  vfd & \leftarrow \text{ValueFPR}(fd, \text{fmt}) \\
  \text{MADDF.fmt: } vinf & \leftarrow vfd +_{\infty} (vfs \times_{\infty} vft) \\
  \text{MADDF.fmt: } vinf & \leftarrow vfd -_{\infty} (vfs \times_{\infty} vft) \\
  \text{StoreFPR(fd, fmt, vinf)}
\end{align*}
\]

### Special Considerations:

The fused multiply-add computation is performed in infinite precision, and signals Inexact, Overflow, or Underflow if and only if the final result differs from the infinite precision result in the appropriate manner.

Like most FPU computational instructions, if the flush-subnormals-to-zero mode, FCSR.FS=1, then subnormals are flushed before beginning the fused-multiply-add computation, and Inexact may be signaled.

I.e. Inexact may be signaled both by input flushing and/or by the fused-multiply-add: the conditions or ORed.

### Exceptions:

Coprocessor Unusable, Reserved Instruction

### Floating Point Exceptions:

Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
### Format:
MADDU rs, rt

### Purpose:
Multiply and Add Unsigned Word to Hi,Lo

To multiply two unsigned words and add the result to Hi, Lo.

### Description:
\((HI, LO) \leftarrow (HI, LO) + (GPR[rs] \times GPR[rt])\)

The 32-bit word value in GPR rs is multiplied by the 32-bit word value in GPR rt, treating both operands as unsigned values, to produce a 64-bit result. The product is added to the 64-bit concatenated values of Hi and Lo. The most significant 32 bits of the result are written into Hi and the least significant 32 bits are written into Lo. No arithmetic exception occurs under any circumstances.

### Restrictions:
None

This instruction does not provide the capability of writing directly to a target GPR.

### Availability and Compatibility:
This instruction has been removed in Release 6.

### Operation:

1. \(\text{temp} \leftarrow (HI || LO) + (GPR[rs] \times GPR[rt])\)
2. \(HI \leftarrow \text{temp}^{63..32}\)
3. \(LO \leftarrow \text{temp}^{31..0}\)

### Exceptions:
None

### Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
### Format:

<table>
<thead>
<tr>
<th>COPI</th>
<th>fmt</th>
<th>ft</th>
<th>fs</th>
<th>fd</th>
<th>MAX.fmt</th>
<th>MIN.fmt</th>
<th>MAXA.fmt</th>
<th>MINA.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100001</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>011110</td>
<td>011100</td>
<td>011111</td>
<td>011101</td>
</tr>
</tbody>
</table>

### Purpose:

Scalar Floating-Point Max/Min/maxNumMag/minNumMag

Scalar Floating-Point Maximum

Scalar Floating-Point Minimum

Scalar Floating-Point argument with Maximum Absolute Value

Scalar Floating-Point argument with Minimum Absolute Value

### Description:

- **MAX.fmt**: \( FPR[fd] \leftarrow \text{maxNum}(FPR[fs], FPR[ft]) \)
- **MIN.fmt**: \( FPR[fd] \leftarrow \text{minNum}(FPR[fs], FPR[ft]) \)
- **MAXA.fmt**: \( FPR[fd] \leftarrow \text{maxNumMag}(FPR[fs], FPR[ft]) \)
- **MINA.fmt**: \( FPR[fd] \leftarrow \text{minNumMag}(FPR[fs], FPR[ft]) \)

MAX.fmt writes the maximum value of the inputs \( fs \) and \( ft \) to the destination \( fd \).

MIN.fmt writes the minimum value of the inputs \( fs \) and \( ft \) to the destination \( fd \).

MAXA.fmt takes input arguments \( fs \) and \( ft \) and writes the argument with the maximum absolute value to the destination \( fd \).

MINA.fmt takes input arguments \( fs \) and \( ft \) and writes the argument with the minimum absolute value to the destination \( fd \).

The instructions MAX.fmt/MIN.fmt/MAXA.fmt/MINA.fmt correspond to the IEEE 754-2008 operations maxNum/
minNum/maxNumMag/minNumMag.

- MAX.fmt corresponds to the IEEE 754-2008 operation maxNum.
- MIN.fmt corresponds to the IEEE 754-2008 operation minNum.
- MAXA.fmt corresponds to the IEEE 754-2008 operation maxNumMag.
- MINA.fmt corresponds to the IEEE 754-2008 operation minNumMag.

Numbers are preferred to NaNs: if one input is a NaN, but not both, the value of the numeric input is returned. If both are NaNs, the NaN in fs is returned.¹

The scalar FPU instructions MAX.fmt/MIN.fmt/MAXA.fmt/MINA.fmt correspond to the MSA instructions FMAX.df/FMIN.df/FMAXA.df/FMINA.df.

- Scalar FPU instruction MAX.fmt corresponds to the MSA vector instruction FMAX.df.
- Scalar FPU instruction MIN.fmt corresponds to the MSA vector instruction FMIN.df.
- Scalar FPU instruction MAXA.fmt corresponds to the MSA vector instruction FMAX_A.df.
- Scalar FPU instruction MINA.fmt corresponds to the MSA vector instruction FMIN_A.df.

Restrictions:

Data-dependent exceptions are possible as specified by the IEEE Standard for Floating-Point Arithmetic 754™-2008. See also the section “Special Cases”, below.

Availability and Compatibility:

These instructions are introduced by and required as of Release 6.

Operation:

```plaintext
if not IsCoprocessorEnabled(1)
  then SignalException(CoprocessorUnusable, 1) endif
if not IsFloatingPointImplemented(fmt)
  then SignalException(ReservedInstruction) endif

v1 ← ValueFPR(fs,fmt)
v2 ← ValueFPR(ft,fmt)

if SNaN(v1) or SNaN(v2) then
  then SignalException(InvalidOperand) endif

if NaN(v1) and NaN(v2) then
  ftmp ← v1
elseif NaN(v1) then
  ftmp ← v2
elseif NaN(v2) then
  ftmp ← v1
else
  case instruction of
```

¹. IEEE standard 754-2008 allows either input to be chosen if both inputs are NaNs. Release 6 specifies that the first input must be propagated.
FMAX.fmt:  ftmp ← MaxFP.fmt(ValueFPR(fs,fmt),ValueFPR(ft,fmt))
FMIN.fmt:  ftmp ← MinFP.fmt(ValueFPR(fs,fmt),ValueFPR(ft,fmt))
FMAXA.fmt: ftmp ← MaxAbsoluteFP.fmt(ValueFPR(fs,fmt),ValueFPR(ft,fmt))
FMINA.fmt: ftmp ← MinAbsoluteFP.fmt(ValueFPR(fs,fmt),ValueFPR(ft,fmt))
end case
endif

StoreFPR (fd, fmt, ftmp)
/* end of instruction */

function MaxFP(tt, ts, n)
   /* Returns the largest argument. */
   endfunction MaxFP

function MinFP(tt, ts, n)
   /* Returns the smallest argument. */
   endfunction MinFP

function MaxAbsoluteFP(tt, ts, n)
   /* Returns the argument with largest absolute value. 
   For equal absolute values, returns the largest argument. */
   endfunction MaxAbsoluteFP

function MinAbsoluteFP(tt, ts, n)
   /* Returns the argument with smallest absolute value.
   For equal absolute values, returns the smallest argument. */
   endfunction MinAbsoluteFP

function NaN(tt, ts, n)
   /* Returns true if the value is a NaN */
   return SNaN(value) or QNaN(value)
   endfunction MinAbsoluteFP

<table>
<thead>
<tr>
<th>Operand</th>
<th>Other</th>
<th>Release 6 Instructions</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>MAX</td>
</tr>
<tr>
<td>fs</td>
<td>ft</td>
<td></td>
</tr>
<tr>
<td>-0.0</td>
<td>0.0</td>
<td>0.0</td>
</tr>
<tr>
<td>0.0</td>
<td>-0.0</td>
<td></td>
</tr>
<tr>
<td>QNaN</td>
<td>#</td>
<td>#</td>
</tr>
<tr>
<td>#</td>
<td>QNaN</td>
<td></td>
</tr>
<tr>
<td>QNaN1</td>
<td>QNaN2</td>
<td>Release 6</td>
</tr>
</tbody>
</table>
### Table 4.1 Special Cases for FP MAX, MIN, MAXA, MINA

<table>
<thead>
<tr>
<th>Operand</th>
<th>Other</th>
<th>Release 6 Instructions</th>
</tr>
</thead>
<tbody>
<tr>
<td>fs ft</td>
<td>Invalid Operation exception enabled</td>
<td>Signal Invalid Operation Exception. Destination not written.</td>
</tr>
<tr>
<td>Either or both operands SNaN</td>
<td>... disabled</td>
<td>Treat as if the SNaN were a QNaN (do not quieten the result).</td>
</tr>
</tbody>
</table>

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**
Unimplemented Operation, Invalid Operation
**MFC0 IMove from Coprocessor 0**

**Format:**
- MFC0 rt, rd
- MFC0 rt, rd, sel

**Purpose:** Move from Coprocessor 0
To move the contents of a coprocessor 0 register to a general register.

**Description:**
GPR[rt] ← CPR[0, rd, sel]
The contents of the coprocessor 0 register specified by the combination of rd and sel are loaded into general register rt. Not all coprocessor 0 registers support the sel field. In those instances, the sel field must be zero.

**Restrictions:**
Pre-Release 6: The results are UNDEFINED if coprocessor 0 does not contain a register as specified by rd and sel.
Release 6: Reading a reserved register or a register that is not implemented for the current core configuration returns 0.

**Operation:**
```plaintext
reg = rd
if IsCoprocessorRegisterImplemented(0, reg, sel) then
  data ← CPR[0, reg, sel]
  GPR[rt] ← data
else
  if ArchitectureRevision() ≥ 6 then
    GPR[rt] ← 0
  else
    UNDEFINED
  endif
endif
```

**Exceptions:**
- Coprocessor Unusable, Reserved Instruction
**Format:** \( \text{MFC1} \ rt, \ fs \)

**Purpose:** Move Word From Floating Point

To copy a word from an FPU (CP1) general register to a GPR.

**Description:** \( \text{GPR}[rt] \leftarrow \text{FPR}[fs] \)

The contents of FPR \( fs \) are loaded into general register \( rt \).

**Restrictions:**

**Operation:**

```plaintext
data \leftarrow \text{ValueFPR}(fs, \text{UNINTERPRETED_WORD})
\text{GPR}[rt] \leftarrow data
```

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Historical Information:**

For MIPS I, MIPS II, and MIPS III the contents of GPR \( rt \) are \text{UNPREDICTABLE} for the instruction immediately following MFC1.
Move Word From Coprocessor 2

**Format:**

```
MFC2 rt, Impl
MFC2, rt, Impl, sel
```

The syntax shown above is an example using MFC1 as a model. The specific syntax is implementation dependent.

**Purpose:** Move Word From Coprocessor 2

To copy a word from a COP2 general register to a GPR.

**Description:**

```
GPR[rt] ← CP2CPR[Impl]
```

The contents of the coprocessor 2 register denoted by the `Impl` field are and placed into general register `rt`. The interpretation of the `Impl` field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

**Restrictions:**

The results are **UNPREDICTABLE** if the `Impl` field specifies a coprocessor 2 register that does not exist.

**Operation:**

```
data ← CP2CPR[Impl]
GPR[rt] ← data
```

**Exceptions:**

Coprocessor Unusable
**Format:**

<table>
<thead>
<tr>
<th>Format:</th>
<th>MFHC0 rt, rd</th>
<th>MFHC0 rt, rd, sel</th>
</tr>
</thead>
</table>

**Purpose:** Move from High Coprocessor 0

To move the contents of the upper 32 bits of a Coprocessor 0 register, extended by 32-bits, to a general register.

**Description:**

\[
\text{GPR}[rt] \leftarrow \text{CPR}[0, \text{rd}, \text{sel}][63:32]
\]

The contents of the Coprocessor 0 register specified by the combination of \text{rd} and \text{sel} are loaded into general register \text{rt}. Not all Coprocessor 0 registers support the \text{sel} field, and in those instances, the \text{sel} field must be zero.

The MFHC0 operation is not affected when the Coprocessor 0 register specified is the \text{EntryLo0} or the \text{EntryLo1} register. Data is read from the upper half of the 32-bit register extended to 64-bits without modification before writing to the GPR. This is because RI and XI bits are not repositioned on write from GPR to \text{EntryLo0} or the \text{EntryLo1}.

**Restrictions:**

Pre-Release 6: The results are UNDEFINED if Coprocessor 0 does not contain a register as specified by \text{rd} and \text{sel}, or the register exists but is not extended by 32-bits, or the register is extended for XPA, but XPA is not supported or enabled.

Release 6: Reading the high part of a register that is reserved, not implemented for the current core configuration, or that is not extended beyond 32 bits returns 0.

**Operation:**

```
if Config5MVH = 0 then SignalException(ReservedInstruction) endif
reg ← rd
if IsCoprocessorRegisterImplemented(0, reg, sel) and
   IsCoprocessorRegisterExtended(0, reg, sel) then
data ← CPR[0, reg, sel]
GPR[rt] ← data_{63..32}
else
   if ArchitectureRevision() ≥ 6 then
      GPR[rt] ← 0
   else
      UNDEFINED
   endif
endif
```

**Exceptions:**

Coprocessor Usable, Reserved Instruction
**MFHC1**

Move Word From High Half of Floating Point Register

**Format:** MFHC1 \( rt, fs \)

**MIPS32 Release 2**

**Purpose:** Move Word From High Half of Floating Point Register

To copy a word from the high half of an FPU (CP1) general register to a GPR.

**Description:** \( GPR[rt] \leftarrow FPR[fs]_{63..32} \)

The contents of the high word of FPR \( fs \) are loaded into general register \( rt \). This instruction is primarily intended to support 64-bit floating point units on a 32-bit CPU, but the semantics of the instruction are defined for all cases.

**Restrictions:**

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception. The results are **UNPREDICTABLE** if \( Status_{FR} = 0 \) and \( fs \) is odd.

**Operation:**

\[
data \leftarrow \text{ValueFPR}(fs, \text{UNINTERPRETED\_DOUBLEWORD})_{63..32} \\
GPR[rt] \leftarrow data
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction
The syntax shown above is an example using MFHCl as a model. The specific syntax is implementation dependent.

**Purpose:** Move Word From High Half of Coprocessor 2 Register

To copy a word from the high half of a COP2 general register to a GPR.

**Description:** \( GPR[rt] \leftarrow CP2CPR[Impl]_{63..32} \)

The contents of the high word of the coprocessor 2 register denoted by the \( Impl \) field are placed into GPR \( rt \). The interpretation of the \( Impl \) field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

**Restrictions:**

The results are **UNPREDICTABLE** if the \( Impl \) field specifies a coprocessor 2 register that does not exist, or if that register is not 64 bits wide.

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
data & \leftarrow CP2CPR[Impl]_{63..32} \\
GPR[rt] & \leftarrow data
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction
MFHI

Move From HI Register

Format: MFHI rd

MIPS32, removed in Release 6

Purpose: Move From HI Register
To copy the special purpose HI register to a GPR.

Description: GPR[rd] ← HI
The contents of special register HI are loaded into GPR rd.

Restrictions:
None

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
GPR[rd] ← HI

Exceptions:
None

Historical Information:
In the MIPS I, II, and III architectures, the two instructions which follow the MFHI must not modify the HI register. If this restriction is violated, the result of the MFHI is UNPREDICTABLE. This restriction was removed in MIPS IV and MIPS32, and all subsequent levels of the architecture.
MFLO Move From LO Register

**Format:** MFLO rd

**Purpose:** Move From LO Register

To copy the special purpose LO register to a GPR.

**Description:** GPR[rd] ← LO

The contents of special register LO are loaded into GPR rd.

**Restrictions:**

None

**Availability and Compatibility:**

This instruction has been removed in Release 6.

**Operation:**

GPR[rd] ← LO

**Exceptions:**

None

**Historical Information:**

In the MIPS I, II, and III architectures, the two instructions which follow the MFLO must not modify the HI register. If this restriction is violated, the result of the MFLO is UNPREDICTABLE. This restriction was removed in MIPS IV and MIPS32, and all subsequent levels of the architecture.
**MOV.fmt**

**Floating Point Move**

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>MOV</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td>0000</td>
<td>0010</td>
<td>0011</td>
<td>000110</td>
</tr>
</tbody>
</table>

**Format:**

MOV.fmt

MOV.S fd, fs  
MOV.D fd, fs  
MOV.PS fd, fs  

MIPS64, MIPS32 Release 2, removed in Release 6

**Purpose:** Floating Point Move

To move an FP value between FPRs.

**Description:**

FPR[fd] <- FPR[fs]

The value in FPR fs is placed into FPR fd. The source and destination are values in format fmt. In paired-single format, both the halves of the pair are copied to fd.

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSRCause and FCSRFlags fields are not modified.

**Restrictions:**

The fields fs and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of MOV.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility:**

MOV.PS has been removed in Release 6.

**Operation:**

StoreFPR(fd, fmt, ValueFPR(fs, fmt))

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Unimplemented Operation
MOVF Move Conditional on Floating Point False

Purpose: Move Conditional on Floating Point False
To test an FP condition code then conditionally move a GPR.

Description: if FPConditionCode(cc) = 0 then GPR[rd] ← GPR[rs]
If the floating point condition code specified by CC is zero, then the contents of GPR rs are placed into GPR rd.

Restrictions:

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
if FPConditionCode(cc) = 0 then
    GPR[rd] ← GPR[rs]
endif

Exceptions:
Reserved Instruction, Coprocessor Unusable
MOVF.fmt

Floating Point Move Conditional on Floating Point False

<table>
<thead>
<tr>
<th>Format: MOVF.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>MOVF.S fd, fs, cc</td>
</tr>
<tr>
<td>MOVF.D fd, fs, cc</td>
</tr>
<tr>
<td>MOVF.PS fd, fs, cc</td>
</tr>
</tbody>
</table>

MIPS32, removed in Release 6
MIPS32, removed in Release 6
removed in Release 6

Purpose: Floating Point Move Conditional on Floating Point False
To test an FP condition code then conditionally move an FP value.

Description: if FPConditionCode(cc) = 0 then FPR[fd] ← FPR[fs]
If the floating point condition code specified by CC is zero, then the value in FPR fs is placed into FPR fd. The source
and destination are values in format fmt.
If the condition code is not zero, then FPR fs is not copied and FPR fd retains its previous value in format fmt. If fd did
not contain a value either in format fmt or previously unused data from a load or move-to operation that could be
interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.
MOVFP.SS merges the lower half of FPR fs into the lower half of FPR fd if condition code CC is zero, and indepen-
dently merges the upper half of FPR fs into the upper half of FPR fd if condition code CC+1 is zero. The CC field
must be even; if it is odd, the result of this operation is UNPREDICTABLE.
The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR_Cause and FCSR_Flags fields are not
modified.

Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPRE-
DICTABLE. The operand must be a value in format fmt. If it is not, the result is UNPREDICTABLE and the value of
the operand FPR becomes UNPREDICTABLE.
The result of MOVFP.SS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model;
it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:
This instruction has been removed in Release 6 and has been replaced by the ‘SEL.fmt’ instruction. Refer to the
SEL.fmt instruction in this manual for more information. Release 6 does not support Paired Single (PS).

Operation:

```
if FPConditionCode(cc) = 0 then
  StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
  StoreFPR(fd, fmt, ValueFPR(fd, fmt))
```

Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
**MOVN**

Move Conditional on Not Zero

**Format:**  
MOVN rd, rs, rt

**Purpose:**  
Move Conditional on Not Zero  
To conditionally move a GPR after testing a GPR value.

**Description:**  
if GPR[rt] ≠ 0 then GPR[rd] ← GPR[rs]

If the value in GPR rt is not equal to zero, then the contents of GPR rs are placed into GPR rd.

**Restrictions:**
None

**Availability and Compatibility:**
This instruction has been removed in Release 6 and has been replaced by the ‘SELNEZ’ instruction. Refer to the SELNEZ instruction in this manual for more information.

**Operation:**
```c
if GPR[rt] ≠ 0 then
    GPR[rd] ← GPR[rs]
endif
```

**Exceptions:**
None

**Programming Notes:**
The non-zero value tested might be the condition true result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
MOVN.fmt Floating Point Move Conditional on Not Zero

Format:

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>rt</th>
<th>fs</th>
<th>fd</th>
<th>MOVN</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Purpose: Floating Point Move Conditional on Not Zero
To test a GPR then conditionally move an FP value.

Description: if GPR[rt] \neq 0 then FPR[fd] \leftarrow FPR[fs]
If the value in GPR rt is not equal to zero, then the value in FPR fs is placed in FPR fd. The source and destination are values in format fmt.

If GPR rt contains zero, then FPR fs is not copied and FPR fd contains its previous value in format fmt. If fd did not contain a value either in format fmt or previously unused data from a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR_Cause and FCSRFlags fields are not modified.

Restrictions:
The fields fs and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of MOVN.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:
This instruction has been removed in Release 6 and has been replaced by the ‘SELNEZ.fmt’ instruction. Refer to the SELNEZ.fmt instruction in this manual for more information. Release 6 does not support Paired Single (PS).

Operation:

```plaintext
if GPR[rt] \neq 0 then
    StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
    StoreFPR(fd, fmt, ValueFPR(fd, fmt))
endif
```

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Unimplemented Operation
MOVT

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

MOVT IMove Conditional on Floating Point True

MOVT rd, rs, cc

MIPS32, removed in Release 6

Purpose: Move Conditional on Floating Point True

To test an FP condition code then conditionally move a GPR.

Description: if FPConditionCode(cc) = 1 then GPR[rd] ← GPR[rs]

If the floating point condition code specified by CC is one, then the contents of GPR rs are placed into GPR rd.

Restrictions:

Availability and Compatibility:

This instruction has been removed in Release 6.

Operation:

if FPConditionCode(cc) = 1 then
    GPR[rd] ← GPR[rs]
endif

Exceptions:

Reserved Instruction, Coprocessor Unusable
**MOVT.fmt**  Floating Point Move Conditional on Floating Point True

**Format:**

MOVT.fmt

MOVT.S fd, fs, cc  
MOVT.D fd, fs, cc  
MOVT.PS fd, fs, cc

**Purpose:**  Floating Point Move Conditional on Floating Point True

To test an FP condition code then conditionally move an FP value.

**Description:**  

if FPConditionCode(cc) = 1 then FPR[fd] ← FPR[fs]  
If the floating point condition code specified by CC is one, then the value in FPR fs is placed into FPR fd. The source and destination are values in format fmt.

If the condition code is not one, then FPR fs is not copied and FPR fd contains its previous value in format fmt. If fd did not contain a value either in format fmt or previously unused data from a load or move-to operation that could be interpreted in format fmt, then the value of fd becomes UNPREDICTABLE.

MOVT.PS merges the lower half of FPR fs into the lower half of FPR fd if condition code CC is one, and independently merges the upper half of FPR fs into the upper half of FPR fd if condition code CC+1 is one. The CC field should be even; if it is odd, the result of this operation is UNPREDICTABLE.

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR\textsubscript{Cause} and FCSR\textsubscript{Flags} fields are not modified.

**Restrictions:**

The fields fs and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE. The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of MOVT.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility**

This instruction has been removed in Release 6 and has been replaced by the ‘SEL.fmt’ instruction. Refer to the SEL.fmt instruction in this manual for more information. Release 6 does not support Paired Single (PS).

**Operation:**

\[
\text{if FPConditionCode(cc) = 1 then } \\
\quad \text{StoreFPR(fd, fmt, ValueFPR(fs, fmt))} \\
\text{else } \\
\quad \text{StoreFPR(fd, fmt, ValueFPR(fd, fmt))}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Unimplemented Operation
**MOVZ**

**Move Conditional on Zero**

<table>
<thead>
<tr>
<th>Format:</th>
<th>MOVZ rd, rs, rt</th>
</tr>
</thead>
<tbody>
<tr>
<td>MIPS32, removed in Release 6</td>
<td></td>
</tr>
</tbody>
</table>

**Purpose:** Move Conditional on Zero

To conditionally move a GPR after testing a GPR value.

**Description:** if GPR\[rt\] = 0 then GPR\[rd\] ← GPR\[rs\]

If the value in GPR \(rt\) is equal to zero, then the contents of GPR \(rs\) are placed into GPR \(rd\).

**Restrictions:**

None

**Availability and Compatibility:**

This instruction has been removed in Release 6 and has been replaced by the ‘SELEQZ’ instruction. Refer to the SELEQZ instruction in this manual for more information.

**Operation:**

```
if GPR[rt] = 0 then
    GPR[rd] ← GPR[rs]
endif
```

**Exceptions:**

None

**Programming Notes:**

The zero value tested might be the *condition false* result from the SLT, SLTI, SLTU, and SLTIU comparison instructions or a boolean value read from memory.
**MOVZ.fmt**

Floating Point Move Conditional on Zero

To test a GPR then conditionally move an FP value.

**Description:**

If the value in GPR \( rt \) is equal to zero then the value in FPR \( fs \) is placed in FPR \( fd \). The source and destination are values in format \( fmt \).

If GPR \( rt \) is not zero, then FPR \( fs \) is not copied and FPR \( fd \) contains its previous value in format \( fmt \). If \( fd \) did not contain a value either in format \( fmt \) or previously unused data from a load or move-to operation that could be interpreted in format \( fmt \), then the value of \( fd \) becomes UNPREDICTABLE.

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the \( FCSR_{Cause} \) and \( FCSR_{Flags} \) fields are not modified.

**Restrictions:**

The fields \( fs \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of MOVZ.PS is UNPREDICTABLE if the processor is executing in the \( FR=0 \) 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the \( FR=1 \) mode, but not with \( FR=0 \), and not on a 32-bit FPU.

**Availability and Compatibility:**

This instruction has been removed in Release 6 and has been replaced by the ‘SELEQZ.fmt’ instruction. Refer to the SELEQZ.fmt instruction in this manual for more information. Release 6 does not support Paired Single (PS).

**Operation:**

```plaintext
if GPR[rt] = 0 then
    StoreFPR(fd, fmt, ValueFPR(fs, fmt))
else
    StoreFPR(fd, fmt, ValueFPR(fd, fmt))
endif
```

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Unimplemented Operation

---

**Format:**

MOVZ.fmt

- **MOVZ.S fd, fs, rt**  
  MIPS32, removed in Release 6
- **MOVZ.D fd, fs, rt**  
  MIPS32, removed in Release 6
- **MOVZ.PS fd, fs, rt**  
  MIPS32 Release 2, removed in Release 6

**Purpose:** Floating Point Move Conditional on Zero

To test a GPR then conditionally move an FP value.
Format: MSUB rs, rt

Purpose: Multiply and Subtract Word to Hi, Lo
To multiply two words and subtract the result from HI, LO.

Description: (HI,LO) ← (HI,LO) - (GPR[rs] x GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit value in GPR rt, treating both operands as signed values, to produce a 64-bit result. The product is subtracted from the 64-bit concatenated values of HI and LO. The most significant 32 bits of the result are written into HI and the least significant 32 bits are written into LO. No arithmetic exception occurs under any circumstances.

Restrictions:
No restrictions in any architecture releases except Release 6.
This instruction does not provide the capability of writing directly to a target GPR.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
\[
\begin{align*}
temp & \leftarrow (HI \ || \ LO) - (GPR[rs] \times GPR[rt]) \\
HI & \leftarrow temp_{63..32} \\
LO & \leftarrow temp_{31..0}
\end{align*}
\]

Exceptions:
None

Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Format: MSUB.fmt

MSUB.S fd, fr, fs, ft  
MSUB.D fd, fr, fs, ft  
MSUB.PS fd, fr, fs, ft

Purpose: Floating Point Multiply Subtract
To perform a combined multiply-then-subtract of FP values.

Description: FPR[fd] \leftarrow (FPR[fs] \times FPR[ft]) - FPR[fr]

The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product is rounded according to the current rounding mode in FCSR. The subtraction result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and subtract instructions were executed.

MSUB.PS multiplies then subtracts the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.

The Cause bits are ORed into the Flag bits if no exception is taken.

Restrictions:
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of MSUB.PS is UNPREDICTABLE if the processor is executing in the \( FR=0 \) 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the \( FR=1 \) mode, but not with \( FR=0 \), and not on a 32-bit FPU.

Availability and Compatibility:
MSUB.S and MSUB.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, these instructions are to be implemented if an FPU is present, either in a 32-bit or 64-bit FPU or in a 32-bit or 64-bit FP Register Mode (\( FIR_{F64}=0 \) or 1, Status_{FR}=0 or 1).

This instruction has been removed in Release 6 and has been replaced by the fused multiply-subtract instruction. Refer to the fused multiply-subtract instruction ‘MSUBF.fmt’ in this manual for more information. Release 6 does not support Paired Single (PS).

Operation:
\[
\begin{align*}
vfr & \leftarrow \text{ValueFPR}(fr, fmt) \\
vfs & \leftarrow \text{ValueFPR}(fs, fmt) \\
vft & \leftarrow \text{ValueFPR}(ft, fmt) \\
\text{StoreFPR}(fd, fmt, (vfs_{fmt} \times vft) -_{fmt} vfr)) 
\end{align*}
\]

Exceptions:
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
  Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MSUBU

Multiply and Subtract Word to Hi,Lo

Format: MSUBU rs, rt

Purpose: Multiply and Subtract Word to Hi,Lo
To multiply two words and subtract the result from HI, LO.

Description: (HI,LO) ← (HI,LO) − (GPR[rs] × GPR[rt])
The 32-bit word value in GPR rs is multiplied by the 32-bit word value in GPR rt, treating both operands as unsigned
values, to produce a 64-bit result. The product is subtracted from the 64-bit concatenated values of HI and LO. The
most significant 32 bits of the result are written into HI and the least significant 32 bits are written into LO. No arith-
metic exception occurs under any circumstances.

Restrictions:
This instruction does not provide the capability of writing directly to a target GPR.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
   temp ← (HI || LO) − (GPR[rs] × GPR[rt])
   HI ← temp63..32
   LO ← temp31..0

Exceptions:
None

Programming Notes:
Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the
latency of the instruction on those processors which implement data-dependent instruction latencies.
**MTC0**

**Move to Coprocessor 0**

- **Format:**
  - MTC0 `rt, rd`
  - MTC0 `rt, rd, sel`

- **Purpose:**
  - Move to Coprocessor 0

To move the contents of a general register to a coprocessor 0 register.

- **Description:**
  - CPR[0, rd, sel] ← GPR[rt]

  The contents of general register `rt` are loaded into the coprocessor 0 register specified by the combination of `rd` and `sel`. Not all coprocessor 0 registers support the `sel` field. In those instances, the `sel` field must be set to zero.

In Release 5, for a 32-bit processor, the MTC0 instruction writes all zeroes to the high-order bits of selected COP0 registers that have been extended beyond 32 bits. This is required for compatibility with legacy software that does not use MTHC0, yet has hardware support for extended COP0 registers (such as for Extended Physical Addressing (XPA)). Because MTC0 overwrites the result of MTHC0, software must first read the high-order bits before writing the low-order bits, then write the high-order bits back either modified or unmodified. For initialization of an extended register, software may first write the low-order bits, then the high-order bits, without first reading the high-order bits.

- **Restrictions:**
  - Pre-Release 6: The results are **UNDEFINED** if coprocessor 0 does not contain a register as specified by `rd` and `sel`.
  - Release 6: Writes to a register that is reserved or not defined for the current core configuration are ignored.

- **Operation:**
  ```plaintext```
  ```
  data ← GPR[rt]
  reg ← rd
  if IsCoprocessorRegisterImplemented (0, reg, sel) then
    CPR[0,reg,sel] ← data
    if (Config5MVH = 1) then
      // The most-significant bit may vary by register. Only supported
      // bits should be written 0. Extended LLAddr is not written with 0s,
      // as it is a read-only register. BadVAddr is not written with 0s, as
      // it is read-only
      if (Config3LPA = 1) then
        if (reg,sel = EntryLo0 or EntryLo1) then CPR[0,reg,sel]63:32 = 032  
          endif
        if (reg,sel = MAAR) then CPR[0,reg,sel]63:32 = 032  
          endif
        // TagLo is zeroed only if the implementation-dependent bits
        // are writable
        if (reg,sel = TagLo) then CPR[0,reg,sel]63:32 = 032  
          endif
        if (Config3VZ = 1) then
          if (reg,sel = EntryHi) then CPR[0,reg,sel]63:32 = 032  
            endif
          endif
      endif
    else
      if ArchitectureRevision() ≥ 6 then
        // nop (no exceptions, coprocessor state not modified)
      else
        UNDEFINED
      endif
  endif
  ```
```
Exceptions:
Coprocessor Unusable, Reserved Instruction
MTC1 Move Word to Floating Point

**Format:**  
MTC1 rt, fs

**MIPS32**

**Purpose:** Move Word to Floating Point  
To copy a word from a GPR to an FPU (CP1) general register.

**Description:**  
FPR[fs] ← GPR[rt]  
The low word in GPR rt is placed into the low word of FPR fs.

**Restrictions:**

**Operation:**

\[
\text{data} \leftarrow \text{GPR[rt]31..0} \\
\text{StoreFPR(fs, UNINTERPRETED\_WORD, data)}
\]

**Exceptions:**

Coprocessor Unusable

**Historical Information:**

For MIPS I, MIPS II, and MIPS III the value of FPR fs is **UNPREDICTABLE** for the instruction immediately following MTC1.
MTC2 IMove Word to Coprocessor 2

The syntax shown above is an example using MTC1 as a model. The specific syntax is implementation-dependent.

**Purpose:** Move Word to Coprocessor 2
To copy a word from a GPR to a COP2 general register.

**Description:** $CP2CPR[Impl] \leftarrow GPR[rt]$
The low word in GPR $rt$ is placed into the low word of a Coprocessor 2 general register denoted by the $Impl$ field. The interpretation of the $Impl$ field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

**Restrictions:**
The results are **UNPREDICTABLE** if the $Impl$ field specifies a Coprocessor 2 register that does not exist.

**Operation:**
\[
data \leftarrow GPR[rt]
CP2CPR[Impl] \leftarrow data
\]

**Exceptions:**
Coprocessor Unusable, Reserved Instruction
MTHC0 Move to High Coprocessor 0

Format:  
MTHC0 rt, rd  
MTHC0 rt, rd, sel

Purpose:  Move to High Coprocessor 0
To copy a word from a GPR to the upper 32 bits of a COP2 general register that has been extended by 32 bits.

Description:  
CPR[0, rd, sel][63:32] ← GPR[rt]
The contents of general register rt are loaded into the Coprocessor 0 register specified by the combination of rd and sel. Not all Coprocessor 0 registers support the sel field; the sel field must be set to zero.

Restrictions:
Pre-Release 6: The results are UNDEFINED if Coprocessor 0 does not contain a register as specified by rd and sel, or if the register exists but is not extended by 32 bits, or the register is extended for XPA, but XPA is not supported or enabled.

Release 6: A write to the high part of a register that is reserved, not implemented for the current core, or that is not extended beyond 32 bits is ignored.

Operation:
if Config5MVH = 0 then SignalException(ReservedInstruction) endif
data ← GPR[rt]
reg ← rd
if IsCoprocessorRegisterImplemented (0, reg, sel) and
   IsCoprocessorRegisterExtended (0, reg, sel) then
   CPR[0, reg, sel][63:32] ← data
else
   if ArchitectureRevision() ≥ 6 then
      // nop (no exceptions, coprocessor state not modified)
   else
      UNDEFINED
   endif
endif

Exceptions:
Coprocessor Unusable, Reserved Instruction
MTHC1 Move Word to High Half of Floating Point Register

**Format:**  MTHC1 rt, fs

**Purpose:** Move Word to High Half of Floating Point Register
To copy a word from a GPR to the high half of an FPU (CP1) general register.

**Description:**  FPR[fs]63..32 ← GPR[rt]
The word in GPR rt is placed into the high word of FPR fs. This instruction is primarily intended to support 64-bit floating point units on a 32-bit CPU, but the semantics of the instruction are defined for all cases.

**Restrictions:**
In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.
The results are **UNPREDICTABLE** if StatusFR = 0 and fs is odd.

**Operation:**
newdata ← GPR[rt]
olddata ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)31..0
StoreFPR(fs, UNINTERPRETED_DOUBLEWORD, newdata || olddata)

**Exceptions:**
Coprocessor Unusable, Reserved Instruction

**Programming Notes**
When paired with MTC1 to write a value to a 64-bit FPR, the MTC1 must be executed first, followed by the MTHC1. This is because of the semantic definition of MTC1, which is not aware that software is using an MTHC1 instruction to complete the operation, and sets the upper half of the 64-bit FPR to an **UNPREDICTABLE** value.
MTHC2 Move Word to High Half of Coprocessor 2 Register

The syntax shown above is an example using MTHC1 as a model. The specific syntax is implementation dependent.

**Purpose:** Move Word to High Half of Coprocessor 2 Register

To copy a word from a GPR to the high half of a COP2 general register.

**Description:**\( CP2\text{CPR}[\text{Impl}]_{63..32} \leftarrow \text{GPR}[\text{rt}] \)

The word in GPR \( \text{rt} \) is placed into the high word of coprocessor 2 general register denoted by the \( \text{Impl} \) field. The interpretation of the \( \text{Impl} \) field is left entirely to the Coprocessor 2 implementation and is not specified by the architecture.

**Restrictions:**

The results are **UNPREDICTABLE** if the \( \text{Impl} \) field specifies a coprocessor 2 register that does not exist, or if that register is not 64 bits wide.

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
\text{data} & \leftarrow \text{GPR}[\text{rt}] \\
\text{CP2}\text{CPR}[\text{Impl}] & \leftarrow \text{data} \parallel \text{CPR}[2,\text{rd},\text{sel}]_{31..0}
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Programming Notes**

When paired with MTC2 to write a value to a 64-bit CPR, the MTC2 must be executed first, followed by the MTHC2. This is because of the semantic definition of MTC2, which is not aware that software is using an MTHC2 instruction to complete the operation, and sets the upper half of the 64-bit CPR to an UNPREDICTABLE value.
Move to HI Register

Purpose: Move to HI Register
To copy a GPR to the special purpose HI register.

Description: HI ← GPR[rs]
The contents of GPR rs are loaded into special register HI.

Restrictions:
A computed result written to the HI/LO pair by DIV, DIVU, MULT, or MULTU must be read by MFHI or MFLO before a new result can be written into either HI or LO.

If an MTHI instruction is executed following one of these arithmetic instructions, but before an MFLO or MFHI instruction, the contents of LO are UNPREDICTABLE. The following example shows this illegal situation:

```
MULT r2, r4  # start operation that will eventually write to HI,LO
...        # code not containing mfhi or mflo
MTHI r6    # this mflo would get an UNPREDICTABLE value
...        # code not containing mflo
MFLO r3    # this mflo would get an UNPREDICTABLE value
```

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
HI ← GPR[rs]

Exceptions:
None

Historical Information:
In MIPS I-III, if either of the two preceding instructions is MFHI, the result of that MFHI is UNPREDICTABLE. Reads of the HI or LO special register must be separated from any subsequent instructions that write to them by two or more instructions. In MIPS IV and later, including MIPS32, this restriction does not exist.
MTLO Move to LO Register

**Format:** MTLO rs

The contents of GPR rs are loaded into special register LO.

**Restrictions:**

A computed result written to the HI/LO pair by DIV, DIVU, MULT, or MULTU must be read by MFHI or MFLO before a new result can be written into either HI or LO.

If an MTLO instruction is executed following one of these arithmetic instructions, but before an MFLO or MFHI instruction, the contents of HI are UNPREDICTABLE. The following example shows this illegal situation:

```
MULT r2,r4  # start operation that will eventually write to HI,LO
...         # code not containing mfhi or mflo
MTLO r6    # code not containing mfhi
...         # code not containing mfhi
MFHI r3    # this mfhi would get an UNPREDICTABLE value
```

**Availability and Compatibility:**

This instruction has been removed in Release 6.

**Historical Information:**

In MIPS I-III, if either of the two preceding instructions is MFHI, the result of that MFHI is UNPREDICTABLE. Reads of the HI or LO special register must be separated from any subsequent instructions that write to them by two or more instructions. In MIPS IV and later, including MIPS32, this restriction does not exist.
MUL

Multiply Word to GPR

Format:  MUL rd, rs, rt

MIPS32, removed in Release 6

Purpose:  Multiply Word to GPR

To multiply two words and write the result to a GPR.

Description:  GPR[rd] ← GPR[rs] × GPR[rt]

The 32-bit word value in GPR rs is multiplied by the 32-bit value in GPR rt, treating both operands as signed values, to produce a 64-bit result. The least significant 32 bits of the product are written to GPR rd. The contents of HI and LO are UNPREDICTABLE after the operation. No arithmetic exception occurs under any circumstances.

Restrictions:

Note that this instruction does not provide the capability of writing the result to the HI and LO registers.

Availability and Compatibility:

The pre-Release 6 MUL instruction has been removed in Release 6. It has been replaced by a similar instruction of the same mnemonic, MUL, but different encoding, which is a member of a family of single-width multiply instructions. Refer to the ‘MUL’ and ‘MUH’ instructions in this manual for more information.

Operation:

\[
\begin{align*}
temp & \leftarrow GPR[rs] \times GPR[rt] \\
GPR[rd] & \leftarrow temp_{31..0} \\
HI & \leftarrow UNPREDICTABLE \\
LO & \leftarrow UNPREDICTABLE
\end{align*}
\]

Exceptions:

None

Programming Notes:

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read GPR rd before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
Multiply Integers (with result to GPR)

**Purpose:** Multiply Integers (with result to GPR)

MUL: Multiply Words Signed, Low Word  
MUH: Multiply Words Signed, High Word  
MULU: Multiply Words Unsigned, Low Word  
MUHU: Multiply Words Unsigned, High Word

**Description:**

MUL: \[ GPR[rd] \leftarrow \text{lo\_word}(\text{multiply\_signed}(GPR[rs] \times GPR[rt])) \]

MUH: \[ GPR[rd] \leftarrow \text{hi\_word}(\text{multiply\_signed}(GPR[rs] \times GPR[rt])) \]

MULU: \[ GPR[rd] \leftarrow \text{lo\_word}(\text{multiply\_unsigned}(GPR[rs] \times GPR[rt])) \]

MUHU: \[ GPR[rd] \leftarrow \text{hi\_word}(\text{multiply\_unsigned}(GPR[rs] \times GPR[rt])) \]

The Release 6 multiply instructions multiply the operands in GPR[rs] and GPR[rd], and place the specified high or low part of the result, of the same width, in GPR[rd].

MUL performs a signed 32-bit integer multiplication, and places the low 32 bits of the result in the destination register.

MUH performs a signed 32-bit integer multiplication, and places the high 32 bits of the result in the destination register.

MULU performs an unsigned 32-bit integer multiplication, and places the low 32 bits of the result in the destination register.

MUHU performs an unsigned 32-bit integer multiplication, and places the high 32 bits of the result in the destination register.

**Restrictions:**

MUL behaves correctly even if its inputs are not sign extended 32-bit integers. Bits 32-63 of its inputs do not affect the result.

MULU behaves correctly even if its inputs are not zero or sign extended 32-bit integers. Bits 32-63 of its inputs do not affect the result.
Availability and Compatibility:

These instructions are introduced by and required as of Release 6.

Programming Notes:

The low half of the integer multiplication result is identical for signed and unsigned. Nevertheless, there are distinct instructions MUL MULU. Implementations may choose to optimize a multiply that produces the low half followed by a multiply that produces the upper half. Programmers are recommended to use matching lower and upper half multiplications.

The Release 6 MUL instruction has the same opcode mnemonic as the pre-Release 6 MUL instruction. The semantics of these instructions are almost identical: both produce the low 32-bits of the $32\times32=64$ product; but the pre-Release 6 MUL is unpredictable if its inputs are not properly sign extended 32-bit values on a 64 bit machine, and is defined to render the HI and LO registers unpredictable, whereas the Release 6 version ignores bits 32-63 of the input, and there are no HI/LO registers in Release 6 to be affected.

Operation:

MUL, MUH:

\[
\begin{align*}
    s1 & \leftarrow \text{signed_word}(GPR[rs]) \\
    s2 & \leftarrow \text{signed_word}(GPR[rt])
\end{align*}
\]

MULU, MUHU:

\[
\begin{align*}
    s1 & \leftarrow \text{unsigned_word}(GPR[rs]) \\
    s2 & \leftarrow \text{unsigned_word}(GPR[rt])
\end{align*}
\]

\[
\text{product} \leftarrow s1 \times s2 \quad \text{/* product is twice the width of sources */}
\]

MUL: GPR[rd] \leftarrow \text{lo_word}(\text{product })

MUH: GPR[rd] \leftarrow \text{hi_word}(\text{product })

MULU: GPR[rd] \leftarrow \text{lo_word}(\text{product })

MUHU: GPR[rd] \leftarrow \text{hi_word}(\text{product })

Exceptions:

None
**Format:**
MUL.fmt

MUL.S  fd, fs, ft  
MUL.D  fd, fs, ft  
MUL.PS fd, fs, ft  

**MIPS32**

**MIPS64,MIPS32 Release 3, removed in Release 6**

**Purpose:** Floating Point Multiply

To multiply FP values.

**Description:** FPR[fds] ← FPR[fs] × FPR[ft]

The value in FPR fs is multiplied by the value in FPR ft. The result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR fd. The operands and result are values in format fmt. MUL.PS multiplies the upper and lower halves of FPR fs and FPR ft independently, and ORs together any generated exceptional conditions.

**Restrictions:**

The fields fs, ft, and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of MUL.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility:**

MUL.PS has been removed in Release 6.

**Operation:**

\[
\text{StoreFPR} \ (fd, \text{fmt}, \text{ValueFPR}(fs, \text{fmt}) \times_{\text{fmt}} \text{ValueFPR}(ft, \text{fmt}))
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
MULTIPLY Word

The MIPS32® Instruction Set Manual, Revision 6.04 292
Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Format:

MULT rs, rt

MIPS32, removed in Release 6

Purpose:

Multiply Word

To multiply 32-bit signed integers.

Description:

(HI, LO) ← GPR[rs] x GPR[rt]

The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as signed values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high-order 32-bit word is placed into special register HI.

No arithmetic exception occurs under any circumstances.

Restrictions:

None

Availability and Compatibility:

The MULT instruction has been removed in Release 6. It has been replaced by the Multiply Low (MUL) and Multiply High (MUH) instructions, whose output is written to a single GPR. Refer to the ‘MUL’ and ‘MUH’ instructions in this manual for more information.

Operation:

prod ← GPR[rs]31..0 x GPR[rt]31..0
LO ← prod31..0
HI ← prod63..32

Exceptions:

None

Programming Notes:

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read LO or HI before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in GPR rt. This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.

Implementation Note:
MULTU Multiply Unsigned Word

Purpose:

To multiply 32-bit unsigned integers.

Description:

\[(HI, LO) \leftarrow GPR[rs] \times GPR[rt]\]

The 32-bit word value in GPR \(rt\) is multiplied by the 32-bit value in GPR \(rs\), treating both operands as unsigned values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register \(LO\), and the high-order 32-bit word is placed into special register \(HI\).

No arithmetic exception occurs under any circumstances.

Restrictions:

None

Availability and Compatibility:

The MULTU instruction has been removed in Release 6. It has been replaced by the Multiply Low (MULU) and Multiply High (MUHU) instructions, whose output is written to a single GPR. Refer to the ‘MULU’ and ‘MUHU’ instructions in this manual for more information.

Operation:

\[
\begin{align*}
\text{prod} & \leftarrow (0 \mid \mid GPR[rs]_{31..0}) \times (0 \mid \mid GPR[rt]_{31..0}) \\
\text{LO} & \leftarrow \text{prod}_{31..0} \\
\text{HI} & \leftarrow \text{prod}_{63..32}
\end{align*}
\]

Exceptions:

None

Programming Notes:

In some processors the integer multiply operation may proceed asynchronously and allow other CPU instructions to execute before it is complete. An attempt to read \(LO\) or \(HI\) before the results are written interlocks until the results are ready. Asynchronous execution does not affect the program result, but offers an opportunity for performance improvement by scheduling the multiply so that other instructions can execute in parallel.

Programs that require overflow detection must check for it explicitly.

Where the size of the operands are known, software should place the shorter operand in GPR \(rt\). This may reduce the latency of the instruction on those processors which implement data-dependent instruction latencies.
NAL

No-op and Link

**Format:** NAL

**Assembly Idiom MIPS32 pre-Release 6, MIPS32 Release 6**

**Purpose:** No-op and Link

**Description:** GPR[31] ← PC+8

NAL is an instruction used to read the PC.

NAL was originally an alias for pre-Release 6 instruction **BLTZAL.** The condition is false, so the 16-bit target offset field is ignored, but the link register, GPR 31, is unconditionally written with the address of the instruction past the delay slot.

**Restrictions:**

NAL is considered to be a not-taken branch, with a delay slot, and may not be followed by instructions not allowed in delay slots. Nor is NAL allowed in a delay slot or forbidden slot.

**Availability and Compatibility:**

This is a deprecated instruction in Release 6. It is strongly recommended not to use this deprecated instructions because it will be removed from a future revision of the MIPS Architecture.

The pre-Release 6 instruction **BLTZAL** when rs is not GPR[0], is removed in Release 6, and is required to signal a Reserved Instruction exception. Release 6 adds **BLTZALC**, the equivalent compact conditional branch and link, with no delay slot.

This instruction, NAL, is introduced by and required as of Release 6, the mnemonic NAL becomes distinguished from the **BLTZAL** instruction removed in Release 6. The NAL instruction encoding, however, works on all implementations, both pre-Release 6, where it was a special case of **BLEZAL**, and Release 6, where it is an instruction in its own right.

NAL is provided only for compatibility with pre-Release 6 software. It is recommended that you use **ADDIUPC** to generate a PC-relative address.

**Exceptions:**

None

**Operation:**

GPR[31] ← PC + 8
**NEG.fmt**

**Floating Point Negate**

<table>
<thead>
<tr>
<th>Format:</th>
<th>NEG.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>NEG.S fd, fs</td>
<td>MIPS32</td>
</tr>
<tr>
<td>NEG.D fd, fs</td>
<td>MIPS32</td>
</tr>
<tr>
<td>NEG.PS fd, fs</td>
<td>MIPS32 Release 2, removed in Release 6</td>
</tr>
</tbody>
</table>

**Purpose:** Floating Point Negate

To negate an FP value.

**Description:** FPR[fd] ← -FPR[fs]

The value in FPR fs is negated and placed into FPR fd. The value is negated by changing the sign bit value. The operand and result are values in format fmt. NEG.PS negates the upper and lower halves of FPR fs independently, and ORs together any generated exceptional conditions.

If FIR.Has2008=0 or FCSR.ABS2008=0 then this operation is arithmetic. For this case, any NaN operand signals invalid operation.

If FCSR.ABS2008=1 then this operation is non-arithmetic. For this case, both regular floating point numbers and NAN values are treated alike, only the sign bit is affected by this instruction. No IEEE 754 exception can be generated for this case, and the FCSR.Cause and FCSR.Flags fields are not modified.

**Restrictions:**

The fields fs and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE. The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

The result of NEG.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

**Availability and Compatibility:**

NEG.PS has been removed in Release 6.

**Operation:**

\[ \text{StoreFPR}(\text{fd}, \text{fmt}, \text{Negate}(	ext{ValueFPR}(\text{fs}, \text{fmt}))) \]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Unimplemented Operation, Invalid Operation
NMADD.fmt

Floating Point Negative Multiply Add

<table>
<thead>
<tr>
<th>COP1X</th>
<th>fr</th>
<th>ft</th>
<th>fs</th>
<th>fd</th>
<th>NMADD</th>
<th>fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>010011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>3</td>
<td>3</td>
</tr>
</tbody>
</table>

Format: NMADD.fmt

NMADD.S fd, fr, fs, ft
NMADD.D fd, fr, fs, ft
NMADD.PS fd, fr, fs, ft

MIPS32 Release 2, removed in Release 6
MIPS32 Release 2, removed in Release 6
MIPS32 Release 2, removed in Release 6

Purpose: Floating Point Negative Multiply Add

To negate a combined multiply-then-add of FP values.

Description:

FPR[fd] \leftarrow -((FPR[fs] \times FPR[ft]) + FPR[fr])

The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product is rounded according to the current rounding mode in FCSR. The value in FPR fr is added to the product. The result sum is calculated to infinite precision, rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and add and negate instructions were executed.

NMADD.PS applies the operation to the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions. The Cause bits are ORed into the Flag bits if no exception is taken.

Restrictions:

The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of NMADD.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:

This instruction has been removed in Release 6.

NMADD.S and NMADD.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required by MIPS32 Release 2 and subsequent versions of MIPS32. When required, these instructions are to be implemented if an FPU is present, either in a 32-bit or 64-bit FPU or in a 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).

Operation:

vfr \leftarrow \text{ValueFPR}(fr, fmt)
vfs \leftarrow \text{ValueFPR}(fs, fmt)
vft \leftarrow \text{ValueFPR}(ft, fmt)
StoreFPR(fd, fmt, -(vfr +_{fmt}(vfs \times_{fmt} vft)))

Exceptions:

Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
**NMSUB.fmt**  
Floating Point Negative Multiply Subtract

<table>
<thead>
<tr>
<th>Cop1X</th>
<th>fir</th>
<th>ft</th>
<th>fs</th>
<th>fd</th>
<th>NMSUB</th>
<th>fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>010011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>111</td>
<td>3</td>
</tr>
</tbody>
</table>

**Format:**  
NMSUB.fmt  
NMSUB.S fd, fr, fs, ft  
NMSUB.D fd, fr, fs, ft  
NMSUB.PS fd, fr, fs, ft  

**Purpose:**  
Floating Point Negative Multiply Subtract  
To negate a combined multiply-then-subtract of FP values.

**Description:**  
FPR[fd] ← ((FPR[fs] × FPR[ft]) − FPR[fr])  
The value in FPR fs is multiplied by the value in FPR ft to produce an intermediate product. The intermediate product is rounded according to the current rounding mode in FCSR. The value in FPR fr is subtracted from the product.  
The result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, negated by changing the sign bit, and placed into FPR fd. The operands and result are values in format fmt. The results and flags are as if separate floating-point multiply and subtract and negate instructions were executed.  
NMSUB.PS applies the operation to the upper and lower halves of FPR fr, FPR fs, and FPR ft independently, and ORs together any generated exceptional conditions.  
The Cause bits are ORed into the Flag bits if no exception is taken.

**Restrictions:**  
The fields fr, fs, ft, and fd must specify FPRs valid for operands of type fmt. If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format fmt; if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of NMSUB.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0 and not on a 32-bit FPU.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

NMSUB.S and NMSUB.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, these instructions are to be implemented if an FPU is present, either in a 32-bit or 64-bit FPU or in a 32-bit or 64-bit FP Register Mode (FIR_{F64}=0 or 1, Status_{FR}=0 or 1).

**Operation:**  
\[
\begin{align*}
\text{vfr} & \leftarrow \text{ValueFPR}(fr, fmt) \\
\text{vfs} & \leftarrow \text{ValueFPR}(fs, fmt) \\
\text{vft} & \leftarrow \text{ValueFPR}(ft, fmt) \\
\text{StoreFPR}(fd, fmt, -(\text{vfs} \times_{\text{fmt}} \text{vft}) -_{\text{fmt}} \text{vfr})
\end{align*}
\]

**Exceptions:**  
Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation, Overflow, Underflow
NOP

Format:  NOP

Assembly Idiom

Purpose:  No Operation
To perform no operation.

Description:
NOP is the assembly idiom used to denote no operation. The actual instruction is interpreted by the hardware as SLL r0, r0, 0.

Restrictions:
None

Operations:
None

Exceptions:
None

Programming Notes:
The zero instruction word, which represents SLL, r0, r0, 0, is the preferred NOP for software to use to fill branch and jump delay slots and to pad out alignment sequences.
Format: \texttt{NOR \ rd, \ rs, \ rt} \\

\textbf{Purpose:} Not Or \\
To do a bitwise logical NOT OR.

\textbf{Description:} \texttt{GPR[rd] \leftarrow GPR[rs] \text{ nor } GPR[rt]}
The contents of GPR \texttt{rs} are combined with the contents of GPR \texttt{rt} in a bitwise logical NOR operation. The result is placed into GPR \texttt{rd}.

\textbf{Restrictions:} \\
None

\textbf{Operation:} \\
\texttt{GPR[rd] \leftarrow GPR[rs] \text{ nor } GPR[rt]}

\textbf{Exceptions:} \\
None
The contents of GPR $rs$ are combined with the contents of GPR $rt$ in a bitwise logical OR operation. The result is placed into GPR $rd$.

**Restrictions:**
None

**Operations:**
GPR[$rd$] ← GPR[$rs$] or GPR[$rt$]

**Exceptions:**
None
**Format:** ORI rt, rs, immediate

**Purpose:** Or Immediate

To do a bitwise logical OR with a constant.

**Description:** GPR[rt] ← GPR[rs] or immediate

The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical OR operation. The result is placed into GPR rt.

**Restrictions:**
None

**Operations:**

GPR[rt] ← GPR[rs] or zero_extend(immediate)

**Exceptions:**
None
PAUSE

Wait for the LLBit to clear.

Format: PAUSE

Purpose: Wait for the LLBit to clear.

Description:
Locks implemented using the LL/SC instructions are a common method of synchronization between threads of control. A lock implementation does a load-linked instruction and checks the value returned to determine whether the software lock is set. If it is, the code branches back to retry the load-linked instruction, implementing an active busy-wait sequence. The PAUSE instruction is intended to be placed into the busy-wait sequence to block the instruction stream until such time as the load-linked instruction has a chance to succeed in obtaining the software lock.

The PAUSE instruction is implementation-dependent, but it usually involves descheduling the instruction stream until the LLBit is zero.

- In a single-threaded processor, this may be implemented as a short-term WAIT operation which resumes at the next instruction when the LLBit is zero or on some other external event such as an interrupt.
- On a multi-threaded processor, this may be implemented as a short term YIELD operation which resumes at the next instruction when the LLBit is zero.

In either case, it is assumed that the instruction stream which gives up the software lock does so via a write to the lock variable, which causes the processor to clear the LLBit as seen by this thread of execution.

The encoding of the instruction is such that it is backward compatible with all previous implementations of the architecture. The PAUSE instruction can therefore be placed into existing lock sequences and treated as a NOP by the processor, even if the processor does not implement the PAUSE instruction.

Restrictions:
Pre-Release 6: The operation of the processor is UNPREDICTABLE if a PAUSE instruction is executed placed in the delay slot of a branch or jump instruction.

Release 6: Implementations are required to signal a Reserved Instruction exception if PAUSE is encountered in the delay slot or forbidden slot of a branch or jump instruction.

Operations:

```plaintext
if LLBit ≠ 0 then
    EPC ← PC + 4 /* Resume at the following instruction */
    DescheduleInstructionStream()
endif
```

Exceptions:
None

Programming Notes:
The PAUSE instruction is intended to be inserted into the instruction stream after an LL instruction has set the LLBit and found the software lock set. The program may wait forever if a PAUSE instruction is executed and there is no possibility that the LLBit will ever be cleared.

An example use of the PAUSE instruction is shown below:
acquire_lock:
  ll t0, 0(a0)  /* Read software lock, set hardware lock */
  bnezc t0, acquire_lock_retry: /* Branch if software lock is taken; */
  /* Release 6 branch */
  addiu t0, t0, 1  /* Set the software lock */
  sc t0, 0(a0)  /* Try to store the software lock */
  bnezc t0, 10f /* Branch if lock acquired successfully */
  sync
acquire_lock_retry:
  pause /* Wait for LLBIT to clear before retry */
  bc acquire_lock /* and retry the operation; Release 6 branch */
10:

Critical region code

release_lock:
  sync
  sw zero, 0(a0)  /* Release software lock, clearing LLBIT */
  /* for any PAUSEd waiters */
PLL.PS Pair Lower Lower

MIPS32 Release 2, removed in Release 6

Purpose: Pair Lower Lower
To merge a pair of paired single values with realignment.

Description: FPR[fd] ← lower(FPR[fs]) || lower(FPR[ft])
A new paired-single value is formed by catenating the lower single of FPR fs (bits 31..0) and the lower single of FPR ft (bits 31..0).
The move is non-arithmetic; it causes no IEEE 754 exceptions, and the FCSR\text{Cause} and FCSR\text{Flags} fields are not modified.

Restrictions:
The fields fs, ft, and fd must specify FPRs valid for operands of type PS. If the fields are not valid, the result is UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
\[
\text{StoreFPR}(fd, PS, \text{ValueFPR}(fs, PS)_{31..0} || \text{ValueFPR}(ft, PS)_{31..0})
\]

Exceptions:
Coprocessor Unusable, Reserved Instruction
**PLU.PL**

**Pair Lower Upper**

- **Format:** `PLU.PL fd, fs, ft`  
  
  **MIPS32 Release 2, removed in Release 6**

  **Purpose:** Pair Lower Upper  
  
  To merge a pair of paired single values with realignment.

  **Description:**
  
  $FPR[fd] \leftarrow \text{lower}(FPR[fs]) \ || \ \text{upper}(FPR[ft])$  
  
  A new paired-single value is formed by catenating the lower single of $FPR[fs]$ (bits $31..0$) and the upper single of $FPR[ft]$ (bits $63..32$).

  The move is non-arithmetic; it causes no IEEE 754 exceptions, and the $FCSR_{Cause}$ and $FCSR_{Flags}$ fields are not modified.

  **Restrictions:**
  
  The fields $fs$, $ft$, and $fd$ must specify FPRs valid for operands of type $PS$. If the fields are not valid, the result is **UNPREDICTABLE**.

  The result of this instruction is **UNPREDICTABLE** if the processor is executing in the $FR=0$ 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the $FR=1$ mode, but not with $FR=0$, and not on a 32-bit FPU.

  **Availability and Compatibility:**
  
  This instruction has been removed in Release 6.

  **Operation:**
  
  ```
  \text{StoreFPR}(fd, PS, \text{ValueFPR}(fs, PS)_{31..0} \ || \ \text{ValueFPR}(ft, PS)_{63..32})
  ```

  **Exceptions:**
  
  Coprocessor Usable, Reserved Instruction
Prefetch

To move data between memory and cache.

**Description:** `prefetch_memory(GPR[base] + offset)`

Prefetch adds the signed `offset` to the contents of GPR `base` to form an effective byte address. The `hint` field supplies information about the way that the data is expected to be used.

Prefetch enables the processor to take some action, typically causing data to be moved to or from the cache, to improve program performance. The action taken for a specific PREF instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter the meaning of a program. Implementations are expected either to do nothing, or to take an action that increases the performance of the program. The PrepareForStore function is unique in that it may modify the architecturally visible state.

Prefetch does not cause addressing-related exceptions, including TLB exceptions. If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs. However, even if no data is moved, some action that is not architecturally visible, such as writeback of a dirty cache line, can take place.

It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction.

Prefetch neither generates a memory operation nor modifies the state of a cache line for a location with an uncached memory access type, whether this type is specified by the address segment (e.g., kseg1), the programmed cacheability and coherency attribute of a segment (e.g., the use of the `K0`, `KU`, or `K23` fields in the Config register), or the per-page cacheability and coherency attribute provided by the TLB.

If PREF results in a memory operation, the memory access type and cacheability&coherency attribute used for the operation are determined by the memory access type and cacheability&coherency attribute of the effective address, just as it would be if the memory operation had been caused by a load or store to the effective address.

For a cached location, the expected and useful action for the processor is to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is fetched into are implementation specific.

In coherent multiprocessor implementations, if the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the instruction causes a coherent memory transaction to occur. This means a prefetch issued on one processor can cause data to be evicted from the cache in another processor.

The PREF instruction and the memory transactions which are sourced by the PREF instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.
<table>
<thead>
<tr>
<th>Value</th>
<th>Name</th>
<th>Data Use and Desired Prefetch Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>load</td>
<td>Use: Prefetched data is expected to be read (not modified). Action: Fetch data as if for a load.</td>
</tr>
<tr>
<td>1</td>
<td>store</td>
<td>Use: Prefetched data is expected to be stored or modified. Action: Fetch data as if for a store.</td>
</tr>
<tr>
<td>2</td>
<td>L1 LRU hint</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: Implementation dependent. This hint code marks the line as LRU in the L1 cache and thus preferred for next eviction. Implementations can choose to writeback and/or invalidate as long as no architectural state is modified.</td>
</tr>
<tr>
<td>4</td>
<td>load_streamed</td>
<td>Use: Prefetched data is expected to be read (not modified) but not reused extensively; it “streams” through cache. Action: Fetch data as if for a load and place it in the cache so that it does not displace data prefetched as “retained.”</td>
</tr>
<tr>
<td>5</td>
<td>store_streamed</td>
<td>Use: Prefetched data is expected to be stored or modified but not reused extensively; it “streams” through cache. Action: Fetch data as if for a store and place it in the cache so that it does not displace data prefetched as “retained.”</td>
</tr>
<tr>
<td>6</td>
<td>load_retained</td>
<td>Use: Prefetched data is expected to be read (not modified) and reused extensively; it should be “retained” in the cache. Action: Fetch data as if for a load and place it in the cache so that it is not displaced by data prefetched as “streamed.”</td>
</tr>
<tr>
<td>7</td>
<td>store_retained</td>
<td>Use: Prefetched data is expected to be stored or modified and reused extensively; it should be “retained” in the cache. Action: Fetch data as if for a store and place it in the cache so that it is not displaced by data prefetched as “streamed.”</td>
</tr>
<tr>
<td>8-15</td>
<td>L2 operation</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: In the Release 6 architecture, hint codes 8 - 15 are treated the same as hint codes 0 - 7 respectively, but operate on the L2 cache.</td>
</tr>
<tr>
<td>16-23</td>
<td>L3 operation</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: In the Release 6 architecture, hint codes 16 - 23 are treated the same as hint codes 0 - 7 respectively, but operate on the L3 cache.</td>
</tr>
<tr>
<td>24</td>
<td>Reserved for Architecture</td>
<td>Pre-Release 6: Unassigned by the Architecture - available for implementation-dependent use. Release 6: This hint code is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI).</td>
</tr>
</tbody>
</table>
Table 5.2 Values of hint Field for PREF Instruction (Continued)

<table>
<thead>
<tr>
<th>Value</th>
<th>Name</th>
<th>Data Use and Desired Prefetch Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>25</td>
<td>writeback_invalidate (also known as “nudge”)</td>
<td>Pre-Release 6: Use—Data is no longer expected to be used. Action—For a writeback cache, schedule a writeback of any dirty data. At the completion of the writeback, mark the state of any cache lines written back as invalid. If the cache line is not dirty, it is implementation dependent whether the state of the cache line is marked invalid or left unchanged. If the cache line is locked, no action is taken. Release 6: This hint code is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI).</td>
</tr>
<tr>
<td>26-29</td>
<td>Reserved for Architecture</td>
<td>Pre-Release 6: Unassigned by the Architecture—available for implementation-dependent use. Release 6: These hints are not implemented in the Release 6 architecture and generate a Reserved Instruction exception (RI).</td>
</tr>
<tr>
<td>30</td>
<td>PrepareForStore</td>
<td>Pre-Release 6: Use—Prepare the cache for writing an entire line, without the overhead involved in filling the line from memory. Action—If the reference hits in the cache, no action is taken. If the reference misses in the cache, a line is selected for replacement, any valid and dirty victim is written back to memory, the entire line is filled with zero data, and the state of the line is marked as valid and dirty. Programming Note: Because the cache line is filled with zero data on a cache miss, software must not assume that this action, in and of itself, can be used as a fast bzero-type function. Release 6: This hint is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI).</td>
</tr>
<tr>
<td>31</td>
<td>Reserved for Architecture</td>
<td>Pre-Release 6: Unassigned by the Architecture—available for implementation-dependent use. Release 6: This hint is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI).</td>
</tr>
</tbody>
</table>

Restrictions:
None
This instruction does not produce an exception for a misaligned memory address, since it has no memory access size.

Availability and Compatibility:
This instruction has been recoded for Release 6.

Operation:
\[
\begin{align*}
\text{vAddr} & \leftarrow \text{GPR[base]} + \text{sign\_extend(offset)} \\
(p\text{Addr}, \text{CCA}) & \leftarrow \text{AddressTranslation(vAddr, DATA, LOAD)} \\
\text{Prefetch}(\text{CCA, pAddr, vAddr, DATA, hint})
\end{align*}
\]

Exceptions:
Bus Error, Cache Error
Prefetch does not take any TLB-related or address-related exceptions under any circumstances.

Programming Notes:
In the Release 6 architecture, hint codes 2:3, 10:11, 18:19 behave as a NOP if not implemented. Hint codes 24:31 are
not implemented (treated as reserved) and always signal a Reserved Instruction exception (RI).

As shown in the instruction drawing above, Release 6 implements a 9-bit offset, whereas all release levels lower than Release 6 of the MIPS architecture implement a 16-bit offset.

Prefetch cannot move data to or from a mapped location unless the translation for that location is present in the TLB. Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations.

Prefetch does not cause addressing exceptions. A prefetch may be used using an address pointer before the validity of the pointer is determined without worrying about an addressing exception.

It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREF instruction. Typically, this only occurs in systems which have high-reliability requirements.

Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.

*Hint* field encodings whose function is described as “streamed” or “retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an optimal way. If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
Format:  \texttt{PREFE} \texttt{hint,offset(base)}

Purpose:  
Prefetch EVA

To move data between user mode virtual address space memory and cache while operating in kernel mode.

Description:  \texttt{prefetch\_memory(GPR[base] + offset)}

\texttt{PREFE} adds the 9-bit signed \texttt{offset} to the contents of \texttt{GPR base} to form an effective byte address. The \texttt{hint} field supplies information about the way that the data is expected to be used.

\texttt{PREFE} enables the processor to take some action, causing data to be moved to or from the cache, to improve program performance. The action taken for a specific \texttt{PREFE} instruction is both system and context dependent. Any action, including doing nothing, is permitted as long as it does not change architecturally visible state or alter the meaning of a program. Implementations are expected either to do nothing, or to take an action that increases the performance of the program. The \texttt{PrepareForStore} function is unique in that it may modify the architecturally visible state.

\texttt{PREFE} does not cause addressing-related exceptions, including TLB exceptions. If the address specified would cause an addressing exception, the exception condition is ignored and no data movement occurs. However, even if no data is moved, some action that is not architecturally visible, such as writeback of a dirty cache line, can take place.

It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the \texttt{PREFE} instruction.

\texttt{PREFE} neither generates a memory operation nor modifies the state of a cache line for a location with an \textit{uncached} memory access type, whether this type is specified by the address segment (for example, kseg1), the programmed cacheability and coherency attribute of a segment (for example, the use of the \texttt{K0}, \texttt{KU}, or \texttt{K23} fields in the \texttt{Config} register), or the per-page cacheability and coherency attribute provided by the TLB.

If \texttt{PREFE} results in a memory operation, the memory access type and cacheability & coherency attribute used for the operation are determined by the memory access type and cacheability & coherency attribute of the effective address, just as it would be if the memory operation had been caused by a load or store to the effective address.

For a cached location, the expected and useful action for the processor is to prefetch a block of data that includes the effective address. The size of the block and the level of the memory hierarchy it is fetched into are implementation specific.

In coherent multiprocessor implementations, if the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the instruction causes a coherent memory transaction to occur. This means a prefetch issued on one processor can cause data to be evicted from the cache in another processor.

The \texttt{PREFE} instruction and the memory transactions which are sourced by the \texttt{PREFE} instruction, such as cache refill or cache writeback, obey the ordering and completion rules of the SYNC instruction.

The \texttt{PREFE} instruction functions in exactly the same fashion as the \texttt{PREF} instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the \texttt{Config5}\texttt{EVA} field being set to one.
### Table 5.3 Values of *hint* Field for PREFE Instruction

<table>
<thead>
<tr>
<th>Value</th>
<th>Name</th>
<th>Data Use and Desired Prefetch Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>load</td>
<td>Use: Prefetched data is expected to be read (not modified). Action: Fetch data as if for a load.</td>
</tr>
<tr>
<td>1</td>
<td>store</td>
<td>Use: Prefetched data is expected to be stored or modified. Action: Fetch data as if for a store.</td>
</tr>
<tr>
<td>2</td>
<td>L1 LRU hint</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: Implementation dependent. This hint code marks the line as LRU in the L1 cache and thus preferred for next eviction. Implementations can choose to writeback and/or invalidate as long as no architectural state is modified.</td>
</tr>
<tr>
<td>4</td>
<td>load_streamed</td>
<td>Use: Prefetched data is expected to be read (not modified) but not reused extensively; it “streams” through cache. Action: Fetch data as if for a load and place it in the cache so that it does not displace data prefetched as “retained.”</td>
</tr>
<tr>
<td>5</td>
<td>store_streamed</td>
<td>Use: Prefetched data is expected to be stored or modified but not reused extensively; it “streams” through cache. Action: Fetch data as if for a store and place it in the cache so that it does not displace data prefetched as “retained.”</td>
</tr>
<tr>
<td>6</td>
<td>load_retained</td>
<td>Use: Prefetched data is expected to be read (not modified) and reused extensively; it should be “retained” in the cache. Action: Fetch data as if for a load and place it in the cache so that it is not displaced by data prefetched as “streamed.”</td>
</tr>
<tr>
<td>7</td>
<td>store_retained</td>
<td>Use: Prefetched data is expected to be stored or modified and reused extensively; it should be “retained” in the cache. Action: Fetch data as if for a store and place it in the cache so that it is not displaced by data prefetched as “streamed.”</td>
</tr>
<tr>
<td>8-15</td>
<td>L2 operation</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: Hint codes 8 - 15 are treated the same as hint codes 0 - 7 respectively, but operate on the L2 cache.</td>
</tr>
<tr>
<td>16-23</td>
<td>L3 operation</td>
<td>Pre-Release 6: Reserved for Architecture. Release 6: Hint codes 16 - 23 are treated the same as hint codes 0 - 7 respectively, but operate on the L3 cache.</td>
</tr>
<tr>
<td>24</td>
<td>Reserved for Architecture</td>
<td>Pre-Release 6: Unassigned by the Architecture - available for implementation-dependent use. Release 6: This hint code is not implemented in the Release 6 architecture and generates a Reserved Instruction exception (RI).</td>
</tr>
</tbody>
</table>
### Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

This instruction does not produce an exception for a misaligned memory address, since it has no memory access size.

### Operation:

\[
\text{vAddr} \leftarrow \text{GGPR}[\text{base}] + \text{sign} \cdot \text{extend}(\text{offset}) \\
(p\text{Addr}, \text{CCA}) \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}, \text{LOAD}) \\
\text{Prefetch} (\text{CCA}, p\text{Addr}, \text{vAddr}, \text{DATA, hint})
\]

### Exceptions:
Bus Error, Cache Error, Address Error, Reserved Instruction, Coprocessor Usable

Prefetch does not take any TLB-related or address-related exceptions under any circumstances.

### Programming Notes:
In the Release 6 architecture, hint codes 0:23 behave as a NOP and never signal a Reserved Instruction exception (RI). Hint codes 24:31 are not implemented (treated as reserved) and always signal a Reserved Instruction exception (RI).
Prefetch cannot move data to or from a mapped location unless the translation for that location is present in the TLB. Locations in memory pages that have not been accessed recently may not have translations in the TLB, so prefetch may not be effective for such locations.

Prefetch does not cause addressing exceptions. A prefetch may be used using an address pointer before the validity of the pointer is determined without worrying about an addressing exception.

It is implementation dependent whether a Bus Error or Cache Error exception is reported if such an error is detected as a byproduct of the action taken by the PREFE instruction. Typically, this only occurs in systems which have high-reliability requirements.

Prefetch operations have no effect on cache lines that were previously locked with the CACHE instruction.

*Hint* field encodings whose function is described as “streamed” or “retained” convey usage intent from software to hardware. Software should not assume that hardware will always prefetch data in an optimal way. If data is to be truly retained, software should use the Cache instruction to lock data into the cache.
Prefetch Indexed

**Format:**

```
PREFX hint, index(base)
```

**Purpose:**

Prefetch Indexed

To move data between memory and cache.

**Description:**

```
prefetch_memory[GPR[base] + GPR[index]]
```

*PrefX* adds the contents of GPR `index` to the contents of GPR `base` to form an effective byte address. The `hint` field supplies information about the way the data is expected to be used.

The only functional difference between the *PREF* and *PREFX* instructions is the addressing mode implemented by the two. Refer to the *PREF* instruction for all other details, including the encoding of the `hint` field.

**Restrictions:**

**Availability and Compatibility:**

Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required by MIPS32 Release 2 and subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode ($FIR_{F64}=0$ or 1, $Status_{FR}=0$ or 1).

This instruction has been removed in Release 6.

**Operation:**

```
vAddr ← GPR[base] + GPR[index]
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, LOAD)
Prefetch(CCA, pAddr, vAddr, DATA, hint)
```

**Exceptions:**

Coprocessor Unusable, Reserved Instruction, Bus Error, Cache Error

**Programming Notes:**

The *PREFX* instruction is only available on processors that implement floating point and should never by generated by compilers in situations other than those in which the corresponding load and store indexed floating point instructions are generated.

Refer to the corresponding section in the *PREF* instruction description.
**Format:** \texttt{PUL.PS fd, fs, ft}  
MIPS64, MIPS32 Release 2, removed in Release 6

**Purpose:** Pair Upper Lower  
To merge a pair of paired single values with realignment.

**Description:** \(\text{FPR}[fd] \leftarrow \text{upper(FPR[fs])} \mid\mid \text{lower(FPR[ft])}\)  
A new paired-single value is formed by catenating the upper single of FPR \(fs\) (bits 63..32) and the lower single of FPR \(ft\) (bits 31..0).  
The move is non-arithmetic; it causes no IEEE 754 exceptions, and the \(FCSR_{\text{Cause}}\) and \(FCSR_{\text{Flags}}\) fields are not modified.

**Restrictions:**  
The fields \(fs, ft, \) and \(fd\) must specify FPRs valid for operands of type \(PS\). If the fields are not valid, the result is \texttt{UNPREDICTABLE}.
The result of this instruction is \texttt{UNPREDICTABLE} if the processor is executing in the \(FR=0\) 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the \(FR=1\) mode, but not with \(FR=0\), and not on a 32-bit FPU.

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  
\(\text{StoreFPR}(fd, PS, \text{ValueFPR}(fs, PS)_{63..32} \mid\mid \text{ValueFPR}(ft, PS)_{31..0})\)

**Exceptions:**  
Coprocessor Unusable, Reserved Instruction
PUU.PS Pair Upper Upper

Format:  

\[
\text{PUU.PS } fd, fs, ft
\]

Purpose:  
To merge a pair of paired single values with realignment.

Description:  
\[
\text{FPR}[fd] \leftarrow \text{upper(FPR}[fs]) \mid \mid \text{upper(FPR}[ft])
\]
A new paired-single value is formed by catenating the upper single of FPR \(fs\) (bits 63..32) and the upper single of FPR \(ft\) (bits 63..32).

The move is non-arithmetic; it causes no IEEE 754 exceptions, and the \(FCSR_{Cause}\) and \(FCSR_{Flags}\) fields are not modified.

Restrictions:  
The fields \(fs\), \(ft\), and \(fd\) must specify FPRs valid for operands of type \(PS\). If the fields are not valid, the result is \(UNPREDICTABLE\).

The result of this instruction is \(UNPREDICTABLE\) if the processor is executing in the \(FR=0\) 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the \(FR=1\) mode, but not with \(FR=0\), and not on a 32-bit FPU.

Availability and Compatibility:  
This instruction has been removed in Release 6.

Operation:  
\[
\text{StoreFPR}(fd, PS, \text{ValueFPR}(fs, PS)_{63..32} \mid \mid \text{ValueFPR}(ft, PS)_{63..32})
\]

Exceptions:  
Coprocessor Unusable, Reserved Instruction
RDHWR

Read Hardware Register

Format: \texttt{RDHWR rt,rd,sel}

MIPS32 Release 2

Purpose: Read Hardware Register

To move the contents of a hardware register to a general purpose register (GPR) if that operation is enabled by privileged software.

The purpose of this instruction is to give user mode access to specific information that is otherwise only visible in kernel mode.

In Release 6, a \textit{sel} field has been added to allow a register with multiple instances to be read selectively. Specifically it is used for \textit{PerfCtr}.

Description: \texttt{GPR[rt] \leftarrow HWR[rd]; GPR[rt] \leftarrow HWR[rd, sel]}

If access is allowed to the specified hardware register, the contents of the register specified by \textit{rd} (optionally \textit{sel} in Release 6) is loaded into general register \textit{rt}. Access control for each register is selected by the bits in the coprocessor 0 \texttt{HWREna} register.

The available hardware registers, and the encoding of the \textit{rd} field for each, are shown in Table 5.4.

\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|c|c|c|c|}
\hline
Register Number & Mnemonic & Description \\
\hline
0 & CPUNum & Number of the CPU on which the program is currently running. This register provides read access to the coprocessor 0 \texttt{EBaseCPUNum} field. \\
1 & SYNCl_Step & Address step size to be used with the SYNCl instruction, or zero if no caches need be synchronized. See that instruction’s description for the use of this value. \\
2 & CC & High-resolution cycle counter. This register provides read access to the coprocessor 0 \texttt{Count} Register. \\
3 & CCRes & Resolution of the CC register. This value denotes the number of cycles between update of the register. For example:
\begin{tabular}{|c|}
\hline
CCRes Value & Meaning \\
\hline
1 & CC register increments every CPU cycle \\
2 & CC register increments every second CPU cycle \\
3 & CC register increments every third CPU cycle \\
\hline
\end{tabular}
\end{tabular}
\end{table}

PerfCtr & Performance Counter Pair. Even \textit{sel} selects the \textit{Control} register, while odd \textit{sel} selects the \textit{Counter} register in the pair. The value of \textit{sel} corresponds to the value of \textit{sel} used by MFC0 to read the COP0 register.

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
Table 5.4 RDHWR Register Numbers

<table>
<thead>
<tr>
<th>Register Number (rs Value)</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>5</td>
<td>XNP</td>
<td>Indicates support for Release 6 Double-Width LLX/SCX family of instructions. If set to 1, then LLX/SCX family of instructions is not present, otherwise present in the implementation. In absence of hardware support for double-width or extended atomics, user software may emulate the instruction’s behavior through other means. See Config5XNP.</td>
</tr>
<tr>
<td>6-28</td>
<td></td>
<td>These registers numbers are reserved for future architecture use. Access results in a Reserved Instruction Exception.</td>
</tr>
<tr>
<td>29</td>
<td>ULR</td>
<td>User Local Register. This register provides read access to the coprocessor 0 UserLocal register, if it is implemented. In some operating environments, the UserLocal register is a pointer to a thread-specific storage block.</td>
</tr>
<tr>
<td>30-31</td>
<td></td>
<td>These register numbers are reserved for implementation-dependent use. If they are not implemented, access results in a Reserved Instruction Exception.</td>
</tr>
</tbody>
</table>

Restrictions:

In implementations of Release 1 of the Architecture, this instruction resulted in a Reserved Instruction Exception.

Access to the specified hardware register is enabled if Coprocessor 0 is enabled, or if the corresponding bit is set in the HWREna register. If access is not allowed or the register is not implemented, a Reserved Instruction Exception is signaled.

In Release 6, when the 3-bit sel is undefined for use with a specific register number, then a Reserved Instruction Exception is signaled.

Availability and Compatibility:

This instruction has been recoded for Release 6. The instruction supports a sel field in Release 6.

Operation:

```assembly
if ((rs!=4) and (sel==0))
    case rd
        0: temp ← EBaseCPDNum
        1: temp ← SYNCI_StepSize()
        2: temp ← Count
        3: temp ← CountResolution()
        if (>=2) // #5 - Release 6
            5: temp ← Config5XNPendif
        29: temp ← UserLocal
    endif
    30: temp ← Implementation-Dependent-Value
    31: temp ← Implementation-Dependent-Value
    otherwise: SignalException(ReservedInstruction)
endcase
elseif ((rs==4) and (sel==defined))// #4 - Release 6
    temp ← PerfCtr[sel]
else
    endif
endif
GPR[rt] ← temp
```
Exceptions:
Reserved Instruction

For a register that does not require sel, the compiler must support an assembly syntax without sel that is ‘RDHWR rt, rd’. Another valid syntax is for sel to be 0 to map to pre-Release 6 register numbers which do not require use of sel that is, ‘RDHWR rt, rd, 0’.
**RDPGPR**

**Read GPR from Previous Shadow Set**

**Format:**

```
<table>
<thead>
<tr>
<th>COP0</th>
<th>RDPGPR</th>
<th>rt</th>
<th>rd</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100 00</td>
<td>01 010</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>
```

**Purpose:** Read GPR from Previous Shadow Set

To move the contents of a GPR from the previous shadow set to a current GPR.

**Description:**

\[
\text{GPR}[rd] \leftarrow \text{SGPR}[\text{SRSCtl}_{\text{PSS}}, rt]
\]

The contents of the shadow GPR register specified by \(\text{SRSCtl}_{\text{PSS}}\) (signifying the previous shadow set number) and \(rt\) (specifying the register number within that set) is moved to the current GPR \(rd\).

**Restrictions:**

In implementations prior to Release 2 of the Architecture, this instruction resulted in a Reserved Instruction exception.

**Operation:**

\[
\text{GPR}[rd] \leftarrow \text{SGPR}[\text{SRSCtl}_{\text{PSS}}, rt]
\]

**Exceptions:**

- Coprocessor Unusable
- Reserved Instruction
**Reciprocal Approximation**

**Purpose:** Reciprocal Approximation

To approximate the reciprocal of an FP value (quickly).

**Description:** \( \text{FPR}[fd] \leftarrow 1.0 / \text{FPR}[fs] \)

The reciprocal of the value in FPR \( fs \) is approximated and placed into FPR \( fd \). The operand and result are values in format \( fmt \).

The numeric accuracy of this operation is implementation dependent. It does not meet the accuracy specified by the IEEE 754 Floating Point standard. The computed result differs from the both the exact result and the IEEE-mandated representation of the exact result by no more than one unit in the least-significant place (ULP).

It is implementation dependent whether the result is affected by the current rounding mode in \( FCSR \).

**Restrictions:**

The fields \( fs \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

**Availability and Compatibility:**

RECIP.S and RECIP.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode \( (FIR_{fr}=0 \text{ or } 1, Status_{FR}=0 \text{ or } 1) \).

**Operation:**

\[
\text{StoreFPR}(fd, fmt, 1.0 / \text{valueFPR}(fs, fmt))
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Inexact, Division-by-zero, Unimplemented Op, Invalid Op, Overflow, Underflow
**Purpose:** Floating-Point Round to Integral

Scalar floating-point round to integral floating point value.

**Description:**

\[
\text{FPR}[fd] \leftarrow \text{round\_int}(\text{FPR}[fs])
\]

The scalar floating-point value in the register \(fs\) is rounded to an integral valued floating-point number in the same format based on the rounding mode bits RM in the FPU Control and Status Register \(FCSR\). The result is written to \(fd\).

The operands and results are values in floating-point data format \(fmt\).

The \text{RINT}.fmt instruction corresponds to the \text{roundToIntegralExact} operation in the IEEE Standard for Floating-Point Arithmetic 754\textsuperscript{TM}-2008. The Inexact exception is signaled if the result does not have the same numerical value as the input operand.

The floating point scalar instruction RINT.fmt corresponds to the MSA vector instruction FRINT.df. I.e. RINT.S corresponds to FRINT.W, and RINT.D corresponds to FRINT.D.

**Restrictions:**

Data-dependent exceptions are possible as specified by the IEEE Standard for Floating-Point Arithmetic 754\textsuperscript{TM}-2008.

**Availability and Compatibility:**

This instruction is introduced by and required as of Release 6.

**Operation:**

\[
\text{RINT}.fmt:
\]

```c
if not IsCoprocessorEnabled(1)
   then SignalException(CoprocessorUnusable, 1) endif
if not IsFloatingPointImplemented(fmt))
   then SignalException(ReservedInstruction) endif

fin \leftarrow \text{ValueFPR}(fs,fmt)
ftmp \leftarrow \text{RoundIntFP}(fin, fmt)
if( fin \neq ftmp ) SignalFPEException(InExact)
StoreFPR (fd, fmt, ftmp )
```

```c
function RoundIntFP(tt, n)
   /* Round to integer operation, using rounding mode FCSR.RM*/
   endfunction RoundIntFP
```

**Exceptions:**

Coprocessor Unusable, Reserved Instruction
Floating Point Exceptions:

Unimplemented Operation, Invalid Operation, Inexact, Overflow, Underflow
Format: \text{ROTR} \; \text{rd}, \; \text{rt}, \; \text{sa}

Purpose: Rotate Word Right

To execute a logical right-rotate of a word by a fixed number of bits.

Description: \text{GPR[rd]} \leftarrow \text{GPR[rt]} \times \text{(right)} \; sa

The contents of the low-order 32-bit word of GPR \text{rt} are rotated right; the word result is placed in GPR \text{rd}. The bit-rotate amount is specified by \text{sa}.

Restrictions:

Operation:

\begin{verbatim}
if ((ArchitectureRevision() < 2) and (Config3SM = 0)) then
  UNPREDICTABLE
endif
s \leftarrow sa
temp \leftarrow \text{GPR[rt]}_{s-1..0} || \text{GPR[rt]}_{31..s}
\text{GPR[rd]} \leftarrow temp
\end{verbatim}

Exceptions:

Reserved Instruction
ROTRV  Rotate Word Right Variable

Format:  ROTRV rd, rt, rs

Purpose:  Rotate Word Right Variable
To execute a logical right-rotate of a word by a variable number of bits.

Description:  GPR[rd] ← GPR[rt] \times\text{right} GPR[rs]
The contents of the low-order 32-bit word of GPR rt are rotated right; the word result is placed in GPR rd. The bit-rotate amount is specified by the low-order 5 bits of GPR rs.

Restrictions:

Operation:

\[
\text{if } ((\text{ArchitectureRevision() < 2}) \text{ and } (\text{Config3SM} = 0)) \text{ then UNPREDICTABLE}
\]
\[
\text{endif}
\]
\[
s \leftarrow \text{GPR[rs]_4..0}
\]
\[
temp \leftarrow \text{GPR[rt]_{s-1..0}} || \text{GPR[rt]_{31..s}}
\]
\[
\text{GPR[rd] \leftarrow temp}
\]

Exceptions:
Reserved Instruction
ROUND.L.fmt
Floating Point Round to Long Fixed Point

Format:

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>ROUND.L</th>
<th>001000</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>8</td>
<td></td>
</tr>
</tbody>
</table>

Purpose: Floating Point Round to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding to nearest.

Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long fixed point format and rounded to nearest/even (rounding mode 0). The result is placed in FPR fd.

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{63}\) to \(2^{63}-1\), the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCSR.
If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to fd. On cores with FCSR\_NAN2008=0, the default result is \(2^{63}-1\). On cores with FCSR\_NAN2008=1, the default result is:
- 0 when the input value is NaN
- \(2^{63}-1\) when the input value is \(+\infty\) or rounds to a number larger than \(2^{63}-1\)
- \(-2^{63}-1\) when the input value is \(-\infty\) or rounds to a number smaller than \(-2^{63}-1\)

Restrictions:
The fields fs and fd must specify valid FPRs: fs for type fmt and fd for long fixed point. If the fields are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. It is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Operation:

StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation
ROUND.W.fmt
Floating Point Round to Word Fixed Point

Format:
ROUND.W.fmt
ROUND.W.S fd, fs        MIPS32
ROUND.W.D fd, fs        MIPS32

Purpose:
Floating Point Round to Word Fixed Point
To convert an FP value to 32-bit fixed point, rounding to nearest.

Description:
FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 32-bit word fixed point format rounding to nearest/even (rounding mode 0). The result is placed in FPR fd.

When the source value is Infinity, NaN, or rounds to an integer outside the range -2^{31} to 2^{31}-1, the result cannot be represented correctly and an IEEE Invalid Operation condition exists. The Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to fd. On cores with FCSR_{NAN2008}=0, the default result is 2^{31}-1. On cores with FCSR_{NAN2008}=1, the default result is:
- 0 when the input value is NaN
- 2^{31}-1 when the input value is +\infty or rounds to a number larger than 2^{31}-1
- -2^{31}-1 when the input value is -\infty or rounds to a number smaller than -2^{31}-1

Restrictions:
The fields fs and fd must specify valid FPRs: fs for type fmt and fd for word fixed point. If the fields are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

Operation:
StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Inexact, Unimplemented Operation, Invalid Operation
RSQRT.fmt  Reciprocal Square Root Approximation

Format:

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>RSQRT.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>010110</td>
</tr>
</tbody>
</table>

Purpose: Reciprocal Square Root Approximation

To approximate the reciprocal of the square root of an FP value (quickly).

Description:

\[ \text{FPR}[fd] \leftarrow \frac{1.0}{\sqrt{\text{FPR}[fs]}} \]

The reciprocal of the positive square root of the value in FPR \( fs \) is approximated and placed into FPR \( fd \). The operand and result are values in format \( fmt \).

The numeric accuracy of this operation is implementation dependent; it does not meet the accuracy specified by the IEEE 754 Floating Point standard. The computed result differs from both the exact result and the IEEE-mandated representation of the exact result by no more than two units in the least-significant place (ULP).

The effect of the current FCSR rounding mode on the result is implementation dependent.

Restrictions:

The fields \( fs \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

Availability and Compatibility:

RSQRT.S and RSQRT.D: Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (\( FIR_{F64} = 0 \) or 1, \( Status_{FR} = 0 \) or 1).

Operation:

\[ \text{StoreFPR}(fd, fmt, 1.0 / \text{SquareRoot}(\text{valueFPR}(fs, fmt))) \]

Exceptions:

Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:

Inexact, Division-by-zero, Unimplemented Operation, Invalid Operation, Overflow, Underflow
**SB IStore Byte**

**Format:**  
SB rt, offset(base)  

**Purpose:** Store Byte  
To store a byte to memory.

**Description:**  
memory[GPR[base] + offset] ← GPR[rt]  
The least-significant 8-bit byte of GPR rt is stored in memory at the location specified by the effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**  
None

**Operation:**

\[
\begin{align*}
\text{vAddr} &\leftarrow \text{sign}_\text{extend}(\text{offset}) + \text{GPR[base]} \\
(\text{pAddr}, \text{CCA}) &\leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA}, \text{STORE}) \\
\text{pAddr} &\leftarrow \text{pAddr}_{\text{FSIZE}-1..2} || (\text{pAddr}_{1..0} \text{xor ReverseEndian})^2 \\
\text{bytesel} &\leftarrow \text{vAddr}_{1..0} \text{xor BigEndianCPU}^2 \\
\text{dataword} &\leftarrow \text{GPR[rt]}_{31-8} \text{xor bytesel} || 0^8\text{bytesel} \\
\text{StoreMemory}(\text{CCA, BYTE, dataword, pAddr, vAddr, DATA})
\end{align*}
\]

**Exceptions:**  
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
SBE Store Byte EVA

Format: \texttt{SBE \text{rt}, offset(base)}

Purpose: Store Byte EVA

To store a byte to user mode virtual address space when executing in kernel mode.

Description: \texttt{memory[\text{GPR[base]} + offset] \rightarrow \text{GPR[rt]}}

The least-significant 8-bit byte of GPR \text{rt} is stored in memory at the location specified by the effective address. The 9-bit signed \textit{offset} is added to the contents of GPR \texttt{base} to form the effective address.

The SBE instruction functions the same as the SB instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the \texttt{Config5\_EVA} field being set to 1.

Restrictions:

Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Operation:

\begin{align*}
\text{vAddr} & \leftarrow \text{sign\_extend(offset) + GPR[base]} \\
(\text{pAddr, CCA}) & \leftarrow \text{AddressTranslation (vAddr, DATA, STORE)} \\
\text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE-1..2}} \mid (\text{pAddr}_{1..0} \text{ xor ReverseEndian}^2) \\
\text{bytesel} & \leftarrow \text{vAddr}_{1..0} \text{ xor BigEndianCPU}^2 \\
\text{dataword} & \leftarrow \text{GPR[rt]}_{31-8*\text{bytesel}..0} \mid 0^{8*\text{bytesel}} \\
\text{StoreMemory (CCA, BYTE, dataword, pAddr, vAddr, DATA)}
\end{align*}

Exceptions:

TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable,
**SC**

### Store Conditional Word

**Format:** $\text{SC } rt, \text{ offset}(\text{base})$

**Purpose:** Store Conditional Word

To store a word to memory to complete an atomic read-modify-write

**Description:**

- if atomic_update then $\text{memory}[\text{GPR[base]} + \text{offset}] \leftarrow \text{GPR}[rt], \text{GPR}[rt] \leftarrow 1$
- else $\text{GPR}[rt] \leftarrow 0$

The LL and SC instructions provide primitives to implement atomic read-modify-write (RMW) operations on synchronizable memory locations. In Release 5, the behavior of SC is modified when $\text{Config5}_{\text{LLB}}=1$.

The 32-bit word in GPR $rt$ is conditionally stored in memory at the location specified by the aligned effective address. The signed offset is added to the contents of GPR $base$ to form an effective address.

The SC completes the RMW sequence begun by the preceding LL instruction executed on the processor. To complete the RMW sequence atomically, the following occur:

- The 32-bit word of GPR $rt$ is stored to memory at the location specified by the aligned effective address.
- A one, indicating success, is written into GPR $rt$.

Otherwise, memory is not modified and a 0, indicating failure, is written into GPR $rt$.

If either of the following events occurs between the execution of LL and SC, the SC fails:

- A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation-dependent, but it is at least one word and at most the minimum page size.
- A coherent store is executed between an LL and SC sequence on the same processor to the block of synchronizable physical memory containing the word (if $\text{Config5}_{\text{LLB}}=1$; else whether such a store causes the SC to fail is not predictable).
- An ERET instruction is executed. (Release 5 includes ERETNC, which will not cause the SC to fail.)

Furthermore, an SC must always compare its address against that of the LL. An SC will fail if the aligned address of the SC does not match that of the preceeding LL.

A load that executes on the processor executing the LL/SC sequence to the block of synchronizable physical memory containing the word, will not cause the SC to fail (if $\text{Config5}_{\text{LLB}}=1$; else such a load may cause the SC to fail).

If any of the events listed below occurs between the execution of LL and SC, the SC may fail where it could have succeeded, i.e., success is not predictable. Portable programs should not cause any of these events.
• A load or store executed on the processor executing the LL and SC that is not to the block of synchronizable physical memory containing the word. (The load or store may cause a cache eviction between the LL and SC that results in SC failure. The load or store does not necessarily have to occur between the LL and SC.)

• Any prefetch that is executed on the processor executing the LL and SC sequence (due to a cache eviction between the LL and SC).

• A non-coherent store executed between an LL and SC sequence to the block of synchronizable physical memory containing the word.

• The instructions executed starting with the LL and ending with the SC do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)

CACHE operations that are local to the processor executing the LL/SC sequence will result in unpredictable behaviour of the SC if executed between the LL and SC, that is, they may cause the SC to fail where it could have succeeded. Non-local CACHE operations (address-type with coherent CCA) may cause an SC to fail on either the local processor or on the remote processor in multiprocessor or multi-threaded systems. This definition of the effects of CACHE operations is mandated if \( \text{Config5}_{\text{LLB}} = 1 \). If \( \text{Config5}_{\text{LLB}} = 0 \), then CACHE effects are implementation-dependent.

The following conditions must be true or the result of the SC is not predictable—the SC may fail or succeed (if \( \text{Config5}_{\text{LLB}} = 1 \), then either success or failure is mandated, else the result is UNPREDICTABLE):

• Execution of SC must have been preceded by execution of an LL instruction.

• An RMW sequence executed without intervening events that would cause the SC to fail must use the same address in the LL and SC. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the LL/SC semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location:

• **Uniprocessor atomicity:** To provide atomic RMW on a single processor, all accesses to the location must be made with memory access type of either cached noncoherent or cached coherent. All accesses must be to one or the other access type, and they may not be mixed.

• **MP atomicity:** To provide atomic RMW among multiple processors, all accesses to the location must be made with a memory access type of cached coherent.

• **I/O System:** To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with a memory access type of cached coherent. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.

**Restrictions:**

The addressed location must have a memory access type of cached noncoherent or cached coherent; if it does not, the result is UNPREDICTABLE.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

**Availability and Compatibility**

This instruction has been recoded for Release 6.
**Operation:**

\[ vAddr \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}] \]

if \( vAddr_{1..0} \neq 0^2 \) then
  \( \text{SignalException(AddressError)} \)
endif

\( (pAddr, \text{CCA}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}, \text{STORE}) \)

dataword \( \leftarrow \text{GPR}[rt] \)

if \( \text{LLbit} \) then
  \( \text{StoreMemory}(\text{CCA}, \text{WORD}, \text{dataword}, pAddr, vAddr, \text{DATA}) \)
endif

\( \text{GPR}[rt] \leftarrow 0^{31} || \text{LLbit} \)

\( \text{LLbit} \leftarrow 0 \) // if \( \text{Config5}_5\_\text{LLB}=1 \), \( \text{SC} \) always clears \( \text{LLbit} \) regardless of address match.

**Exceptions:**

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch

**Programming Notes:**

\( \text{LL} \) and \( \text{SC} \) are used to atomically update memory locations, as shown below.

```
L1:
  \text{LL} \quad T1, (T0) \quad \# \text{load \ counter}
  \text{ADDI} \ T2, T1, 1 \quad \# \text{increment}
  \text{SC} \quad T2, (T0) \quad \# \text{try \ to \ store, \ checking \ for \ atomicity}
  \text{BEQ} \ T2, 0, L1 \quad \# \text{if \ not \ atomic \ (0), \ try \ again}
  \text{NOP} \quad \# \text{branch\-delay \ slot}
```

Exceptions between the \( \text{LL} \) and \( \text{SC} \) cause \( \text{SC} \) to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.

\( \text{LL} \) and \( \text{SC} \) function on a single processor for \textit{cached noncoherent} memory so that parallel programs can be run on uniprocessor systems that do not support \textit{cached coherent} memory access types.

As shown in the instruction drawing above, Release 6 implements a 9-bit offset, whereas all release levels lower than Release 6 of the MIPS architecture implement a 16-bit offset.
Format:  \texttt{SCE \text{rt}, \text{offset}(\text{base})}

Purpose:  Store Conditional Word EVA

To store a word to user mode virtual memory while operating in kernel mode to complete an atomic read-modify-write.

Description:  

\begin{verbatim}
if atomic_update then memory[\text{GPR[base] + offset}] \leftarrow \text{GPR[rt]}, \text{GPR[rt]} \leftarrow 1 \text{ else GPR[rt]} \leftarrow 0
\end{verbatim}

The \texttt{LL} and \texttt{SC} instructions provide primitives to implement atomic read-modify-write (RMW) operations for synchronizable memory locations.

The 32-bit word in \texttt{GPR rt} is conditionally stored in memory at the location specified by the aligned effective address. The 9-bit signed \texttt{offset} is added to the contents of \texttt{GPR base} to form an effective address.

The \texttt{SCE} completes the RMW sequence begun by the preceding \texttt{LLE} instruction executed on the processor. To complete the RMW sequence atomically, the following occurs:

- The 32-bit word of \texttt{GPR rt} is stored to memory at the location specified by the aligned effective address.
- A 1, indicating success, is written into \texttt{GPR rt}.

Otherwise, memory is not modified and a 0, indicating failure, is written into \texttt{GPR rt}.

If either of the following events occurs between the execution of \texttt{LL} and \texttt{SC}, the \texttt{SC} fails:

- A coherent store is completed by another processor or coherent I/O module into the block of synchronizable physical memory containing the word. The size and alignment of the block is implementation dependent, but it is at least one word and at most the minimum page size.
- An \texttt{ERET} instruction is executed.

If either of the following events occurs between the execution of \texttt{LLE} and \texttt{SCE}, the \texttt{SCE} may succeed or it may fail; the success or failure is not predictable. Portable programs should not cause one of these events.

- A memory access instruction (load, store, or prefetch) is executed on the processor executing the \texttt{LLE}/\texttt{SCE}.
- The instructions executed starting with the \texttt{LLE} and ending with the \texttt{SCE} do not lie in a 2048-byte contiguous region of virtual memory. (The region does not have to be aligned, other than the alignment required for instruction words.)

The following conditions must be true or the result of the \texttt{SCE} is \textit{UNPREDICTABLE}:

- Execution of \texttt{SCE} must have been preceded by execution of an \texttt{LLE} instruction.
- An RMW sequence executed without intervening events that would cause the \texttt{SCE} to fail must use the same address in the \texttt{LLE} and \texttt{SCE}. The address is the same if the virtual address, physical address, and cacheability & coherency attribute are identical.

Atomic RMW is provided only for synchronizable memory locations. A synchronizable memory location is one that is associated with the state and logic necessary to implement the \texttt{LLE}/\texttt{SCE} semantics. Whether a memory location is synchronizable depends on the processor and system configurations, and on the memory access type used for the location.
• **Uniprocessor atomicity:** To provide atomic RMW on a single processor, all accesses to the location must be made with memory access type of either *cached non coherent* or *cached coherent*. All accesses must be to one or the other access type, and they may not be mixed.

• **MP atomicity:** To provide atomic RMW among multiple processors, all accesses to the location must be made with a memory access type of *cached coherent*.

• **I/O System:** To provide atomic RMW with a coherent I/O system, all accesses to the location must be made with a memory access type of *cached coherent*. If the I/O system does not use coherent memory operations, then atomic RMW cannot be provided with respect to the I/O reads and writes.

The SCE instruction functions the same as the SC instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the $Config_{EVA}^5$ field being set to 1.

**Restrictions:**

The addressed location must have a memory access type of *cached non coherent* or *cached coherent*; if it does not, the result is *UNPREDICTABLE*.

The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.

Providing misaligned support for Release 6 is not a requirement for this instruction.

**Operation:**

\[
\text{vAddr} \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}]
\]

if $\text{vAddr}_{1..0} \neq 0^2$ then
  \text{SignalException(AddressError)}
endif

(pAddr, CCA) \leftarrow \text{AddressTranslation}(\text{vAddr, DATA, STORE})

dataword \leftarrow \text{GPR}[rt]

if LLbit then
  \text{StoreMemory}(\text{CCA, WORD, dataword, pAddr, vAddr, DATA})
endif

GPR[rt] \leftarrow 0^{31} || \text{LLbit}

**Exceptions:**

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch, Reserved Instruction, Coprocessor Unusable

**Programming Notes:**

LLE and SCE are used to atomically update memory locations, as shown below.

L1:

```
LLE T1, (T0)  # load counter
ADDI T2, T1, 1  # increment
SCE T2, (T0)  # try to store, checking for atomicity
BEQ T2, 0, L1  # if not atomic (0), try again
NOP  # branch-delay slot
```

Exceptions between the LLE and SCE cause SCE to fail, so persistent exceptions must be avoided. Examples are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance.
LLE and SCE function on a single processor for *cached non coherent* memory so that parallel programs can be run on uniprocessor systems that do not support *cached coherent* memory access types.
Purpose: Store Conditional Extended {Word, Word EVA}

Store to memory as part of an extended LLX/LL-SCX/SC sequence; word, or word EVA

Description:
The LLX/SCX family of instructions (SCX, SCXE) extends the MIPS LL/SC mechanism for performing atomic read-modify-writes to permit more than one memory location to be written atomically. The memory locations are constrained to be aligned, adjacent and within both the same synchronization block and the same cache line (if applicable).

LL-SC code sequences in general, and LLX/LL-SCX/SC in particular, provide atomicity if the computer system can guarantee that, if the SC passes, then atomicity has not been violated by transactions between the LL and SC. It should also guarantee eventual success, i.e. that failures will not persist forever.

The signed offset is added to the contents of GPR base to form an effective address. This address must be naturally aligned.

An SCX/SCXE instruction (at PC) must be followed by a matching SC/SCE instruction (at PC+4).

For SCX and SCXE the 32-bit word in GPR rt is concatenated with the 32-bit word of the following SC instruction’s GPR rt to form the 64-bit doubleword data to be conditionally stored.

The SCX/SC family instruction double width store data is performed if it can be guaranteed that there has been no violation of atomicity since the preceding LLX/LL family instruction. If such atomicity cannot be guaranteed, then the conditional store fails. A value is written into the rt register of the SC family instruction that follows the SCX family instruction: 0 if failure, 1 if success.

If the following SC-family (SC, SCE) instruction succeeds, then the SCX-family instruction (SCX, SCXE) also succeeds, and the store data from both the SCX and SCE are concatenated and committed to memory atomically as a double width transaction. If the SC fails, then the SCX also fails, and neither commit to memory. The SC instruction at PC+4 modifies a GPR to indicate success or failure of both the SC and SCX.

In particular, the SCX/SCXE and SC/SCE data addresses must be adjacent, within the same synchronization block, non-overlapping, and naturally-aligned appropriately (for a 64-bit access for SCX/SC and SCXE/SCE). The SC/SCE data address must be the address of the lowest byte in the double width memory access.

If the PC and PC+4 instruction encodings do not match, a Reserved Instruction exception is signaled. If the effective addresses of SCX and SC or SCXE and SCE are not 32-bit word aligned separately and 64-bit doubleword aligned together, then Address Error is signaled. See Restrictions section for a full description of match requirements, and special case for SDBBP and BREAK breakpoint instructions.
Restrictions:

The following restrictions apply to load-linked and store-conditional extended instructions in the LLX/SCX instruction family:

Coprocessor 0’s Cause register bit BD is extended to indicate exceptions related to the next instruction after the LLX/SCX-family instruction. Pseudocode indicates what value Cause.BD should be set to via comments such as SignalException(AddressError) //BD=1/. Similarly, the status register BadInstrP is extended to hold the LLX/SCX-family instruction if an exception is signaled for the next instruction, with BD=1.

An LLX/SCX family instruction must not be placed in a branch delay slot or compact branch forbidden slot: if this rule is violated, a Reserved Instruction exception will be signaled (with EPC=PC of branch, BD=1).

An LLX/SCX family instruction must be followed by a matching LL/SC-family instruction: An SCX instruction must be followed by an SC instruction of the same type. Similarly for LLX/LL, LLXE/LLE, and SCXE/SCE. If the following instruction does not match, a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=1).

Except: An LLX/SCX instruction may be followed by one of the breakpoint instructions BREAK or SDBBP, in which case the appropriate breakpoint exception takes priority over the Reserved Instruction exception. The BREAK exception will be signaled with EPC=PC of the LLX/SCX family instruction and BD=1. The debug exception caused by such an SDBBP will be reported with DEPC=PC of the LLX/SCX family instruction and DBD=1.

The base field must be the same in an LLX/SCX family instruction and the following, matching, LL/SC-family instruction: If the following instruction does not match, a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=1).

The base and rt fields of the LLX family instruction must not be the same. If they are the same a Reserved Instruction exception must be signaled (with EPC=PC of the LLX/SCX family instruction, BD=0).

The LLX/SCX and following LL/SC family instructions must match in their offset field: Given matching in instruction type and base, the difference between the offset fields of the instruction at PC and the instruction at PC+4 should be the data size, 4 for LLX/LL, LLXE/LLE, and SCX/SCXE. Programmers should follow this rule in coding. However, implementations do not need to explicitly check this rule, since it is implied by other rules. TBD

Natural Alignment: The effective address must be naturally aligned for any LLX/SCX family instruction; if not naturally aligned, an Address Error exception is signaled. I.e. for LLX, LLXE, SCX and SCXE, if the two least significant bits of the effective address are not both zero, an Address Error exception is signaled. Such an Address Error exception is signaled with EPC=PC of the LLX/SCX family instruction, BD=0.

Release 6 requires systems to provide support for misaligned memory accesses for all ordinary memory reference instructions such as LW (Load Word). However, this instruction is a special memory reference instruction for which misaligned support is NOT provided, and for which signaling an exception (AddressError) on a misaligned access is required.

Double Width Alignment: In addition to natural alignment, the memory bytes written by the LLX/SCX family instruction and the following LL/SC family instruction must be adjacent, non-overlapping, and must have the alignment natural for double the memory access size: The lowest byte address in an LLX/LL, LLXE/LLE, SCX/SC or SCXE/SCE pair must be 8-byte aligned. It is required that the LL/SC family instruction byte address be lower than that of the LLX/SCX family instruction. i.e. that the LL/SC family instruction in an LLX/LL or SCX/SC family instruction pair must be naturally aligned for double the memory access width.

The double width alignment condition must be satisfied for both virtual and physical addresses. If this condition is not met, then an Address Error exception is signaled, with EPC = PC of first instruction, and BD=1. This condition is guaranteed to be met in the physical address if met in the virtual address and if the SCX and SC translations are consistent.

Exception Priority: although LLX and LL may complete execution together, all exceptions for an LLX instruction (at PC) must be signaled, with EPC=PC and BD=0, before any exceptions are signaled, with EPC=PC and BD=1, for the
Exceptions relating to an LLX/SCX family instruction are reported with $EPC = PC$ of the LLX/SCX family instruction, and $BD = 0$.

Exceptions relating to interaction between an LLX/SCX family instruction and the following instruction are reported with $EPC = PC$ of LLX/SCX instruction and $BD = 1$.

Debug single step exceptions are reported with $DEPC = PC$ of the LLX/SCX family instruction, and $BD = 0$. No debug single step exception will be reported for the SC instruction of an SCX/SC pair: For the purposes of debug single stepping, the SCX/SC pair is atomic. Similarly for LLX/LL, LLE/LLE, and SCXE/SCE pairs of instructions.

Exceptions related to the SCX/SC family instruction pair before following instruction cancel SCX but do not clear $LLbit$: if an exception or interrupt occurs at or after the SCX-family instruction and before or at the next instruction, the SCX is canceled, but $LLbit$ is not cleared. I.e. the LLX/LL-SCX/SC atomic is not necessarily forced to fail. Exceptions are therefore reported with $EPC = PC$ of SCX, and $BD = 0$ or 1 as appropriate. Exception handling software should return (ERET or ERETNC) to the PC of the SCX instruction, re-executing the SCX/SC pair. Adjusting EPC or DEPC and returning to the SC instruction without re-executing the SCX instruction will result in incorrect behavior.

For exceptions related to an LLX/LL family instruction pair:

- No memory access is performed.
- Neither target register of the LLX/LL family instruction pair is updated.
- $LLbit$ is not set.
- $EPC$ (or $DEPC$) is set to the PC of the LLX family instruction.
- Status.BD is set to 0 or 1 as appropriate, as described below.

Exception handling software should return (ERET or ERETNC) to the PC of the LLX instruction, re-executing the LLX/LL pair. Adjusting EPC or DEPC and returning to the LL instruction without re-executing the LLX instruction will result in incorrect behavior.

LLX/LL and SCX/SC matching: the LL-family instruction, the SC-family instruction, and the optional LLX/SCX-family instructions in a MIPS atomic sequence should match. Portable software should not rely on mismatching LLX/LL/SCX/SC to complete successfully, nor to fail. Implementations are permitted to cause the SC to fail if the LL/SCX/SC do not match, but are not required to do so. Matching LLX/LL/SCX/SC should be of the same instruction type (word (LLX/LL/SCX/SC), or word EVA (LLXE/LLE/SCXE/SCE)). Table 5.5 summarizes these rules for LL/SC family instructions.

---

1. Terminology: “Should” is a recommendation. Implementations are encouraged to provide should behavior, but are not required to do so. Portable software should not rely on such behavior, but is encouraged to follow should rules. “Must” behavior are requirements: Implementations are required to implement such behavior, and software that violates such requirements will fail, typically with a exception such as a Reserved Instruction exception or Address Error.
The LL and SC virtual and physical addresses should match completely. However, the memory addressing mode - the and offset - need not match between LLX/LL and SCX/SC. All physical address bits in the LL physical address and the corresponding bits in the SC physical address should match to the alignment required for the size of the LL/SC family instructions or LLX/LL and SCX/SC family instruction pairs. This applies to atomic code sequences created via LL/SC, LLE/SCE, and their corresponding extended versions LLX/LL-SCX/SC, LLXE/LLE-SCXE/SC.

Translation Consistency: It is required that LL and SC match addresses, and that LLX/SC family instructions lie in the same synchronization block. Even if all virtual addresses match, on a processor with hardware page table walking it is possible for physical address translation to change between LL and SC, and between the execution phase of LLX, LL, SCX and SC family instructions. e.g., between the time that SCX is first executed, and the time that the SCX store data is committed along with SC. The SCX/SC must only succeed if the SCX and SC physical addresses are consistent. If the address translations are inconsistent, implementations are required to fail the SCX/SC pair, or to retry them in a manner transparent to software. Similarly for LLX/LL pairs. Similarly for other information obtained from translation, such as the CCA (Cacheability and Coherence Attribute).

It is required that LLX/LL or SCX/SC instruction pairs act as if only a single address translation is done for the first instruction in the pair, and that translation is used for the second instruction, changing only lower address bits 3:0. Similarly for LLX/LL, LLXE/LLE, and SCX/SCXE instruction pairs.

Synchronizable memory type (CCA): The addressed location must be synchronizable by all processors and I/O devices sharing the location; if it is not, the result is UNPREDICTABLE. Which storage is synchronizable is a function of both CPU and system implementations. See the documentation of the SC instruction for the formal definition.

2. Note that the implementation dependent $LLAddr$ register (Load Linked Address (CP0 Register 17, Select 0)) does not hold physical address bits 0 to 4 as of Release 5 or after. The requirement all LL and SC address bits match therefore involves comparing LL address bits not stored in any software accessible register state.
LLX/LL need not be writeable: The addressed location need not be writable for LL or LLX family instructions. If it is not writable a subsequent SC or SCX family instruction will fault, but LL or LLX family instructions may be used in situations that do not generate such faults, e.g., the PAUSE instruction.

LLX/LL and PAUSE: If an LLX/LL family instruction pair is followed by a PAUSE instruction, the PAUSE instruction must terminate if it cannot be guaranteed that any of the memory bytes address by the LLX/LL instruction pair have not been modified.

Memory Ordering of LL/SC family instructions (included LLX/SCX family instructions):

- An SCX/SC family instruction pair is executed atomically as seen by the processor executing these instructions and by other processors. I.e. the SC will not be seen to be executed before the SCX, and no other instruction, processor or device, can observe the SCX store without also being able to observe the SC store, or vice versa.

- LLX/LL family instruction pairs are not required to perform a double width atomic read of memory, but violations of atomicity will be detected, clearing LLbit, so that the matching SC will fail.\(^3\)

- Atomicity of LLX/LL family instruction pairs may be provided by MIPS CPU implementations as and if required by certain system configurations for uncached memory.\(^4\)

- All LL/SC family instructions, including LLX/LL and SCX/SC family instruction pairs, are ordered by their implicit dependency on LLbit: e.g., a later LL will not be executed before an earlier SC from the same processor, even if their data memory addresses do not overlap.

- In the MIPS memory consistency architecture, LL/SC family instructions (including LLX/SCX family instructions) are not ordered with respect to other memory accesses from the same processor, except when their addresses overlap, or explicit SYNC instructions lie between them. For example, a later LL can be executed before an earlier SW, or vice versa.\(^5\)

Availability and Compatibility:

The LLX/SCX family of instructions is introduced by and required as of the MIPS Release 6 architecture and the microMIPS Release 6 architecture.

LLX and SCX are introduced by and required as of MIPS32 Release 6. SCXE is introduced by and required as of MIPS32 Release 6 when EVA is also implemented, which is indicated by bit EVA of coprocessor 0’s Config5 register.

Operation:

\[
/* \text{pseudocode for SCX and for the following instruction;} \\
* \text{this replaces the following instruction pseudocode.} \\
\]

3. For example, an implementation of LLX/LL in cached memory may have LLX set LLaddr and then perform the LLX word load, and then may execute LL separately. A separate processor may perform an atomic doubleword write that changes both the LLX and LL memory locations, such that the values returned by LLX and LL may not have both been simultaneously present in memory. However, if atomicity is violated in this way, then LLbit must be cleared. The LL instruction of an LLX/LL instruction pair will not set LLbit if it has been cleared after the LLX instruction. Overall, LLX/LL family instruction pairs are not required to be atomic; whereas SCX/SC family instruction pairs are required to be atomic, if performed. However, certain system configurations, for uncached memory in particular, require that the LLX/LL family instruction pair be performed atomically via a single bus transaction.

4. MIPS recommends that implementations perform a double width atomic read memory access for LLX/LL family instruction pairs, for cached as well as uncached memory, but does not require this. Portable software should not assume that an LLX/LL family instruction pair to detect possible violations of atomicity.

5. Note that this applies also to ordinary load instructions lying between LL and SC, inside the atomic RMW sequence.
* this_instruction = SCX instruction at PC during instruction time I
next_instruction = instruction at PC+4 during instruction time I
* = instruction at PC during instruction time I+1
* = SC, or BREAK or SDBBP, else invalid
* 'SCX' and 'SC' are generic, applicable to SCX-family and SC-family.
* 
* All exceptions are signaled with EPC or DEPC = PC of SCX instruction.
* All exceptions in instruction time I are signaled with BD=0.
* All exceptions in instruction time I+1 are signaled with BD=1.
*/

I: /* SCX-only execution in instruction time I */
/* perform address calculation and translation and SCX-only checks. */
successful_so_far  1
if this_instruction is SCX then
  size  4
else if this_instruction is SCXE then
  EVA_Checks() /*BD=0*/
  size  4
else
  assert(IMPOSSIBLE)
endif

scx_va  GPR[this_instruction.base] + sign_extend( this_instruction.offset )
if scx_va & (size-1) ≠ 0 then SignalException(AddressError) /*BD=0*/ endif

(scx_pa,scx_cca)  AddressTranslation( scx_va, DATA, STORE ) /*BD=0*/

scx_store_data  GPR[this_instruction.rt]
/* complete SCX execution in instruction time I+1 */

I+1:
/* SCX execution time I+1 and next_instruction execution time I combined */
/* All exceptions in instruction time I+1 are signaled with BD=1. */

LLX_SCX_family_common_code{
  /*inputs:*/ this_instruction, scx_pa, scx_cca, size,
  /*returns:*/ next_instruction, sc_va, sc_pa, sc_cca
}

sc_store_data  GPR[next_instruction.rt]

store_data_2xwide  (scx_store_data << (size*8)) || sc_store_data
/* Not shown: byte swapping default Little Endian to BigEndian, if needed */

/* Required check that LL and SC physical addresses match (all bits) */
/* Note that LLAddr CP0 register may not hold full LL physical address */
if sc_pa[i] ≠ LL physical address bit i for any bit i
  then successful_so_far  0 endif

/* Fundamental LLBit check for LL/SCX/SC */
if successful_so_far and LLbit = 1
  then /* Optionally check that LL matches SCX/SC - opcode, size, etc. */
StoreMemory( CCA, 2*size, store_data_2xwide, sc_pa, sc_va, DATA )
scx_and_sc_successful ← 1
else
    scx_and_sc_successful ← 0
endif

GPR[next_instruction.rt] ← scx_and_sc_successful
LLbit ← 0

/* end of combined SCX / SC pseudocode */

where /* helper function */

function EVA_checks
    if (Config5EVA=0) then SignalException(ReservedInstruction) endif
    if !IsCoprocessorEnabled(0)
        then SignalException(CoprocessorUnusable, 0) endif
    AM = SegmentAM(address) /* TBD: bug in SCE pseudocode */
    if (AM != UUSK && AM != MUSK && AM != MUSUK)
        then SignalException(AddressError) endif
end function

function LLX_SCX_family_common_code(
    /*inputs: */ this_instruction, this_pa, this_cca, size,
    /*outputs:*/ next_instruction, next_va, next_pa, next_cca
)
/* begin function */

if next_instruction is BREAK or SDBBP then
    /* Execute BREAK or SDBBP in normal I+1 manner,
    * as if in a branch delay slot or compact branch forbidden slot.
    * signaling appropriate exception */
endif

/* next_instruction must be matching non-extended LL/SC family
 * - this pseudocode replaces normal pseudocode for next instruction. */
if (this_instruction is LLX and next_instruction is not LL)
or (this_instruction is LLXE and next_instruction is not LLE)
or (this_instruction is SCX and next_instruction is not SC)
or (this_instruction is SCXE and next_instruction is not SCE)
then
    SignalException(ReservedInstruction) /*BD=1*/
endif
/* next instruction is non-extended LL/SC family: consistency checks */

/* Check base register field for consistency */
if this_instruction.base ≠ next_instruction.base
    then SignalException(ReservedInstruction) /*BD=1*/ endif

/* Address computation for LL/SC-family next_instruction */
next_va ← GPR[next_instruction.base] + sign_extend( next_instruction.offset )

/* LL/SC following LLX/SCX virtual address must be doublewidth aligned
if next_va & (size*2-1) ≠ 0
    then SignalException(AddressError) /*BD=1*/ endif

/* LLX/SCX and LL/SC address virtual addresses must be adjacent
 * (adjacent, nonoverlapping, doubleword aligned) */
if this_va&(2*size-1) - next_va&(2*size-1) ≠ size
then SignalException(AddressError) /*BD=1*/ endif
/* assert( this_va-next_va != size ) */

/* Check offsets for consistency */
/* assert( this_instruction.offset - next_instruction.offset = size ) */
/* offset check not needed - other constraints ensure */

/* LL/SC virtual to physical address translation */
/* Reuse the translation of the first instruction to ensure consistency */
/* Note: after all RI and AE exceptions, for standard exception priority */
next_pa ← this_pa & (2*size-1)
/* given alignment constraints, */
next_pa = this_pa - size = this_pa & (2*size-1)
next_cca ← this_cca

end function /* LLX_SCX_family_common_code */

Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Reserved Instruction

Programming Notes:
LL/SC (and LLX/SCX) code sequences function on multiprocessor systems for cached coherent memory.
LL/SC (and LLX/SCX) code sequences function on multiprocessor systems for uncached memory if the CPU supports bus transactions visible to external hardware so that such external hardware can guarantee that atomicity has not been violated. Such support is implementation dependent.
LL/SC (and LLX/SCX) code sequences function on a single processor for cached noncoherent memory so that parallel programs can be run on uniprocessor systems that do not support cached coherent memory access types, and so that violations of atomicity caused by exception handling can be detected.
LL/SC (and LLX/SCX) code sequences function on a single processor for uncached memory so that parallel programs can be run on uniprocessor systems that do not support cached memory access types, and so that violations of atomicity caused by exception handling can be detected.

Example: MIPS32 64-bit compare and swap using LLX/LL-SCX/SC code sequence:

```
cas2x32_retry_loop:
    # (t0,t1) is value to be compared against value in memory at (tA,tA+4)
    # (t2,t3) is value to be written
    MOV T2, T2’ # add t2’, r0, t2   # copy because SC destroys store data
    LLX T5, (TA)4                    # load hi
    LL T4, (TA)                      # load lo
    BNEC T1, T5, cas2x32_fail       # compare hi
    NOP                              # CTI not allowed in forbidden slot
    BNEC T0, T4, cas2x32_fail       # compare lo
    NOP                              # SCX not allowed in forbidden slot
    SCX T3, (TA)4                    # store-conditional hi
    SC T2’, (TA)                     # store-conditional lo, checking for atomicity
    BEQZC T2’, cas2x32_retry_loop   # if not atomic (0), try again

cas2x32_fail:
```

Exceptions between the LLX/LL and SCX/SC may cause the SC to fail, so persistent exceptions must be avoided. Some examples of these are arithmetic operations that trap, system calls, and floating point operations that trap or require software emulation assistance. However, exceptions per se do not necessarily cause failure: the ERETNC
instruction allows an exception handler to complete without clearing LLbit.

**Example: MIPS32 64-bit atomic store using LLX/LL-SCX/SC code sequence:**

```assembly
# R1 = 64-bit aligned address, R2=lo 32 bits, R3=high 32 bits
st2x32_retry_loop:
  LLX R5, (R1)4        # throwing LLX/LL load data away
  LL R5, (R1)
  MOV R2, R2’         # copy store data because SCX destroys
  SCX R3, (R1)4       # store-conditional hi
  SC R2’, (R1)        # store-conditional lo, checking for atomicity
  BEQZC R2’, st2x32_retry_loop # if not atomic (0), try again
# if we get here, then 64-bit store accomplished
```

**Example: MIPS32 64-bit atomic load using LLX/SC:**

```assembly
# R1 = 64-bit aligned address, R2 and R3 will receive values loaded
ld2x32_retry_loop:
  LLX R3, (R1)4
  LL R2, (R1)
  MOV R2, R2’
  SCX R3, (R1)4       # store value read back
  SC R2’, (R1)        # store-conditional lo, checking for atomicity
  BEQZC R4, ld2x32_retry_loop # if not atomic (0), try again
# if we get here, then 64-bit load accomplished
```

Note that an SCX/SC instruction pair is required to test atomicity. Because atomicity cannot be tested without doing at least a SC store conditional instruction, this instruction sequence cannot be used to perform double width atomic reads from memory that the reader cannot write.

**Example: MIPS32 64-bit atomic load using LL/SC without LLX/SCX:**

```assembly
# R1 = 64-bit aligned address, R2 and R3 will receive values loaded
ld2x32_retry_loop:
  LL R2, (R12)
  SYNC
  LW R3, (R13)
  MOV R2, R2’
  SYNC
  SC R2’, (R12)# store-conditional lo, checking for atomicity
  BEQZC R4, ld2x32_retry_loop # if not atomic (0), try again
# if we get here, then 64-bit load accomplished
```

Note that the load of (R2,R3) above is atomic in the sense that if the SC succeeds, then at some point between the LL and SC the values (R2,R3) were both present in memory at their corresponding memory locations (R12,R13). If (R12,R13) lie in the same synchronization block, then they are both present in memory at the time of the SC. If (R12,R13) are not in the same synchronization block, then while they were both present in memory at some time between LL and SC, the value of R13, the location which is not monitored by LL/SC, may have changed by the time of the SC.

Note also that SYNC instructions are needed between the LL and the LW, and between the LW and the SC, to prevent reordering of these memory accesses. Because such SYNCs are expensive, MIPS recommends the LLX/LL-SCX/SC code sequence over the LL-SYNC-LW-SYNC-SC code sequence.

**Implementation Notes:**
The synchronization block of memory used for LL/SC is typically the largest cache line in use.

Implementations of LL/SC in general, and LLX/LL-SCX/SC in particular, provide atomicity if the computer system can guarantee that, if the SC passes, then atomicity has not been violated by transactions between the LL and SC. It
should also guarantee eventual success, i.e. that failures will not persist forever.

Correct implementation depends on the system, both the CPU and the external memory subsystem. For example, the CPU may implement LL/SC correctly for cacheable coherent memory, but if the I/O subsystem can write to memory without being exposed to the cache coherency mechanism, LL/SC will not detect violations of atomicity caused by such non-coherent I/O accesses. Similarly, the CPU may implement uncached memory requests for LL and SC, but if the external memory subsystem performs an SC request and returns success without guaranteeing atomicity, LL/SC may not provide the expected guarantee of atomicity.

If it is not possible to guarantee such atomicity then it is recommended that implementations cause the SC to fail, returning the failure code in GPR[rt] without performing the store.

LL/SC and LLX/LL-SCX/SC code sequences should only be used for the following memory types (Cache and Coherency Attributes (CCAs)):

- **cached coherent**: if the cache protocol can guarantee that atomicity has not been violated by transactions between the LL and SC.

- **uncached**:
  - for uncached memory that is memory-like, i.e. which does not have memory-mapped I/O side effects
  - if the CPU supports bus transactions visible to external hardware so that such external hardware can guarantee that atomicity has not been violated by transactions between the LL and SC, and can signal success or failure by replying to the uncached bus transaction triggered by the SC-family instruction.
  - or if the system configuration is such that the CPU can observe all memory transactions that would violate atomicity

- **cached noncoherent or uncached** (no side effects): on uniprocessor systems lacking cache coherence or external hardware that can make atomicity assertions, LL-SC and LLX/LL-SCX/SC code sequences can be used to detect violations of atomicity caused by interrupt handling

- for other memory types: it may be **UNPREDICTABLE** whether the SC and possible SCX stores are performed, and whether the SC reports success or failure.
SDBBP Software Debug Breakpoint

---

**Format:**
SDBBP code

**Purpose:**
Software Debug Breakpoint

To cause a debug breakpoint exception

**Description:**
This instruction causes a debug exception, passing control to the debug exception handler. If the processor is executing in Debug Mode when the SDBBP instruction is executed, the exception is a Debug Mode Exception, which sets the DebugDExcCode field to the value 0x9 (Bp). The code field can be used for passing information to the debug exception handler, and is retrieved by the debug exception handler only by loading the contents of the memory word containing the instruction, using the DEPC register. The CODE field is not used in any way by the hardware.

**Restrictions:**

**Availability and Compatibility:**
This instruction has been recoded for Release 6.

**Operation:**

```plaintext
if Config5.SBRI=1 then /* SBRI is a MIPS Release 6 feature */
  SignalException(ReservedInstruction) endif
if DebugDM = 1 then SignalDebugModeBreakpointException() endif // nested
SignalDebugBreakpointException() // normal
```

**Exceptions:**
Debug Breakpoint Exception
Debug Mode Breakpoint Exception

**Programming Notes:**
Release 6 changes the instruction encoding. The primary opcode changes from SPECIAL2 to SPECIAL. Also it defines a different function field value for SDBBP.
**Format:** SDC1 ft, offset(base)

**Purpose:** Store Doubleword from Floating Point

To store a doubleword from an FPR to memory.

**Description:**

memory[base + offset] ← FPR[ft]

The 64-bit doubleword in FPR ft is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**

Pre-Release 6: An Address Error exception occurs if EffectiveAddress2..0 ≠ 0 (not doubleword-aligned).

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

```plaintext
vAddr ← sign_extend(offset) + GPR[base]  
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE)  
datadoubleword ← ValueFPR( ft, UNINTERPRETED_DOUBLEWORD)  
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 0b2)  
StoreMemory( CCA, WORD, datadoubleword31..0, pAddr, vAddr, DATA)  
paddr ← paddr xor 0b100  
StoreMemory( CCA, WORD, datadoubleword63..32, pAddr, vAddr+4, DATA)
```

**Exceptions:**

Coprocessor Usable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
### SDC2

**Pre-Release 6**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SDC2</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>111110</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Release 6**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>SDC2</td>
<td>rt</td>
<td>base</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>01111</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** `SDC2 rt, offset(base)`

**Purpose:** Store Doubleword from Coprocessor 2

To store a doubleword from a Coprocessor 2 register to memory.

**Description:**

\[
\text{memory}\left[\text{GPR[base]} + \text{offset}\right] \leftarrow \text{CPR}[2, \text{rt}, 0]
\]

The 64-bit doubleword in Coprocessor 2 register `rt` is stored in memory at the location specified by the aligned effective address. The 16-bit signed `offset` is added to the contents of GPR `base` to form the effective address.

**Restrictions:**

- **Pre-Release 6:** An Address Error exception occurs if `EffectiveAddress_{2..0} \neq 0` (not doubleword-aligned).
- Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.
- Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Availability and Compatibility:**

This instruction has been recoded for Release 6.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign extend}(\text{offset}) + \text{GPR[base]} \\
(p\text{Addr}, CCA) & \leftarrow \text{AddressTranslation}(\text{vAddr}, \text{DATA, STORE}) \\
\text{lsw} & \leftarrow \text{CPR}[2, \text{rt}, 0] \\
\text{msw} & \leftarrow \text{CPR}[2, \text{rt+1}, 0] \\
\text{paddr} & \leftarrow \text{paddr xor } ((\text{BigEndianCPU xor ReverseEndian}) \mid\mid 0^2) \\
\text{StoreMemory}(CCA, \text{WORD, lsw, pAddr, vAddr, DATA}) \\
\text{paddr} & \leftarrow \text{paddr xor } 0b100 \\
\text{StoreMemory}(CCA, \text{WORD, msw, pAddr, vAddr+4, DATA})
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch

**Programming Notes:**

As shown in the instruction drawing above, Release 6 implements an 11-bit offset, whereas all release levels lower than Release 6 of the MIPS architecture implement a 16-bit offset.
Store Doubleword Indexed from Floating Point

**Format:** SDXC1 fs, index(base)

**Purpose:** Store Doubleword Indexed from Floating Point
To store a doubleword from an FPR to memory (GPR+GPR addressing).

**Description:** memory[GPR[base] + GPR[index]] ← FPR[fs]
The 64-bit doubleword in FPR \( fs \) is stored in memory at the location specified by the aligned effective address. The contents of GPR index and GPR base are added to form the effective address.

**Restrictions:**
An Address Error exception occurs if EffectiveAddress\(_{2..0}\) ≠ 0 (not doubleword-aligned).

**Availability and Compatibility:**
This instruction has been removed in Release 6.

Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, these instructions are to be implemented if an FPU is present either in a 32-bit or 64-bit FPU or in a 32-bit or 64-bit FP Register Mode (\( FRI_{64} = 0 \) or 1, \( Status = 0 \) or 1).

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{GPR[base] + GPR[index]} \\
\text{if vAddr}\_2..0 & \neq 0^3 \text{ then} \\
& \quad \text{SignalException(AddressError)} \\
\text{endif} \\
(pAddr, CCA) & \leftarrow \text{AddressTranslation(vAddr, DATA, STORE)} \\
\text{datadoubleword} & \leftarrow \text{ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)} \\
\text{paddr} & \leftarrow \text{paddr xor (BigEndianCPU xor ReverseEndian) } || \ 0^2 \\
\text{StoreMemory(CCA, WORD, datadoubleword}_{31..0}, pAddr, vAddr, DATA) \\
\text{paddr} & \leftarrow \text{paddr xor 0b100} \\
\text{StoreMemory(CCA, WORD, datadoubleword}_{63..32}, pAddr, vAddr+4, DATA)
\end{align*}
\]

**Exceptions:**
TLB Refill, TLB Invalid, TLB Modified, Coprocessor Unusable, Address Error, Reserved Instruction, Watch.
SEB: Sign-Extend Byte

**Format:** SEB rd, rt

**Purpose:** Sign-Extend Byte

To sign-extend the least significant byte of GPR rt and store the value into GPR rd.

**Description:**

\[
\text{GPR[rd]} \leftarrow \text{SignExtend(GPR[rt]_{7..0})}
\]

The least significant byte from GPR rt is sign-extended and stored in GPR rd.

**Restrictions:**

Prior to architecture Release 2, this instruction resulted in a Reserved Instruction exception.

**Operation:**

\[
\text{GPR[rd]} \leftarrow \text{sign\_extend(GPR[rt]_{7..0})}
\]

**Exceptions:**

Reserved Instruction

**Programming Notes:**

For symmetry with the SEB and SEH instructions, you expect that there would be ZEB and ZEH instructions that zero-extend the source operand and expect that the SEW and ZEW instructions would exist to sign- or zero-extend a word to a doubleword. These instructions do not exist because there are functionally-equivalent instructions already in the instruction set. The following table shows the instructions providing the equivalent functions.

<table>
<thead>
<tr>
<th>Expected Instruction</th>
<th>Function</th>
<th>Equivalent Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>ZEB rx, ry</td>
<td>Zero-Extend Byte</td>
<td>ANDI rx, ry, 0xFF</td>
</tr>
<tr>
<td>ZEH rx, ry</td>
<td>Zero-Extend Halfword</td>
<td>ANDI rx, ry, 0xFFFF</td>
</tr>
</tbody>
</table>
Format: SEH rd, rt

Purpose: Sign-Extend Halfword

To sign-extend the least significant halfword of GPR rt and store the value into GPR rd.

Description: GPR[rd] ← SignExtend(GPR[rt]_{15..0})

The least significant halfword from GPR rt is sign-extended and stored in GPR rd.

Restrictions:

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

Operation:

GPR[rd] ← sign_extend(GPR[rt]_{15..0})

Exceptions:

Reserved Instruction

Programming Notes:

The SEH instruction can be used to convert two contiguous halfwords to sign-extended word values in three instructions. For example:

```
lw t0, 0(a1)   /* Read two contiguous halfwords */
seh t1, t0     /* t1 = lower halfword sign-extended to word */
sra t0, t0, 16 /* t0 = upper halfword sign-extended to word */
```

Zero-extended halfwords can be created by changing the SEH and SRA instructions to ANDI and SRL instructions, respectively.

For symmetry with the SEB and SEH instructions, you expect that there would be ZEB and ZEH instructions that zero-extend the source operand and expect that the SEW and ZEW instructions would exist to sign- or zero-extend a word to a doubleword. These instructions do not exist because there are functionally-equivalent instructions already in the instruction set. The following table shows the instructions providing the equivalent functions.

<table>
<thead>
<tr>
<th>Expected Instruction</th>
<th>Function</th>
<th>Equivalent Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>ZEB rx, ry</td>
<td>Zero-Extend Byte</td>
<td>ANDI rx, ry, 0xFF</td>
</tr>
<tr>
<td>ZEH rx, ry</td>
<td>Zero-Extend Halfword</td>
<td>ANDI rx, ry, 0xFFFF</td>
</tr>
</tbody>
</table>

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
**SEL.fmt**  
Select floating point values with FPR condition

Format:  

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>ft</th>
<th>fs</th>
<th>fd</th>
<th>SEL</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100001</td>
<td>S, D only</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>6</td>
</tr>
</tbody>
</table>

**Purpose:** Select floating point values with FPR condition

**Description:**  
\[ \text{FPR}[fd] \leftarrow \text{FPR}[fd].\text{bit}0 \ ? \ \text{FPR}[ft] : \text{FPR}[fs] \]

SEL.fmt is a select operation, with a condition input in FPR \( fd \), and 2 data inputs in FPRs \( ft \) and \( fs \).

- If the condition is true, the value of \( ft \) is written to \( fd \).
- If the condition is false, the value of \( fs \) is written to \( fd \).

The condition input is specified by FPR \( fd \), and is overwritten by the result. The condition is true only if bit 0 of the condition input FPR \( fd \) is set. Other bits are ignored.

This instruction has floating point formats S and D, but these specify only the width of the operands. SEL.S can be used for 32-bit W data, and SEL.D can be used for 64 bit L data.

This instruction does not cause data-dependent exceptions. It does not trap on NaNs, and the \( FCSR_{Cause} \) and \( FCSR_{Flags} \) fields are not modified.

**Restrictions:**

None

**Availability and Compatibility:**

SEL.fmt is introduced by and required as of MIPS32 Release 6.

**Special Considerations:**

Only formats S and D are valid. Other format values may be used to encode other instructions. Unused format encodings are required to signal the Reserved Instruction exception.

**Operation:**

\[
\begin{align*}
tmp & \leftarrow \text{ValueFPR}(fd, \text{UNINTERPRETED\_WORD}) \\
\text{cond} & \leftarrow \text{tmp}.\text{bit}0 \\
\text{if} \ \text{cond} & \ \text{then} \\
& \ \text{tmp} \leftarrow \text{ValueFPR}(ft, \text{fmt}) \\
\text{else} & \\
& \ \text{tmp} \leftarrow \text{ValueFPR}(fs, \text{fmt}) \\
\text{endif} \\
\text{StoreFPR}(fd, \text{fmt}, \text{tmp})
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

None
| SEL.fmt | Select floating point values with FPR condition |
SELEQZ SELNEZ

Select integer GPR value or zero

Format:

SELEQZ  rd,rs,rt

SELNEZ  rd,rs,rt

MIPS32 Release 6

Purpose: Select integer GPR value or zero

Description:


• SELEQZ is a select operation, with a condition input in GPR rt, one explicit data input in GPR rs, and implicit data input 0. The condition is true only if all bits in GPR rt are zero.

• SELNEZ is a select operation, with a condition input in GPR rt, one explicit data input in GPR rs, and implicit data input 0. The condition is true only if any bit in GPR rt is nonzero.

If the condition is true, the value of rs is written to rd.

If the condition is false, the zero written to rd.

This instruction operates on all GPRLEN bits of the CPU registers, that is, all 32 bits on a 32-bit CPU, and all 64 bits on a 64-bit CPU. All GPRLEN bits of rt are tested.

Restrictions:

None

Availability and Compatibility:

These instructions are introduced by and required as of MIPS32 Release 6.

Special Considerations:

None

Operation:

SELEQZ: cond ← GPR[rt] ≠ 0

SELNEZ: cond ← GPR[rt] = 0

if cond then
    tmp ← GPR[rs]
else
    tmp ← 0
endif

GPR[rd] ← tmp

Exceptions:

None
Programming Note:

Release 6 removes the Pre-Release 6 instructions MOVZ and MOVN:

MOVZ: if GPR[rt] = 0 then GPR[rd] ← GPR[rs]
MOVN: if GPR[rt] ≠ 0 then GPR[rd] ← GPR[rs]

MOVZ can be emulated using Release 6 instructions as follows:

SELEQZ at, rs, rt
SELNEZ rd, rd, rt
OR rd, rd, at

Similarly MOVN:

SELNEZ at, rs, rt
SELEQZ rd, rd, rt
OR rd, rd, at

The more general select operation requires 4 registers (1 output + 3 inputs (1 condition + 2 data)) and can be expressed:

rD ← if rC then rA else rB

The more general select can be created using Release 6 instructions as follows:

SELNEZ at, rB, rC
SELNEZ rD, rA, rC
OR rD, rA, at
SELEQZ.fmt SELNEQZ.fmt

Select floating point value or zero with FPR condition.

Format:

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>ft</th>
<th>fs</th>
<th>fd</th>
<th>SELEQZ</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>S, D only</td>
<td>SELEQZ.S fd,fs,ft</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>SELNEQZ.D fd,fs,ft</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>SELNEQZ.S fd,fs,ft</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>SELNEQZ.D fd,fs,ft</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Purpose: Select floating point value or zero with FPR condition.

Description:

- **SELEQZ.fmt**: 
  
  \[ \text{SELEQZ} \text{.fmt} \text{: FPR}[fd] \rightarrow \text{FPR}[ft].\text{bit0} ? 0 : \text{FPR}[fs] \]

- **SELNEQZ.fmt**: 
  
  \[ \text{SELNEQZ} \text{.fmt} \text{: FPR}[fd] \rightarrow \text{FPR}[ft].\text{bit0} ? \text{FPR}[fs] : 0 \]

- **SELEQZ.fmt** is a select operation, with a condition input in FPR \( ft \), one explicit data input in FPR \( fs \), and implicit data input 0. The condition is true only if bit 0 of FPR \( ft \) is zero.

- **SELNEQZ.fmt** is a select operation, with a condition input in FPR \( ft \), one explicit data input in FPR \( fs \), and implicit data input 0. The condition is true only if bit 0 of FPR \( ft \) is nonzero.

If the condition is true, the value of \( fs \) is written to \( fd \).
If the condition is false, the value that has all bits zero is written to \( fd \).

This instruction has floating point formats S and D, but these specify only the width of the operands. Format S can be used for 32-bit W data, and format D can be used for 64 bit L data. The condition test is restricted to bit 0 of FPR \( ft \). Other bits are ignored.

This instruction has no execution exception behavior. It does not trap on NaNs, and the \( FCSR_{\text{Cause}} \) and \( FCSR_{\text{Flags}} \) fields are not modified.

Restrictions:

FPR \( fd \) destination register bits beyond the format width are UNPREDICTABLE. For example, if \( fmt \) is S, then \( fd \) bits 0-31 are defined, but bits 32 and above are UNPREDICTABLE. If \( fmt \) is D, then \( fd \) bits 0-63 are defined.

Availability and Compatibility:

These instructions are introduced by and required as of MIPS32 Release 6.

Special Considerations:

Only formats S and D are valid. Other format values may be used to encode other instructions. Unused format encodings are required to signal the Reserved Instruction exception.

Operation:

```plaintext
tmp <- \text{ValueFPR}(ft, \text{UNINTERPRETED\_WORD})
SELEQZ: \text{cond} <- \text{tmp}.\text{bit0} = 0
SELNEQZ: \text{cond} <- \text{tmp}.\text{bit0} \neq 0
if \text{cond} then
  \text{tmp} <- \text{ValueFPR}(fs, fmt)
else
```

---

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
tmp ← 0 /* all bits set to zero */
endif
StoreFPR(fd, fmt, tmp)

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
SH

Store Halfword

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SH</td>
<td>101001</td>
<td>base</td>
<td>rt</td>
<td>offset</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>16</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Format: \texttt{SH \texttt{rt}, offset(base)}

Purpose: Store Halfword

To store a halfword to memory.

Description: \texttt{memory[ GPR[base] + offset] \leftarrow GPR[rt]}

The least-significant 16-bit halfword of register \texttt{rt} is stored in memory at the location specified by the aligned effective address. The 16-bit signed \texttt{offset} is added to the contents of GPR \texttt{base} to form the effective address.

Restrictions:

Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

\begin{align*}
\texttt{vAddr} & \leftarrow \text{sign\_extend}(\texttt{offset}) + \text{GPR[base]} \\
\texttt{(pAddr, CCA)} & \leftarrow \text{AddressTranslation}(\texttt{vAddr}, \text{DATA}, \text{STORE}) \\
\texttt{pAddr} & \leftarrow \texttt{pAddr}_{\text{PSIZE\_1..2}} \ || \ (\texttt{pAddr}_{1..0} \ xor \ (\text{ReverseEndian} \ || \ 0)) \\
\texttt{bytesel} & \leftarrow \texttt{vAddr}_{1..0} \ xor \ (\text{EndianCPU} \ || \ 0) \\
\texttt{dataword} & \leftarrow \text{GPR}[\texttt{rt}]_{31-8*\texttt{bytesel}..0} \ || \ 0^{8*\texttt{bytesel}} \\
\text{StoreMemory}(\texttt{CCA}, \text{HALFWORD}, \texttt{dataword}, \texttt{pAddr}, \texttt{vAddr}, \text{DATA})
\end{align*}

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
SHE Store Halfword EVA

**Format:** SHE rt, offset(base)  

**Purpose:** Store Halfword EVA  

To store a halfword to user mode virtual address space when executing in kernel mode.

**Description:** memory[GPR[base] + offset] ← GPR[rt]  

The least-significant 16-bit halfword of register rt is stored in memory at the location specified by the aligned effective address. The 9-bit signed offset is added to the contents of GPR base to form the effective address.

The SHE instruction functions the same as the SH instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to 1.

**Restrictions:**

Only usable in kernel mode when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Pre-Release 6: The effective address must be naturally-aligned. If the least-significant bit of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign} \_\text{extend}(\text{offset}) + \text{GPR[base]} \\
(\text{pAddr, CCA}) & \leftarrow \text{AddressTranslation (vAddr, DATA, STORE)} \\
\text{pAddr} & \leftarrow \text{pAddr}_{\text{PSIZE}-1.2} || (\text{pAddr}_{1.0} \text{xor ReverseEndian} || 0) \\
\text{bytesel} & \leftarrow \text{vAddr}_{1.0} \text{xor BigEndianCPU} || 0 \\
\text{dataword} & \leftarrow \text{GPR[rt]}_{31.8*\text{bytesel}..0} || 0^8*\text{bytesel} \\
\text{StoreMemory (CCA, HALFWORD, dataword, pAddr, vAddr, DATA)}
\end{align*}
\]

**Exceptions:**

TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
**Purpose:** Signal Reserved Instruction Exception

The SIGRIE instruction signals a Reserved Instruction exception.

**Description:** SignalException(ReservedInstruction)

The SIGRIE instruction signals a Reserved Instruction exception. Implementations should use exactly the same mechanisms as they use for reserved instructions that are not defined by the Architecture.

The 16-bit `code` field is available for software use.

**Restrictions:**

The 16-bit `code` field is available for software use. The value zero is considered the default value. Software may provide extended functionality by interpreting nonzero values of the `code` field in a manner that is outside the scope of this architecture specification.

**Availability and Compatibility:**

This instruction is introduced by and required as of Release 6.

Pre-Release 6: this instruction encoding was reserved, and required to signal a Reserved Instruction exception. Therefore this instruction can be considered to be both backwards and forwards compatible.

**Operation:**

`SignalException(ReservedInstruction)`

**Exceptions:**

Reserved Instruction
**SLL**

**Shift Word Left Logical**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>000000</td>
<td>0</td>
<td>00000</td>
<td>rt</td>
<td>rd</td>
<td>sa</td>
<td>SLL</td>
<td>000000</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:** $\text{SLL } rd, \, rt, \, sa$

**Purpose:** Shift Word Left Logical

To left-shift a word by a fixed number of bits.

**Description:** $\text{GPR}[rd] \leftarrow \text{GPR}[rt] \ll sa$

The contents of the low-order 32-bit word of GPR $rt$ are shifted left, inserting zeros into the emptied bits. The word result is placed in GPR $rd$. The bit-shift amount is specified by $sa$.

**Restrictions:**
None

**Operation:**

\[
s \leftarrow sa \\
\text{temp} \leftarrow \text{GPR}[rt]\{31-s\}..0 || 0^s \\
\text{GPR}[rd] \leftarrow \text{temp}
\]

**Exceptions:**
None

**Programming Notes:**

SLL $r0, r0, 0$, expressed as NOP, is the assembly idiom used to denote no operation.

SLL $r0, r0, 1$, expressed as SSNOP, is the assembly idiom used to denote no operation that causes an issue break on superscalar processors.
SLLV

**Shift Word Left Logical Variable**

**Format:**  SLLV rd, rt, rs  

**Purpose:**  Shift Word Left Logical Variable  
To left-shift a word by a variable number of bits.

**Description:**  
GPR[rd] ← GPR[rt] << GPR[rs]  
The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits. The resulting word is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs.

**Restrictions:**  
None

**Operation:**

\[
\begin{align*}
\text{s} & \leftarrow \text{GPR}[rs]_{4..0} \\
\text{temp} & \leftarrow \text{GPR}[rt]_{(31-s)\ldots0} \mid 0^s \\
\text{GPR}[rd] & \leftarrow \text{temp}
\end{align*}
\]

**Exceptions:**  
None

**Programming Notes:**  
None
**Format:**  
SLT rd, rs, rt

**Purpose:**  
Set on Less Than

To record the result of a less-than comparison.

**Description:**  
GPR[rd] ← (GPR[rs] < GPR[rt])

Compare the contents of GPR rs and GPR rt as signed integers; record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false).

The arithmetic comparison does not cause an Integer Overflow exception.

**Restrictions:**

None

**Operation:**

```plaintext```
if GPR[rs] < GPR[rt] then
    GPR[rd] ← 0^GPRLEN-1 || 1
else
    GPR[rd] ← 0^GPRLEN
endif
```

**Exceptions:**

None
SLTI Set on Less Than Immediate

**Format:**  
SLTI rt, rs, immediate

**Purpose:**  
Set on Less Than Immediate  
To record the result of a less-than comparison with a constant.

**Description:**  
GPR[rt] ← (GPR[rs] < sign_extend(immediate))

Compare the contents of GPR rs and the 16-bit signed immediate as signed integers; record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false).

The arithmetic comparison does not cause an Integer Overflow exception.

**Restrictions:**
None

**Operation:**

```
if GPR[rs] < sign_extend(immediate) then
    GPR[rt] ← 0^{GPRLEN-1}|| 1
else
    GPR[rt] ← 0^{GPRLEN}
endif
```

**Exceptions:**
None
SLTIU
Set on Less Than Immediate Unsigned

Format: SLTIU rt, rs, immediate

Purpose: Set on Less Than Immediate Unsigned
To record the result of an unsigned less-than comparison with a constant.

Description: GPR[rt] ← (GPR[rs] < sign_extend(immediate))
Compare the contents of GPR rs and the sign-extended 16-bit immediate as unsigned integers; record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false).
Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.
The arithmetic comparison does not cause an Integer Overflow exception.

Restrictions:
None

Operation:
if (0 || GPR[rs]) < (0 || sign_extend(immediate)) then
    GPR[rt] ← 0GPRLEN-1 || 1
else
    GPR[rt] ← 0GPRLEN
endif

Exceptions:
None
Set on Less Than Unsigned

**Format:** \( \text{SLTU } rd, rs, rt \)  

**Purpose:** Set on Less Than Unsigned  
To record the result of an unsigned less-than comparison.

**Description:**  
\[ \text{GPR}[rd] \leftarrow (\text{GPR}[rs] < \text{GPR}[rt]) \]  
Compare the contents of GPR \( rs \) and GPR \( rt \) as unsigned integers; record the Boolean result of the comparison in GPR \( rd \). If GPR \( rs \) is less than GPR \( rt \), the result is 1 (true); otherwise, it is 0 (false).

The arithmetic comparison does not cause an Integer Overflow exception.

**Restrictions:**  
None

**Operation:**  
\[
\begin{align*}
\text{if } (0 || \text{GPR}[rs]) < (0 || \text{GPR}[rt]) \text{ then } \\
\quad \text{GPR}[rd] & \leftarrow 0^{\text{GPRLEN}-1} || 1 \\
\text{else } \\
\quad \text{GPR}[rd] & \leftarrow 0^{\text{GPRLEN}} 
\end{align*}
\]

**Exceptions:**  
None
FLOATING POINT SQUARE ROOT

**Purpose:**
To compute the square root of an FP value.

**Description:**
\[ FPR[fd] \leftarrow SQRT(FPR[fs]) \]

The square root of the value in FPR \( fs \) is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR \( fd \). The operand and result are values in format \( fmt \).

If the value in FPR \( fs \) corresponds to \(-0\), the result is \(-0\).

**Restrictions:**
- If the value in FPR \( fs \) is less than 0, an Invalid Operation condition is raised.
- The fields \( fs \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.
- The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

**Operation:**
\[ \text{StoreFPR}(fd, fmt, \text{SquareRoot}(\text{ValueFPR}(fs, fmt))) \]

**Exceptions:**
- Coprocessor Unusable, Reserved Instruction
- Floating Point Exceptions:
  - Invalid Operation, Inexact, Unimplemented Operation

**Format:**
```plaintext
COP1
010001
fmt
00000
fs
5
fd
5
SQRT
000100
```

MIPS32

- \( \text{SQRT}.S \) fd, fs
- \( \text{SQRT}.D \) fd, fs
Format:   SRA rd, rt, sa

Purpose:  Shift Word Right Arithmetic

To execute an arithmetic right-shift of a word by a fixed number of bits.

Description:  GPR[rd] ← GPR[rt] >> sa  (arithmetic)

The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.

Restrictions:
None

Operation:

\[
\begin{align*}
  &s \leftarrow sa \\
  &\text{temp} \leftarrow \text{GPR[rt]}_{31}^{s} || \text{GPR[rt]}_{31..s} \\
  &\text{GPR[rd]} \leftarrow \text{temp}
\end{align*}
\]

Exceptions:
None
**SRAV**

**Shift Word Right Arithmetic Variable**

**Format:**  
SRAV rd, rt, rs

**MIPS32**

**Purpose:**  
Shift Word Right Arithmetic Variable

To execute an arithmetic right-shift of a word by a variable number of bits.

**Description:**  
GPR[rd] ← GPR[rt] >> GPR[rs]  (arithmetic)

The contents of the low-order 32-bit word of GPR rt are shifted right, duplicating the sign-bit (bit 31) in the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by the low-order 5 bits of GPR rs.

**Restrictions:**

None

**Operation:**

\[ s \leftarrow GPR[rs]_{4..0} \]
\[ \text{temp} \leftarrow (GPR[rt]_{31})^s \ | \ | GPR[rt]_{31..s} \]
\[ GPR[rd] \leftarrow \text{temp} \]

**Exceptions:**

None
### SRL

#### Shift Word Right Logical

**Format:**  
SRL rd, rt, sa

**Purpose:**  
Shift Word Right Logical

To execute a logical right-shift of a word by a fixed number of bits.

**Description:**  
GPR[rd] $\leftarrow$ GPR[rt] $>>$ sa (logical)

The contents of the low-order 32-bit word of GPR rt are shifted right, inserting zeros into the emptied bits. The word result is placed in GPR rd. The bit-shift amount is specified by sa.

**Restrictions:**
None

**Operation:**

$$
\begin{align*}
  s & \leftarrow sa \\
  \text{temp} & \leftarrow 0^s \mid \mid \text{GPR[rt]}_{31..s} \\
  \text{GPR[rd]} & \leftarrow \text{temp}
\end{align*}
$$

**Exceptions:**
None
Format: \texttt{SRLV rd, rt, rs}

Purpose: Shift Word Right Logical Variable
To execute a logical right-shift of a word by a variable number of bits.

Description: \texttt{GPR[rd] \leftarrow GPR[rt] \gg GPR[rs]} (logical)
The contents of the low-order 32-bit word of GPR \texttt{rt} are shifted right, inserting zeros into the emptied bits; the word result is placed in GPR \texttt{rd}. The bit-shift amount is specified by the low-order 5 bits of GPR \texttt{rs}.

Restrictions:
None

Operation:
\begin{align*}
  s & \leftarrow \text{GPR[rs]_{4..0}} \\
  \text{temp} & \leftarrow 0^s \|	ext{GPR[rt]_{31..s}} \\
  \text{GPR[rd]} & \leftarrow \text{temp}
\end{align*}

Exceptions:
None
**SSNOP**

**Superscalar No Operation**

<table>
<thead>
<tr>
<th>Format:</th>
<th>SSNOP</th>
</tr>
</thead>
<tbody>
<tr>
<td>Assembly Idiom MIPS32</td>
<td></td>
</tr>
</tbody>
</table>

**Purpose:** Superscalar No Operation

Break superscalar issue on a superscalar processor.

**Description:**

SSNOP is the assembly idiom used to denote superscalar no operation. The actual instruction is interpreted by the hardware as SLL r0, r0, 1.

This instruction alters the instruction issue behavior on a superscalar processor by forcing the SSNOP instruction to single-issue. The processor must then end the current instruction issue between the instruction previous to the SSNOP and the SSNOP. The SSNOP then issues alone in the next issue slot.

On a single-issue processor, this instruction is a NOP that takes an issue slot.

**Restrictions:**

None

**Availability and Compatibility**

Release 6: the special no-operation instruction SSNOP is deprecated: it behaves the same as a conventional NOP. Its special behavior with respect to instruction issue is no longer guaranteed. The EHB and JR.HB instructions are provided to clear execution and instruction hazards.

Assemblers targeting specifically Release 6 should reject the SSNOP instruction with an error.

**Operation:**

None

**Exceptions:**

None

**Programming Notes:**

SSNOP is intended for use primarily to allow the programmer control over CP0 hazards by converting instructions into cycles in a superscalar processor. For example, to insert at least two cycles between an MTC0 and an ERET, one would use the following sequence:

```
mtc0 x,y
ssnop
ssnop
eret
```

The MTC0 issues in cycle T. Because the SSNOP instructions must issue alone, they may issue no earlier than cycle $T+1$ and cycle $T+2$, respectively. Finally, the ERET issues no earlier than cycle $T+3$. Although the instruction after an SSNOP may issue no earlier than the cycle after the SSNOP is issued, that instruction may issue later. This is because other implementation-dependent issue rules may apply that prevent an issue in the next cycle. Processors should not introduce any unnecessary delay in issuing SSNOP instructions.
### SUB Subtract Word

**Format:** \texttt{SUB rd, rs, rt}

**Purpose:** Subtract Word

To subtract 32-bit integers. If overflow occurs, then trap.

**Description:** \( GPR[rd] \leftarrow GPR[rs] - GPR[rt] \)

The 32-bit word value in GPR \( rt \) is subtracted from the 32-bit value in GPR \( rs \) to produce a 32-bit result. If the subtraction results in 32-bit 2's complement arithmetic overflow, then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR \( rd \).

**Restrictions:**

None

**Operation:**

\[
\begin{align*}
\text{temp} & \leftarrow (GPR[rs]_{31} \mid GPR[rs]_{31..0}) - (GPR[rt]_{31} \mid GPR[rt]_{31..0}) \\
\text{if temp}_{32} & \neq \text{temp}_{31} \text{ then} \\
& \quad \text{SignalException(IntegerOverflow)} \\
\text{else} \\
& \quad GPR[rd] \leftarrow \text{temp}_{31..0}
\end{align*}
\]

**Exceptions:**

Integer Overflow

**Programming Notes:**

SUBU performs the same arithmetic operation but does not trap on overflow.
Floating Point Subtract

SUB.fmt

Format:

- **SUB.S fd, fs, ft**
- **SUB.D fd, fs, ft**
- **SUB.PS fd, fs, ft**

MIPS32, MIPS32 Release 2, removed in Release 6

Purpose:

Floating Point Subtract

To subtract FP values.

Description:

\[ \text{FPR}[fd] \leftarrow \text{FPR}[fs] - \text{FPR}[ft] \]

The value in FPR \( ft \) is subtracted from the value in FPR \( fs \). The result is calculated to infinite precision, rounded according to the current rounding mode in FCSR, and placed into FPR \( fd \). The operands and result are values in format \( fmt \). SUB.PS subtracts the upper and lower halves of FPR \( fs \) and FPR \( ft \) independently, and ORs together any generated exceptional conditions.

Restrictions:

The fields \( fs, ft, \) and \( fd \) must specify FPRs valid for operands of type \( fmt \). If the fields are not valid, the result is UNPREDICTABLE.

The operands must be values in format \( fmt \); if they are not, the result is UNPREDICTABLE and the value of the operand FPRs becomes UNPREDICTABLE.

The result of SUB.PS is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the \( FR=1 \) mode, but not with \( FR=0 \), and not on a 32-bit FPU.

Availability and Compatibility:

SUB.PS has been removed in Release 6.

Operation:

\[ \text{StoreFPR} (fd, fmt, \text{ValueFPR}(fs, fmt) -_{fmt} \text{ValueFPR}(ft, fmt)) \]

CPU Exceptions:

Coprocessor Unusable, Reserved Instruction

FPU Exceptions:

Inexact, Overflow, Underflow, Invalid Op, Unimplemented Op
Format: \texttt{SUBU rd, rs, rt}

\textbf{Purpose:} Subtract Unsigned Word

To subtract 32-bit integers.

\textbf{Description:} \texttt{GPR[rd] \leftarrow GPR[rs] - GPR[rt]}

The 32-bit word value in \texttt{GPR rt} is subtracted from the 32-bit value in \texttt{GPR rs} and the 32-bit arithmetic result is and placed into \texttt{GPR rd}.

No integer overflow exception occurs under any circumstances.

\textbf{Restrictions:}

None

\textbf{Operation:}

\[
\text{temp} \leftarrow \text{GPR[rs]} - \text{GPR[rt]}
\]

\[
\text{GPR[rd]} \leftarrow \text{temp}
\]

\textbf{Exceptions:}

None

\textbf{Programming Notes:}

The term "unsigned" in the instruction name is a misnomer; this operation is 32-bit modulo arithmetic that does not trap on overflow. It is appropriate for unsigned arithmetic, such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
SUXC1

Store Doubleword Indexed Unaligned from Floating Point

Format: SUXC1 fs, index(base)  
MIPS64,MIPS32 Release 2, removed in Release 6

Purpose: Store Doubleword Indexed Unaligned from Floating Point
To store a doubleword from an FPR to memory (GPR+GPR addressing) ignoring alignment.

Description: memory[(GPR[base] + GPR[index])PSIZE-1..3] ← FPR[fs]
The contents of the 64-bit doubleword in FPR fs is stored at the memory location specified by the effective address. The contents of GPR index and GPR base are added to form the effective address. The effective address is double-word-aligned; EffectiveAddress2..0 are ignored.

Restrictions:
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model. The instruction is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Availability and Compatibility
This instruction has been removed in Release 6.

Operation:

vAddr ← (GPR[base]+GPR[index])63..3 || 03
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE)
datadoubleword ← ValueFPR(fs, UNINTERPRETED_DOUBLEWORD)
paddr ← paddr xor ((BigEndianCPU xor ReverseEndian) || 02)
StoreMemory(CCA, WORD, datadoubleword31..0, pAddr, vAddr, DATA)
paddr ← paddr xor 0b100
StoreMemory(CCA, WORD, datadoubleword63..32, pAddr, vAddr+4, DATA)

Exceptions:
Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Watch
Format: \texttt{SW rt, offset(base)} \hfill MIPS32

Purpose: Store Word

To store a word to memory.

Description: \texttt{memory[GPR[base] + offset] \leftarrow GPR[rt]}

The least-significant 32-bit word of GPR \texttt{rt} is stored in memory at the location specified by the aligned effective address. The 16-bit signed \texttt{offset} is added to the contents of GPR \texttt{base} to form the effective address.

Restrictions:

Pre-Release 6: The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

\begin{align*}
\text{vAddr} & \leftarrow \text{sign\
obreakdash\_extend}(\text{offset}) + \text{GPR}[\text{base}] \\
(\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation (vAddr, DATA, STORE)} \\
\text{dataword} & \leftarrow \text{GPR[rt]} \\
\text{StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)}
\end{align*}

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
### SWC1
#### Store Word from Floating Point

**Purpose:** Store Word from Floating Point

To store a word from an FPR to memory.

**Description:**
\[ \text{memory}[\text{GPR}[\text{base}] + \text{offset}] \leftarrow \text{FPR}[\text{ft}] \]

The low 32-bit word from FPR ft is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

**Restrictions:**

- Pre-Release 6: An Address Error exception occurs if EffectiveAddress1..0 \(\neq 0\) (not word-aligned).

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Operation:**

| vAddr \(\leftarrow\) sign_extend(offset) + GPR[base] |
| (pAddr, CCA) \(\leftarrow\) AddressTranslation(vAddr, DATA, STORE) |
| dataword \(\leftarrow\) ValueFPR(ft, UNINTERPRETED_WORD) |
| StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA) |

**Exceptions:**

- Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch
Store Word from Coprocessor 2

**Format:**  \( \text{SWC2 } rt, \text{ offset(base)} \)

**Purpose:** Store Word from Coprocessor 2

To store a word from a COP2 register to memory

**Description:**  
memory\[GPR[base] + offset\] \(\leftarrow\) CPR\[2,rt,0\]  
The low 32-bit word from COP2 (Coprocessor 2) register \(rt\) is stored in memory at the location specified by the aligned effective address. The signed \(offset\) is added to the contents of GPR \(base\) to form the effective address.

**Restrictions:**

Pre-Release 6: An Address Error exception occurs if EffectiveAddress\(_{1..0}\) \(\neq\) 0 (not word-aligned).

Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.

Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

**Availability and Compatibility**

This instruction has been recoded for Release 6.

**Operation:**

\[
\begin{align*}
\text{vAddr} \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR[base]} \\
(\text{pAddr, CCA}) \leftarrow \text{AddressTranslation(vAddr, DATA, STORE)} \\
\text{dataword} \leftarrow \text{CPR}[2,rt,0] \\
\text{StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA)}
\end{align*}
\]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction, TLB Refill, TLB Invalid, TLB Modified, Address Error, Watch

**Programming Notes:**

As shown in the instruction drawing above, Release 6 implements an 11-bit offset, whereas all release levels lower than Release 6 of the MIPS architecture implement a 16-bit offset.
Format: SWE rt, offset(base)

Purpose: Store Word EVA
To store a word to user mode virtual address space when executing in kernel mode.

Description: memory[GPR[base] + offset] ← GPR[rt]
The least-significant 32-bit word of GPR rt is stored in memory at the location specified by the aligned effective address. The 9-bit signed offset is added to the contents of GPR base to form the effective address.
The SWE instruction functions the same as the SW instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.
Implementation of this instruction is specified by the Config5EVA field being set to 1.

Restrictions:
Only usable in kernel mode when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.
Pre-Release 6: The effective address must be naturally-aligned. If either of the 2 least-significant bits of the address is non-zero, an Address Error exception occurs.
Release 6 allows hardware to provide address misalignment support in lieu of requiring natural alignment.
Note: The pseudocode is not completely adapted for Release 6 misalignment support as the handling is implementation dependent.

Operation:

vAddr ← sign_extend(offset) + GPR[base]
(pAddr, CCA) ← AddressTranslation (vAddr, DATA, STORE)
dataword ← GPR[rt]
StoreMemory (CCA, WORD, dataword, pAddr, vAddr, DATA)

Exceptions:
TLB Refill, TLB Invalid, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
Format: \texttt{SWL rt, offset(base)}

MIPS32, removed in Release 6

Purpose: Store Word Left

To store the most-significant part of a word to an unaligned memory address.

Description: \texttt{memory[\text{GPR[base]} + offset] \leftarrow \text{GPR[rt]}}

The 16-bit signed \textit{offset} is added to the contents of GPR \textit{base} to form an effective address (\textit{EffAddr}). \textit{EffAddr} is the address of the most-significant of 4 consecutive bytes forming a word (\textit{W}) in memory starting at an arbitrary byte boundary.

A part of \textit{W} (the most-significant 1 to 4 bytes) is in the aligned word containing \textit{EffAddr}. The same number of the most-significant (left) bytes from the word in GPR \textit{rt} are stored into these bytes of \textit{W}.

The following figure illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The four consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of \textit{W} (2 bytes) is located in the aligned word containing the most-significant byte at 2.

3. SWL stores the most-significant 2 bytes of the low word from the source register into these 2 bytes in memory.
4. The complementary SWR stores the remainder of the unaligned word.

\textbf{Figure 5.9 Unaligned Word Store Using SWL and SWR}

<table>
<thead>
<tr>
<th>Word at byte 2 in memory, big-endian byte order; each memory byte contains its own address</th>
<th>Memory: Initial contents</th>
</tr>
</thead>
<tbody>
<tr>
<td>GPR 24</td>
<td>E F G H</td>
</tr>
</tbody>
</table>

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ... |

<table>
<thead>
<tr>
<th>After executing SWL $24,2,$ ($$0$)</th>
<th>Then after SWR $24,5,$ ($$0$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>

The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word—that is, the low 2 bits of the address (\textit{vAddr}_{1,0})—and the current byte-ordering mode of the processor (big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte ordering.
Restrictions:

None

Availability and Compatibility:

Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:

\[
vAddr \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}[\text{base}]
\]

\[(\text{pAddr}, \text{CCA}) \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}, \text{STORE})\]

\[
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE}-1..2} \ || \ (\text{pAddr}_{1..0} \ \text{xor} \ \text{ReverseEndian}^2)
\]

If BigEndianMem = 0 then

\[
\text{pAddr} \leftarrow \text{pAddr}_{\text{PSIZE}-1..2} \ || \ 0^2
\]
endif

\[
\text{byte} \leftarrow vAddr_{1..0} \ \text{xor} \ \text{BigEndianCPU}^2
\]

\[
\text{dataword} \leftarrow 0^{24-8\times\text{byte}} \ || \ \text{GPR}[\text{rt}]_{31..24-8\times\text{byte}}
\]

\[
\text{StoreMemory(CCA, byte, dataword, pAddr, vAddr, DATA)}
\]

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
**Format:**  SWLE rt, offset(base)

**Purpose:** Store Word Left EVA

To store the most-significant part of a word to an unaligned user mode virtual address while operating in kernel mode.

**Description:** memory[base + offset] ← GPR[rt]

The 9-bit signed offset is added to the contents of GPR base to form an effective address (EffAddr). EffAddr is the address of the most-significant of 4 consecutive bytes forming a word (W) in memory starting at an arbitrary byte boundary.

A part of W (the most-significant 1 to 4 bytes) is in the aligned word containing EffAddr. The same number of the most-significant (left) bytes from the word in GPR rt are stored into these bytes of W.

The following figure shows this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of W (2 bytes) is located in the aligned word containing the most-significant byte at 2.

1. SWLE stores the most-significant 2 bytes of the low word from the source register into these 2 bytes in memory.

2. The complementary SWRE stores the remainder of the unaligned word.

**Figure 5.11 Unaligned Word Store Using SWLE and SWRE**

The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word—that is, the low 2 bits of the address (vAddrJ, 0)—and the current byte-ordering mode of the processor (big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte ordering.

The SWLE instruction functions the same as the SWL instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to 1.
Restrictions:

Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Availability and Compatibility:

Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:

\[
\begin{align*}
\text{vAddr} & \leftarrow \text{sign} \_\text{extend} (\text{offset}) + \text{GPR}[\text{base}] \\
(\text{pAddr}, \text{CCA}) & \leftarrow \text{AddressTranslation (vAddr, DATA, STORE)} \\
\text{pAddr} & \leftarrow \text{pAddr}_{P\text{SIZE}-1..2} \mid \mid (\text{pAddr}_{1..0} \text{xor ReverseEndian}^2) \\
\text{If BigEndianMem} = 0 \text{ then} & \text{pAddr} \leftarrow \text{pAddr}_{P\text{SIZE}-1..2} \mid \mid 0^2 \\
\text{endif} & \text{byte} \leftarrow \text{vAddr}_{1..0} \text{xor BigEndianCPU}^2 \\
\text{dataword} & \leftarrow 0^{31..24-8\text{byte}} \mid \mid \text{GPR}[\text{rt}]_{31..24-8\text{byte}} \\
\text{StoreMemory(CCA, byte, dataword, pAddr, vAddr, DATA)}
\end{align*}
\]

Exceptions:

TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch, Reserved Instruction, Coprocessor Unusable
**Format:** SWR rt, offset(base)  

**Purpose:** Store Word Right

To store the least-significant part of a word to an unaligned memory address.

**Description:** memory[\(GPR\text{[base]} + \text{offset}\)] \(\leftarrow\) GPR[rt]

The 16-bit signed offset is added to the contents of GPR base to form an effective address (EffAddr). EffAddr is the address of the least-significant of 4 consecutive bytes forming a word (\(W\)) in memory starting at an arbitrary byte boundary.

A part of \(W\) (the least-significant 1 to 4 bytes) is in the aligned word containing EffAddr. The same number of the least-significant (right) bytes from the word in GPR rt are stored into these bytes of \(W\).

The following figure illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of \(W\) (2 bytes) is contained in the aligned word containing the least-significant byte at 5.

1. SWR stores the least-significant 2 bytes of the low word from the source register into these 2 bytes in memory.

2. The complementary SWL stores the remainder of the unaligned word.

**Figure 5.13 Unaligned Word Store Using SWR and SWL**

<table>
<thead>
<tr>
<th>Word at byte 2 in memory, big-endian byte order, each mem byte contains its address least — significance — least</th>
<th>Memory: Initial contents</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 1 2 3 4 5 6 7 8 ...</td>
<td>GPR 24</td>
</tr>
<tr>
<td>0 1 2 3 G H 6 ...</td>
<td>After executing SWR $24, 5 ($0)</td>
</tr>
<tr>
<td>0 1 E F G H 6 ...</td>
<td>Then after SWL $24, 2 ($0)</td>
</tr>
</tbody>
</table>

The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word—that is, the low 2 bits of the address (\(vAddr_{1,0}\))—and the current byte-ordering mode of the processor (big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte-ordering.
Restrictions:
None

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:

\[
\begin{align*}
vAddr & \leftarrow \text{sign\_extend}(\text{offset}) + \text{GPR}\[\text{base}] \\
(pAddr, \text{CCA}) & \leftarrow \text{AddressTranslation}(vAddr, \text{DATA}, \text{STORE}) \\
pAddr & \leftarrow pAddr_{\text{PSIZE}-1\ldots2} || (pAddr_{1\ldots0} \text{xor ReverseEndian}^2) \\
\text{If BigEndianMem} & = 0 \text{ then} \\
pAddr & \leftarrow pAddr_{\text{PSIZE}-1\ldots2} || 0^2 \\
\text{endif} \\
\text{byte} & \leftarrow vAddr_{1\ldots0} \text{xor BigEndianCPU}^2 \\
\text{dataword} & \leftarrow \text{GPR}[r]_{31-8*\text{byte}} || 0^8*\text{byte} \\
\text{StoreMemory}(\text{CCA}, \text{WORD-\text{byte}, dataword, pAddr, vAddr, DATA})
\end{align*}
\]

Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch
<table>
<thead>
<tr>
<th>SWR</th>
<th>Store Word Right</th>
</tr>
</thead>
</table>

395 The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
The MIPS32® Instruction Set Manual, Revision 6.04

Swre

Store Word Right EVA

Format: SWRE rt, offset(base)

MIPS32, removed in Release 6

Purpose: Store Word Right EVA

To store the least-significant part of a word to an unaligned user mode virtual address while operating in kernel mode.

Description: memory[ GPR[base] + offset ] ← GPR[rt]

The 9-bit signed offset is added to the contents of GPR base to form an effective address (EffAddr). EffAddr is the address of the least-significant of 4 consecutive bytes forming a word (W) in memory starting at an arbitrary byte boundary.

A part of W (the least-significant 1 to 4 bytes) is in the aligned word containing EffAddr. The same number of the least-significant (right) bytes from the word in GPR rt are stored into these bytes of W.

The following figure illustrates this operation using big-endian byte ordering for 32-bit and 64-bit registers. The 4 consecutive bytes in 2..5 form an unaligned word starting at location 2. A part of W (2 bytes) is contained in the aligned word containing the least-significant byte at 5.

3. SWRE stores the least-significant 2 bytes of the low word from the source register into these 2 bytes in memory.

4. The complementary SWLE stores the remainder of the unaligned word.

<table>
<thead>
<tr>
<th>SPECIAL3</th>
<th>base</th>
<th>rt</th>
<th>offset</th>
<th>0</th>
<th>SWRE</th>
<th>100010</th>
</tr>
</thead>
<tbody>
<tr>
<td>011111</td>
<td>5</td>
<td>5</td>
<td>9</td>
<td>1</td>
<td></td>
<td>6</td>
</tr>
</tbody>
</table>

Figure 5.15 Unaligned Word Store Using SWRE and SWLE

The bytes stored from the source register to memory depend on both the offset of the effective address within an aligned word—that is, the low 2 bits of the address (vAddr[1:0])—and the current byte-ordering mode of the processor (big- or little-endian). The following figure shows the bytes stored for every combination of offset and byte-ordering.

The LWE instruction functions the same as the LW instruction, except that address translation is performed using the user mode virtual address space mapping in the TLB when accessing an address within a memory segment configured to use the MUSUK access mode. Memory segments using UUSK or MUSK access modes are also accessible. Refer to Volume III, Enhanced Virtual Addressing section for additional information.

Implementation of this instruction is specified by the Config5EVA field being set to 1.
Restrictions:
Only usable when access to Coprocessor0 is enabled and when accessing an address within a segment configured using UUSK, MUSK or MUSUK access mode.

Availability and Compatibility:
Release 6 removes the load/store-left/right family of instructions, and requires the system to support misaligned memory accesses.

Operation:
\[
\text{vAddr} \leftarrow \text{sign	extunderscore extend}(\text{offset}) + \text{GPR[base]}
\]
\[
(\text{pAddr}, \text{CCA}) \leftarrow \text{AddressTranslation} (\text{vAddr}, \text{DATA}, \text{STORE})
\]
\[
\text{pAddr} \leftarrow \text{pAddr}_{PSIZE-1..2} \mid | (\text{pAddr}_{1..0} \xor \text{ReverseEndian}^2)
\]
If BigEndianMem = 0 then
\[
\text{pAddr} \leftarrow \text{pAddr}_{PSIZE-1..2} \mid 0^2
\]
endif
\[
\text{byte} \leftarrow \text{vAddr}_{1..0} \xor \text{BigEndianCPU}^2
\]
\[
\text{dataword} \leftarrow \text{GPR[rt]}_{31-8*\text{byte}} \mid | 0^8*\text{byte}
\]
\[
\text{StoreMemory} (\text{CCA}, \text{WORD}-\text{byte}, \text{dataword}, \text{pAddr}, \text{vAddr}, \text{DATA})
\]

Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Bus Error, Address Error, Watch, Coprocessor Unusable
SWXC1

Format: SWXC1 fs, index(base)

Purpose: Store Word Indexed from Floating Point

To store a word from an FPR to memory (GPR+GPR addressing)

Description: memory[GPR[base] + GPR[index]] ← FPR[fs]

The low 32-bit word from FPR fs is stored in memory at the location specified by the aligned effective address. The contents of GPR index and GPR base are added to form the effective address.

Restrictions:
An Address Error exception occurs if EffectiveAddress1..0 ≠ 0 (not word-aligned).

Availability and Compatibility:
This instruction has been removed in Release 6.
Required in all versions of MIPS64 since MIPS64 Release 1. Not available in MIPS32 Release 1. Required in MIPS32 Release 2 and all subsequent versions of MIPS32. When required, required whenever FPU is present, whether a 32-bit or 64-bit FPU, whether in 32-bit or 64-bit FP Register Mode (FIRF64=0 or 1, StatusFR=0 or 1).

Operation:

vAddr ← GPR[base] + GPR[index]
if vAddr1..0 ≠ 0 then
    SignalException(AddressError)
endif
(pAddr, CCA) ← AddressTranslation(vAddr, DATA, STORE)
dataword ← ValueFPR(fs, UNINTERPRETED_WORD)
StoreMemory(CCA, WORD, dataword, pAddr, vAddr, DATA)

Exceptions:
TLB Refill, TLB Invalid, TLB Modified, Address Error, Reserved Instruction, Coprocessor Unusable, Watch
SYNC Synchronize Shared Memory

Format: \texttt{SYNC \{stype = 0 implied\}}
\texttt{SYNC stype}

Purpose: Synchronize Shared Memory
To order loads and stores for shared memory.

Description:
These types of ordering guarantees are available through the SYNC instruction:

• Completion Barriers
• Ordering Barriers

\textit{Completion Barrier — Simple Description:}

• The barrier affects only \textit{uncached} and \textit{cached coherent} loads and stores.

• The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must be completed before the specified memory instructions after the SYNC are allowed to start.

• Loads are completed when the destination register is written. Stores are completed when the stored value is visible to every other processor in the system.

\textit{Completion Barrier — Detailed Description:}

• Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must be already globally performed before any synchronizable specified memory instructions that occur after the SYNC are allowed to be performed, with respect to any other processor or coherent I/O module.

• The barrier does not guarantee the order in which instruction fetches are performed.

• A stype value of zero will always be defined such that it performs the most complete set of synchronization operations that are defined. This means stype zero always does a completion barrier that affects both loads and stores preceding the SYNC instruction and both loads and stores that are subsequent to the SYNC instruction. Non-zero values of stype may be defined by the architecture or specific implementations to perform synchronization behaviors that are less complete than that of stype zero. If an implementation does not use one of these non-zero values to define a different synchronization behavior, then that non-zero value of stype must act the same as stype zero completion barrier. This allows software written for an implementation with a lighter-weight barrier to work on another implementation which only implements the stype zero completion barrier.

• A completion barrier is required, potentially in conjunction with SSNOP (in Release 1 of the Architecture) or EHB (in Release 2 of the Architecture), to guarantee that memory reference results are visible across operating mode changes. For example, a completion barrier is required on some implementations on entry to and exit from Debug Mode to guarantee that memory effects are handled correctly.

\textit{SYNC behavior when the stype field is zero:}
SYNC

Synchronize Shared Memory

- A completion barrier that affects preceding loads and stores and subsequent loads and stores.

*Ordering Barrier — Simple Description:

- The barrier affects only *uncached* and *cached coherent* loads and stores.
- The specified memory instructions (loads or stores or both) that occur before the SYNC instruction must always be ordered before the specified memory instructions after the SYNC.
- Memory instructions which are ordered before other memory instructions are processed by the load/store datapath first before the other memory instructions.

*Ordering Barrier — Detailed Description:

- Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must reach a stage in the load/store datapath after which no instruction re-ordering is possible before any synchronizable specified memory instruction which occurs after the SYNC instruction in the instruction stream reaches the same stage in the load/store datapath.
- If any memory instruction before the SYNC instruction in program order, generates a memory request to the external memory and any memory instruction after the SYNC instruction in program order also generates a memory request to external memory, the memory request belonging to the older instruction must be globally performed before the time the memory request belonging to the younger instruction is globally performed.
- The barrier does not guarantee the order in which instruction fetches are performed.

As compared to the completion barrier, the ordering barrier is a lighter-weight operation as it does not require the specified instructions before the SYNC to be already completed. Instead it only requires that those specified instructions which are subsequent to the SYNC in the instruction stream are never re-ordered for processing ahead of the specified instructions which are before the SYNC in the instruction stream. This potentially reduces how many cycles the barrier instruction must stall before it completes.

The Acquire and Release barrier types are used to minimize the memory orderings that must be maintained and still have software synchronization work.

Implementations that do not use any of the non-zero values of stype to define different barriers, such as ordering barriers, must make those stype values act the same as stype zero.

For the purposes of this description, the CACHE, PREF and PREFX instructions are treated as loads and stores. That is, these instructions and the memory transactions sourced by these instructions obey the ordering and completion rules of the SYNC instruction.
Table 5.6 lists the available completion barrier and ordering barriers behaviors that can be specified using the stype field.

**Table 5.6 Encodings of the Bits[10:6] of the SYNC instruction; the SType Field**

<table>
<thead>
<tr>
<th>Code</th>
<th>Name</th>
<th>Older instructions which must reach the load/store ordering point before the SYNC instruction completes.</th>
<th>Younger instructions which must reach the load/store ordering point only after the SYNC instruction completes.</th>
<th>Older instructions which must be globally performed when the SYNC instruction completes</th>
<th>Compliance</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x0</td>
<td>SYNC or SYNC 0</td>
<td>Loads, Stores</td>
<td>Loads, Stores</td>
<td>Loads, Stores</td>
<td>Required</td>
</tr>
<tr>
<td>0x4</td>
<td>SYNC_WMB or SYNC 4</td>
<td>Stores</td>
<td>Stores</td>
<td></td>
<td>Optional</td>
</tr>
<tr>
<td>0x10</td>
<td>SYNC_MB or SYNC 16</td>
<td>Loads, Stores</td>
<td>Loads, Stores</td>
<td></td>
<td>Optional</td>
</tr>
<tr>
<td>0x11</td>
<td>SYNC_ACQUIRE or SYNC 17</td>
<td>Loads</td>
<td>Loads, Stores</td>
<td></td>
<td>Optional</td>
</tr>
<tr>
<td>0x12</td>
<td>SYNC_RELEASE or SYNC 18</td>
<td>Loads, Stores</td>
<td>Stores</td>
<td></td>
<td>Optional</td>
</tr>
<tr>
<td>0x13</td>
<td>SYNC_RMB or SYNC 19</td>
<td>Loads</td>
<td>Loads</td>
<td></td>
<td>Optional</td>
</tr>
<tr>
<td>0x1-0x3, 0x5-0xF</td>
<td>Implementation-Specific and Vendor Specific Sync Types</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0x14 - 0x1F</td>
<td>RESERVED</td>
<td></td>
<td></td>
<td></td>
<td>Reserved for MIPS Technologies for future extension of the architecture.</td>
</tr>
</tbody>
</table>

Terms:

**Synchronizable**: A load or store instruction is *synchronizable* if the load or store occurs to a physical location in shared memory using a virtual location with a memory access type of either *uncached* or *cached coherent*. *Shared memory* is memory that can be accessed by more than one processor or by a coherent I/O system module.

**Performed load**: A load instruction is *performed* when the value returned by the load has been determined. The result of a load on processor A has been *determined* with respect to processor or coherent I/O module B when a subsequent store to the location by B cannot affect the value returned by the load. The store by B must use the same memory access type as the load.

**Performed store**: A store instruction is *performed* when the store is observable. A store on processor A is *observable* with respect to processor or coherent I/O module B when a subsequent load of the location by B returns the value.
written by the store. The load by B must use the same memory access type as the store.

Globally performed load: A load instruction is globally performed when it is performed with respect to all processors and coherent I/O modules capable of storing to the location.

Globally performed store: A store instruction is globally performed when it is globally observable. It is globally observable when it is observable by all processors and I/O modules capable of loading from the location.

Coherent I/O module: A coherent I/O module is an Input/Output system component that performs coherent Direct Memory Access (DMA). It reads and writes memory independently as though it were a processor doing loads and stores to locations with a memory access type of cached coherent.

Load/Store Datapath: The portion of the processor which handles the load/store data requests coming from the processor pipeline and processes those requests within the cache and memory system hierarchy.

Restrictions:
The effect of SYNC on the global order of loads and stores for memory access types other than uncached and cached coherent is UNPREDICTABLE.

Operation:

SyncOperation(stype)

Exceptions:
None

Programming Notes:
A processor executing load and store instructions observes the order in which loads and stores using the same memory access type occur in the instruction stream; this is known as program order.

A parallel program has multiple instruction streams that can execute simultaneously on different processors. In multiprocessor (MP) systems, the order in which the effects of loads and stores are observed by other processors—the global order of the loads and store—determines the actions necessary to reliably share data in parallel programs.

When all processors observe the effects of loads and stores in program order, the system is strongly ordered. On such systems, parallel programs can reliably share data without explicit actions in the programs. For such a system, SYNC has the same effect as a NOP. Executing SYNC on such a system is not necessary, but neither is it an error.

If a multiprocessor system is not strongly ordered, the effects of load and store instructions executed by one processor may be observed out of program order by other processors. On such systems, parallel programs must take explicit actions to reliably share data. At critical points in the program, the effects of loads and stores from an instruction stream must occur in the same order for all processors. SYNC separates the loads and stores executed on the processor into two groups, and the effect of all loads and stores in one group is seen by all processors before the effect of any load or store in the subsequent group. In effect, SYNC causes the system to be strongly ordered for the executing processor at the instant that the SYNC is executed.

Many MIPS-based multiprocessor systems are strongly ordered or have a mode in which they operate as strongly ordered for at least one memory access type. The MIPS architecture also permits implementation of MP systems that are not strongly ordered; SYNC enables the reliable use of shared memory on such systems. A parallel program that does not use SYNC generally does not operate on a system that is not strongly ordered. However, a program that does use SYNC works on both types of systems. (System-specific documentation describes the actions needed to reliably share data in parallel programs for that system.)

The behavior of a load or store using one memory access type is UNPREDICTABLE if a load or store was previously made to the same physical location using a different memory access type. The presence of a SYNC between the references does not alter this behavior.

SYNC affects the order in which the effects of load and store instructions appear to all processors; it does not gener-
ally affect the physical memory-system ordering or synchronization issues that arise in system programming. The effect of SYNC on implementation-specific aspects of the cached memory system, such as writeback buffers, is not defined.

# Processor A (writer)
# Conditions at entry:
# The value 0 has been stored in FLAG and that value is observable by B
SW R1, DATA          # change shared DATA value
LI   R2, 1          SYNC # Perform DATA store before performing FLAG store
SW   R2, FLAG       # say that the shared DATA value is valid

# Processor B (reader)
LI   R2, 1
1: LW    R1, FLAG  # Get FLAG
BNE    R2, R1, 1B  # if it says that DATA is not valid, poll again
NOP
SYNC   # FLAG value checked before doing DATA read
LW    R1, DATA     # Read (valid) shared DATA value

The code fragments above shows how SYNC can be used to coordinate the use of shared data between separate writer and reader instruction streams in a multiprocessor environment. The FLAG location is used by the instruction streams to determine whether the shared data item DATA is valid. The SYNC executed by processor A forces the store of DATA to be performed globally before the store to FLAG is performed. The SYNC executed by processor B ensures that DATA is not read until after the FLAG value indicates that the shared data is valid.

Software written to use a SYNC instruction with a non-zero stype value, expecting one type of barrier behavior, should only be run on hardware that actually implements the expected barrier behavior for that non-zero stype value or on hardware which implements a superset of the behavior expected by the software for that stype value. If the hardware does not perform the barrier behavior expected by the software, the system may fail.
SYNCI is a synchronization instruction to make instruction writes effective.

**Format:**
```
SYNCI offset(base)
```

**Purpose:** Synchronize Caches to Make Instruction Writes Effective

To synchronize all caches to make instruction writes effective.

**Description:**
This instruction is used after a new instruction stream is written to make the new instructions effective relative to an instruction fetch, when used in conjunction with the SYNC and JALR.HB, JR.HB, or ERET instructions, as described below. Unlike the CACHE instruction, the SYNCI instruction is available in all operating modes in an implementation of Release 2 of the architecture.

The 16-bit offset is sign-extended and added to the contents of the base register to form an effective address. The effective address is used to address the cache line in all caches which may need to be synchronized with the write of the new instructions. The operation occurs only on the cache line which may contain the effective address. One SYNCI instruction is required for every cache line that was written. See the Programming Notes below.

A TLB Refill and TLB Invalid (both with cause code equal TLBL) exception can occur as a byproduct of this instruction. This instruction never causes TLB Modified exceptions nor TLB Refill exceptions with a cause code of TLBS. This instruction never causes Execute-Inhibit nor Read-Inhibit exceptions.

A Cache Error exception may occur as a byproduct of this instruction. For example, if a writeback operation detects a cache or bus error during the processing of the operation, that error is reported via a Cache Error exception. Similarly, a Bus Error Exception may occur if a bus operation invoked by this instruction is terminated in an error.

An Address Error Exception (with cause code equal AdEL) may occur if the effective address references a portion of the kernel address space which would normally result in such an exception. It is implementation dependent whether such an exception does occur.

It is implementation dependent whether a data watch is triggered by a SYNCI instruction whose address matches the Watch register address match conditions.

**Restrictions:**
The operation of the processor is UNPREDICTABLE if the effective address references any instruction cache line that contains instructions to be executed between the SYNCI and the subsequent JALR.HB, JR.HB, or ERET instruction required to clear the instruction hazard.

The SYNCI instruction has no effect on cache lines that were previously locked with the CACHE instruction. If correct software operation depends on the state of a locked line, the CACHE instruction must be used to synchronize the caches.

Full visibility of the new instruction stream requires execution of a subsequent SYNC instruction, followed by a JALR.HB, JR.HB, DERET, or ERET instruction. The operation of the processor is UNPREDICTABLE if this sequence is not followed.

**SYNCI globalization:**
The SYNCI instruction acts on the current processor at a minimum. Implementations are required to affect caches outside the current processor to perform the operation on the current processor (as might be the case if multiple processors share an L2 or L3 cache).
In multiprocessor implementations where instruction caches are coherently maintained by hardware, the SYNCI instruction should behave as a NOP instruction.

In multiprocessor implementations where instruction caches are not coherently maintained by hardware, the SYNCI instruction may optionally affect all coherent icaches within the system. If the effective address uses a coherent Cacheability and Coherency Attribute (CCA), then the operation may be *globalized*, meaning it is broadcast to all of the coherent instruction caches within the system. If the effective address does not use one of the coherent CCAs, there is no broadcast of the SYNCI operation. If multiple levels of caches are to be affected by one SYNCI instruction, all of the affected cache levels must be processed in the same manner - either all affected cache levels use the globalized behavior or all affected cache levels use the non-globalized behavior.

Pre-Release 6: Portable software could not rely on the optional *globalization* of SYNCI. Strictly portable software without implementation specific awareness could only rely on expensive “instruction cache shootdown” using interprocessor interrupts.

Release 6: SYNCI *globalization* is required. Compliant implementations must globalize SYNCI, and portable software can rely on this behavior.

**Operation:**

\[
\text{vaddr} \leftarrow \text{GPR[base]} + \text{sign extend(offset)}
\]

\[
\text{SynchronizeCacheLines(vaddr)} \quad \text{/* Operate on all caches */}
\]

**Exceptions:**

Reserved Instruction exception (Release 1 implementations only)
TLB Refill Exception
TLB Invalid Exception
Address Error Exception
Cache Error Exception
Bus Error Exception

**Programming Notes:**

When the instruction stream is written, the SYNCI instruction should be used in conjunction with other instructions to make the newly-written instructions effective. The following example shows a routine which can be called after the new instruction stream is written to make those changes effective. The SYNCI instruction could be replaced with the corresponding sequence of CACHE instructions (when access to Coprocessor 0 is available), and that the JR.HB instruction could be replaced with JALR.HB, ERET, or DERET instructions, as appropriate. A SYNC instruction is required between the final SYNCI instruction in the loop and the instruction that clears instruction hazards.

```
/*
This routine makes changes to the instruction stream effective to the
hardware. It should be called after the instruction stream is written.
On return, the new instructions are effective.
*
* Inputs:
* a0 = Start address of new instruction stream
* a1 = Size, in bytes, of new instruction stream
*/
beq a1, zero, 20f /* If size==0, */
nop /* branch around */
addu a1, a0, a1 /* Calculate end address + 1 */
rdhwr v0, HW_SYNCLI_Step /* Get step size for SYNCI from new */
|  Release 2 instruction */
beq v0, zero, 20f /* If no caches require synchronization, */
nop /* branch around */
```
SYNCI

Synchronize Caches to Make Instruction Writes Effective

10: synci 0(a0) /* Synchronize all caches around address */
    addu a0, a0, v0 /* Add step size in delay slot */
    sltu v1, a0, a1 /* Compare current with end address */
    bne v1, zero, 10b /* Branch if more to do */
    sync /* Clear memory hazards */
    jr.hb ra /* Return, clearing instruction hazards */
    nop

### SYSCALL

**Purpose:** System Call  
To cause a System Call exception.

**Description:**  
A system call exception occurs, immediately and unconditionally transferring control to the exception handler.  
The `code` field is available for use as software parameters, but is retrieved by the exception handler only by loading the contents of the memory word containing the instruction.

**Restrictions:**  
None

**Operation:**  
```
SignalException(SystemCall)
```

**Exceptions:**  
System Call
**TEQ ITrap if Equal**

**Format:** TEQ rs, rt

**Purpose:** Trap if Equal
To compare GPRs and do a conditional trap.

**Description:** if GPR[rs] = GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers. If GPR rs is equal to GPR rt, then take a Trap exception.

The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.

**Restrictions:**
None

**Operation:**
if GPR[rs] = GPR[rt] then
   SignalException(Trap)
endif

**Exceptions:**
Trap
**TEQI** Trap if Equal Immediate

**Format:** TEQI rs, immediate

**Purpose:** Trap if Equal Immediate
To compare a GPR to a constant and do a conditional trap.

**Description:** if GPR[rs] = immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers. If GPR rs is equal to immediate, then take a Trap exception.

**Restrictions:**
None

**Availability and Compatibility:**
This instruction has been removed in Release 6.

**Operation:**

```
if GPR[rs] = sign_extend(immediate) then
  SignalException(Trap)
endif
```

**Exceptions:**
Traps
**TGE**

**Trap if Greater or Equal**

*Purpose:* Trap if Greater or Equal

To compare GPRs and do a conditional trap.

*Description:* if GPR[rs] ≥ GPR[rt] then Trap

Compare the contents of GPR rs and GPR rt as signed integers. If GPR rs is greater than or equal to GPR rt, then take a Trap exception.

The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, the system software must load the instruction word from memory.

*Restrictions:* None

*Operation:*

```c
if GPR[rs] ≥ GPR[rt] then
    SignalException(Trap)
endif
```

*Exceptions:* Trap
TGEI Trap if Greater or Equal Immediate

Format:  TGEI rs, immediate

MIPS32, removed in Release 6

Purpose: Trap if Greater or Equal Immediate
To compare a GPR to a constant and do a conditional trap.

Description: if GPR[rs] ≥ immediate then Trap
Compare the contents of GPR rs and the 16-bit signed immediate as signed integers. If GPR rs is greater than or equal to immediate, then take a Trap exception.

Restrictions:
None

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
if GPR[rs] ≥ sign_extend(immediate) then
   SignalException(Trap)
endif

Exceptions:
Trap
TGEIU

Trap if Greater or Equal Immediate Unsigned

Format:  TGEIU rs, immediate

MIPS32, removed in Release 6

Purpose:  Trap if Greater or Equal Immediate Unsigned
To compare a GPR to a constant and do a conditional trap.

Description:  if GPR[rs] ≥ immediate then Trap
Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers. If GPR rs is greater than or equal to immediate, then take a Trap exception.

Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.

Restrictions:
None

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
if (0 || GPR[rs]) ≥ (0 || sign_extend(immediate)) then
   SignalException(Trap)
endif

Exceptions:
Trap
TGEU Trap if Greater or Equal Unsigned

<table>
<thead>
<tr>
<th>Format:</th>
<th>TGEU rs, rt</th>
</tr>
</thead>
</table>

**Purpose:** Trap if Greater or Equal Unsigned

To compare GPRs and do a conditional trap.

**Description:** if GPR[rs] ≥ GPR[rt] then Trap

Compare the contents of GPR rs and GPR rt as unsigned integers. If GPR rs is greater than or equal to GPR rt, then take a Trap exception.

The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, the system software must load the instruction word from memory.

**Restrictions:**

None

**Operation:**

```c
if (0 || GPR[rs]) ≥ (0 || GPR[rt]) then
    SignalException(Trap)
endif
```

**Exceptions:**

Trap
**Format:** TLBINV

**Purpose:** TLB Invalidate

TLBINV invalidates a set of TLB entries based on ASID and Index match. The virtual address is ignored in the entry match. TLB entries which have their G bit set to 1 are not modified.

Implementation of the TLBINV instruction is optional. The implementation of this instruction is indicated by the IE field in `Config4`.

Support for TLBINV is recommend for implementations supporting VTLB/FTLB type of MMU.

Implementation of `EntryHI_EHINV` field is required for implementation of TLBINV instruction.

**Description:**

On execution of the TLBINV instruction, the set of TLB entries with matching ASID are marked invalid, excluding those TLB entries which have their G bit set to 1.

The `EntryHI_ASID` field has to be set to the appropriate ASID value before executing the TLBINV instruction.

Behavior of the TLBINV instruction applies to all applicable TLB entries and is unaffected by the setting of the `Wired` register.

- For JTLB-based MMU (`Config_MT=1`):
  
  All matching entries in the JTLB are invalidated. The `Index` register is unused.

- For VTLB/FTLB -based MMU (`Config_MT=4`):
  
  If TLB invalidate walk is implemented in software (`Config_IE=2`), then software must do these steps to flush the entire MMU:
  
  1. one TLBINV instruction is executed with an index in VTLB range (invalidates all matching VTLB entries)
  2. a TLBINV instruction is executed for each FTLB set (invalidates all matching entries in FTLB set)

  If TLB invalidate walk is implemented in hardware (`Config_IE=3`), then software must do these steps to flush the entire MMU:
  
  1. one TLBINV instruction is executed (invalidates all matching entries in both FTLB & VTLB). In this case, `Index` is unused.

**Restrictions:**

When `Config_MT = 4` and `Config_IE = 2`, the operation is **UNDEFINED** if the contents of the `Index` register are greater than or equal to the number of available TLB entries.

If access to Coprocessor 0 is not enabled, a Coprocessor Usable Exception is signaled.

**Availability and Compatibility:**

Implementation of the TLBINV instruction is optional. The implementation of this instruction is indicated by the IE
field in \textit{Config4}.

Implementation of \textit{EntryHI\_EHINV} field is required for implementation of TLBINV instruction.

Pre-Release 6, support for TLBINV is recommended for implementations supporting VTLB/FTLB type of MMU. Release 6 (and subsequent releases) support for TLBINV is required for implementations supporting VTLB/FTLB type of MMU.

Release 6: On processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU (\textit{ConfigMT} = 2 or 3), the operation of this instruction causes a Reserved Instruction exception (RI).

\textbf{Operation:}

\begin{verbatim}
if ( ConfigMT=1 or (ConfigMT=4 & Config4\_IE=2 & Index < VTLBsize()) )
    startnum ← 0
    endnum ← VTLBsize() - 1
endif

// treating VTLB and FTLB as one array
if ( ConfigMT=4 & Config4\_IE=2 & Index ≥ VTLBsize(); )
    startnum ← start of selected FTLB set // implementation specific
    endnum ← end of selected FTLB set - 1 // implementation specific
endif

if (ConfigMT=4 & Config4\_IE=3)
    startnum ← 0
    endnum ← VTLBsize() + FTLBsize() - 1;
endif

for (i = startnum to endnum)
    if (TLB[i]ASID = EntryHi\_ASID & TLB[i]G = 0)
        TLB[i]VPN\_invalid ← 1
    endif
endfor
\end{verbatim}

\textbf{Exceptions:}

Coprocessor Unusable,
### TLBINVF

**Format:** TLBINVF

**Purpose:** TLB Invalidate Flush

TLBINVF invalidates a set of TLB entries based on Index match. The virtual address and ASID are ignored in the entry match.

Implementation of the TLBINVF instruction is optional. The implementation of this instruction is indicated by the IE field in Config4.

Support for TLBINVF is recommend for implementations supporting VTLB/FTLB type of MMU.

Implementation of the EntryHIEINV field is required for implementation of TLBINV and TLBINVF instructions.

**Description:**

On execution of the TLBINVF instruction, all entries within range of Index are invalidated.

Behavior of the TLBINVF instruction applies to all applicable TLB entries and is unaffected by the setting of the Wired register.

- For JTLB-based MMU (ConfigMT=1):
  
  TLBINVF causes all entries in the JTLB to be invalidated. Index is unused.

- For VTLB/FTLB-based MMU (ConfigMT=4):

  If TLB invalidate walk is implemented in your software (Config4IE=2), then your software must do these steps to flush the entire MMU:

  1. one TLBINVF instruction is executed with an index in VTLB range (invalidates all VTLB entries)
  2. a TLBINVF instruction is executed for each FTLB set (invalidates all entries in FTLB set)

  If TLB invalidate walk is implemented in hardware (Config4IE=3), then software must do these steps to flush the entire MMU:

  1. one TLBINVF instruction is executed (invalidates all entries in both FTLB & VTLB). In this case, Index is unused.

**Restrictions:**

When ConfigMT=4 and ConfigIE=2, the operation is UNDEFINED if the contents of the Index register are greater than or equal to the number of available TLB entries.

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

**Availability and Compatibility:**

Implementation of the TLBINVF instruction is optional. The implementation of this instruction is indicated by the IE field in Config4.

Implementation of EntryHIEINV field is required for implementation of TLBINVF instruction.

---

<table>
<thead>
<tr>
<th>COP0</th>
<th>CO</th>
<th>0</th>
<th>000 0000 0000 0000 0000</th>
<th>TLBINVF</th>
<th>000100</th>
</tr>
</thead>
<tbody>
<tr>
<td>010000</td>
<td>1</td>
<td>0</td>
<td>0000000000000000</td>
<td></td>
<td></td>
</tr>
<tr>
<td>31 26 25 24 65 0</td>
<td>CO</td>
<td>0</td>
<td>000 0000 0000 0000 0000</td>
<td>TLBINVF</td>
<td>000100</td>
</tr>
</tbody>
</table>
Pre-Release 6, support for TLBINVF is recommended for implementations supporting VTLB/FTLB type of MMU. Release 6 (and subsequent releases) support for TLBINV is required for implementations supporting VTLB/FTLB type of MMU.

Release 6: On processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU ($Config_{MT} = 2$ or $3$), the operation of this instruction causes a Reserved Instruction exception (RI).

**Operation:**

```
if ( Config_{MT}=1 or (Config_{MT}=4 & Config4_{IE}=2 & Index < VTLBsize()) )
   startnum ← 0
   endnum ← VTLBsize() - 1
endif

// treating VTLB and FTLB as one array
if (Config_{MT}=4 & Config4_{IE}=2 & Index ≥ VTLBsize(); )
   startnum ← start of selected FTLB set // implementation specific
   endnum ← end of selected FTLB set - 1 // implementation specific
endif

if (Config_{MT}=4 & Config4_{IE}=3))
   startnum ← 0
   endnum ← TLBsize() + FTLBsize() - 1;
endif

for (i = startnum to endnum)
   TLB[i].VPN2_invalid ← 1
endfor

function VTLBsize
   SizeExt = ArchRev() ≥ 6
   ? Config4VTLBSizeExt
   : Config4MMUExtDef == 3
      ? Config4VTLBSizeExt
      : Config4MMUExtDef == 1
         ? Config4MMUSizeExt
         : 0
   ;
   return 1 + ((SizeExt << 6) | Config1.MMUSize);
endfunction

function FTLBsize
   if ( Config1_{MT} == 4 ) then
      return (Config4FTLBWays + 2) * (1 << C0_Config4FTLBSets);
   else
      return 0;
   endif
endfunction
```

**Exceptions:**

Coprocessor Unusable,
Format: TLBP

Purpose: Probe TLB for Matching Entry
To find a matching entry in the TLB.

Description:
The Index register is loaded with the address of the TLB entry whose contents match the contents of the EntryHi register. If no TLB entry matches, the high-order bit of the Index register is set.

• In Release 1 of the Architecture, it is implementation dependent whether multiple TLB matches are detected on a TLBP. However, implementations are strongly encouraged to report multiple TLB matches only on a TLB write.
• In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write.
• In Release 3 of the Architecture, multiple TLB matches may be reported on either TLB write or TLB probe.

Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Release 6: Processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU (ConfigMT = 2 or 3), the operation of this instruction causes a Reserved Instruction exception (RI).

Operation:

Index ← 1 || UNPREDICTABLE31
for i in 00 ... TLBEntries-1
    if ((TLB[i]VPN2 and not (TLB[i]Mask)) = (EntryHiVPN2 and not (TLB[i]Mask))) and
       ((TLB[i]Q = 1) or (TLB[i]ASID = EntryHiASID))then
        Index ← i
    endif
endfor

Exceptions:
Coprocessor Unusable, Machine Check
**TLBR**

### Purpose:
Read Indexed TLB Entry

To read an entry from the TLB.

### Description:
The `EntryHi`, `EntryLo0`, `EntryLo1`, and `PageMask` registers are loaded with the contents of the TLB entry pointed to by the `Index` register.

- In Release 1 of the Architecture, it is implementation dependent whether multiple TLB matches are detected on a TLBR. However, implementations are strongly encouraged to report multiple TLB matches only on a TLB write.
- In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write.
- In Release 3 of the Architecture, multiple TLB matches may be detected on a TLBR.

In an implementation supporting TLB entry invalidation (`Config4IE ≥ 1`), reading an invalidated TLB entry causes `EntryLo0` and `EntryLo1` to be set to 0, `EntryHiEHINV` to be set to 1, all other `EntryHi` bits to be set to 0, and `PageMask` to be set to a value representing the minimum supported page size.

The value written to the `EntryHi`, `EntryLo0`, and `EntryLo1` registers may be different from the original written value to the TLB via these registers in that:

- The value returned in the `VPN2` field of the `EntryHi` register may have those bits set to zero corresponding to the one bits in the Mask field of the TLB entry (the least-significant bit of `VPN2` corresponds to the least-significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed after a TLB entry is written and then read.
- The value returned in the `PFN` field of the `EntryLo0` and `EntryLo1` registers may have those bits set to zero corresponding to the one bits in the Mask field of the TLB entry (the least significant bit of `PFN` corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed after a TLB entry is written and then read.
- The value returned in the `G` bit in both the `EntryLo0` and `EntryLo1` registers comes from the single `G` bit in the TLB entry. Recall that this bit was set from the logical AND of the two `G` bits in `EntryLo0` and `EntryLo1` when the TLB was written.

### Restrictions:
The operation is **UNDEFINED** if the contents of the Index register are greater than or equal to the number of TLB entries in the processor.

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Release 6: Processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU (`ConfigMT = 2` or 3), the operation of this instruction causes a Reserved Instruction exception (RI).

### Operation:
```plaintext
i ← Index
if i > (TLBEntries - 1) then
    UNDEFINED
endif
```
if ( (Config4_IE ≥ 1) and TLB[i]_VPN2_invalid = 1) then
  PageMask_Mask ← 0 // or value representing minimum page size
  EntryHi ← 0
  EntryLo1 ← 0
  EntryLo0 ← 0
  EntryHi_EHINV ← 1
else
  PageMask_Mask ← TLB[i]_Mask
  EntryHi ← (TLB[i]_VPN2 and not TLB[i]_Mask) || # Masking implem dependent
  0^5 || TLB[i]_ASID
  EntryLo1 ← 0^2 ||
  (TLB[i]_PFN1 and not TLB[i]_Mask) || # Masking implem dependent
  TLB[i]_C1 || TLB[i]_D1 || TLB[i]_V1 || TLB[i]_G
  EntryLo0 ← 0^2 ||
  (TLB[i]_PFN0 and not TLB[i]_Mask) || # Masking implem dependent
  TLB[i]_C0 || TLB[i]_D0 || TLB[i]_V0 || TLB[i]_G
endif

Exceptions:
Coprocessor Unusable, Machine Check
**TLBWI** | **Write Indexed TLB Entry**

Format: TLBWI  
MIPS32

**Purpose:** Write Indexed TLB Entry  
To write or invalidate a TLB entry indexed by the Index register.

**Description:**  
If \( Config4IE = 0 \) or \( EntryHi_{EHINV} = 0 \):

The TLB entry pointed to by the Index register is written from the contents of the \( EntryHi, EntryLo0, EntryLo1 \), and \( PageMask \) registers. It is implementation dependent whether multiple TLB matches are detected on a TLBWI. In such an instance, a Machine Check Exception is signaled.

In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. The information written to the TLB entry may be different from that in the \( EntryHi, EntryLo0, \) and \( EntryLo1 \) registers, in that:

- The value written to the VPN2 field of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of the \( PageMask \) register (the least significant bit of VPN2 corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.

- The value written to the PFN0 and PFN1 fields of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of \( PageMask \) register (the least significant bit of PFN corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.

- The single G bit in the TLB entry is set from the logical AND of the G bits in the \( EntryLo0 \) and \( EntryLo1 \) registers.

If \( Config4IE \geq 1 \) and \( EntryHi_{EHINV} = 1 \):

The TLB entry pointed to by the Index register has its VPN2 field marked as invalid. This causes the entry to be ignored on TLB matches for memory accesses. No Machine Check is generated.

**Restrictions:**  
The operation is **UNDEFINED** if the contents of the Index register are greater than or equal to the number of TLB entries in the processor.

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Release 6: Processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU \( (ConfigMT = 2 \) or 3), the operation of this instruction causes a Reserved Instruction exception (RI).

**Operation:**

\[
i \leftarrow \text{Index} \\
\text{if } (Config4IE \geq 1) \text{ then} \\
\quad \text{TLB}[i].\text{VPN2_invalid} \leftarrow 0 \\
\text{if } (EntryHi_{EHINV} = 1) \text{ then}
\]

The MIPS32© Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
TLBWI

Write Indexed TLB Entry

TLB[i]VPN2_invalid ← 1
break
endif
endif

TLB[i]Mask ← PageMaskMask
TLB[i]VPN2 ← EntryHiVPN2 and not PageMaskMask # Implementation dependent
TLB[i]ASID ← EntryHiASID
TLB[i]G ← EntryLo1G and EntryLo0G
TLB[i]PFN1 ← EntryLo1PFN and not PageMaskMask # Implementation dependent
TLB[i]C1 ← EntryLo1C
TLB[i]D1 ← EntryLo1D
TLB[i]V1 ← EntryLo1V
TLB[i]PFN0 ← EntryLo0PFN and not PageMaskMask # Implementation dependent
TLB[i]C0 ← EntryLo0C
TLB[i]D0 ← EntryLo0D
TLB[i]V0 ← EntryLo0V

Exceptions:

Coprocessor Unusable, Machine Check
TLBWR

Format: TLBWR

Purpose: Write Random TLB Entry

To write a TLB entry indexed by the Random register, or, in Release 6, write a TLB entry indexed by an implementation-defined location.

Description:
The TLB entry pointed to by the Random register is written from the contents of the EntryHi, EntryLo0, EntryLo1, and PageMask registers. It is implementation dependent whether multiple TLB matches are detected on a TLBWR. In such an instance, a Machine Check Exception is signaled.

In Release 6, the Random register has been removed. References to Random refer to an implementation-determined value that is not visible to software.

In Release 2 of the Architecture, multiple TLB matches may only be reported on a TLB write. The information written to the TLB entry may be different from that in the EntryHi, EntryLo0, and EntryLo1 registers, in that:

- The value written to the VPN2 field of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of the PageMask register (the least significant bit of VPN2 corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.

- The value written to the PFN0 and PFN1 fields of the TLB entry may have those bits set to zero corresponding to the one bits in the Mask field of PageMask register (the least significant bit of PFN corresponds to the least significant bit of the Mask field). It is implementation dependent whether these bits are preserved or zeroed during a TLB write.

- The single G bit in the TLB entry is set from the logical AND of the G bits in the EntryLo0 and EntryLo1 registers.

Restrictions:
If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Release 6: Processors that include a Block Address Translation (BAT) or Fixed Mapping (FM) MMU (ConfigMT = 2 or 3), the operation of this instruction causes a Reserved Instruction exception (RI).

Operation:

\[
\begin{align*}
i & \leftarrow \text{Random} \\
\text{if (Config4\_IE} \geq 1) \text{ then} \\
\text{TLB}[i]_{\text{VPN2\_invalid}} & \leftarrow 0 \\
\text{endif} \\
\text{TLB}[i]_{\text{Mask}} & \leftarrow \text{PageMaskMask} \\
\text{TLB}[i]_{\text{VPN2}} & \leftarrow \text{EntryHi}\_\text{VPN2} \text{ and not PageMaskMask} \# \text{Implementation dependent} \\
\text{TLB}[i]_{\text{ASID}} & \leftarrow \text{EntryHi}\_\text{ASID} \\
\text{TLB}[i]_{g} & \leftarrow \text{EntryLo1\_g} \text{ and EntryLo0\_g} \\
\text{TLB}[i]_{\text{PFN1}} & \leftarrow \text{EntryLo1}\_\text{PFN} \text{ and not PageMaskMask} \# \text{Implementation dependent} \\
\text{TLB}[i]_{c1} & \leftarrow \text{EntryLo1}\_c \\
\text{TLB}[i]_{d1} & \leftarrow \text{EntryLo1}\_d \\
\text{TLB}[i]_{v1} & \leftarrow \text{EntryLo1}\_v \\
\text{TLB}[i]_{\text{PFN0}} & \leftarrow \text{EntryLo0}\_\text{PFN} \text{ and not PageMaskMask} \# \text{Implementation dependent}
\end{align*}
\]
TLBWR

Write Random TLB Entry

TLB[i]_C0 ← EntryLo0_C
TLB[i]_D0 ← EntryLo0_D
TLB[i]_V0 ← EntryLo0_V

Exceptions:

Coprocessor Unusable, Machine Check
TLT ITrap if Less Than

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.

Format: TLT rs, rt

Purpose: Trap if Less Than
To compare GPRs and do a conditional trap.

Description: if GPR[rs] < GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers. If GPR rs is less than GPR rt, then take a Trap exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.

Restrictions:
None

Operation:
if GPR[rs] < GPR[rt] then
   SignalException(Trap)
endif

Exceptions:
Trap
Format:  TLTI rs, immediate

Purpose:  Trap if Less Than Immediate
          To compare a GPR to a constant and do a conditional trap.

Description:  if GPR[rs] < immediate then Trap

Compare the contents of GPR rs and the 16-bit signed immediate as signed integers. If GPR rs is less than immediate, then take a Trap exception.

Restrictions:
None

Availability and Compatibility:
This instruction has been removed in Release 6.

Operation:
   if GPR[rs] < sign_extend(immediate) then
      SignalException(Trap)
   endif

Exceptions:
Trap
**TLTIU**  
Trap if Less Than Immediate Unsigned

**Format:**  
TLTIU rs, immediate

**MIPS32, removed in Release 6**

**Purpose:**  
Trap if Less Than Immediate Unsigned

To compare a GPR to a constant and do a conditional trap.

**Description:**  
if GPR[rs] < immediate then Trap

Compare the contents of GPR rs and the 16-bit sign-extended immediate as unsigned integers. If GPR rs is less than immediate, then take a Trap exception.

Because the 16-bit immediate is sign-extended before comparison, the instruction can represent the smallest or largest unsigned numbers. The representable values are at the minimum [0, 32767] or maximum [max_unsigned-32767, max_unsigned] end of the unsigned range.

**Restrictions:**  
None

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**

```plaintext
if (0 || GPR[rs]) < (0 || sign_extend(immediate)) then
    SignalException(Trap)
endif
```

**Exceptions:**

Trap
TLTU Trap if Less Than Unsigned

Format:  TLTU rs, rt

Purpose:  Trap if Less Than Unsigned
To compare GPRs and do a conditional trap.

Description:  if GPR[rs] < GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as unsigned integers. If GPR rs is less than GPR rt, then take a Trap exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software.
To retrieve the information, system software must load the instruction word from memory.

Restrictions:
None

Operation:
if (0 || GPR[rs]) < (0 || GPR[rt]) then
    SignalException(Trap)
endif

Exceptions:
Trap
TNE

Trap if Not Equal

Format: TNE rs, rt

Purpose: Trap if Not Equal
To compare GPRs and do a conditional trap.

Description: if GPR[rs] ≠ GPR[rt] then Trap
Compare the contents of GPR rs and GPR rt as signed integers. If GPR rs is not equal to GPR rt, then take a Trap exception.
The contents of the code field are ignored by hardware and may be used to encode information for system software. To retrieve the information, system software must load the instruction word from memory.

Restrictions:
None

Operation:
if GPR[rs] ≠ GPR[rt] then
    SignalException(Trap)
endif

Exceptions:
Trap
**Format:**  
TNEI rs, immediate

**Purpose:**  
Trap if Not Equal Immediate  
To compare a GPR to a constant and do a conditional trap.

**Description:**  
if GPR[rs] ≠ immediate then Trap  
Compare the contents of GPR rs and the 16-bit signed `immediate` as signed integers. If GPR rs is not equal to `immediate`, then take a Trap exception.

**Restrictions:**  
None

**Availability and Compatibility:**  
This instruction has been removed in Release 6.

**Operation:**  
if GPR[rs] ≠ sign_extend(immediate) then  
        SignalException(Trap)  
endif

**Exceptions:**  
Trap
TRUNC.L.fmt

Floating Point Truncate to Long Fixed Point

<table>
<thead>
<tr>
<th>COP1</th>
<th>fmt</th>
<th>0</th>
<th>fs</th>
<th>fd</th>
<th>TRUNC.L.fmt</th>
</tr>
</thead>
<tbody>
<tr>
<td>010001</td>
<td>00000</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>001001</td>
</tr>
</tbody>
</table>

Format: TRUNC.L.fmt
TRUNC.L.S fd, fs
TRUNC.L.D fd, fs

MIPS64, MIPS32 Release 2
MIPS64, MIPS32 Release 2

Purpose: Floating Point Truncate to Long Fixed Point
To convert an FP value to 64-bit fixed point, rounding toward zero.

Description: FPR[fd] ← convert_and_round(FPR[fs])
The value in FPR fs, in format fmt, is converted to a value in 64-bit long-fixed point format and rounded toward zero (rounding mode 1). The result is placed in FPR fd.

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{63}\) to \(2^{63}-1\), the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to fd and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to fd. On cores with FCSR\_NAN2008=0, the default result is \(2^{63}-1\). On cores with FCSR\_NAN2008=1, the default result is:
- 0 when the input value is NaN
- \(2^{63}-1\) when the input value is \(+\infty\) or rounds to a number larger than \(2^{63}-1\)
- \(-2^{63}-1\) when the input value is \(-\infty\) or rounds to a number smaller than \(-2^{63}-1\)

Restrictions:
The fields fs and fd must specify valid FPRs: fs for type fmt and fd for long fixed point. If the fields are not valid, the result is UNPREDICTABLE.
The operand must be a value in format fmt; if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.
The result of this instruction is UNPREDICTABLE if the processor is executing in the FR=0 32-bit FPU register model; it is predictable if executing on a 64-bit FPU in the FR=1 mode, but not with FR=0, and not on a 32-bit FPU.

Operation:
StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L))

Exceptions:
Coprocessor Unusable, Reserved Instruction

Floating Point Exceptions:
Unimplemented Operation, Invalid Operation, Inexact
TRUNC.W.fmt  Floating Point Truncate to Word Fixed Point

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>6</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP1</td>
<td>fmt</td>
<td>0</td>
<td>0</td>
<td>fs</td>
<td>fd</td>
<td>TRUNC.W</td>
<td>001101</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

TRUNC.W.fmt
TRUNC.W.S fd, fs
TRUNC.W.D fd, fs

**Purpose:** Floating Point Truncate to Word Fixed Point
To convert an FP value to 32-bit fixed point, rounding toward zero.

**Description:**

\[ FPR[fd] \leftarrow convert_and_round(FPR[fs]) \]

The value in FPR \( fs \), in format \( fmt \), is converted to a value in 32-bit word fixed point format using rounding toward zero (rounding mode 1). The result is placed in FPR \( fd \).

When the source value is Infinity, NaN, or rounds to an integer outside the range \(-2^{31} \) to \( 2^{31}-1 \), the result cannot be represented correctly and an IEEE Invalid Operation condition exists. In this case the Invalid Operation flag is set in the FCSR. If the Invalid Operation Enable bit is set in the FCSR, no result is written to \( fd \) and an Invalid Operation exception is taken immediately. Otherwise, a default result is written to \( fd \). On cores with FCSR\_NAN2008=0, the default result is \( 2^{31}-1 \). On cores with FCSR\_NAN2008=1, the default result is:

- \( 0 \) when the input value is NaN
- \( 2^{31}-1 \) when the input value is \(+\infty\) or rounds to a number larger than \( 2^{31}-1 \)
- \(-2^{31}-1 \) when the input value is \(-\infty\) or rounds to a number smaller than \(-2^{31}-1 \)

**Restrictions:**

The fields \( fs \) and \( fd \) must specify valid FPRs: \( fs \) for type \( fmt \) and \( fd \) for word fixed point. If the fields are not valid, the result is UNPREDICTABLE.

The operand must be a value in format \( fmt \); if it is not, the result is UNPREDICTABLE and the value of the operand FPR becomes UNPREDICTABLE.

**Operation:**

\[ \text{StoreFPR}(fd, W, \text{ConvertFmt}(\text{ValueFPR}(fs, fmt), fmt, W)) \]

**Exceptions:**

Coprocessor Unusable, Reserved Instruction

**Floating Point Exceptions:**

Inexact, Invalid Operation, Unimplemented Operation
WAIT
Enter Standby Mode

Purpose: Enter Standby Mode
Wait for Event

Description:
The WAIT instruction performs an implementation-dependent operation, involving a lower power mode. Software may use the code bits of the instruction to communicate additional information to the processor. The processor may use this information as control for the lower power mode. A value of zero for code bits is the default and must be valid in all implementations.

The WAIT instruction is implemented by stalling the pipeline at the completion of the instruction and entering a lower power mode. The pipeline is restarted when an external event, such as an interrupt or external request occurs, and execution continues with the instruction following the WAIT instruction. It is implementation-dependent whether the pipeline restarts when a non-enabled interrupt is requested. In this case, software must poll for the cause of the restart. The assertion of any reset or NMI must restart the pipeline and the corresponding exception must be taken.

If the pipeline restarts as the result of an enabled interrupt, that interrupt is taken between the WAIT instruction and the following instruction (EPC for the interrupt points at the instruction following the WAIT instruction).

In Release 6, the behavior of WAIT has been modified to make it a requirement that a processor that has disabled operation as a result of executing a WAIT will resume operation on arrival of an interrupt even if interrupts are not enabled.

In Release 6, the encoding of WAIT with bits 26:6 of the opcode set to 0 will never disable COP0 Count on an active WAIT instruction. In particular, this modification has been added to architecturally specify that COP0 Count is not disabled on execution of WAIT with default code of 0. Prior to Release 6, whether Count is disabled was implementation-dependent. In the future, other encodings of WAIT may be defined which specify other forms of power-saving or stand-by modes. If not implemented, then such unimplemented encodings must default to WAIT 0.

Restrictions:
Pre-Release 6: The operation of the processor is UNDEFINED if a WAIT instruction is executed in the delay slot of a branch or jump instruction.

Release 6: Implementations are required to signal a Reserved Instruction exception if WAIT is encountered in the delay slot or forbidden slot of a branch or jump instruction.

If access to Coprocessor 0 is not enabled, a Coprocessor Unusable Exception is signaled.

Operation:
Pre-Release 6:
\[
\begin{align*}
I: & \quad \text{Enter implementation dependent lower power mode} \\
I+1: & \quad /* \text{Potential interrupt taken here} */
\end{align*}
\]

Release 6:
\[
\begin{align*}
I: & \quad \text{if IsCoprocessorEnabled(0) then} \\
& \quad \text{while ( !interrupt_pending_and_not_masked_out() } \&\& \\
& \quad \quad \text{!implementation_dependent_wake_event() )} \\
& \quad \quad \langle \text{enter or remain in low power mode or stand-by mode} \rangle
\end{align*}
\]
else
    SignalException(CoprocessorUnusable, 0)
endif

I+1:  if ( interrupt_pending() && interrupts_enabled() ) then
    EPC ← PC + 4
    < process interrupt; execute ERET eventually >
else
    // unblock on non-enabled interrupt or imp dep wake event.
    PC ← PC + 4
    < continue execution at instruction after wait >
endif

function interrupt_pending_and_not_masked_out
    return (Config3VEIC && IntCtlVS && CauseIV && !StatusBEV)
        ? CauseRIPL > StatusIPL : CauseIP & StatusIM;
endfunction

function interrupts_enabled
    return StatusIE && !StatusEXL && !StatusERL && !DebugDM;
endfunction

function implementation_dependent_wake_event
    <return true if implementation dependent waking-up event occurs>
endfunction

Exceptions:

Coprocessor Unusable Exception
**WRPGPR**

**Write to GPR in Previous Shadow Set**

<table>
<thead>
<tr>
<th>COP0</th>
<th>WRPGPR</th>
<th>rt</th>
<th>rd</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100 00</td>
<td>01 110</td>
<td>5</td>
<td>5</td>
<td>000 0000 0000</td>
</tr>
</tbody>
</table>

**Format:** WRPGPR rd, rt

**Purpose:** Write to GPR in Previous Shadow Set
To move the contents of a current GPR to a GPR in the previous shadow set.

**Description:**

\[
\text{SGPR}[\text{SRSCtl}_{\text{PSS}}, \text{rd}] \leftarrow \text{GPR}[\text{rt}]
\]

The contents of the current GPR rt is moved to the shadow GPR register specified by SRSCtl_{PSS} (signifying the previous shadow set number) and rd (specifying the register number within that set).

**Restrictions:**
In implementations prior to Release 2 of the Architecture, this instruction resulted in a Reserved Instruction exception.

**Operation:**

\[
\text{SGPR}[\text{SRSCtl}_{\text{PSS}}, \text{rd}] \leftarrow \text{GPR}[\text{rt}]
\]

**Exceptions:**
Coprocessor Unusable, Reserved Instruction
WSBH

Format: WSBH rd, rt

Purpose: Word Swap Bytes Within Halfwords

To swap the bytes within each halfword of GPR rt and store the value into GPR rd.

Description: GPR[rd] ← SwapBytesWithinHalfwords(GPR[rt])

Within each halfword of GPR rt the bytes are swapped, and stored in GPR rd.

Restrictions:

In implementations prior to Release 2 of the architecture, this instruction resulted in a Reserved Instruction exception.

Operation:

GPR[rd] ← GPR[r]23..16 || GPR[r]31..24 || GPR[r]7..0 || GPR[r]15..8

Exceptions:

Reserved Instruction

Programming Notes:

The WSBH instruction can be used to convert halfword and word data of one endianness to another endianness. The endianness of a word value can be converted using the following sequence:

```assembly
lw   t0, 0(a1)    /* Read word value */
wsbh t0, t0      /* Convert endianness of the halfwords */
rotr t0, t0, 16  /* Swap the halfwords within the words */
```

Combined with SEH and SRA, two contiguous halfwords can be loaded from memory, have their endianness converted, and be sign-extended into two word values in four instructions. For example:

```assembly
lw   t0, 0(a1)    /* Read two contiguous halfwords */
wsbh t0, t0      /* Convert endianness of the halfwords */
seh t1, t0       /* t1 = lower halfword sign-extended to word */
seh t0, t0, 16   /* t0 = upper halfword sign-extended to word */
```

Zero-extended words can be created by changing the SEH and SRA instructions to ANDI and SRL instructions, respectively.
XOR

**Format:** XOR rd, rs, rt

**Purpose:** Exclusive OR

To do a bitwise logical Exclusive OR.

**Description:**

\[ \text{GPR}[rd] \leftarrow \text{GPR}[rs] \text{ XOR } \text{GPR}[rt] \]

Combine the contents of GPR rs and GPR rt in a bitwise logical Exclusive OR operation and place the result into GPR rd.

**Restrictions:**

None

**Operation:**

\[ \text{GPR}[rd] \leftarrow \text{GPR}[rs] \text{ xor } \text{GPR}[rt] \]

**Exceptions:**

None
Format: XORI rt, rs, immediate

Purpose: Exclusive OR Immediate
To do a bitwise logical Exclusive OR with a constant.

Description: GPR[rt] ← GPR[rs] XOR immediate
Combine the contents of GPR rs and the 16-bit zero-extended immediate in a bitwise logical Exclusive OR operation and place the result into GPR rt.

Restrictions:
None

Operation:
GPR[rt] ← GPR[rs] xor zero_extend(immediate)

Exceptions:
None
<table>
<thead>
<tr>
<th>XORI</th>
<th>Exclusive OR Immediate</th>
</tr>
</thead>
</table>

The MIPS32® Instruction Set Manual, Revision 6.04

Copyright © 2015 Imagination Technologies LTD. and/or its Affiliated Group Companies. All rights reserved.
Appendix A

Instruction Bit Encodings

A.1 Instruction Encodings and Instruction Classes

Instruction encodings are presented in this section; field names are printed here and throughout the book in *italics*.

When encoding an instruction, the primary *opcode* field is encoded first. Most *opcode* values completely specify an instruction that has an *immediate* value or offset.

*Opcode* values that do not specify an instruction instead specify an instruction class. Instructions within a class are further specified by values in other fields. For instance, *opcode* REGIMM specifies the *immediate* instruction class, which includes conditional branch and trap *immediate* instructions.

A.2 Instruction Bit Encoding Tables

This section provides various bit encoding tables for the instructions of the MIPS32® ISA.

Figure A.1 shows a sample encoding table and the instruction *opcode* field this table encodes. Bits 31..29 of the *opcode* field are listed in the leftmost columns of the table. Bits 28..26 of the *opcode* field are listed along the topmost rows of the table. Both decimal and binary values are given, with the first three bits designating the row, and the last three bits designating the column.

An instruction’s encoding is found at the intersection of a row (bits 31..29) and column (bits 28..26) value. For instance, the *opcode* value for the instruction labeled EX1 is 33 (decimal, row and column), or 011011 (binary). Similarly, the *opcode* value for EX2 is 64 (decimal), or 110100 (binary).

Release 6 introduces additional nomenclature to the opcode tables for Release 6 instructions. For new instructions, bits 31:26 are generically named POPXY where X is the row number, and Y is the column number. This convention is extended to sub-opcode tables, except bits 5:0 are generically named SOPXY, where X is the row number, and Y is the column number. This naming convention is applied where a specific encoded value may be shared by multiple instructions.
Tables A.2 through A.21 describe the encoding used for the MIPS32 ISA. Table A.1 describes the meaning of the symbols used in the tables.

**Table A.1 Symbols Used in the Instruction Encoding Tables**

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>*</td>
<td>Operation or field codes marked with this symbol are reserved for future use. Executing such an instruction must cause a Reserved Instruction exception. Note: Some instruction encodings are assigned to coprocessors (as indicated by COP0 or COP1 in the encoding table titles). For such instruction encodings, the Coprocessor Unavailable exception takes priority over the Reserved Instruction exception.</td>
</tr>
<tr>
<td>no marking</td>
<td>Many instructions are optional, or available only in certain configurations. As of Release 6, if a table entry would be empty in a particular configuration, then implementations are required to signal a Reserved Instruction exception when executed. Pre-Release 6 signalling a reserved instruction was not necessarily required, hence symbols such as * ∨ V ∆ which indicate when such signalling is required or present, and when not. In other words, as of Release 6 full instruction decoding, including detection of unused instructions, is assumed as the default.</td>
</tr>
<tr>
<td>δ</td>
<td>(Also italic field name.) Operation or field codes marked with this symbol denotes a field class. The instruction word must be further decoded by examining additional tables that show values for another instruction field.</td>
</tr>
<tr>
<td>β</td>
<td>Operation or field codes marked with this symbol represent a valid encoding for a higher-order MIPS ISA level or a new revision of the Architecture. Executing such an instruction must cause a Reserved Instruction exception.</td>
</tr>
</tbody>
</table>
### Table A.1 Symbols Used in the Instruction Encoding Tables (Continued)

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>V</td>
<td>Operation or field codes marked with this symbol represent instructions which were only legal if 64-bit operations were enabled on implementations of Release 1 of the Architecture. In Release 2 of the architecture, operation or field codes marked with this symbol represent instructions which are legal if 64-bit floating point operations are enabled. In other cases, executing such an instruction must cause a Reserved Instruction exception (non-coprocessor encodings or coprocessor instruction encodings for a coprocessor to which access is allowed) or a Coprocessor Unusable Exception (coprocessor instruction encodings for a coprocessor to which access is not allowed).</td>
</tr>
<tr>
<td>Δ</td>
<td>Instructions formerly marked V in some earlier versions of manuals, corrected and marked Δ in revision 5.03. Legal on MIPS64r1 but not MIPS32r1; in release 2 and above, legal in both MIPS64 and MIPS32, in particular even when running in “32-bit FPU Register File mode”, FR=0, as well as FR=1.</td>
</tr>
<tr>
<td>θ</td>
<td>Operation or field codes marked with this symbol are available to licensed MIPS partners. To avoid multiple conflicting instruction definitions, MIPS Technologies will assist the partner in selecting appropriate encodings if requested by the partner. The partner is not required to consult with MIPS Technologies when one of these encodings is used. If no instruction is encoded with this value, executing such an instruction must cause a Reserved Instruction exception (SPECIAL2 encodings or coprocessor instruction encodings for a coprocessor to which access is allowed) or a Coprocessor Unusable Exception (coprocessor instruction encodings for a coprocessor to which access is not allowed).</td>
</tr>
<tr>
<td>θ*</td>
<td>Release 6 reserves the SPECIAL2 encodings. pre-MIPS32 Release 2 the SPECIAL2 encodings were available for customer use as UDIs. Otherwise like θ above.</td>
</tr>
<tr>
<td>σ</td>
<td>Field codes marked with this symbol represent an EJTAG support instruction and implementation of this encoding is optional for each implementation. If the encoding is not implemented, executing such an instruction must cause a Reserved Instruction exception. If the encoding is implemented, it must match the instruction encoding as shown in the table.</td>
</tr>
<tr>
<td>ε</td>
<td>Operation or field codes marked with this symbol are reserved for MIPS optional Module or Application Specific Extensions. If the Module/ASE is not implemented, executing such an instruction must cause a Reserved Instruction exception.</td>
</tr>
<tr>
<td>ϕ</td>
<td>Operation or field codes marked with this symbol are obsolete and will be removed from a future revision of the MIPS32 ISA. Software should avoid using these operation or field codes.</td>
</tr>
<tr>
<td>⊕</td>
<td>Operation or field codes marked with this symbol are valid for Release 2 implementations of the architecture. Executing such an instruction in a Release 1 implementation must cause a Reserved Instruction exception.</td>
</tr>
<tr>
<td>6N</td>
<td>Instruction added by Release 6. “N” for “new”.</td>
</tr>
<tr>
<td>6Nm</td>
<td>New Release 6 encoding for a pre-Release 6 instruction that has been moved. “Nm” for “New (moved)”.</td>
</tr>
</tbody>
</table>
6Rm pre-Release 6 instruction encoding moved in Release 6. “Rm” for “Removed (moved elsewhere)
.
6R instruction encoding removed by Release 6. “R” for “Removed”.

Table A.2 MIPS32 Encoding of the Opcode Field

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>6Rm</td>
<td>pre-Release 6 instruction encoding moved in Release 6. “Rm” for “Removed (moved elsewhere)”.</td>
</tr>
<tr>
<td>6R</td>
<td>pre-Release 6 instruction encoding removed by Release 6. “R” for “Removed”.</td>
</tr>
</tbody>
</table>

1. Pre-Release 6 instruction LUI is a special case of Release 6 instruction AUI.
2. Architecture Release 1, the COP1X opcode was called COP3, and was available as another user-available coprocessor. Architecture Release 2, a full 64-bit floating point unit is available with 32-bit CPUs, and the COP1X opcode is reserved for that purpose on all Release 2 CPUs. 32-bit implementations of Release 1 of the architecture are strongly discouraged from using this opcode for a user-available coprocessor as doing so limits potential for an upgrade path for the FPU.
3. Architecture Release 2 added the SPECIAL3 opcode. Implementations of Release 1 of the Architecture signaled a Reserved Instruction exception for this opcode.

- A.2 Instruction Bit Encoding Tables

Table A.1 Symbols Used in the Instruction Encoding Tables (Continued)

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPECIAL</td>
<td>Pre-Release 6 instruction encoding moved in Release 6.</td>
</tr>
<tr>
<td>REGIMM</td>
<td>Pre-Release 6 instruction encoding removed by Release 6.</td>
</tr>
<tr>
<td>J</td>
<td></td>
</tr>
</tbody>
</table>
### Table A.3 MIPS32 SPECIAL Opcode Encoding of Function Field

<table>
<thead>
<tr>
<th>function</th>
<th>bits 2..0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0 00</td>
</tr>
<tr>
<td>0 001</td>
<td>Jε2,3,6R</td>
</tr>
<tr>
<td>0 010</td>
<td>MFHεRR</td>
</tr>
<tr>
<td>0 111</td>
<td>4MULTεRR</td>
</tr>
<tr>
<td>4 100</td>
<td>ADD</td>
</tr>
<tr>
<td>5 101</td>
<td></td>
</tr>
<tr>
<td>6 110</td>
<td>TGE</td>
</tr>
<tr>
<td>7 111</td>
<td></td>
</tr>
</tbody>
</table>

1. Specific encodings of the rt, rd, and sa fields are used to distinguish among the SLL, NOP, SSNOP, EHB and PAUSE functions. Release 6 makes SSNOP equivalent to NOP.

2. Specific encodings of the hint field are used to distinguish JR from JR.HB and JALR from JALR.HB.

3. Release 6 removes JR and JR.HB. JALR with rd=0 provides functionality equivalent to JR. JALR.HB with rd=0 provides functionality equivalent to JR.HB. Assemblers should produce the new instruction when encountering the old mnemonic.

4. Specific encodings of the sa field are used to distinguish pre-Release 6 and Release 6 integer multiply and divide instructions. See Table A.23 on page 455, which shows that the encodings do not conflict. The pre-Release 6 divide instructions signal Reserved Instruction exception on Release 6. Note that the same mnemonics are used for pre-Release 6 divide instructions that return both quotient and remainder, and Release 6 divide instructions that return only quotient, with separate MOD instructions for the remainder.

### Table A.4 MIPS32 REGIMM Encoding of rt Field

<table>
<thead>
<tr>
<th>rt</th>
<th>bits 18..16</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0 00</td>
</tr>
<tr>
<td>1</td>
<td>TGEεRR</td>
</tr>
<tr>
<td>2</td>
<td>BLTZ</td>
</tr>
<tr>
<td>3</td>
<td>BGEZ</td>
</tr>
</tbody>
</table>

1. NAL and BAL are assembly idioms prior to Release 6.
### Table A.5 MIPS32 SPECIAL2 Encoding of Function Field

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 000</td>
<td>MADD\textsuperscript{RR}</td>
<td>MADD\textsuperscript{UR}</td>
<td>MUL\textsuperscript{RR}</td>
<td>MUL\textsuperscript{UR}</td>
<td>MSUB\textsuperscript{RR}</td>
<td>MSUB\textsuperscript{UR}</td>
<td>MSUB\textsuperscript{RR}</td>
<td>MSUB\textsuperscript{UR}</td>
</tr>
<tr>
<td>1 001</td>
<td>(c)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>2 010</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>3 011</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>4 100</td>
<td>CLZ\textsuperscript{RRn</td>
<td>m}</td>
<td>CLZ\textsuperscript{URn</td>
<td>m}</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>5 101</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>6 110</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>7 111</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
<td>0(\ast)</td>
</tr>
</tbody>
</table>

### Table A.6 MIPS32 SPECIAL3\textsuperscript{1} Encoding of Function Field for Release 2 of the Architecture

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>000 100 200 300 400 500 600 700</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 000</td>
<td>EXT (\oplus)</td>
</tr>
<tr>
<td>1 100</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>2 010</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>3 011</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>4 101</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>5 110</td>
<td>0(\ast)</td>
</tr>
<tr>
<td>7 111</td>
<td>0(\ast)</td>
</tr>
</tbody>
</table>

1. Architecture Release 2 added the SPECIAL3 opcode. Implementations of Release 1 of the Architecture signaled a Reserved Instruction exception for this opcode and all function field values shown above.

### Table A.7 MIPS32 MOVCI\textsuperscript{RR}\textsuperscript{1} Encoding of \(tf\) Bit

<table>
<thead>
<tr>
<th>(tf)</th>
<th>bit 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 000</td>
<td>MOVT\textsuperscript{RR}</td>
</tr>
<tr>
<td>1 100</td>
<td></td>
</tr>
</tbody>
</table>

1. Release 6 removes the MOVCI instruction family (MOVT and MOVF).
Instruction Bit Encodings

Table A.8 MIPS32™ SRL Encoding of Shift/Rotate

<table>
<thead>
<tr>
<th>R</th>
<th>bit 21</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>SRL</td>
</tr>
<tr>
<td>1</td>
<td>ROTR</td>
</tr>
</tbody>
</table>

1. Release 2 of the Architecture added the ROTR instruction. Implementations of Release 1 of the Architecture ignored bit 21 and treated the instruction as an SRL.

Table A.9 MIPS32™ SRLV Encoding of Shift/Rotate

<table>
<thead>
<tr>
<th>R</th>
<th>bit 6</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>SRLV</td>
</tr>
<tr>
<td>1</td>
<td>ROTRV</td>
</tr>
</tbody>
</table>

1. Release 2 of the Architecture added the ROTRV instruction. Implementations of Release 1 of the Architecture ignored bit 6 and treated the instruction as an SRLV.

Table A.10 MIPS32™ BSHFL Encoding of sa Field

<table>
<thead>
<tr>
<th>sa</th>
<th>bits 8..6</th>
</tr>
</thead>
<tbody>
<tr>
<td>bits 10..9</td>
<td>00</td>
</tr>
<tr>
<td>00</td>
<td>00</td>
</tr>
<tr>
<td>01</td>
<td>BITSWAP6N</td>
</tr>
<tr>
<td>10</td>
<td>ALIGN6N (BSHFL)</td>
</tr>
<tr>
<td>11</td>
<td>SEB</td>
</tr>
</tbody>
</table>

1. The sa field is sparsely decoded to identify the final instructions. Entries in this table with no mnemonic are reserved for future use by MIPS technologies and may or may not cause a Reserved Instruction exception.
### Table A.11 MIPS32 COP0 Encoding of rs Field

<table>
<thead>
<tr>
<th>rs bits 23..21</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111</th>
</tr>
</thead>
<tbody>
<tr>
<td>bits 25..24</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>0 00</td>
<td>MFC0</td>
<td>β</td>
<td>MFH</td>
<td>ε</td>
<td>MTC0</td>
<td>β</td>
<td>MTH</td>
<td>*</td>
</tr>
<tr>
<td>1 01</td>
<td>ε</td>
<td>*</td>
<td>RDPGPR ⊕ MFMC01 δδ</td>
<td>ε</td>
<td>*</td>
<td>WRPGPR ⊕ *</td>
<td></td>
<td></td>
</tr>
<tr>
<td>2 10</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3 11</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>C0 δ</td>
</tr>
</tbody>
</table>

1. Release 2 of the Architecture added the MFMC0 function, which is further decoded as the DI (bit 5 = 0) and EI (bit 5 = 1) instructions.

### Table A.12 MIPS32 COP0 Encoding of Function Field When rs=CO

<table>
<thead>
<tr>
<th>function bits 2..0</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111</th>
</tr>
</thead>
<tbody>
<tr>
<td>bits 5..3</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>0 000</td>
<td>*</td>
<td>TLBR</td>
<td>TLBWI</td>
<td>TLBINV</td>
<td>TLBINVF</td>
<td>*</td>
<td>TLBWR</td>
<td>*</td>
</tr>
<tr>
<td>1 001</td>
<td>TLBP</td>
<td>ε</td>
<td>ε</td>
<td>ε</td>
<td>ε</td>
<td>*</td>
<td>ε</td>
<td>*</td>
</tr>
<tr>
<td>2 010</td>
<td>ε</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>3 011</td>
<td>ERET</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>DERET σ</td>
</tr>
<tr>
<td>4 100</td>
<td>WAIT</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>5 101</td>
<td>ε</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>6 110</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>7 111</td>
<td>ε</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
</tbody>
</table>

### Table A.13 PCREL Encoding of Minor Opcode Field

<table>
<thead>
<tr>
<th>Extension bit 20..18</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111</th>
</tr>
</thead>
<tbody>
<tr>
<td>bit 17..16</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>0 00</td>
<td>ADDIU</td>
<td>ADDIU</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
</tr>
<tr>
<td>1 01</td>
<td>ADDIU</td>
<td>ADDIU</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
</tr>
<tr>
<td>2 10</td>
<td>ADDIU</td>
<td>ADDIU</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
</tr>
<tr>
<td>3 11</td>
<td>ADDIU</td>
<td>ADDIU</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
<td>LWPC</td>
</tr>
</tbody>
</table>
### Table A.14 MIPS32 Encoding of rs Field

<table>
<thead>
<tr>
<th>rs</th>
<th>bits 23..21</th>
<th>bits 25..24</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>00</td>
</tr>
<tr>
<td>1</td>
<td>01</td>
<td>01</td>
</tr>
<tr>
<td>2</td>
<td>10</td>
<td>10</td>
</tr>
<tr>
<td>3</td>
<td>11</td>
<td>11</td>
</tr>
</tbody>
</table>

- **MFC1**: MFC1
- **CFC1**: CFC1
- **MFHC1**: MFHC1
- **MT1**: MT1
- **CTC1**: CTC1
- **MTHC1**: MTHC1
- **MTC1**: MTC1
- **BC16R**: BC16R
- **BC1ANY26R**: BC1ANY26R
- **BC1EQZ6R**: BC1EQZ6R
- **BC1EQZ6N**: BC1EQZ6N
- **BC1NEZ6N**: BC1NEZ6N
- **BNZ.V**: BNZ.V
- **BC1B**: BC1B
- **BC1H**: BC1H
- **BC1W**: BC1W
- **BC1D**: BC1D
- **BNZ.B**: BNZ.B
- **BNZ.H**: BNZ.H
- **BNZ.W**: BNZ.W
- **BNZ.D**: BNZ.D

### Table A.15 MIPS32 COP1 Encoding of Function Field When rs=S

<table>
<thead>
<tr>
<th>function</th>
<th>bits 2..0</th>
<th>bits 5..3</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 000</td>
<td>ADD</td>
<td>SUB</td>
</tr>
<tr>
<td>1 001</td>
<td>ROUND.L</td>
<td>TRUNC.L</td>
</tr>
<tr>
<td>2 010</td>
<td>SEL.GN</td>
<td>MOVCF.GR</td>
</tr>
<tr>
<td>3 011</td>
<td>MADD6N</td>
<td>MSUBF.EN</td>
</tr>
<tr>
<td>4 100</td>
<td>*</td>
<td>CVT.D</td>
</tr>
<tr>
<td>5 101</td>
<td>*</td>
<td>CVT.D</td>
</tr>
<tr>
<td>6 110</td>
<td>*</td>
<td>CVT.D</td>
</tr>
<tr>
<td>7 111</td>
<td>*</td>
<td>CVT.D</td>
</tr>
</tbody>
</table>
### Table A.16 MIPS32 COP1 Encoding of Function Field When rs=D

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ADD</td>
<td>SUB</td>
<td>MUL</td>
<td>DIV</td>
<td>SQRT</td>
<td>ABS</td>
<td>MOV</td>
<td>NEG</td>
</tr>
<tr>
<td>1</td>
<td>ROUND.L</td>
<td>TRUNC.L</td>
<td>CEIL.L</td>
<td>FLOOR.L</td>
<td>ROUND.W</td>
<td>TRUNC.W</td>
<td>CEIL.W</td>
<td>FLOOR.W</td>
</tr>
<tr>
<td>2</td>
<td>SEL6R</td>
<td>MOVCFIR</td>
<td>MOVZIR</td>
<td>MOVNR</td>
<td>SELEQZIR</td>
<td>RECIP</td>
<td>RSQRT</td>
<td>SELNZIR</td>
</tr>
<tr>
<td>3</td>
<td>MADDF</td>
<td>MSUB6R</td>
<td>RINT6R</td>
<td>CLAS6R</td>
<td>RECIP2</td>
<td>RECIP1</td>
<td>MAX6R</td>
<td>MIN6R</td>
</tr>
<tr>
<td>4</td>
<td>CVT.S</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>CVT.W</td>
<td>CVT.L</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>5</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
</tbody>
</table>

### Table A.17 MIPS32 COP1 Encoding of Function Field When rs=W or L

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>CVT.S</td>
<td>CVT.D</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>CVT.PS.PW</td>
<td>*</td>
</tr>
<tr>
<td>5</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>6</td>
<td>CVT.S</td>
<td>CVT.D</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>CVT.PS.PW</td>
<td>*</td>
</tr>
<tr>
<td>7</td>
<td>CVT.S</td>
<td>CVT.D</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>CVT.PS.PW</td>
<td>*</td>
</tr>
</tbody>
</table>

1. Format type L is legal only if 64-bit floating point operations are enabled.
2. Release 6 introduces the CMP.condn.fmt instruction family, where .fmt=S or D, 32 or 64 bit floating point. However, .S and .D for CMP.condn.fmt are encoded as .W 10100 and .L 10101 in the “standard” format. The conditions tested are encoded the same way for pre-Release 6 C.cond.fmt and Release 6 CMP.cond.fmt, except that Release 6 adds new conditions not present in C.cond.fmt. Release 6, however, has changed the recommended mnemonics for the CMP.condn.fmt to be consistent with the IEEE standard rather than pre-Release 6. See the table in the description of CMP.condn.fmt in Volume II of the MIPS Architecture Reference Manual, which shows the correspondence between pre-Release 6 C.cond.fmt, Release 6 CMP.cond.fmt, and MSA FC*.fmt / FS*.fmt floating point comparisons.
### Instruction Bit Encodings

#### Table A.18 MIPS32 COP1 Encoding of Function Field When rs=PS¹²

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>000</th>
<th>001</th>
<th>010</th>
<th>011</th>
<th>100</th>
<th>101</th>
<th>110</th>
<th>111</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>ADDSRV</td>
<td>SUBSRV</td>
<td>MULSRV</td>
<td>+</td>
<td>+</td>
<td>ABSSRV</td>
<td>MOVWRV</td>
<td>NEGRV</td>
</tr>
<tr>
<td>1</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
<td>*</td>
</tr>
<tr>
<td>2</td>
<td>*</td>
<td>MOVCFSRSRV</td>
<td>MOVZSRV</td>
<td>MOVNFRV</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td>+</td>
</tr>
<tr>
<td>3</td>
<td>ADDRFRV</td>
<td>*</td>
<td>MULFRV</td>
<td>+</td>
<td>RECIP2RFRV</td>
<td>RECIP1RFRV</td>
<td>RSQRT1RFRV</td>
<td>RSQRT2RFRV</td>
</tr>
<tr>
<td>4</td>
<td>100</td>
<td>CVT.S.PU4R</td>
<td></td>
<td>+</td>
<td>+</td>
<td>CVT.PW.PS4R</td>
<td>+</td>
<td>+</td>
</tr>
<tr>
<td>5</td>
<td>101</td>
<td>CVT.PS4R</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td>PLS4R</td>
<td>PLS6R</td>
<td>PUU.PS4R</td>
</tr>
<tr>
<td>6</td>
<td>110</td>
<td>C.F.PSRRV</td>
<td>C.UN.PSRRV</td>
<td>C.EQRRV</td>
<td>C.EQPSRRV</td>
<td>C.OLT.PSRRV</td>
<td>C.OLT.PSRRV</td>
<td>C.ULE.PSRRV</td>
</tr>
<tr>
<td>7</td>
<td>111</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
<td>C.NGLE.PSRRV</td>
</tr>
</tbody>
</table>

1. Format type PS is legal only if 64-bit floating point operations are enabled. All encodings in this table are reserved in Release 6.
2. Release 6 removes format type PS (paired single). MSA (MIPS SIMD Architecture) may be used instead.

#### Table A.19 MIPS32 COP1 Encoding of tf Bit When rs=S, D, or PS⁶, Function=MOVCF⁶

<table>
<thead>
<tr>
<th>tf</th>
<th>bit 16</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>MOVF.fmt⁶R</td>
</tr>
<tr>
<td>1</td>
<td>MOVT.fmt⁶R</td>
</tr>
</tbody>
</table>

1. Release 6 removes the MOVCF instruction family (MOVF.fmt and MOVT.fmt), replacing them by SEL.fmt.

#### Table A.20 MIPS32 COP2 Encoding of rs Field

<table>
<thead>
<tr>
<th>rs</th>
<th>bits 23..21</th>
</tr>
</thead>
<tbody>
<tr>
<td>bits 25..24</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>MFC2</td>
</tr>
<tr>
<td>1</td>
<td>BC2R</td>
</tr>
<tr>
<td>2</td>
<td>10</td>
</tr>
<tr>
<td>3</td>
<td>11</td>
</tr>
</tbody>
</table>

The MIPS32® Instruction Set Manual, Revision 6.04
A.3 Floating Point Unit Instruction Format Encodings

Table A.21 MIPS32 COP1X<sup>6R</sup> Encoding of Function Field

<table>
<thead>
<tr>
<th>bits 5..3</th>
<th>bits 2..0</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 000</td>
<td>LWXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>1 001</td>
<td>SWXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>1 010</td>
<td>LDXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>1 011</td>
<td>SDXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>2 011</td>
<td>LDUXC16R</td>
</tr>
<tr>
<td>2 100</td>
<td>MADD.S&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>2 101</td>
<td>MSUB.S&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>2 110</td>
<td>NMADD.S&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>2 111</td>
<td>NMSUB.S&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>3 011</td>
<td>SDXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>3 100</td>
<td>MADD.D&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>3 101</td>
<td>MSUB.D&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>3 110</td>
<td>NMADD.D&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>3 111</td>
<td>NMSUB.D&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>4 011</td>
<td>SDXC&lt;sup&gt;1GR&lt;/sup&gt; Δ</td>
</tr>
<tr>
<td>4 100</td>
<td>MADD.PS&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>4 101</td>
<td>MSUB.PS&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>4 110</td>
<td>NMADD.PS&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>4 111</td>
<td>NMSUB.PS&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
<tr>
<td>5 111</td>
<td>NMSUB.PS&lt;sup&gt;6R&lt;/sup&gt;Δ</td>
</tr>
</tbody>
</table>

1. Release 6 removes format type PS (paired single). MSA (MIPS SIMD Architecture) may be used instead.
2. Release 6 removes all pre-Release 6 COP1X instructions, of the form 010011 - COP1X.PS, non-fused FP multiply adds, and indexed and unaligned loads, stores, and prefetches.

A.3 Floating Point Unit Instruction Format Encodings

Instruction format encodings for the floating point unit are presented in this section. This information is a tabular presentation of the encodings described in tables ranging from Table A.14 to Table A.21 above.

Table A.22 Floating Point Unit Instruction Format Encodings

<table>
<thead>
<tr>
<th>fmt field (bits 25..21 of COP1 opcode)</th>
<th>fmt3 field (bits 2..0 of COP1X opcode)</th>
<th>Mnemonic</th>
<th>Name</th>
<th>Bit Width</th>
<th>Data Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>Decimal</td>
<td>Hex</td>
<td>Decimal</td>
<td>Hex</td>
<td>Used to encode Coprocessor 1 interface instructions (MFC1, CTC1, etc.). Not used for format encoding.</td>
<td></td>
</tr>
<tr>
<td>0..15</td>
<td>00..0F</td>
<td>—</td>
<td>—</td>
<td>Release 6 CMP.condn.S/D encoded as W/L.</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td>10</td>
<td>0</td>
<td>0</td>
<td>S</td>
<td>Single</td>
</tr>
<tr>
<td>17</td>
<td>11</td>
<td>1</td>
<td>1</td>
<td>D</td>
<td>Double</td>
</tr>
<tr>
<td>18..19</td>
<td>12..13</td>
<td>2..3</td>
<td>2..3</td>
<td>Reserved for future use by the architecture.</td>
<td></td>
</tr>
<tr>
<td>20</td>
<td>14</td>
<td>4</td>
<td>4</td>
<td>W</td>
<td>Word</td>
</tr>
<tr>
<td>21</td>
<td>15</td>
<td>5</td>
<td>5</td>
<td>L</td>
<td>Long</td>
</tr>
<tr>
<td>22</td>
<td>16</td>
<td>6</td>
<td>6</td>
<td>PS</td>
<td>Paired Single</td>
</tr>
<tr>
<td>23</td>
<td>17</td>
<td>7</td>
<td>7</td>
<td>Reserved for future use by the architecture.</td>
<td></td>
</tr>
</tbody>
</table>
### Table A.22 Floating Point Unit Instruction Format Encodings

<table>
<thead>
<tr>
<th>fmt field (bits 25..21 of COP1 opcode)</th>
<th>fmt3 field (bits 2..0 of COP1X opcode)</th>
<th>Mnemonic</th>
<th>Name</th>
<th>Bit Width</th>
<th>Data Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>Decimal</td>
<td>Hex</td>
<td>Decimal</td>
<td>Hex</td>
<td></td>
<td></td>
</tr>
<tr>
<td>24..31</td>
<td>18..1F</td>
<td>—</td>
<td>—</td>
<td>Reserved for future use by the architecture. Not available for fmt3 encoding.</td>
<td></td>
</tr>
</tbody>
</table>

**Note:** Release 6 CMP.condn.S/D encoded as W/L: as described in Table A.17 on page 450, “MIPS32 COP1 Encoding of Function Field When rs=W or L” on page 450, Release 6 uses certain instruction encodings with the rs (fmt) field equal to 11000 (W) or 11001 (L) to represent S and D respectively, for the instruction family CMP.condn.fmt.
A.4 Release 6 Instruction Encodings

Release 6 adds several new instructions, removes several old instructions, and changes the encodings of several pre-Release 6 instructions. In many cases, the old encodings for instructions moved or removed are required to signal the Reserved Instruction on Release 6, so that uses of old instructions can be trapped, and emulated or warned about; but in several cases the old encodings have been reused for new Release 6 instructions.

These instruction encoding changes are indicated in the tables above. Release 6 new instructions are superscripted 6N; Release 6 removed instructions are superscripted 6R; Release 6 instructions that have been moved are marked 6Rm at the pre-Release 6 encoding that they are moved from, and 6Nm at the new Release 6 encoding that it is moved to. Encoding table cells that contain both a non-Release 6 instruction and a Release 6 instruction superscripted 6N or 6Nm indicate a possible conflict, although in many cases footnotes indicate that other fields allow the distinction to be made.

The tables below show the further decoding in Release 6 for field classes (instruction encoding families) indicated in other tables.

Instruction encodings are also illustrated in the instruction descriptions in Volume II. Those encodings are authoritative. The instruction encoding tables in this section, above, based on bitfields, are illustrative, since they cannot completely indicate the new tighter encodings.

**MUL/DIV family encodings:** Table A.23 below shows the Release 6 integer family of multiply and divide instructions encodings, as well as the pre-Release 6 instructions they replace. The Release 6 and pre-Release 6 instructions share the same primary opcode, bits 31-26 = 000000, and share the function code, bits 5-0, with their pre-Release 6 counterparts, but are distinguished by bits 10-6 of the instruction. The pre-Release 6 instructions signal a Reserved Instruction exception on Release 6 implementations.

However, the instruction names collide: pre-Release 6 and Release 6 DIV, DIVU, DDIV, DDIVU are actually distinct instructions, although they share the same mnemonics. The pre-Release 6 instructions produce two results, both quotient and remainder in the HI/LO register pair, while the Release 6 DIV instruction produce only a single result, the quotient. It is possible to distinguish the conflicting instructions in assembly by looking at how many register operands the instructions have, two versus three.

As of Release 6, all of pre-Release 6 instruction encodings that are removed are required to signal the reserved instruction exception, as are all in the vicinity 000000.xxxxx.xxxxx.aaaaa.011xxx, i.e. all with the primary opcodes and function codes listed in Table A.23, with the exception of the aaaaa field values 00010 and 00011 for the new instructions.
PC-relative family encodings: Table A.24 and Table A.25 present the PC-relative family of instruction encodings. Table A.24 in traditional form, Table A.25 in the bitstring form that clearly shows the immediate varying from 19 bits to 16 bits.
B*C compact branch and jump encodings: In several cases Release 6 uses much tighter instruction encodings than previous releases of the MIPS architecture, reducing redundancy, to allow more instructions to be encoded. Instead of purely looking at bitfields, Release 6 defines encodings that compare different bitfields: e.g. the encoding 010110.rs.rt.offset16 is BGEC if neither rs nor rt are 00000 and rs is not equal to rt, but is BGEZC if rs is the same as rt, and is BLEZC if rs is 00000 and rt is not. (The encoding with rt 00000 and arbitrary rs is the pre-Release 6 instruction BLEZL.rs.00000.offset16, a branch likely instruction which is removed by Release 6, and whose encoding is required to signal the Reserved Instruction exception.)

This tight instruction encoding motivates the bitstring and constraints notation for Release 6 instruction encodings.

and the equivalent constraints indicated in the instruction encoding diagrams for the instruction descriptions in Volume II. Table A.26 below shows the B*C compact branch encodings, which use constraints such as RS = RT. pre-Release 6 encodings that are removed by Release 6 are shaded darkly, while the remaining redundant encodings are shaded lightly or stippled.

Note: Pre-Release 6 instructions BLEZL, BGTZL, BLEZ, and BGTZ do not conflict with the new Release 6 instructions they are tightly packed with in the encoding tables, but the ADDI, DADDI, LWC2, SWC2, LDC2 and SDC2 truly conflict.
### Table A.26 B*C compact branch encodings

<table>
<thead>
<tr>
<th>Primary Opcode</th>
<th>Constraints involving rs and rt fields</th>
<th>Constraints involving rs and rt fields</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>rs/rt0/NZ</td>
<td>NZrs =/&lt;&gt; NZrt</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010 110</td>
<td>0rs 0rt</td>
<td>useless BLEZL^6R</td>
</tr>
<tr>
<td></td>
<td></td>
<td>BGEZ^6N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NZrs 0rt</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>110 111</td>
<td>0rs 0rt</td>
<td>useless BGTZL^8R</td>
</tr>
<tr>
<td></td>
<td></td>
<td>BLTZ^8N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NZrs 0rt</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>001 000</td>
<td>0rs NZrt</td>
<td>BEQZ^8N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>BEQC^8N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NZrs 0rt</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>110 110</td>
<td>0rs NZrt</td>
<td>JIC^8N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>BEQZ^8N</td>
</tr>
<tr>
<td></td>
<td></td>
<td>NZrs 0rt</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>110 010</td>
<td></td>
<td>BC^8N off26&lt;&lt;2</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0/NZrs 0/NZrt</td>
</tr>
</tbody>
</table>

---

The MIPS32® Instruction Set Manual, Revision 6.04
Appendix B

Revision History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.90</td>
<td>November 1, 2000</td>
<td>Internal review copy of reorganized and updated architecture documentation.</td>
</tr>
<tr>
<td>0.91</td>
<td>November 15, 2000</td>
<td>Internal review copy of reorganized and updated architecture documentation.</td>
</tr>
</tbody>
</table>
| 0.92     | December 15, 2000 | Changes in this revision:  
  • Correct sign in description of MSUBU.  
  • Update JR and JALR instructions to reflect the changes required by MIPS16.                |
| 0.95     | March 12, 2001  | Update for second external review release                                                                                                   |
| 1.00     | August 29, 2002 | Update based on all review feedback:  
  • Add missing optional select field syntax in mtc0/mfe0 instruction descriptions.  
  • Correct the PREF instruction description to acknowledge that the Prepare-ForStore function does, in fact, modify architectural state.  
  • To provide additional flexibility for Coprocessor 2 implementations, extend the $sel$ field for DMFC0, DMTC0, MFC0, and MTC0 to be 8 bits.  
  • Update the PREF instruction to note that it may not update the state of a locked cache line.  
  • Remove obviously incorrect documentation in DIV and DIVU with regard to putting smaller numbers in register rt.  
  • Fix the description for MFC2 to reflect data movement from the coprocessor 2 register to the GPR, rather than the other way around.  
  • Correct the pseudo code for LDC1, LDC2, SDC1, and SDC2 for a MIPS32 implementation to show the required word swapping.  
  • Indicate that the operation of the CACHE instruction is UNPREDICTABLE if the cache line containing the instruction is the target of an invalidate or writeback invalidate.  
  • Indicate that an Index Load Tag or Index Store Tag operation of the CACHE instruction must not cause a cache error exception.  
  • Make the entire right half of the MFC2, MTC2, CFC2, CTC2, DMFC2, and DMTC2 instructions implementation dependent, thereby acknowledging that these fields can be used in any way by a Coprocessor 2 implementation.  
  • Clean up the definitions of LL, SC, LLD, and SCD.  
  • Add a warning that software should not use non-zero values of the stype field of the SYNC instruction.  
  • Update the compatibility and subsetting rules to capture the current requirements. |
<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
</table>
| 1.90     | September 1, 2002 | Merge the MIPS Architecture Release 2 changes in for the first release of a Release 2 processor. Changes in this revision include:  
• All new Release 2 instructions have been included: DI, EHB, EI, EXT, INS, JALR.HB, JR.HB, MFHC1, MFHC2, MTHC1, MTHC2, RDPGPR, ROTR, ROTRV, SEB, SEH, SYNCI, WRPGPR, WSBH.  
• The following instruction definitions changed to reflect Release 2 of the Architecture: DERET, ERET, JAL, JALR, JR, SRL, SRLV  
• With support for 64-bit FPUs on 32-bit CPUs in Release 2, all floating point instructions that were previously implemented by MIPS64 processors have been modified to reflect support on either MIPS32 or MIPS64 processors in Release 2.  
• All pseudo-code functions have been updated, and the Are64BitFPOperationsEnabled function was added.  
• Update the instruction encoding tables for Release 2. |
| 2.00     | June 9, 2003 | Continue with updates to merge Release 2 changes into the document. Changes in this revision include:  
• Correct the target GPR (from rd to rt) in the SLTI and SLTIU instructions. This appears to be a day-one bug.  
• Correct CPR number, and missing data movement in the pseudocode for the MTC0 instruction.  
• Add note to indicate that the CACHE instruction does not take Address Error Exceptions due to mis-aligned effective addresses.  
• Update SRL, ROTR, SRLV, ROTRV, DSRL, DROTR, DSRLV, DROTRV, DSRL32, and DROTR32 instructions to reflect a 1-bit, rather than a 4-bit decode of shift vs. rotate function.  
• Add programming note to the PrepareForStore PREF hint to indicate that it cannot be used alone to create a bzero-like operation.  
• Add note to the PREF and PREFX instruction indicating that they may cause Bus Error and Cache Error exceptions, although this is typically limited to systems with high-reliability requirements.  
• Update the SYNCI instruction to indicate that it should not modify the state of a locked cache line.  
• Establish specific rules for when multiple TLB matches can be reported (on writes only). This makes software handling easier. |
| 2.50     | July 1, 2005 | Changes in this revision:  
• Correct figure label in LWR instruction (it was incorrectly specified as LWL).  
• Update all files to FrameMaker 7.1.  
• Include support for implementation-dependent hardware registers via RDHWR.  
• Indicate that it is implementation-dependent whether prefetch instructions cause EJTAG data breakpoint exceptions on an address match, and suggest that the preferred implementation is not to cause an exception.  
• Correct the MIPS32 pseudocode for the LDC1, LDXC1, LUXC1, SDC1, SDXC1, and SUXC1 instructions to reflect the Release 2 ability to have a 64-bit FPU on a 32-bit CPU. The correction simplifies the code by using the ValueFPR and StoreFPR functions, which correctly implement the Release 2 access to the FPRs.  
• Add an explicit recommendation that all cache operations that require an index be done by converting the index to a kseg0 address before performing the cache operation.  
• Expand on restrictions on the PREF instruction in cases where the effective address has an uncached coherency attribute. |
## Revision History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
</table>
| 2.60     | June 25, 2008 | Changes in this revision:  
  - Applied the new B0.01 template.  
  - Update RDHWR description with the UserLocal register.  
  - added PAUSE instruction  
  - Ordering SYNCs  
  - CMP behavior of CACHE, PREF*, SYNC1  
  - CVT.S.PL, CVT.S.PU are non-arithmetic (no exceptions)  
  - *MADD.fmt & *MSUB.fmt are non-fused.  
  - various typos fixed |
| 2.61     | July 10, 2008 |  
  - Revision History file was incorrectly copied from Volume III.  
  - Removed index conditional text from PAUSE instruction description.  
  - SYNC instruction - added additional format “SYNC stype” |
| 2.62     | January 2, 2009 |  
  - LWC1, LWXC1 - added statement that upper word in 64bit registers are UNDEFINED.  
  - CVT.S.PL and CVT.S.PU descriptions were still incorrectly listing IEEE exceptions.  
  - Typo in CFC1 Description.  
  - CCR es is accessed through $3 for RDHWR, not $4 |
| 3.00     | March 25, 2010 |  
  - JALX instruction description added.  
  - Sub-setting rules updated for JALX.  
  - |
| 3.01     | June 01, 2010 |  
  - Copyright page updated.  
  - User mode instructions not allowed to produce UNDEFINED results, only UNPREDICTABLE results. |
| 3.02     | March 21, 2011 |  
  - RECIP, RSQRT instructions do not require 64-bit FPU.  
  - MADD/MSUB/NMADD/NMSUB pseudo-code was incorrect for PS format check. |
| 3.50     | September 20, 2012 |  
  - Added EVA load/store instructions: LBE, LBU E, LHE, LHUE, LWE, SBE, SHE, SWE, CACHEE, PREFE, LLE, SCE, LWLE, LWRE, SWLE, SWRE.  
  - TLBW1 - can be used to invalidate the VPN2 field of a TLB entry.  
  - FCSR.MAC2008 bit affects intermediate rounding in MADD.fmt, MSUB.fmt, NMADD.fmt and NMSUB.fmt.  
  - FCSR.ABS2008 bit defines whether ABS.fmt and NEG.fmt are arithmetic or not (how they deal with QNAN inputs). |
| 3.51     | October 20, 2012 |  
  - CACHE and SYNC1 ignore RI and XI exceptions.  
  - CVT, CEIL, FLOOR, ROUND, TRUNC to integer can’t generate FP-Overflow exception. |
| 5.00     | December 14, 2012 |  
  - R5 changes: DSP and MT ASEs -> Modules  
  - NMADD.fmt, NMSUB.fmt - for IEEE2008 negate portion is arithmetic. |
| 5.01     | December 15, 2012 |  
  - No technical content changes:  
  - Update logos on Cover.  
  - Update copyright page. |
<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
</table>
| 5.02     | April 22, 2013 | - Fix: Figure 2.26 Are64BitFPOperationsEnabled Pseudocode Function - “Enabled” was missing.  
- R5 change retroactive to R3: removed FCSR.MCA2008 bit: no architectural support for fused multiply add with no intermediate rounding. Applies to MADD.fm, MSUB.fm, NMADD.fm, NMSUB.fm.  
- Clarification: references to “16 FP registers mode” changed to “the FR=0 32-bit register model”; specifically, paired single (PS) instructions and long (L) format instructions have UNPREDICTABLE results if FR=0, as well as LUXC1 and SUXC1.  
- Clarification: C.cond.fmt instruction page: cond bits 2..1 specify the comparison, cond bit 0 specifies ordered versus unordered, while cond bit 3 specifies signaling versus non-signaling.  
- R5 change: UFR (User mode FR change): CFC1, CTC1 changes. |
| 5.03     | August 21, 2013 | - Resolved inconsistencies with regards to the availability of instructions in MIPS32r2: MADD.fm family (MADD.S, MADD.D, NMADD.S, NMADD.D, MSUB.S, MSUB.D, NMSUB.S, NMSUB.D), RECIP.fm family (RECIPS, RECIP.D, RSQRT.S, RSQRT.D), and indexed FP loads and stores (LWXC1, LDXC1, SWXC1, SDXC1). The appendix section A.2 “Instruction Bit Encoding Tables”, shared between Volume I and Volume II of the ARM, was updated, in particular the new upright delta Δ mark is added to Table A.2 “Symbols Used in the Instruction Encoding Tables”, replacing the inverse delta marking V for these instructions. Similar updates made to microMIPS’s corresponding sections. Instruction set descriptions and pseudocode in Volume II, Basic Instruction Set Architecture, updated. These instructions are required in MIPS32r2 if an FPU is implemented.  
- Misaligned memory access support for MSA: see Volume II, Appendix B “Misaligned Memory Accesses”.  
- Has2008 is required as of release 5 - Table 5.4, “FIR Register Descriptions”.  
- ABS2008 and NAN2008 fields of Table 5.7 “FCSR RegisterField Descriptions” were optional in release 3 and could be R/W, but as of release 5 are required, read-only, and preset by hardware.  
- FPU FCSR.FS Flush Subnormals / Flush to Zero behavior is made consistent with MSA behavior, in MSACSR.FS: Table 5.7, “FCSR Register Field Descriptions”, updated. New section 5.8.1.4 “Alternate Flush to Zero Underflow Handling”.  
- Volume I, Section 2.2 “Compliance ad Subsetting” noted that the L format is required in MIPS FPUs, to be consistent with Table 5.4 “FIR Register Field Definitions”.  
- Noted that UFR and UNFR can only be written with the value 0 from GPR[0]. See section 5.6.5 “User accessible FPU Register model control (UFR, CP1 Control Register 1)” and section 5.6.5 “User accessible Negated FPU Register model control (UNFR, CP1 Control Register 4)”. |
| 5.04     | December 11, 2013 | LLSC Related Changes  
- Added ERETNC. New.  
- Modified SC handling: refined, added, and elaborated cases where SC can fail or was UNPREDICTABLE.  
XPA Related Changes  
- Added MTHC0, MFHC0 to access extensions. All new.  
- Modified MTC0 for MIPS32 to zero out the extended bits which are writable. This is to support compatibility of XPA hardware with non XPA software. In pseudo-code, added registers that are impacted.  
- MTHC0 and MFHC0 - Added RI conditions. |
### Revision History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>6.00 - R6U draft</td>
<td>Dec. 19, 2013</td>
<td>• Feature complete R6U draft of Volume II new instructions.</td>
</tr>
<tr>
<td></td>
<td>Jan 14-16, 2014</td>
<td>• Split MAX.fmt-family, instruction description that described multiple</td>
</tr>
<tr>
<td></td>
<td></td>
<td>instructions, into separate instruction description pages MAX.fmt,</td>
</tr>
<tr>
<td></td>
<td></td>
<td>MAX_A.fmt, MIN.fmt, MIN_A.fmt.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Mnemonic change: AUIPA changed to ALUIPC, Aligned Add Upper Immediate to</td>
</tr>
<tr>
<td></td>
<td></td>
<td>PC. Now all Release 6 new PC relative instructions end in “p”.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Renamed CMP.cond.fmt -&gt; CMP.condn.fmt, i.e. renamed 5-bit cond field</td>
</tr>
<tr>
<td></td>
<td></td>
<td>“condn” to distinguish it from old 4-bit cond field.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Cleaning up descriptions of NAL and BAL to reduce confusion about</td>
</tr>
<tr>
<td></td>
<td></td>
<td>deprecation versus removal of BLTZAL and BGEZAL.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• DAH and DATI use rs src/dest register, not rt.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Table showing that the compact branches are complete, reversing rs and rt</td>
</tr>
<tr>
<td></td>
<td></td>
<td>for BLEC, BGTC, BLEUC, BGTUC</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Forbidden slot RI required; takes exception like delay slot; boilerplate</td>
</tr>
<tr>
<td></td>
<td></td>
<td>consistency automated.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• MOD instruction family: remainder has same sign as dividend</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Updated to R6U 1.03</td>
</tr>
<tr>
<td></td>
<td>Jan 17, 2014</td>
<td>• NAL, BAL: improved confusing explanation of how NAL and BAL used to</td>
</tr>
<tr>
<td></td>
<td></td>
<td>be special cases of BLEZAL, etc., instructions removed by Release 6</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Forbidden slot boilerplate: requires Reserved Instruction exception for</td>
</tr>
<tr>
<td></td>
<td></td>
<td>control instructions, even if interrupted: exception state (EPC, etc.)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>points to branch, not forbidden slot, like delay slot.</td>
</tr>
<tr>
<td></td>
<td>Jan 20, 2014</td>
<td>• Fixed bugs and changed instruction encodings: BEQZALC, BNEZALC,</td>
</tr>
<tr>
<td></td>
<td></td>
<td>BGEUC, BLTUC, BLEZLC family, BC1EQZ, BC2EQZ, BC1NEZ, BC2NEZ, BITSWAP</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• AUI, BAL</td>
</tr>
<tr>
<td>R6U draft</td>
<td>Feb 10, 2014</td>
<td>• Refactored “Compatibility and Subsetting” sections of Volumes I and II</td>
</tr>
<tr>
<td></td>
<td></td>
<td>for reuse without replication.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Updated Volume II tables of instructions by categories (preceeding section</td>
</tr>
<tr>
<td></td>
<td></td>
<td>entitled Alphabetical List of Instructions) for R6U changes.</td>
</tr>
</tbody>
</table>

### Summary of all R6U drafts up to this date - R6U version 1.03

- MIPS3D removed from the Release 6 architecture.
- Some 3-source instructions (conditional moves) replaced with new 2-source instructions: MOVZ/MOVN.fmt replaced by SELEQZ/SELNEZ.fmt; MOVZ/MOVN replaced by SELEQZ/SELNEZ.
- PREF/PREFE: Unsound prefetch hints downgraded; optional implementation dependent prefetch hints expanded.

### Free up Opcode Space

- Change encodings of LL/SC/LLD/SCD/PREF/CACHE, reducing offset from 16 bits to 9 bits
- SPECIAL2 encodings changed: CLO/CLZ/DCLO/DCLZ
- Other changes mentioned below: traps with immediate operands removed (ADDI/DADDI, TGEI/TGEI/TLTI/TLTIU/TEQI/TNEI)
- Free 15 major opcodes: COP1X, SPECIAL2, LWL/LWR, SWL/SWR, LDL/LDR, SDL/SDR, LL/SC, LLD/SCD, PREF, CACHE, as described below; by changing encodings.
Integer Multiply and Divide

- Integer accumulators (HI/LO) removed from base Release 6, moved to DSPr6, allowed only with microMIPS: MFHI, MTHIO, MFLO, MTLO, MADD, MADDU, MUL, MSUB, MSUBU removed.
- Release 6 adds multiply and divide instructions that write to same-width register: MULT replaced by MUL/MUH; MULTU replaced by MULU/MUHU; DIV replaced by DIV/MOD; DIVU replaced by DIVU/MODU; similarly for 64-bit DMUH, etc.

Control Transfer Instructions (CTIs)

- Branch likely instructions removed by Release 6: BEQL, etc.
- Enhanced compact branches and jumps provided
- No delay slots; back-to-back branches disallowed (forbidden slot)
- More complete set of conditions: BEQC/BNEC, all signed and unsigned reg-reg comparisons, e.g. BLTC, BLTUC; all comparisons against zero, e.g. BLTZC
- More complete set of conditional procedure call instructions: BEQZALC, BNEZALC
- Large offset PC-relative branches: BC/BALC 26-bit offset (scaled by 4); BEQZC/BNEZC 21-bit offset
- JIC/JIALC: “indexed” jumps, jump to register + sign extended 16-bit offset
- Trap-in-overflow adds with immediate removed by MIOPSr6: ADDI, DADDI; replaced by branches on overflow BOVC/BNVC.
- Redundant JR.HB removed, aliased to JALR.HB with rdest=0.
- BLTZAL/BGEZAL removed; not used because unconditionally wrote link register

SSNOP identical to NOP.

Misaligned Memory Accesses

- Unaligned load/store instructions (LWL/LWR, etc.) removed from Release 6. Support for misaligned memory accesses must be provided by a Release 6 system for all ordinary loads and stores, by hardware or by software trap-and-emulate.
- CPU scalar ALIGN instruction

Address Generation and Constant Building

- Instructions to build large constants (such as address constants): AUI (Add upper immediate), DAHI, DATI.
- Instructions for PC-relative address formation: ADDIUPC, ALUIPC.
- PC-relative loads: LWF, LWUP, LDP.
- Indexed FPU memory accesses removed: LWXC1, LUXC1, PFX, etc.
- Load-scaled-address instructions: LSA, DLSA
- 32-bit address wrapping improved.

DSP ASE

- DSP ASE and SmartMIPS disallowed; recommend MSA instead
- DSPr6 to be defined, used with microMIPS.
- Instructions promoted from DSP ASE to Base ISA: BALIGN becomes Release 6 ALIGN, BITREV becomes Release 6 BITSWAP
## Revision History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td><strong>FPU and co-processor</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Instruction encodings changed: COP2 loads/stores, cache/prefetch, SPECIAL2: LWC2/SWC2, LDC2/SWC2</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• FR=0 not allowed, FR=1 required.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Compatibility and Subsetting section amended to allow a single precision only FPU (FIR.S=FIR.W=1, FIR.D=FIR.L=0.)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Paired Single (PS) removed from the Release 6 architecture, including: COP1.PS, COP1X.PS, BC1ANY2, BC1ANY4, CVT.PS.S, CVT.PS.W.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• FPU scalar counterparts to MSA instructions: RINT.fmt, CLASS.fmt, MAX/MAXA/MIN/MINA.fmt.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Unfused multiply adds removed: MADD/MSUB/NMADD/NMSUB.fmt</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• IEEE2008 Fused multiply adds added: MADDF/MSUBF.fmt</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Floating point condition codes and related instructions removed: C.cond.fmt removed, BC1T/BC1F, MOVF/MOVT.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• MOVF/MOVT.fmt replaced by SEL.fmt</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• New FP compare instruction CMP.cond.fmt places result in FPR and related BC1EQZ/BC2EZQ</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• New FP comparisons: CMP.cond.fmt with cond = OR (ordered), UNE (Unordered or Not Equal), NE (Not Equal).</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Coprocessor 2 condition codes removed: BC2F/BC2T removed, replaced by BC2NEQZ/BC2EQZ</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Recent R6U architecture changes not fully reflected in this draft:</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• This draft does not completely reflect the new 32-bit address wrapping proposal but still refers in some places to the old IAM (Implicit Address Mode) proposal.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• This draft does not yet reflect constraints on endianness, in particular in the section on Misaligned memory access support: e.g. code and data must have the same endianness, Status.RE is removed, etc.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• BC1EQZ/BC1NEZ will test only bit 0 of the condition register, not all bits.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• This draft does not yet say that writing to a 32-bit FPR renders upper bits of a 64 bit FPR or 128 bit floating point register UNPREDICTABLE; it describes the old proposal of zeroing the upper bits.</td>
</tr>
<tr>
<td></td>
<td></td>
<td><strong>Known issues:</strong></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• This draft describes Release 6, as well as earlier releases of the MIPS architecture. E.g. instructions that were present in MIPSr5 but which were removed in Release 6 are still in the manual, although they should be clearly marked “removed by Release 6” to indicate that they have been removed by Release 6.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• R6U new instruction pseudocode is 64-bit, rather than 32-bit, albeit attempting to use notations that apply to both.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Certain new instruction descriptions are “unsplit”, describing families of instructions such as all compact branches, rather than separate descriptions of each instruction. This facilitates comparison and consistency, but currently allows certain MIPS64 Release 6 instructions to appear inappropriately in the MIPS32 Release 6 manual. A future release of the manual will “split” these instruction family descriptions, e.g. the compact branch family will be split up into at least 12 different instruction descriptions.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• R6U requires misalignment support for all ordinary memory reference instructions, but the pseudocode does not yet reflect this. Boilerplate has been added to all existing instructions saying this.</td>
</tr>
<tr>
<td></td>
<td></td>
<td>• The new R6U PC-relative loads (LWP, LWUP, LDP) in this draft incorrectly say that misaligned accesses are permitted.</td>
</tr>
</tbody>
</table>
• ALIGN/DALIGN: clarified bp=0 behavior
• ALIGN/DALIGN pseudocode used || as logical OR rather than MIPS’ pseudocode concatenate.
• Removed incorrect note about not using r31 as a source register to BAL.
• Release 6 requires BC1EQZ/BC1NEZ if an FPU is present, i.e. they cannot signal RI.
• R6U 1.05 change: BC1EQZ/BC1NEZ test only bit 0 of the FPY; changed from testing if any bit nonzero; helps with trap-and-emulate of DP on an SP-only FPU.
• Known problem: R6U 1.05 change not yet made: all 32-bit FP operations leave upper bits of 64 bit FOR and/or 128-bit MSR unpredictable; helps with trap-and-emulate of DP on an SP-only FPU.
• Clearly marked all .PS instructions as removed via removed by Release 6 in instruction format.
• DMUL, DMULTU, DDIV, DDIVU marked removed by Release 6
• Started using =Release 6 notation to indicate that an instruction has been changed but is still present. JR.HB =Release 6, aliased to JALR.HB.
• SDBBP updated for R6P facility to disable if no hardware debug trap handler.
• UFR/UNFR (User-mode FR facility) disallowed in Release 6: changes to CTC1 and CFC1 instructions.

R6U ARM Volume II 6.00 preliminary release February 14, 2014

• Last minute change: BC1EQZ.fmt and BC1NEZ.fmt test only bit 0, least significant bit, of FPR.
• Similar changes to SEL.fmt, SELEQZ.fmt, SELNEZ.fmt not yet made.

Post-6.00 February 20, 2014

• FPU truth consuming instructions (BC1EQZ.fmt, BC1NEZ.fmt, SEL.fmt, SELEQZ.fmt, SELNEZ.fmt) change completed: test bit 0, least-significant-bit, of FPR containing condition.

6.01 December 1, 2014

• Production Release.
• Add DVP and EVP instructions for multithreading.
• Add POP and SOP encoding nomenclature to opcode tables in appendix A

6.02 December 10, 2014

• JIC format changed from JIC offset(rt) to JIC rt, offset.
• JIALC format changed from JIALC offset(rt) to JIALC rt, offset.
• ‘offset’ removed from NAL format.
## Revision History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Date</th>
<th>Description</th>
</tr>
</thead>
</table>
| 6.03     | September 4, 2015  | • Fixed many inconsistencies; no functional impact.  
• RDHWR updates for Release 6.  
• WAIT updates for Release 6.  
• CFC1/CTC1 UFR-related text reworded.  
• CFC1/CTC1 FRE-related text added.  
• Added LLX/SCX(32/64) instructions.  
• Jump Register ISA Mode switching text reworded.  
• MisalignedSupport() language in ld/st pseudo-code reworded.  
• Release 6 behaviour added to move-to/from instructions: return 0,nop.  
• TLBINV/TLBINVF description and pseudocode corrected and clarified.  
• ALIGN/DALIGN pseudocode cleaned up; removed redundancy.  
• Removed “Special Considerations" section from B<cond>c  
• Language clarified in PREF/PREFE tables; no functional change. |
| 6.04     | November 13, 2015  | **MIPS32 and MIPS64:**  
• J/JAL now indicated as deprecated (but not removed).  
• DVP: added text indicating that a disabled VP will not be re-enabled for execution on deferred exception.  
• CACHE/CACHEE: Undefined operations are really NOP.  
• CMP.condn.fmt: removed fmt related text in description section. .S/.D explicitly encoded.  
• Fixed minor textual typos in MAXA/MINA.fmt functions.  
• DERET: restriction – if executed out of debug mode, then RI, not UNDEFINED.  
• TLBWR: Updated reference to Random. No longer supported in Release 6.  
• PCREL instructions: added PCREL minor opcode table, fixed conditional text bugs in register reference.  
• BC1F/BC1FL/BC1T/BC1TL: removed last paragraph of historical information section. These instructions can be immediately preceded by instruction that sets cond. code.  
• JIALC: restructured operation section using ‘temp’ to avoid false hazard of link update overwriting source.  
• LUI: Fixed conditional text errors related to the encoding table. microMIPS appeared in MIPS.  
• JIALC/JIC: Updated to indicate effect on ‘ISAMode’.  
• Fixed typo ROUND/TRUNC/FLOOR/CEIL.W.fmt. Range value should be $2^{31-1}$ not $2^{63-1}$.  
**MIPS64 only:**  
• DMFC0/DMTC0: now indicates what happens with 32-bit COP0 registers. |