RotaryEmbedding#

class RotaryEmbedding(dim, base=10000.0, device=None, dtype=None)[source]#

Bases: Module

Precompute the cosine and sine tables used by rotary embeddings.

Shape#

  • positions: (P,) or (B, P).

  • Returns: (cos, sin) with shape (P, dim / 2) or (B, P, dim / 2).

Attributes:#

dim:

Number of channels rotated by RoPE.

base:

Frequency base used to build the inverse frequency spectrum.

inv_freq:

Buffer containing the inverse frequencies used to generate the tables.

Initialize the RoPE table builder.

type dim:

int

param dim:

Number of channels rotated by RoPE.

type dim:

int

type base:

float

param base:

Frequency base used to build the inverse frequency spectrum.

type base:

float

type device:

torch.device, optional

param device:

Buffer factory options.

type device:

torch.device, optional

type dtype:

torch.dtype, optional

param dtype:

Buffer factory options.

type dtype:

torch.dtype, optional

apply_rope(x, positions, position_mask=None)[source]#

Apply rotary position embeddings to the entire head embedding of x.

Return type:

Tensor

Parameters:

Shape#

  • x: (B, H, T, D).

  • positions: (P,) or (B, P).

  • position_mask: optional boolean mask with the same layout as positions.

  • Returns: x with every head channel rotated in place.

forward(positions)[source]#

Return the RoPE cosine and sine tables for the provided positions.

Return type:

tuple[Tensor, Tensor]

Parameters:

positions (Tensor)

Parameters: