eTransformerEncoderLayer#

class eTransformerEncoderLayer(in_rep, self_attn, dim_feedforward=2048, dropout=0.1, activation=GELU(approximate='none'), layer_norm_eps=1e-05, norm_first=True, norm_module='rmsnorm', bias=True, device=None, dtype=None, init_scheme='xavier_uniform')[source]#

Bases: eModule

Equivariant Transformer encoder layer with the same API as torch.nn.TransformerEncoderLayer.

Applies equivariant multi-head attention (plain eMultiheadAttention or any caller-provided PositionalAttentionBase backend) followed by an equivariant feed-forward block built from eLinear layers and equivariant normalization (eRMSNorm by default, or eLayerNorm), mirroring PyTorch’s ordering (pre- or post-norm) while constraining every linear map to commute with the group action.

The layer defines:

\[\mathbf{f}_{\mathbf{\theta}}: \mathcal{X}^{T} \to \mathcal{X}^{T}.\]

Functional equivariance constraint:

\[\mathbf{f}_{\mathbf{\theta}}(\rho_{\mathcal{X}}(g)\mathbf{x}_{1:T}) = \rho_{\mathcal{X}}(g)\,\mathbf{f}_{\mathbf{\theta}}(\mathbf{x}_{1:T}) \quad \forall g\in\mathbb{G},\]

where \(\rho_{\mathcal{X}}(g)\) acts on the feature/channel axis at every token.

Create an equivariant Transformer encoder layer.

Parameters:

in_rep (Representation) – Input representation \(\rho_{\text{in}}\).
self_attn (eMultiheadAttention | PositionalAttentionBase) – Pre-built equivariant self-attention module.
dim_feedforward (int) – Hidden dimension of the feedforward network.
dropout (float) – Dropout probability.
activation (Module) – Activation module. Default: torch.nn.GELU().
layer_norm_eps (float) – Epsilon for layer normalization.
norm_first (bool) – If True, apply normalization before attention/feedforward.
norm_module (Literal['layernorm', 'rmsnorm']) – Normalization layer type ('layernorm' or 'rmsnorm').
bias (bool) – Whether to use bias in linear layers.
device – Tensor device.
dtype – Tensor dtype.
init_scheme (str | None) – Initialization scheme for equivariant layers.

check_equivariance(batch_size=4, seq_len=5, samples=20, atol=0.0001, rtol=0.0001)[source]#

Quick sanity check ensuring both attention blocks and the full layer are equivariant.

Return type:

None

Parameters:

batch_size (int)
seq_len (int)
samples (int)
atol (float)
rtol (float)

forward(src, src_mask=None, src_key_padding_mask=None, is_causal=False, *, src_positions=None, src_position_mask=None)[source]#

Pass the input through the equivariant encoder layer.

Parameters:

src (Tensor) – input sequence of shape (B, T, D) with last dimension equal to in_rep.size.
src_mask (Tensor | None) – optional attention mask for the input sequence.
src_key_padding_mask (Tensor | None) – optional padding mask for the batch.
is_causal (bool) – if True, applies a causal mask to the self-attention block.
src_positions (Tensor | None) – Optional source-token positions for positional attention backends.
src_position_mask (Tensor | None) – Optional boolean mask for valid source positions.

Return type:

Tensor