eTransformerEncoderLayer#
- class eTransformerEncoderLayer(in_rep, self_attn, dim_feedforward=2048, dropout=0.1, activation=GELU(approximate='none'), layer_norm_eps=1e-05, norm_first=True, norm_module='rmsnorm', bias=True, device=None, dtype=None, init_scheme='xavier_uniform')[source]#
Bases:
eModuleEquivariant Transformer encoder layer with the same API as
torch.nn.TransformerEncoderLayer.Applies equivariant multi-head attention (plain
eMultiheadAttentionor any caller-providedPositionalAttentionBasebackend) followed by an equivariant feed-forward block built fromeLinearlayers and equivariant normalization (eRMSNormby default, oreLayerNorm), mirroring PyTorch’s ordering (pre- or post-norm) while constraining every linear map to commute with the group action.The layer defines:
\[\mathbf{f}_{\mathbf{\theta}}: \mathcal{X}^{T} \to \mathcal{X}^{T}.\]Functional equivariance constraint:
\[\mathbf{f}_{\mathbf{\theta}}(\rho_{\mathcal{X}}(g)\mathbf{x}_{1:T}) = \rho_{\mathcal{X}}(g)\,\mathbf{f}_{\mathbf{\theta}}(\mathbf{x}_{1:T}) \quad \forall g\in\mathbb{G},\]where \(\rho_{\mathcal{X}}(g)\) acts on the feature/channel axis at every token.
Create an equivariant Transformer encoder layer.
- Parameters:
in_rep (
Representation) – Input representation \(\rho_{\text{in}}\).self_attn (
eMultiheadAttention|PositionalAttentionBase) – Pre-built equivariant self-attention module.dim_feedforward (
int) – Hidden dimension of the feedforward network.dropout (
float) – Dropout probability.activation (
Module) – Activation module. Default:torch.nn.GELU().layer_norm_eps (
float) – Epsilon for layer normalization.norm_first (
bool) – IfTrue, apply normalization before attention/feedforward.norm_module (
Literal['layernorm','rmsnorm']) – Normalization layer type ('layernorm'or'rmsnorm').bias (
bool) – Whether to use bias in linear layers.device – Tensor device.
dtype – Tensor dtype.
init_scheme (
str|None) – Initialization scheme for equivariant layers.
- check_equivariance(batch_size=4, seq_len=5, samples=20, atol=0.0001, rtol=0.0001)[source]#
Quick sanity check ensuring both attention blocks and the full layer are equivariant.
- forward(src, src_mask=None, src_key_padding_mask=None, is_causal=False, *, src_positions=None, src_position_mask=None)[source]#
Pass the input through the equivariant encoder layer.
- Parameters:
src (
Tensor) – input sequence of shape(B, T, D)with last dimension equal toin_rep.size.src_mask (
Tensor|None) – optional attention mask for the input sequence.src_key_padding_mask (
Tensor|None) – optional padding mask for the batch.is_causal (
bool) – ifTrue, applies a causal mask to the self-attention block.src_positions (
Tensor|None) – Optional source-token positions for positional attention backends.src_position_mask (
Tensor|None) – Optional boolean mask for valid source positions.
- Return type: