PositionalAttentionBase#

class PositionalAttentionBase(*args, **kwargs)[source]#

Bases: Module, ABC

Abstract interface for attention blocks with explicit positional branches.

The convention in this module is that positional information is provided as a function, defined by a submodule, acting on the query and key streams only. Values are left unchanged:

\[ ilde{\mathbf{q}} = \mathbf{q} + \phi_\theta(\mathbf{P}_Q), \qquad ilde{\mathbf{k}} = \mathbf{k} + \phi_\theta(\mathbf{P}_K), \qquad ilde{\mathbf{v}} = \mathbf{v}.\]

A standard multi-head attention operator is then applied to \((\tilde{\mathbf{q}}, \tilde{\mathbf{k}}, \tilde{\mathbf{v}})\). If the positional branch is the identity/no-op map, the module reduces to torch.nn.MultiheadAttention.

Concrete implementations may ignore any positional arguments they do not use, but the forward signature stays stable so encoder and decoder layers can call all attention backends uniformly.

Shape#

query, key, value: (B, P, D).
q_positions, k_positions: (P,) or (B, P). When omitted, they default to torch.arange(P) for the corresponding sequence.
q_position_mask, k_position_mask: (P,) or (B, P) boolean masks.
attn_mask: any attention mask layout accepted by
torch.nn.MultiheadAttention.
key_padding_mask: (B, S) boolean or additive padding mask.
Returns: attention output with the same leading layout as the input, plus
optional attention weights.

Attributes:#

This base class does not define any storage beyond the standard torch.nn.Module state. Concrete subclasses define their own positional encoder and attention parameters.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstractmethod forward(query, key, value, *, q_positions=None, k_positions=None, q_position_mask=None, k_position_mask=None, attn_mask=None, key_padding_mask=None, need_weights=False, is_causal=False)[source]#

Apply attention with optional explicit position metadata.

Return type:

tuple[Tensor, Tensor | None]

Parameters:

query (Tensor)
key (Tensor)
value (Tensor)
q_positions (Tensor | None)
k_positions (Tensor | None)
q_position_mask (Tensor | None)
k_position_mask (Tensor | None)
attn_mask (Tensor | None)
key_padding_mask (Tensor | None)
need_weights (bool)
is_causal (bool)

Shape#

query, key, value: see PositionalAttentionBase.
q_positions, k_positions: position coordinates for the query and
key tokens. They may differ in cross-attention.
Returns: (output, attn_weights) where attn_weights is None
unless need_weights=True.

reset_parameters()[source]#

Initialize parameters to match torch.nn.MultiheadAttention.

Return type:: None

Parameters:

args (Any)
kwargs (Any)