diff --git a/doc/release_notes.rst b/doc/release_notes.rst index 0268e08b..ca582462 100644 --- a/doc/release_notes.rst +++ b/doc/release_notes.rst @@ -51,15 +51,21 @@ Most users should keep calling ``model.solve(...)``. If you want more control, y **Deprecations** * ``Solver.solve_problem``, ``Solver.solve_problem_from_model``, and ``Solver.solve_problem_from_file`` still work but emit a ``DeprecationWarning``. Use ``Solver.from_name(...).solve()`` (or simply ``model.solve(...)``) instead. They will be removed in a future release. +* **Implicit MultiIndex-level projection is deprecated.** Passing an input indexed by a *level* of a stacked-``MultiIndex`` dimension (e.g. per-``period`` bounds onto a ``(period, timestep)`` ``snapshot`` index) emits an ``EvolvingAPIWarning`` — in arithmetic and in ``add_variables`` / ``add_constraints`` — and will raise under the upcoming v1 convention. Project the input onto the dimension explicitly (select with the dimension's level values) to keep current behavior. Affects PyPSA multi-investment models. See Bug Fixes below for details. **Bug Fixes** +* ``add_variables`` / ``add_constraints``: extends 0.7.0's coords-as-truth rule to ``lower``, ``upper`` and ``mask`` for every bound type and dim order. Pandas ``Series`` / ``DataFrame`` bounds or masks missing a dimension are broadcast to ``coords`` instead of being silently dropped (`#709 `__); the variable's dimension order always follows ``coords`` (`#706 `__); bare-tuple coord entries (``coords=[(0, 1, 2)]``) now behave like lists. Mismatched values or extra dims raise ``ValueError`` with a labelled message; sparse-coord masks (formerly a v0.6.3 ``FutureWarning``, #580) raise ``ValueError``, and masks with dims not in the data raise ``ValueError`` instead of ``AssertionError``. +* Pandas inputs whose index names *levels* of a stacked-``MultiIndex`` ``coords`` dimension are now projected onto that dimension: a level subset broadcasts across the others, the full set aligns element-wise. This fixes PyPSA multi-investment arithmetic (e.g. an expression over a ``(period, timestep)`` ``snapshot`` MultiIndex times a ``period``-indexed weighting). In ``add_variables`` / ``add_constraints`` the input must provide a value for every level combination of the MultiIndex or a ``ValueError`` is raised (the error lists the missing combinations). **Implicit level projections are deprecated**: they emit an ``EvolvingAPIWarning`` everywhere — in arithmetic *and* in ``add_variables`` / ``add_constraints`` — and will raise under the upcoming v1 convention. Project the input onto the dimension explicitly (select with the dimension's level values) to keep current behavior. Aligning the full level set with full coverage stays silent. Strict validation also rejects a ``MultiIndex`` input with *unnamed* levels whose combinations don't match ``coords`` (previously a silent bypass, as such inputs can't be projected by level name). +* ``add_piecewise_formulation`` now produces a reproducible dimension order in the broadcast breakpoint array. The previous set-based expansion gave a hash-randomized order that varied between processes. * SOS constraints on masked variables no longer cause solver-specific failures (Gurobi ``IndexError``, Xpress ``?404 Invalid column number``, LP parse errors, silent set corruption). ``Model.solve()`` and ``Model.to_file()`` now raise a clear ``NotImplementedError`` referring users to `#688 `__; pass ``reformulate_sos=True`` as a workaround. * ``Model.solve(..., reformulate_sos=True)`` now actually reformulates SOS constraints even when the solver supports them natively. Previously it was silently ignored with a warning. * Fix Mosek interface to inspect both the basic and IPM solutions and pick the one with the better status, so that an optimal crossover solution is not discarded when IPM terminates with a (near-)Farkas certificate. **Breaking Changes** +* ``add_variables`` / ``add_constraints``: the v0.6.3 ``mask`` deprecations (#580) are now hard ``ValueError``\ s; an unnamed ``pd.MultiIndex`` in sequence-form ``coords`` raises ``TypeError`` unless paired with ``dims=[i]``. See Bug Fixes above. +* Sequence-form ``coords`` entries can no longer be ``xarray.DataArray`` objects — they raise ``TypeError``. Pass the underlying index instead: ``variable.indexes[dim]`` (a ``pd.Index``). * ``available_solvers`` now lists all *installed* solvers, even ones without a working license. If you used it to decide "can I actually solve with X?", switch to ``linopy.licensed_solvers`` or ``SolverClass.license_status()``. * ``Model.solver_model`` and ``Model.solver_name`` are now read-only properties that delegate to ``model.solver``. You can't reassign them (only ``= None`` is allowed, which closes the solver), and ``solver_name`` is ``None`` before the first solve. * ``result.solution.primal`` and ``result.solution.dual`` are now ``numpy`` arrays indexed by linopy's integer labels (with ``NaN`` for slots without a value), instead of pandas Series keyed by variable/constraint name. If you accessed them by name, use ``model.variables[name].solution`` (or ``model.constraints[name].dual``) instead. @@ -67,6 +73,7 @@ Most users should keep calling ``model.solve(...)``. If you want more control, y **Internal** +* ``linopy.common`` provides two DataArray conversion helpers: ``as_dataarray`` (convert only) and ``broadcast_to_coords`` (convert and broadcast against ``coords``). The latter takes ``strict`` (default ``True``): any mismatch with ``coords`` raises, naming ``label`` in the error; ``strict=False`` passes mismatches through for downstream xarray alignment. * Each ``Solver`` subclass now overrides at most three hooks: ``_build_direct`` (build the native model), ``_run_direct`` (run it), and ``_run_file`` (run the solver on an LP/MPS file). File-only solvers (CBC, GLPK, CPLEX, SCIP, Knitro, COPT, MindOpt) only override ``_run_file``. * New ``ConstraintLabelIndex`` cached on ``Model.constraints`` (mirrors the existing ``Variables.label_index``); ``ConstraintBase`` gains ``active_labels()`` and a ``range`` property; ``CSRConstraint`` exposes ``coords``. * ``linopy.common`` gains ``values_to_lookup_array``; the legacy pandas-based helpers ``series_to_lookup_array`` and ``lookup_vals`` are removed. diff --git a/linopy/common.py b/linopy/common.py index 831b6bc8..4147cde6 100644 --- a/linopy/common.py +++ b/linopy/common.py @@ -9,10 +9,10 @@ import operator import os -from collections.abc import Callable, Generator, Hashable, Iterable, Sequence +from collections.abc import Callable, Generator, Hashable, Iterable, Mapping, Sequence from functools import cached_property, partial, reduce, wraps from pathlib import Path -from typing import TYPE_CHECKING, Any, Generic, TypeVar, overload +from typing import TYPE_CHECKING, Any, Generic, Literal, NamedTuple, TypeVar, overload from warnings import warn import numpy as np @@ -23,6 +23,7 @@ from xarray import Coordinates, DataArray, Dataset, apply_ufunc, broadcast from xarray import align as xr_align from xarray.core import dtypes, indexing +from xarray.core.coordinates import CoordinateValidationError from xarray.core.types import JoinOptions, T_Alignable from xarray.namedarray.utils import is_dict_like @@ -30,6 +31,7 @@ from linopy.constants import ( HELPER_DIMS, SIGNS, + EvolvingAPIWarning, SIGNS_alternative, SIGNS_pretty, sign_replace_dict, @@ -221,23 +223,30 @@ def as_dataarray( **kwargs: Any, ) -> DataArray: """ - Convert an object to a DataArray. + Convert ``arr`` to a DataArray. + + Picks the right constructor for each supported input type (pandas, + polars, numpy, scalar, DataArray) and labels positional axes with + ``dims`` / ``coords``. The result is not reshaped against ``coords``: + dims are neither expanded, reordered, nor projected onto MultiIndex + dims. Use :func:`broadcast_to_coords` when + ``coords`` should govern the result's shape. Parameters ---------- - arr: - The input object. - coords (Union[dict, list, None]): - The coordinates for the DataArray. If None, default coordinates will be used. - dims (Union[list, None]): - The dimensions for the DataArray. If None, the dimensions will be automatically generated. - **kwargs: - Additional keyword arguments to be passed to the DataArray constructor. + arr + The input to convert. + coords + Coordinate values used to label positional axes. + dims + Dimension names used to label positional axes. + **kwargs + Forwarded to the underlying DataArray construction. Returns ------- - DataArray: - The converted DataArray. + DataArray + The converted input, dims and entries as ``arr`` provides them. """ if isinstance(arr, pd.Series | pd.DataFrame): arr = pandas_to_dataarray(arr, coords=coords, dims=dims, **kwargs) @@ -276,30 +285,564 @@ def as_dataarray( return arr -def broadcast_mask(mask: DataArray, labels: DataArray) -> DataArray: +def _as_index(coord_values: Any) -> pd.Index: + return ( + coord_values if isinstance(coord_values, pd.Index) else pd.Index(coord_values) + ) + + +def _as_multiindex(coord_values: Any) -> pd.MultiIndex | None: + """Return the backing ``pd.MultiIndex`` of a coords entry, or ``None``.""" + if isinstance(coord_values, pd.MultiIndex): + return coord_values + if isinstance(coord_values, DataArray): + idx = coord_values.to_index() + if isinstance(idx, pd.MultiIndex): + return idx + return None + + +class _LevelProjection(NamedTuple): + """ + Record of one MultiIndex-level projection performed by ``_broadcast_to_coords``. + + Terminology: a stacked MultiIndex dim has *levels* (its component index + names, e.g. ``period`` / ``timestep``) and *level combinations* (its + elements — one tuple per position, e.g. ``(2030, 't1')``). + """ + + dim: Hashable + levels: list[Hashable] + is_partial: bool # input carried only a subset of the MI's levels + has_gap: bool # some level combinations of the MI dim got no value (NaN) + missing: list[Any] # the level combinations that got no value + + +def _project_onto_multiindex_levels( + arr: DataArray, + expected: dict[Hashable, Any], +) -> tuple[DataArray, list[_LevelProjection]]: + """ + Map ``arr`` dims that name levels of a stacked-MultiIndex coords dim onto it. + + For every level combination of the MultiIndex dim, select the ``arr`` + value at that combination's level values. A subset of levels broadcasts + across the remaining ones; the full set aligns element-wise. ``arr`` is + returned unchanged when it carries no level dims. + + Raises ``ValueError`` only on structural errors: a level name owned by + two MI dims, or a level value missing from ``arr``. Partial projections + and coverage gaps are recorded in the returned ``_LevelProjection`` list; + the caller decides how to treat them. + """ + level_owner: dict[Hashable, Hashable] = {} + owner_mi: dict[Hashable, pd.MultiIndex] = {} + for dim, coord_values in expected.items(): + mi = _as_multiindex(coord_values) + if mi is None: + continue + owner_mi[dim] = mi + for level in mi.names: + if level is None: + continue + if level in level_owner: + raise ValueError( + f"Level {level!r} is shared by MultiIndex dimensions " + f"{level_owner[level]!r} and {dim!r}; cannot resolve which " + f"to align to." + ) + level_owner[level] = dim + + groups: dict[Hashable, list[Hashable]] = {} + for d in arr.dims: + if d in expected: + continue + owner = level_owner.get(d) + if owner is not None: + groups.setdefault(owner, []).append(d) + + projections: list[_LevelProjection] = [] + for dim, levels in groups.items(): + mi = owner_mi[dim] + selectors = { + level: DataArray(np.asarray(mi.get_level_values(level)), dims=[dim]) + for level in levels + } + try: + arr = arr.sel(selectors) + except KeyError as err: + raise ValueError( + f"Cannot align level(s) {levels} onto MultiIndex dimension " + f"{dim!r}: value {err} is missing." + ) from err + arr = arr.assign_coords(Coordinates.from_pandas_multiindex(mi, dim)) + # A level combination is "missing" when the projection gave it no + # value at any position of the other dims. + null_mask = arr.isnull() + other_dims = [d for d in arr.dims if d != dim] + if other_dims: + null_mask = null_mask.any(other_dims) + has_gap = bool(null_mask.any()) + missing = list(arr.indexes[dim][null_mask.values]) if has_gap else [] + projections.append( + _LevelProjection( + dim=dim, + levels=levels, + is_partial=len(levels) < sum(name is not None for name in mi.names), + has_gap=has_gap, + missing=missing, + ) + ) + + return arr, projections + + +def _broadcast_to_coords( + arr: Any, + coords: CoordsLike | None = None, + dims: DimsLike | None = None, + **kwargs: Any, +) -> tuple[DataArray, list[_LevelProjection]]: """ - Broadcast a boolean mask to match the shape of labels. + Convert ``arr`` and broadcast it against ``coords`` (shared mechanics). - Ensures that mask dimensions are a subset of labels dimensions, broadcasts - the mask accordingly, and fills any NaN values (from missing coordinates) - with False while emitting a FutureWarning. + Returns the broadcast DataArray together with the MultiIndex-level + projections performed along the way, so the public entry points can + apply their own policy (warn or raise) to partial projections and + coverage gaps. """ - assert set(mask.dims).issubset(labels.dims), ( - "Dimensions of mask not a subset of resulting labels dimensions." + if coords is None: + return as_dataarray(arr, coords, dims, **kwargs), [] + + if isinstance(coords, list | tuple) and any(isinstance(c, tuple) for c in coords): + # xarray reads bare `(a, b)` as `(dim_name, values)`; normalize so a + # coords entry passed as a tuple behaves identically to a list. + coords = [list(c) if isinstance(c, tuple) else c for c in coords] + + expected = _coords_to_dict(coords, dims=dims) + if not expected: + return as_dataarray(arr, coords, dims, **kwargs), [] + + if isinstance(arr, pd.Series | pd.DataFrame): + converted = _named_pandas_to_dataarray(arr) + if converted is not None: + arr = converted + + if not isinstance(arr, DataArray): + # numpy/polars/unnamed-pandas inputs are positional — their only + # meaningful information is the values; any axis labels are + # auto-generated. Default dims to coords' keys so the conversion + # labels axes correctly (instead of dim_0/dim_1), then re-assign + # coords from expected so positional inputs align to coords by + # position. A shape mismatch surfaces here as a clear xarray + # "conflicting sizes" error rather than a confusing + # "coordinates do not match" further down. + if dims is None: + dims = list(expected) + arr = as_dataarray(arr, coords, dims=dims, **kwargs) + # Skip MultiIndex dims — re-assigning a PandasMultiIndex coord emits + # a FutureWarning and isn't needed (the conversion already used it). + arr = arr.assign_coords( + { + d: expected[d] + for d in arr.dims + if d in expected and not isinstance(arr.indexes.get(d), pd.MultiIndex) + } + ) + + arr, projections = _project_onto_multiindex_levels(arr, expected) + + for dim, coord_values in expected.items(): + if dim not in arr.dims: + continue + if isinstance(arr.indexes.get(dim), pd.MultiIndex): + continue + expected_idx = _as_index(coord_values) + actual_idx = arr.coords[dim].to_index() + if actual_idx.equals(expected_idx): + continue + # Same values, different order → reindex to match expected order. + # Different value sets are left alone for downstream xarray alignment. + if len(actual_idx) == len(expected_idx) and set(actual_idx) == set( + expected_idx + ): + arr = arr.reindex({dim: expected_idx}) + + # expand_dims prepends new dimensions and their coordinate variables; + # the subsequent transpose restores coords order. Both are no-ops when + # the array already matches. Reconstruct so the DataArray's coords + # iteration order also follows coords (a Dataset built from this picks + # up its dim order from coord insertion). + expand = {k: v for k, v in expected.items() if k not in arr.dims} + if expand: + # expand_dims drops the level coords of a MultiIndex-backed dim, + # leaving a degenerate flat index that fails to align downstream. + # Broadcast against a proper Coordinates template instead. + plain = {} + for dim, coord_values in expand.items(): + mi = _as_multiindex(coord_values) + # Fall back to expand_dims when arr already carries one of the + # MultiIndex's level names as its own coord: broadcasting against + # the level coords would raise on the conflicting index. + if mi is None or set(mi.names) & (set(arr.coords) | set(arr.dims)): + plain[dim] = coord_values + continue + template = DataArray( + np.zeros(len(mi)), + coords=Coordinates.from_pandas_multiindex(mi, dim), + dims=[dim], + ) + arr, _ = broadcast(arr, template) + if plain: + arr = arr.expand_dims(plain) + + target_dims = tuple(d for d in expected if d in arr.dims) + tuple( + d for d in arr.dims if d not in expected ) - mask = mask.broadcast_like(labels) - if mask.isnull().any(): - warn( - "Mask contains coordinates not covered by the data dimensions. " - "Missing values will be filled with False (masked out). " - "In a future version, this will raise an error. " - "Use mask.reindex() or `linopy.align()` to explicitly handle missing " - "coordinates.", - FutureWarning, - stacklevel=3, + arr = arr.transpose(*target_dims) + + coord_order = [c for c in target_dims if c in arr.coords] + [ + c for c in arr.coords if c not in target_dims + ] + if list(arr.coords) != coord_order: + arr = DataArray( + arr.variable, + coords={c: arr.coords[c] for c in coord_order}, + name=arr.name, + ) + + return arr, projections + + +@overload +def broadcast_to_coords( + arr: Any, + coords: CoordsLike | None = ..., + dims: DimsLike | None = ..., + *, + strict: Literal[True] = ..., + label: str, + **kwargs: Any, +) -> DataArray: ... + + +@overload +def broadcast_to_coords( + arr: Any, + coords: CoordsLike | None = ..., + dims: DimsLike | None = ..., + *, + strict: Literal[False], + label: None = ..., + **kwargs: Any, +) -> DataArray: ... + + +def broadcast_to_coords( + arr: Any, + coords: CoordsLike | None = None, + dims: DimsLike | None = None, + *, + strict: bool = True, + label: str | None = None, + **kwargs: Any, +) -> DataArray: + """ + Convert ``arr`` to a DataArray and broadcast it against ``coords``. + + When ``coords`` carries named dimensions, the result is aligned with + them: positional inputs are labeled by position, shared dims with equal + values in a different order are reindexed, dims missing from ``arr`` + are expanded, dims naming levels of a stacked-MultiIndex coords dim are + projected onto it, and the result is transposed to ``coords`` order. + + ``strict`` decides what happens to anything broadcasting alone cannot + resolve — extra dims, disagreeing coord values, and MultiIndex coverage + gaps: + + - ``strict=True`` (default): raise, naming ``label`` in the error. + - ``strict=False``: pass through unchanged so downstream xarray + alignment can handle them. + + A stacked-MultiIndex dim of ``coords`` has *levels* (its component + index names, e.g. ``period`` / ``timestep``) and *level combinations* + (its elements — one tuple per position, e.g. ``(2030, 't1')``). Inputs + indexed by levels instead of the dim itself are implicitly projected + onto the dim's level combinations. These projections are deprecated in + both modes and emit an :class:`~linopy.EvolvingAPIWarning`; the v1 + convention will require them to be explicit. Two cases: + + - input misses a whole level → broadcasts across it; warns in both modes. + - input gives some level combinations no value (a *coverage gap*) → + warns under ``strict=False``, raises under ``strict=True`` (the error + lists the missing combinations). + + Parameters + ---------- + arr + The input to convert and broadcast. + coords + Coordinate values the result is broadcast against. ``None`` falls + back to plain conversion. + dims + Dimension names used to label positional axes. + strict + Check that the result stays within ``coords`` (raise on violation) + instead of passing violations through. + label + Name of the input in error messages (e.g. ``"lower bound"``). + Required when ``strict=True``, not accepted otherwise. + **kwargs + Forwarded to the underlying DataArray construction. + + Returns + ------- + DataArray + Broadcast against ``coords``. + """ + if not strict: + da, projections = _broadcast_to_coords(arr, coords, dims, **kwargs) + _warn_implicit_projections(projections) + return da + + if label is None: + raise TypeError( + "broadcast_to_coords(strict=True) requires `label` to name the " + "input in error messages, e.g. label='lower bound'." ) - mask = mask.fillna(False).astype(bool) - return mask + subject = label + if coords is not None: + _coords_to_dict(coords, dims=dims) + try: + da, projections = _broadcast_to_coords(arr, coords, dims=dims, **kwargs) + except TypeError as err: + raise TypeError(f"{subject} could not be aligned to coords: {err}") from err + except (ValueError, CoordinateValidationError) as err: + raise ValueError(f"{subject} could not be aligned to coords: {err}") from err + for p in projections: + if p.has_gap: + preview = ", ".join(str(c) for c in p.missing[:5]) + if len(p.missing) > 5: + preview += f", … ({len(p.missing)} in total)" + raise ValueError( + f"{subject} could not be aligned to coords: no value for " + f"{len(p.missing)} level combination(s) of MultiIndex dimension " + f"{p.dim!r}: {preview}. The input is indexed by level(s) " + f"{p.levels} and must cover every combination." + ) + _warn_implicit_projections(projections) + validate_alignment(da, coords, dims=dims, label=label) + return da + + +def _warn_implicit_projections(projections: list[_LevelProjection]) -> None: + """ + Deprecation warnings for implicit MultiIndex-level projections. + + The same check in every mode (scenario B of the #732 / #737 discussion): + implicit projection is deprecated and raises under the v1 convention. The + strict path raises on coverage gaps before reaching here, so only partial + levels warn there; the non-strict path warns for both. + + TODO(#738): migrate to ``warn_legacy()`` / ``LinopySemanticsWarning`` + once the v1 semantics infrastructure (#717) lands. + """ + for p in projections: + if p.is_partial or p.has_gap: + kind = ( + f"broadcasting level subset {p.levels}" + if p.is_partial + else f"filling uncovered level combinations with NaN " + f"(from level(s) {p.levels})" + ) + warn( + f"multiindex-projection: implicitly {kind} onto MultiIndex " + f"dimension {p.dim!r}. This is deprecated and will raise under " + f"the v1 convention; project the input onto the dimension " + f"explicitly (select with the dimension's level values) to " + f"keep current behavior.", + EvolvingAPIWarning, + stacklevel=3, + ) + + +def validate_alignment( + arr: DataArray, + coords: CoordsLike | None, + dims: DimsLike | None = None, + *, + label: str | None = None, +) -> None: + """ + Raise ``ValueError`` if ``arr`` is incompatible with ``coords``. + + ``arr`` is compatible with ``coords`` when both of the following hold: + + - every dim in ``arr.dims`` is also a dim in ``coords`` (no extras); + - for every dim shared between ``arr`` and ``coords``, the coord + values are equal. + + ``dims`` mirrors the ``dims`` argument of ``as_dataarray``: it names + unnamed entries in a sequence-form ``coords`` by position, so + ``coords=[[1, 2, 3]], dims=["x"]`` is enforced the same way as + ``coords={"x": [1, 2, 3]}``. + + ``label`` names the argument in error messages (e.g. ``"lower bound"``). + + No-op when ``coords`` is ``None`` or carries no named dimensions. + """ + if coords is None: + return + expected = _coords_to_dict(coords, dims=dims) + if not expected: + return + subject = label or "Value" + expected_dims = set(expected) + extra = set(arr.dims) - expected_dims + if extra: + raise ValueError( + f"{subject} has dimension(s) {sorted(extra, key=str)} not declared in coords " + f"({sorted(expected_dims, key=str)}). Add them to coords or remove them from " + f"{subject.lower()}." + ) + for dim, coord_values in expected.items(): + if dim not in arr.dims: + continue + expected_mi = _as_multiindex(coord_values) + actual_mi = _as_multiindex(arr.indexes.get(dim)) + if expected_mi is not None or actual_mi is not None: + if ( + expected_mi is None + or actual_mi is None + or not actual_mi.equals(expected_mi) + ): + raise ValueError( + f"{subject}: MultiIndex for dimension {dim!r} does not " + f"match coords." + ) + continue + expected_idx = _as_index(coord_values) + actual_idx = arr.coords[dim].to_index() + if not actual_idx.equals(expected_idx): + raise ValueError( + f"{subject}: coordinate values for dimension {dim!r} do not match " + f"coords — expected {expected_idx.tolist()}, got " + f"{actual_idx.tolist()}." + ) + + +def _coords_to_dict( + coords: Sequence[Sequence | pd.Index] | Mapping, + dims: DimsLike | None = None, +) -> dict[Hashable, Any]: + """ + Normalize coords to a dict mapping dim names to coordinate values. + + Container forms: + + - ``xarray.Coordinates`` → kept dim entries only (MultiIndex level + coords dropped). + - ``Mapping`` → returned as a shallow ``dict`` copy. + - sequence-of-entries → each entry handled per the rules below. + + Sequence-entry rules (``i`` is the position in ``coords``, ``dims[i]`` + is the matching entry in ``dims`` when one exists). An entry is + *unlabeled* if it's an unnamed ``pd.Index`` or a bare ``list`` / + ``tuple`` / ``range`` / ``ndarray``. + + +---------------------------------+-----------------------+-----------+ + | Entry | Naming source | Outcome | + +=================================+=======================+===========+ + | ``pd.Index`` with ``.name`` | ``.name`` | accepted | + +---------------------------------+-----------------------+-----------+ + | unlabeled entry | ``dims[i]`` | accepted | + +---------------------------------+-----------------------+-----------+ + | unlabeled entry | — (no ``dims[i]``) | skipped | + | | | — xarray | + | | | assigns | + | | | ``dim_0`` | + | | | etc. | + +---------------------------------+-----------------------+-----------+ + | ``pd.MultiIndex`` with ``.name``| ``.name`` | accepted | + +---------------------------------+-----------------------+-----------+ + | ``pd.MultiIndex`` w/o ``.name`` | ``dims[i]`` | accepted | + | | | (named on | + | | | a copy) | + +---------------------------------+-----------------------+-----------+ + | ``pd.MultiIndex`` w/o ``.name`` | — (no ``dims[i]``) | TypeError | + +---------------------------------+-----------------------+-----------+ + | anything else (e.g. DataArray) | — | TypeError | + +---------------------------------+-----------------------+-----------+ + """ + if isinstance(coords, Coordinates): + # Coordinates iterates over every coord variable, including + # MultiIndex level coords. Keep only the entries that are dims. + return {d: coords[d] for d in coords.dims if d in coords} + if isinstance(coords, Mapping): + return dict(coords) + dim_names: list[Any] | None = None + if dims is not None: + dim_names = list(dims) if isinstance(dims, list | tuple) else [dims] + result: dict[Hashable, Any] = {} + for i, c in enumerate(coords): + if isinstance(c, pd.MultiIndex): + name = c.name or ( + dim_names[i] if dim_names and i < len(dim_names) else None + ) + if name is None: + raise TypeError( + "MultiIndex coords entries must have .name set so " + "xarray can use it as the dimension name. Set it via " + "`idx.name = 'my_dim'`, or pass `dims=[...]` to name " + "entries by position." + ) + if c.name is None: + c = c.copy() + c.name = name + result[name] = c + elif isinstance(c, pd.Index): + name = ( + c.name + if c.name + else (dim_names[i] if dim_names and i < len(dim_names) else None) + ) + if name is not None: + result[name] = c + elif isinstance(c, list | tuple | range | np.ndarray): + if dim_names and i < len(dim_names): + result[dim_names[i]] = pd.Index(c, name=dim_names[i]) + else: + raise TypeError( + f"coords entries must be pd.Index or an unnamed sequence " + f"(list / tuple / range / numpy.ndarray); got " + f"{type(c).__name__}. For an xarray DataArray coord, pass " + f"`variable.indexes[]` (a pd.Index) instead." + ) + return result + + +def _named_pandas_to_dataarray(arr: pd.Series | pd.DataFrame) -> DataArray | None: + """ + Convert a pandas Series or DataFrame with fully named axes to a DataArray. + + Returns ``None`` if any axis (or MultiIndex level) is unnamed or + non-string, so the caller can fall back to ``as_dataarray``. + """ + names = list(arr.index.names) + if isinstance(arr, pd.DataFrame): + names += list(arr.columns.names) + if any(not isinstance(n, str) for n in names): + return None + + if isinstance(arr, pd.DataFrame): + if isinstance(arr.index, pd.MultiIndex) or isinstance( + arr.columns, pd.MultiIndex + ): + arr = arr.stack(list(range(arr.columns.nlevels)), future_stack=True) + return arr.to_xarray() + return DataArray(arr) + + return arr.to_xarray() # TODO: rename to to_pandas_dataframe diff --git a/linopy/expressions.py b/linopy/expressions.py index 31868234..673eaba9 100644 --- a/linopy/expressions.py +++ b/linopy/expressions.py @@ -49,6 +49,7 @@ LocIndexer, as_dataarray, assign_multiindex_safe, + broadcast_to_coords, check_common_keys_values, check_has_nulls, check_has_nulls_polars, @@ -582,7 +583,9 @@ def _add_constant( # so that missing data does not silently propagate through arithmetic. if np.isscalar(other) and join is None: return self.assign(const=self.const.fillna(0) + other) - da = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + da = broadcast_to_coords( + other, coords=self.coords, dims=self.coord_dims, strict=False + ) self_const, da, needs_data_reindex = self._align_constant( da, fill_value=0, join=join ) @@ -611,7 +614,9 @@ def _apply_constant_op( - factor (other) is filled with fill_value (0 for mul, 1 for div) - coeffs and const are filled with 0 (additive identity) """ - factor = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + factor = broadcast_to_coords( + other, coords=self.coords, dims=self.coord_dims, strict=False + ) self_const, factor, needs_data_reindex = self._align_constant( factor, fill_value=fill_value, join=join ) @@ -1103,7 +1108,9 @@ def to_constraint( ) if isinstance(rhs, CONSTANT_TYPES): - rhs = as_dataarray(rhs, coords=self.coords, dims=self.coord_dims) + rhs = broadcast_to_coords( + rhs, coords=self.coords, dims=self.coord_dims, strict=False + ) extra_dims = set(rhs.dims) - set(self.coord_dims) if extra_dims: @@ -1865,7 +1872,7 @@ def from_rule( cls, model: Model, rule: Callable, - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = None, + coords: Sequence[Sequence | pd.Index] | Mapping | None = None, ) -> LinearExpression: """ Create a linear expression from a rule and a set of coordinates. @@ -2290,7 +2297,7 @@ def as_expression( model : linopy.Model, optional Assigned model, by default None **kwargs : - Keyword arguments passed to `linopy.as_dataarray`. + Keyword arguments passed to `linopy.common.broadcast_to_coords`. Returns ------- @@ -2307,7 +2314,7 @@ def as_expression( return obj.to_linexpr() else: try: - obj = as_dataarray(obj, **kwargs) + obj = broadcast_to_coords(obj, strict=False, **kwargs) except ValueError as e: raise ValueError("Cannot convert to LinearExpression") from e return LinearExpression(obj, model) diff --git a/linopy/model.py b/linopy/model.py index 48a8200b..aa0e5d29 100644 --- a/linopy/model.py +++ b/linopy/model.py @@ -20,7 +20,7 @@ import pandas as pd import xarray as xr from deprecation import deprecated -from numpy import inf, ndarray +from numpy import inf from pandas.core.frame import DataFrame from pandas.core.series import Series from xarray import DataArray, Dataset @@ -31,7 +31,7 @@ as_dataarray, assign_multiindex_safe, best_int, - broadcast_mask, + broadcast_to_coords, maybe_replace_signs, replace_by_map, to_path, @@ -112,73 +112,6 @@ logger = logging.getLogger(__name__) -def _coords_to_dict( - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping, -) -> dict[str, Any]: - """Normalize coords to a dict mapping dim names to coordinate values.""" - if isinstance(coords, Mapping): - return dict(coords) - # Sequence of indexes - result: dict[str, Any] = {} - for c in coords: - if isinstance(c, pd.Index) and c.name: - result[c.name] = c - return result - - -def _validate_dataarray_bounds(arr: Any, coords: Any) -> Any: - """ - Validate and expand DataArray bounds against explicit coords. - - If ``arr`` is not a DataArray, return it unchanged (``as_dataarray`` - will handle conversion). For DataArray inputs: - - - Raises ``ValueError`` if the array has dimensions not in coords. - - Raises ``ValueError`` if shared dimension coordinates don't match. - - Expands missing dimensions via ``expand_dims``. - """ - if not isinstance(arr, DataArray): - return arr - - expected = _coords_to_dict(coords) - if not expected: - return arr - - extra = set(arr.dims) - set(expected) - if extra: - raise ValueError(f"DataArray has extra dimensions not in coords: {extra}") - - for dim, coord_values in expected.items(): - if dim not in arr.dims: - continue - if isinstance(arr.indexes.get(dim), pd.MultiIndex): - continue - expected_idx = ( - coord_values - if isinstance(coord_values, pd.Index) - else pd.Index(coord_values) - ) - actual_idx = arr.coords[dim].to_index() - if not actual_idx.equals(expected_idx): - # Same values, different order → reindex to match expected order - if len(actual_idx) == len(expected_idx) and set(actual_idx) == set( - expected_idx - ): - arr = arr.reindex({dim: expected_idx}) - else: - raise ValueError( - f"Coordinates for dimension '{dim}' do not match: " - f"expected {expected_idx.tolist()}, got {actual_idx.tolist()}" - ) - - # Expand missing dimensions - expand = {k: v for k, v in expected.items() if k not in arr.dims} - if expand: - arr = arr.expand_dims(expand) - - return arr - - class Model: """ Linear optimization model. @@ -657,9 +590,9 @@ def add_variables( self, lower: Any = -inf, upper: Any = inf, - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = None, + coords: Sequence[Sequence | pd.Index] | Mapping | None = None, name: str | None = None, - mask: DataArray | ndarray | Series | None = None, + mask: MaskLike | None = None, binary: bool = False, integer: bool = False, semi_continuous: bool = False, @@ -682,12 +615,27 @@ def add_variables( upper : TYPE, optional Upper bound of the variable(s). Ignored if `binary` is True. The default is inf. - coords : list/xarray.Coordinates, optional - The coords of the variable array. - These are directly passed to the DataArray creation of - `lower` and `upper`. For every single combination of - coordinates a optimization variable is added to the model. - The default is None. + coords : list/dict/xarray.Coordinates, optional + The coords of the variable array. When provided with **named + dimensions** (a ``Mapping``, ``xarray.Coordinates``, a + sequence of named ``pd.Index`` objects, or an unnamed + sequence paired with ``dims=`` in ``**kwargs``), ``coords`` + is the source of truth for the variable's dimensions, + order, and values. ``lower``, ``upper`` and ``mask`` are + aligned to this contract: + + - dims of every bound must be a subset of ``coords.dims``; + extra dims raise ``ValueError``; + - dim order in the variable always follows ``coords``; + - shared-dim coordinate values must equal ``coords``; same + values in a different order are auto-reindexed, different + value sets raise ``ValueError``; + - dims listed in ``coords`` but missing from a bound are + broadcast to ``coords`` shape. + + One optimization variable is added per combination of + coordinates. The default is ``None``, in which case the + shape is inferred from the bounds. name : str, optional Reference name of the added variables. The default None results in a name like "var1", "var2" etc. @@ -740,6 +688,67 @@ def add_variables( [7]: x[7] ∈ [0, inf] [8]: x[8] ∈ [0, inf] [9]: x[9] ∈ [0, inf] + + Strict coords-as-truth: a bound with an extra dim raises. + + >>> import xarray as xr + >>> m = Model() + >>> bad = xr.DataArray( + ... [[1.0, 2.0, 3.0]] * 2, + ... dims=["extra", "x"], + ... coords={"x": [0, 1, 2]}, + ... ) + >>> m.add_variables(lower=bad, coords=[pd.Index([0, 1, 2], name="x")], name="v") + Traceback (most recent call last): + ... + ValueError: lower bound has dimension(s) ['extra'] not declared in coords ... + + Strict coords-as-truth: a bound whose shared-dim values don't + match raises. + + >>> m = Model() + >>> wrong = xr.DataArray( + ... [1.0, 2.0, 3.0], dims=["x"], coords={"x": [10, 20, 30]} + ... ) + >>> m.add_variables( + ... lower=wrong, coords=[pd.Index([0, 1, 2], name="x")], name="v" + ... ) + Traceback (most recent call last): + ... + ValueError: lower bound: coordinate values for dimension 'x' do not match coords ... + + Strict coords-as-truth, helpful side: a bound whose coord values + match ``coords`` only in a different order is auto-reindexed. + + >>> m = Model() + >>> reordered = xr.DataArray( + ... [3.0, 1.0, 2.0], dims=["x"], coords={"x": ["c", "a", "b"]} + ... ) + >>> v = m.add_variables( + ... lower=reordered, + ... coords=[pd.Index(["a", "b", "c"], name="x")], + ... name="r", + ... ) + >>> v.lower.values.tolist() + [1.0, 2.0, 3.0] + + Unnamed-coords sequence + ``dims=`` opts into the same strict + enforcement as a named index — extra dims still raise. + + >>> m = Model() + >>> m.add_variables(lower=bad, coords=[[0, 1, 2]], dims=["x"], name="w") + Traceback (most recent call last): + ... + ValueError: lower bound has dimension(s) ['extra'] not declared in coords ... + + The same strict contract applies to ``mask`` (including with + ``coords=[[...]], dims=[...]``). + + >>> m = Model() + >>> m.add_variables(mask=bad, coords=[[0, 1, 2]], dims=["x"], name="wm") + Traceback (most recent call last): + ... + ValueError: mask has dimension(s) ['extra'] not declared in coords ... """ if name is None: name = f"var{self._varnameCounter}" @@ -765,14 +774,12 @@ def add_variables( "Semi-continuous variables require a positive scalar lower bound." ) - if coords is not None: - lower = _validate_dataarray_bounds(lower, coords) - upper = _validate_dataarray_bounds(upper, coords) - + lower_da = broadcast_to_coords(lower, coords, label="lower bound", **kwargs) + upper_da = broadcast_to_coords(upper, coords, label="upper bound", **kwargs) data = Dataset( { - "lower": as_dataarray(lower, coords, **kwargs), - "upper": as_dataarray(upper, coords, **kwargs), + "lower": lower_da, + "upper": upper_da, "labels": -1, } ) @@ -781,8 +788,12 @@ def add_variables( self._check_valid_dim_names(data) if mask is not None: - mask = as_dataarray(mask, coords=data.coords, dims=data.dims).astype(bool) - mask = broadcast_mask(mask, data.labels) + mask = broadcast_to_coords( + mask, + coords if coords is not None else data.coords, + label="mask", + **kwargs, + ).astype(bool) # Auto-mask based on NaN in bounds (use numpy for speed) if self.auto_mask: @@ -891,7 +902,7 @@ def add_constraints( sign: SignLike | None = ..., rhs: ConstantLike | VariableLike | ExpressionLike | None = ..., name: str | None = ..., - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = ..., + coords: Sequence[Sequence | pd.Index] | Mapping | None = ..., mask: MaskLike | None = ..., freeze: Literal[False] = ..., ) -> Constraint: ... @@ -907,7 +918,7 @@ def add_constraints( sign: SignLike | None = ..., rhs: ConstantLike | VariableLike | ExpressionLike | None = ..., name: str | None = ..., - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = ..., + coords: Sequence[Sequence | pd.Index] | Mapping | None = ..., mask: MaskLike | None = ..., freeze: Literal[True] = ..., ) -> CSRConstraint: ... @@ -922,7 +933,7 @@ def add_constraints( sign: SignLike | None = None, rhs: ConstantLike | VariableLike | ExpressionLike | None = None, name: str | None = None, - coords: Sequence[Sequence | pd.Index | DataArray] | Mapping | None = None, + coords: Sequence[Sequence | pd.Index] | Mapping | None = None, mask: MaskLike | None = None, freeze: bool | None = None, ) -> ConstraintBase: @@ -1046,8 +1057,7 @@ def add_constraints( (data,) = xr.broadcast(data, exclude=[TERM_DIM]) if mask is not None: - mask = as_dataarray(mask, coords=data.coords, dims=data.dims).astype(bool) - mask = broadcast_mask(mask, data.labels) + mask = broadcast_to_coords(mask, data.coords, label="mask").astype(bool) # Auto-mask based on null expressions or NaN RHS (use numpy for speed) if self.auto_mask: @@ -1428,7 +1438,7 @@ def calculate_block_maps(self) -> None: @overload def linexpr( - self, *args: Sequence[Sequence | pd.Index | DataArray] | Mapping + self, *args: Sequence[Sequence | pd.Index] | Mapping ) -> LinearExpression: ... @overload @@ -1441,7 +1451,7 @@ def linexpr( *args: tuple[ConstantLike, str | Variable | ScalarVariable] | ConstantLike | Callable - | Sequence[Sequence | pd.Index | DataArray] + | Sequence[Sequence | pd.Index] | Mapping, ) -> LinearExpression: """ diff --git a/linopy/piecewise.py b/linopy/piecewise.py index ccc265a7..25a0ce17 100644 --- a/linopy/piecewise.py +++ b/linopy/piecewise.py @@ -1006,20 +1006,18 @@ def _broadcast_points( lin_exprs = [_to_linexpr(e) for e in exprs] - target_dims: set[str] = set() - for le in lin_exprs: - target_dims.update(str(d) for d in le.coord_dims) - - missing = target_dims - skip - {str(d) for d in points.dims} - if not missing: - return points + point_dims = {str(d) for d in points.dims} + # Iterate exprs/dims in order; a set would give a hash-dependent, + # run-varying expanded dimension order. expand_map: dict[str, list] = {} - for d in missing: - for le in lin_exprs: + for le in lin_exprs: + for dim in le.coord_dims: + d = str(dim) + if d in skip or d in point_dims or d in expand_map: + continue if d in le.coords: - expand_map[str(d)] = list(le.coords[d].values) - break + expand_map[d] = list(le.coords[d].values) if expand_map: points = points.expand_dims(expand_map) diff --git a/linopy/sos_reformulation.py b/linopy/sos_reformulation.py index 1f17ee92..4abfb755 100644 --- a/linopy/sos_reformulation.py +++ b/linopy/sos_reformulation.py @@ -119,7 +119,7 @@ def reformulate_sos1( upper_name = f"{prefix}{name}_upper" card_name = f"{prefix}{name}_card" - coords = [var.coords[d] for d in var.dims] + coords = [var.indexes[d] for d in var.dims] y = model.add_variables(coords=coords, name=y_name, binary=True) model.add_constraints(var <= M * y, name=upper_name) @@ -173,9 +173,9 @@ def reformulate_sos2( card_name = f"{prefix}{name}_card" z_coords = [ - pd.Index(var.coords[sos_dim].values[:-1], name=sos_dim) + pd.Index(var.indexes[sos_dim][:-1], name=sos_dim) if d == sos_dim - else var.coords[d] + else var.indexes[d] for d in var.dims ] z = model.add_variables(coords=z_coords, name=z_name, binary=True) diff --git a/linopy/types.py b/linopy/types.py index aca72082..6b4cf712 100644 --- a/linopy/types.py +++ b/linopy/types.py @@ -23,10 +23,7 @@ from linopy.variables import ScalarVariable, Variable CoordsLike: TypeAlias = ( - Sequence[Sequence | Index | DataArray] - | Mapping - | DataArrayCoordinates - | DatasetCoordinates + Sequence[Sequence | Index] | Mapping | DataArrayCoordinates | DatasetCoordinates ) DimsLike: TypeAlias = str | Iterable[Hashable] diff --git a/linopy/variables.py b/linopy/variables.py index cbf2fb87..755a3afc 100644 --- a/linopy/variables.py +++ b/linopy/variables.py @@ -37,6 +37,7 @@ VariableLabelIndex, as_dataarray, assign_multiindex_safe, + broadcast_to_coords, check_has_nulls, check_has_nulls_polars, filter_nulls_polars, @@ -327,7 +328,9 @@ def to_linexpr( linopy.LinearExpression Linear expression with the variables and coefficients. """ - coefficient = as_dataarray(coefficient, coords=self.coords, dims=self.dims) + coefficient = broadcast_to_coords( + coefficient, coords=self.coords, dims=self.dims, strict=False + ) coefficient = coefficient.reindex_like(self.labels, fill_value=0) coefficient = coefficient.fillna(0) ds = Dataset({"coeffs": coefficient, "vars": self.labels}).expand_dims( diff --git a/test/test_common.py b/test/test_common.py index 0c379a0b..692cb910 100644 --- a/test/test_common.py +++ b/test/test_common.py @@ -5,7 +5,9 @@ @author: fabian """ +import warnings from collections.abc import Callable +from typing import Any import numpy as np import pandas as pd @@ -15,16 +17,18 @@ from xarray import DataArray from xarray.testing.assertions import assert_equal -from linopy import LinearExpression, Model, Variable +from linopy import EvolvingAPIWarning, LinearExpression, Model, Variable from linopy.common import ( align, as_dataarray, assign_multiindex_safe, best_int, + broadcast_to_coords, get_dims_with_index_levels, is_constant, iterate_slices, maybe_group_terms_polars, + validate_alignment, ) from linopy.testing import assert_linequal, assert_varequal from linopy.types import CoordsLike @@ -345,6 +349,7 @@ def test_as_dataarray_with_ndarray_coords_dict_dims_aligned() -> None: def test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned() -> None: + """as_dataarray converts only: dims label the axes, extra coord entries are dropped.""" target_dims = ("dim_0", "dim_1") target_coords = {"dim_0": ["a", "b"], "dim_2": ["A", "B"]} arr = np.array([[1, 2], [3, 4]]) @@ -354,6 +359,18 @@ def test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned() -> None: assert "dim_2" not in da.coords +def test_broadcast_to_coords_with_ndarray_coords_dict_set_dims_not_aligned() -> None: + """Coords is source of truth: extra coord entries broadcast into the result.""" + target_dims = ("dim_0", "dim_1") + target_coords = {"dim_0": ["a", "b"], "dim_2": ["A", "B"]} + arr = np.array([[1, 2], [3, 4]]) + da = broadcast_to_coords(arr, coords=target_coords, dims=target_dims, strict=False) + # dims labels the positional axes; coords adds dim_2 by broadcast. + assert set(da.dims) == {"dim_0", "dim_1", "dim_2"} + assert list(da.coords["dim_0"].values) == ["a", "b"] + assert list(da.coords["dim_2"].values) == ["A", "B"] + + def test_as_dataarray_with_number() -> None: num = 1 da = as_dataarray(num, dims=["dim1"], coords=[["a"]]) @@ -483,6 +500,423 @@ def test_as_dataarray_with_unsupported_type() -> None: as_dataarray(lambda x: 1, dims=["dim1"], coords=[["a"]]) +def test_broadcast_to_coords_preserves_extra_dims() -> None: + """Extra dims in the input are not rejected — they broadcast downstream.""" + arr = DataArray( + [[1, 2], [3, 4], [5, 6]], + dims=["a", "t"], + coords={"a": [0, 1, 2], "t": [10, 20]}, + ) + coords = {"a": [0, 1, 2]} + da = broadcast_to_coords(arr, coords=coords, strict=False) + assert set(da.dims) == {"a", "t"} + assert list(da.coords["t"].values) == [10, 20] + + +def test_broadcast_to_coords_keeps_disjoint_shared_dim_values() -> None: + """Different value sets on a shared dim are passed through (xr.align handles).""" + arr = DataArray([1, 2, 3, 4, 5], dims=["a"], coords={"a": [0, 1, 2, 3, 4]}) + coords = {"a": [2, 3]} + da = broadcast_to_coords(arr, coords=coords, strict=False) + # No exception, no reindex; downstream alignment intersects. + assert list(da.coords["a"].values) == [0, 1, 2, 3, 4] + + +def test_broadcast_to_coords_expands_missing_multiindex_dim_keeps_levels() -> None: + """ + Broadcasting a missing MultiIndex dim must keep its level coords intact. + + expand_dims drops MultiIndex level coords, leaving a degenerate flat + index that fails to align downstream (PyPSA multi-investment regression). + """ + midx = pd.MultiIndex.from_tuples( + [(2020, 0), (2020, 1), (2030, 0), (2030, 1)], + names=["period", "timestep"], + ) + midx.name = "snapshot" + sc = xr.Coordinates.from_pandas_multiindex(midx, "snapshot") + labels = DataArray( + [[1], [2], [3], [4]], coords={**sc, "name": ["1"]}, dims=["snapshot", "name"] + ) + coeff = broadcast_to_coords( + DataArray([1.0], coords={"name": ["1"]}, dims=["name"]), + coords=labels.coords, + dims=labels.dims, + strict=False, + ) + assert set(coeff.xindexes) == {"snapshot", "period", "timestep", "name"} + coeff.reindex_like(labels, fill_value=0) + + +def test_broadcast_to_coords_broadcasts_single_multiindex_level() -> None: + """ + A constant indexed by one MultiIndex level broadcasts across the MI dim. + + PyPSA multi-investment multiplies an expression over a (period, timestep) + 'snapshot' MultiIndex by a weighting indexed only by 'period'. Each entry + of the MultiIndex must pick up its level's value. + """ + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + by_level1 = DataArray([10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"]) + + with pytest.warns(EvolvingAPIWarning, match=r"broadcasting level subset"): + da = broadcast_to_coords(by_level1, coords=coords, dims=["dim_3"], strict=False) + + assert da.dims == ("dim_3",) + assert isinstance(da.indexes["dim_3"], pd.MultiIndex) + assert da.sel(dim_3=(1, "a")).item() == 10.0 + assert da.sel(dim_3=(1, "b")).item() == 10.0 + assert da.sel(dim_3=(2, "a")).item() == 20.0 + assert da.sel(dim_3=(2, "b")).item() == 20.0 + + +def test_broadcast_to_coords_stacks_full_multiindex_levels() -> None: + """ + A constant indexed by all MI level names stacks element-wise into the MI dim. + + PyPSA's storage_weightings is a pandas Series over a (period, timestep) + MultiIndex subset (the last snapshot of each period); it must align onto + the matching level combinations of the 'snapshot' MultiIndex. Combinations the subset does + not cover are left as NaN (broadcast path). + """ + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + subset = pd.MultiIndex.from_tuples([(1, "a"), (2, "b")], names=["level1", "level2"]) + weights = pd.Series([10.0, 20.0], index=subset) + + with pytest.warns( + EvolvingAPIWarning, match=r"filling uncovered level combinations" + ): + da = broadcast_to_coords(weights, coords=coords, dims=["dim_3"], strict=False) + + assert da.dims == ("dim_3",) + assert isinstance(da.indexes["dim_3"], pd.MultiIndex) + assert da.sel(dim_3=(1, "a")).item() == 10.0 + assert da.sel(dim_3=(2, "b")).item() == 20.0 + assert np.isnan(da.sel(dim_3=(1, "b")).item()) + assert np.isnan(da.sel(dim_3=(2, "a")).item()) + + +def test_broadcast_to_coords_full_multiindex_full_coverage_is_silent() -> None: + """ + Full-level, fully-covering alignment is convention-clean → no warning. + + Aligning an input that reconstructs the whole MultiIndex onto its dim is + equivalent to the input already carrying that dim (future §11), so it must + not emit the EvolvingAPIWarning the partial/gap projections do. + """ + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + full = pd.Series([1.0, 2.0, 3.0, 4.0], index=idx) + + with warnings.catch_warnings(): + warnings.simplefilter("error", EvolvingAPIWarning) + da = broadcast_to_coords(full, coords=coords, dims=["dim_3"], strict=False) + + assert da.dims == ("dim_3",) + assert da.values.tolist() == [1.0, 2.0, 3.0, 4.0] + + +def test_broadcast_to_coords_level_projection_ambiguous_raises() -> None: + """A level name shared by two MI dims cannot be resolved.""" + a = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("shared", "x")) + b = pd.MultiIndex.from_product([[1, 2], ["c", "d"]], names=("shared", "y")) + coords = { + **xr.Coordinates.from_pandas_multiindex(a, "dimA"), + **xr.Coordinates.from_pandas_multiindex(b, "dimB"), + } + arr = DataArray([1.0, 2.0], coords={"shared": [1, 2]}, dims=["shared"]) + + with pytest.raises(ValueError, match=r"shared.*shared by MultiIndex"): + broadcast_to_coords(arr, coords=coords, strict=False) + + +def test_broadcast_to_coords_level_projection_missing_value_raises() -> None: + """A level value absent from the input cannot be broadcast.""" + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + by_level1 = DataArray([10.0, 20.0], coords={"level1": [1, 9]}, dims=["level1"]) + + with pytest.raises(ValueError, match=r"Cannot align level.*is missing"): + broadcast_to_coords(by_level1, coords=coords, dims=["dim_3"], strict=False) + + +def test_broadcast_to_coords_unrelated_multiindex_series_still_unstacks() -> None: + """A MI Series whose levels match no coords MI dim keeps unstacking.""" + sub = pd.MultiIndex.from_product([["p", "q"], [1, 2]], names=["foo", "bar"]) + series = pd.Series([1.0, 2.0, 3.0, 4.0], index=sub) + + da = broadcast_to_coords(series, coords={"time": [0, 1, 2]}, strict=False) + + assert set(da.dims) == {"time", "foo", "bar"} + + +# --------------------------------------------------------------------------- +# Strictness: as_dataarray (convert) ⊂ broadcast_to_coords(strict=False) ⊂ broadcast_to_coords(strict=True) +# --------------------------------------------------------------------------- + + +def test_as_dataarray_does_not_expand_missing_coord_dims() -> None: + """as_dataarray converts; only broadcast_to_coords expands missing dims.""" + coords = {"a": [0, 1], "b": [10, 20]} + arr = np.array([1, 2]) + + converted = as_dataarray(arr, coords=coords, dims=["a"]) + assert converted.dims == ("a",) + + broadcast = broadcast_to_coords(arr, coords=coords, dims=["a"], strict=False) + assert broadcast.dims == ("a", "b") + + +def test_extra_dims_pass_broadcast_rung_fail_strict_rung() -> None: + """Extra dims pass through the broadcast rung but fail the strict rung.""" + arr = DataArray( + [[1, 2], [3, 4]], dims=["a", "t"], coords={"a": [0, 1], "t": [10, 20]} + ) + coords = {"a": [0, 1]} + + da = broadcast_to_coords(arr, coords=coords, strict=False) + assert set(da.dims) == {"a", "t"} + + with pytest.raises(ValueError, match=r"not declared in coords"): + broadcast_to_coords(arr, coords, label="lower bound") + + +def test_broadcast_to_coords_rejects_multiindex_coverage_gap() -> None: + """A coverage gap warns on the broadcast rung but raises on the strict rung.""" + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + subset = pd.MultiIndex.from_tuples([(1, "a"), (2, "b")], names=["level1", "level2"]) + weights = pd.Series([10.0, 20.0], index=subset) + + with pytest.warns( + EvolvingAPIWarning, match=r"filling uncovered level combinations" + ): + broadcast_to_coords(weights, coords=coords, dims=["dim_3"], strict=False) + + with pytest.raises(ValueError, match=r"no value for .* level combination"): + broadcast_to_coords(weights, coords, dims=["dim_3"], label="lower bound") + + +def test_broadcast_to_coords_rejects_unnamed_multiindex_mismatch() -> None: + """ + A MultiIndex input with unnamed levels cannot be projected by level name, + so it keeps its own index under the coords dim. The strict rung must still + reject it when its level combinations don't cover coords, just as the + named-level coverage-gap case does. + """ + idx = pd.MultiIndex.from_product([[2020, 2030], ["t1", "t2"]], names=("p", "t")) + idx.name = "snapshot" + coords = xr.Coordinates.from_pandas_multiindex(idx, "snapshot") + sparse_unnamed = pd.Series({(2020, "t1"): 1.0, (2030, "t2"): 2.0}) + + with pytest.raises(ValueError, match=r"MultiIndex for dimension 'snapshot'"): + broadcast_to_coords( + sparse_unnamed, coords, dims=["snapshot"], label="lower bound" + ) + + +def test_broadcast_to_coords_strict_partial_level_warns() -> None: + """ + Per-level bounds broadcast across the MI dim, with the deprecation warning. + + Scenario B (#732 / #737 discussion): implicit MI-level projection is + deprecated everywhere, including the strict (bounds/mask) path, and will + raise under the v1 convention. + """ + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + by_level1 = DataArray([10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"]) + + with pytest.warns(EvolvingAPIWarning, match=r"broadcasting level subset"): + da = broadcast_to_coords(by_level1, coords, dims=["dim_3"], label="lower bound") + + assert da.sel(dim_3=(1, "b")).item() == 10.0 + assert da.sel(dim_3=(2, "a")).item() == 20.0 + + +def test_validate_alignment_rejects_extra_dims() -> None: + arr = DataArray( + [[1, 2], [3, 4]], dims=["a", "b"], coords={"a": [0, 1], "b": [0, 1]} + ) + with pytest.raises(ValueError, match=r"not declared in coords"): + validate_alignment(arr, {"a": [0, 1]}) + + +def test_validate_alignment_rejects_value_mismatch() -> None: + arr = DataArray([1, 2, 3], dims=["a"], coords={"a": [0, 1, 2]}) + with pytest.raises(ValueError, match="do not match coords"): + validate_alignment(arr, {"a": [10, 20, 30]}) + + +def test_validate_alignment_allows_subset_dims() -> None: + """arr.dims ⊂ coords.dims is fine (broadcasting fills the missing dim).""" + arr = DataArray([1, 2, 3], dims=["a"], coords={"a": [0, 1, 2]}) + validate_alignment(arr, {"a": [0, 1, 2], "b": [10, 20]}) # no raise + + +def test_validate_alignment_unnamed_coords_and_dims() -> None: + """coords=[[...]], dims=[...] enforces the same contract as a named mapping.""" + arr = DataArray([1, 2, 3], dims=["x"], coords={"x": [0, 1, 2]}) + validate_alignment(arr, [[0, 1, 2]], dims=["x"]) # no raise + + bad = DataArray( + [[1, 2], [3, 4]], dims=["x", "y"], coords={"x": [0, 1], "y": [0, 1]} + ) + with pytest.raises(ValueError, match=r"not declared in coords"): + validate_alignment(bad, [[0, 1]], dims=["x"]) + + +def test_validate_alignment_label_in_error() -> None: + arr = DataArray( + [[1, 2], [3, 4]], dims=["a", "b"], coords={"a": [0, 1], "b": [0, 1]} + ) + with pytest.raises(ValueError, match=r"lower bound has dimension\(s\) \['b'\]"): + validate_alignment(arr, {"a": [0, 1]}, label="lower bound") + + +def test_broadcast_to_coords_strict_requires_label() -> None: + """strict=True without label raises: errors must name their subject.""" + with pytest.raises(TypeError, match=r"requires `label`"): + broadcast_to_coords(np.array([1, 2]), {"x": [0, 1]}) # type: ignore[call-overload] + + +def test_broadcast_to_coords_wraps_conversion_errors() -> None: + with pytest.raises(ValueError, match=r"lower bound could not be aligned"): + broadcast_to_coords(np.array([1, 2]), {"x": [0, 1, 2]}, label="lower bound") + + +def test_broadcast_to_coords_preserves_type_errors() -> None: + """Unsupported input types stay TypeError (don't become ValueError).""" + with pytest.raises(TypeError, match=r"lower bound could not be aligned"): + broadcast_to_coords(lambda x: x, {"x": [0, 1, 2]}, label="lower bound") + + +def test_broadcast_to_coords_does_not_relabel_coords_errors() -> None: + """Coords-side TypeError carries its own message, not the value label.""" + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["i", "j"]) + with pytest.raises(TypeError, match=r"MultiIndex.*must have \.name set"): + broadcast_to_coords(np.array([1, 2, 3, 4]), [mi], label="lower bound") + + +class TestCoordsToDictRules: + """ + One test per row of the ``_coords_to_dict`` rules table. + + Each test name states the rule it pins; the assertions show the + expected outcome. Together they form the executable spec of how + sequence-form ``coords`` entries are named. + """ + + @staticmethod + def _parse(coords: Any, dims: Any = None) -> dict: + from linopy.common import _coords_to_dict + + return _coords_to_dict(coords, dims=dims) + + # -- container forms --------------------------------------------------- + + def test_mapping_is_returned_as_shallow_dict_copy(self) -> None: + src = {"x": [0, 1, 2], "y": [10, 20]} + result = self._parse(src) + assert result == src + assert result is not src + + def test_xarray_coordinates_keeps_only_dim_entries(self) -> None: + midx = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["i", "j"]) + coords = xr.Coordinates.from_pandas_multiindex(midx, "stacked") + result = self._parse(coords) + assert set(result) == {"stacked"} + + # -- pd.Index entries -------------------------------------------------- + + def test_named_pd_index_uses_its_name(self) -> None: + result = self._parse([pd.Index([0, 1, 2], name="x")]) + assert set(result) == {"x"} + + def test_unnamed_pd_index_with_dims_uses_dims(self) -> None: + result = self._parse([pd.Index([0, 1, 2])], dims=["x"]) + assert set(result) == {"x"} + + def test_unnamed_pd_index_without_dims_is_size_only(self) -> None: + # Same as a bare sequence: contributes no dim name; xarray assigns + # ``dim_0`` downstream. + assert self._parse([pd.Index([0, 1, 2])]) == {} + m = Model() + v = m.add_variables(coords=[pd.Index([0, 1, 2])]) + assert v.dims == ("dim_0",) + + # -- pd.MultiIndex entries -------------------------------------------- + + def test_named_multiindex_uses_its_name(self) -> None: + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["i", "j"]) + mi.name = "multi" + result = self._parse([mi]) + assert set(result) == {"multi"} + + def test_unnamed_multiindex_with_dims_uses_dims(self) -> None: + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["i", "j"]) + result = self._parse([mi], dims=["multi"]) + assert set(result) == {"multi"} + assert result["multi"].name == "multi" + assert mi.name is None # caller's MultiIndex not mutated + + def test_unnamed_multiindex_without_dims_raises(self) -> None: + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=["i", "j"]) + with pytest.raises(TypeError, match=r"MultiIndex.*must have \.name set"): + self._parse([mi]) + + # -- bare sequence entries -------------------------------------------- + + @pytest.mark.parametrize( + "entry", + [[0, 1, 2], (0, 1, 2), range(3), np.array([0, 1, 2])], + ids=["list", "tuple", "range", "ndarray"], + ) + def test_bare_sequence_with_dims_uses_dims(self, entry: Any) -> None: + result = self._parse([entry], dims=["x"]) + assert set(result) == {"x"} + + @pytest.mark.parametrize( + "entry", + [[0, 1, 2], (0, 1, 2), range(3), np.array([0, 1, 2])], + ids=["list", "tuple", "range", "ndarray"], + ) + def test_bare_sequence_without_dims_is_silently_skipped(self, entry: Any) -> None: + assert self._parse([entry]) == {} + + @pytest.mark.parametrize( + "entry", + [[0, 1, 2], (0, 1, 2), range(3), np.array([0, 1, 2])], + ids=["list", "tuple", "range", "ndarray"], + ) + def test_bare_sequence_without_dims_falls_through_to_xarray_dim_0( + self, entry: Any + ) -> None: + m = Model() + v = m.add_variables(coords=[entry]) + assert v.dims == ("dim_0",) + + # -- unsupported entries ---------------------------------------------- + + def test_dataarray_entry_raises(self) -> None: + with pytest.raises(TypeError, match=r"coords entries must be pd\.Index"): + self._parse([DataArray([0, 1, 2], dims=["x"])]) + + def test_unknown_type_entry_raises(self) -> None: + class Foo: ... + + with pytest.raises(TypeError, match=r"coords entries must be pd\.Index"): + self._parse([Foo()]) + + def test_best_int() -> None: # Test for int8 assert best_int(127) == np.int8 diff --git a/test/test_constraint.py b/test/test_constraint.py index a1b33d66..d3581de9 100644 --- a/test/test_constraint.py +++ b/test/test_constraint.py @@ -453,6 +453,45 @@ def test_constraint_rhs_setter_with_expression_and_constant( assert mc.lhs.nterm == 2 +def test_constraint_rhs_setter_broadcasts_missing_dim() -> None: + """Rhs assignment broadcasts against the constraint coords: missing dims expand.""" + m = Model() + x = m.add_variables( + coords=[pd.RangeIndex(2, name="i"), pd.RangeIndex(3, name="j")], name="x" + ) + con = m.add_constraints(1 * x >= 0, name="con") + + con.rhs = xr.DataArray([1.0, 2.0], dims=["i"], coords={"i": [0, 1]}) # type: ignore + + assert dict(con.rhs.sizes) == {"i": 2, "j": 3} + assert (con.rhs.sel(i=1) == 2.0).all() + + +def test_constraint_rhs_setter_projects_multiindex_level() -> None: + """ + Rhs indexed by one MultiIndex level is projected onto the stacked dim. + + Regression: as_expression must convert constants with the broadcast rung + (broadcast_to_coords), not plain conversion — otherwise the level dim + collides with the MI level coord downstream (xarray AlignmentError). + """ + idx = pd.MultiIndex.from_product([[1, 2], ["a", "b"]], names=("level1", "level2")) + idx.name = "dim_3" + coords = xr.Coordinates.from_pandas_multiindex(idx, "dim_3") + m = Model() + x = m.add_variables(coords=coords, name="x") + con = m.add_constraints(1 * x >= 0, name="con") + + rhs_by_level = xr.DataArray( + [10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"] + ) + with pytest.warns(linopy.EvolvingAPIWarning, match="broadcasting level subset"): + con.rhs = rhs_by_level # type: ignore + + assert con.rhs.sel(dim_3=(1, "b")).item() == 10.0 + assert con.rhs.sel(dim_3=(2, "a")).item() == 20.0 + + def test_constraint_labels_setter_invalid(c: linopy.constraints.CSRConstraint) -> None: # Test that assigning labels raises AttributeError (Constraint is frozen) with pytest.raises(AttributeError): diff --git a/test/test_constraints.py b/test/test_constraints.py index 1667bfec..acc41b2e 100644 --- a/test/test_constraints.py +++ b/test/test_constraints.py @@ -258,20 +258,29 @@ def test_masked_constraints_broadcast() -> None: assert (m.constraints.labels.bc2[:, 0:5] != -1).all() assert (m.constraints.labels.bc2[:, 5:10] == -1).all() + # Pandas Series with named index missing a dim is broadcast to data.coords. + mask_pd = pd.Series( + [True, False, True] + [False] * 7, index=pd.RangeIndex(10, name="dim_0") + ) + m.add_constraints(1 * x + 10 * y, EQUAL, 0, name="bc_pd", mask=mask_pd) + assert (m.constraints.labels.bc_pd[[0, 2], :] != -1).all() + assert (m.constraints.labels.bc_pd[[1, 3, 4, 5, 6, 7, 8, 9], :] == -1).all() + + # Mask with sparse coords (subset of data's coords) now raises instead of + # emitting a FutureWarning — the rule from the bounds path applies here too. mask3 = xr.DataArray( [True, True, False, False, False], dims=["dim_0"], coords={"dim_0": range(5)}, ) - with pytest.warns(FutureWarning, match="Missing values will be filled"): + with pytest.raises( + ValueError, match=r"mask: coordinate values for dimension 'dim_0'" + ): m.add_constraints(1 * x + 10 * y, EQUAL, 0, name="bc3", mask=mask3) - assert (m.constraints.labels.bc3[0:2, :] != -1).all() - assert (m.constraints.labels.bc3[2:5, :] == -1).all() - assert (m.constraints.labels.bc3[5:10, :] == -1).all() # Mask with extra dimension not in data should raise mask4 = xr.DataArray([True, False], dims=["extra_dim"]) - with pytest.raises(AssertionError, match="not a subset"): + with pytest.raises(ValueError, match=r"mask has dimension\(s\) \['extra_dim'\]"): m.add_constraints(1 * x + 10 * y, EQUAL, 0, name="bc4", mask=mask4) diff --git a/test/test_linear_expression.py b/test/test_linear_expression.py index 79a1029b..82aba70e 100644 --- a/test/test_linear_expression.py +++ b/test/test_linear_expression.py @@ -17,7 +17,14 @@ from xarray.core.types import JoinOptions from xarray.testing import assert_equal -from linopy import LinearExpression, Model, QuadraticExpression, Variable, merge +from linopy import ( + EvolvingAPIWarning, + LinearExpression, + Model, + QuadraticExpression, + Variable, + merge, +) from linopy.constants import HELPER_DIMS, TERM_DIM from linopy.expressions import ScalarLinearExpression from linopy.testing import assert_linequal, assert_quadequal @@ -288,6 +295,27 @@ def test_linear_expression_multi_indexed(u: Variable) -> None: assert isinstance(expr, LinearExpression) +def test_multiply_expression_by_multiindex_level_constant(u: Variable) -> None: + """ + Expression over a MultiIndex dim times a single-level constant. + + Mirrors PyPSA's ``soc_delta * storage_weightings``: ``u`` is indexed by + the (level1, level2) MultiIndex ``dim_3``; the weighting is indexed only + by ``level1``. The product must not raise, and each ``dim_3`` entry must + take the weight of its ``level1``. + """ + by_level1 = xr.DataArray([10.0, 20.0], coords={"level1": [1, 2]}, dims=["level1"]) + + with pytest.warns(EvolvingAPIWarning, match=r"broadcasting level subset"): + expr = (1 * u) * by_level1 + + coeffs = expr.coeffs.squeeze("_term") + assert coeffs.sel(dim_3=(1, "a")).item() == 10.0 + assert coeffs.sel(dim_3=(1, "b")).item() == 10.0 + assert coeffs.sel(dim_3=(2, "a")).item() == 20.0 + assert coeffs.sel(dim_3=(2, "b")).item() == 20.0 + + def test_linear_expression_with_errors(m: Model, x: Variable) -> None: with pytest.raises(TypeError): x / x @@ -518,6 +546,47 @@ def test_matmul_expr_and_const(x: Variable, y: Variable) -> None: assert_linequal(expr.dot(const), target) +def test_matmul_contracts_only_shared_dims(z: Variable) -> None: + """ + A @ b contracts the genuinely shared dims and keeps the rest. + + ``z`` has dims (dim_0, dim_1); ``b`` has (dim_1, location). Only dim_1 + is shared, so the result must keep dim_0 and location. A conversion that + broadcast ``b`` to ``z``'s coords would expand dim_0 into ``b`` and + contract it away too — collapsing the result to (location,) only. + """ + expr = 1 * z + b = xr.DataArray( + np.ones((3, 2)), + coords={"dim_1": expr.data.indexes["dim_1"], "location": ["L1", "L2"]}, + dims=["dim_1", "location"], + ) + + res = expr @ b + + assert set(res.coord_dims) == {"dim_0", "location"} + assert_linequal(res, (expr * b).sum("dim_1")) + + +def test_matmul_contracts_all_dims_when_const_covers_them(z: Variable) -> None: + """B covering all of a's dims (and more) contracts a's dims, keeping b's extras.""" + expr = 1 * z # dims (dim_0, dim_1) + b = xr.DataArray( + np.ones((2, 3, 2)), + coords={ + "dim_0": expr.data.indexes["dim_0"], + "dim_1": expr.data.indexes["dim_1"], + "location": ["L1", "L2"], + }, + dims=["dim_0", "dim_1", "location"], + ) + + res = expr @ b + + assert set(res.coord_dims) == {"location"} + assert_linequal(res, (expr * b).sum(["dim_0", "dim_1"])) + + def test_matmul_wrong_input(x: Variable, y: Variable, z: Variable) -> None: expr = 10 * x + y + z with pytest.raises(TypeError): diff --git a/test/test_piecewise_constraints.py b/test/test_piecewise_constraints.py index c44af394..72b57265 100644 --- a/test/test_piecewise_constraints.py +++ b/test/test_piecewise_constraints.py @@ -1383,6 +1383,23 @@ def test_broadcast_over_extra_dims(self) -> None: assert "generator" in delta.dims assert "time" in delta.dims + def test_broadcast_points_dim_order_follows_exprs(self) -> None: + """Expanded dims follow the expression dim order, not set ordering.""" + import xarray as xr + + from linopy.piecewise import BREAKPOINT_DIM, _broadcast_points + + m = Model() + coords = [ + pd.Index(["v0", "v1"], name="alpha"), + pd.Index(["w0", "w1"], name="beta"), + pd.Index([0, 1], name="gamma"), + ] + x = m.add_variables(coords=coords, name="x") + points = xr.DataArray([0, 1, 2, 3], dims=[BREAKPOINT_DIM]) + out = _broadcast_points(points, 1 * x) + assert out.dims == ("alpha", "beta", "gamma", BREAKPOINT_DIM) + # =========================================================================== # NaN masking diff --git a/test/test_repr.py b/test/test_repr.py index 0b8a6a6b..ebe9804c 100644 --- a/test/test_repr.py +++ b/test/test_repr.py @@ -40,6 +40,7 @@ multiindex = pd.MultiIndex.from_product( [list("asdfhjkg"), list("asdfghj")], names=["level_0", "level_1"] ) +multiindex.name = "multi" g = m.add_variables(coords=[multiindex], name="g") # create linear expression for each variable diff --git a/test/test_variable.py b/test/test_variable.py index b14b746e..c5e315bd 100644 --- a/test/test_variable.py +++ b/test/test_variable.py @@ -419,42 +419,189 @@ def test_bound_types_with_coords( ) def test_dataarray_coord_mismatch(self, model: "Model", coords: Any) -> None: lower = DataArray([0, 0, 0], dims=["x"], coords={"x": [0, 1, 2]}) - with pytest.raises(ValueError, match="do not match"): + with pytest.raises(ValueError, match="lower bound.*do not match coords"): model.add_variables(lower=lower, coords=coords, name="x") def test_dataarray_coord_mismatch_upper(self, model: "Model") -> None: upper = DataArray([1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}) - with pytest.raises(ValueError, match="do not match"): + with pytest.raises(ValueError, match="upper bound.*do not match coords"): model.add_variables(upper=upper, coords=self.SEQ_COORDS, name="x") def test_dataarray_extra_dims(self, model: "Model") -> None: - lower = DataArray([[1, 2], [3, 4]], dims=["x", "y"]) - with pytest.raises(ValueError, match="extra dimensions"): + lower = DataArray( + [[1, 2], [3, 4], [5, 6]], dims=["x", "y"], coords={"x": [0, 1, 2]} + ) + with pytest.raises(ValueError, match=r"lower bound has dimension\(s\) \['y'\]"): model.add_variables(lower=lower, coords=self.DICT_COORDS, name="x") + def test_mask_extra_dims_with_unnamed_coords_and_dims(self, model: "Model") -> None: + """Mask is validated against coords + dims= like lower/upper.""" + mask = DataArray( + [[True, False], [True, False], [False, True]], + dims=["x", "extra"], + coords={"x": [0, 1, 2]}, + ) + with pytest.raises(ValueError, match=r"mask has dimension\(s\) \['extra'\]"): + model.add_variables( + mask=mask, + coords=[[0, 1, 2]], + dims=["x"], + name="m", + ) + + def test_dataarray_coord_reorder(self, model: "Model") -> None: + """A bound whose coords differ only in order is reindexed to coords.""" + lower = DataArray([3, 1, 2], dims=["x"], coords={"x": ["c", "a", "b"]}) + var = model.add_variables( + lower=lower, coords=[pd.Index(["a", "b", "c"], name="x")], name="x" + ) + assert (var.data.lower == [1, 2, 3]).all() + + def test_positional_bound_aligns_to_coords(self, model: "Model") -> None: + """ + Numpy / unnamed-pandas bounds align to coords positionally, + even when the input's auto-generated coord values would not match. + """ + coords = [pd.Index(list("abc"), name="x")] + # numpy array — no labels at all, positional alignment. + v_np = model.add_variables(upper=np.array([1, 2, 3]), coords=coords, name="np") + assert v_np.dims == ("x",) + assert (v_np.data.upper.sel(x="a") == 1).all() + assert (v_np.data.upper.sel(x="c") == 3).all() + # Unnamed Series — pandas index is auto-generated, ignored in favour + # of coords (positional alignment, principle: coords is source of truth). + v_s = model.add_variables( + upper=pd.Series([10, 20, 30]), coords=coords, name="s" + ) + assert v_s.dims == ("x",) + assert (v_s.data.upper.sel(x="a") == 10).all() + assert (v_s.data.upper.sel(x="c") == 30).all() + # Unnamed DataFrame — both axes positional. + v_df = model.add_variables( + upper=pd.DataFrame([[1, 2], [3, 4], [5, 6]]), + coords=[pd.Index(list("abc"), name="x"), pd.Index(list("xy"), name="y")], + name="df", + ) + assert v_df.dims == ("x", "y") + assert (v_df.data.upper.sel(x="a", y="x") == 1).all() + assert (v_df.data.upper.sel(x="c", y="y") == 6).all() + + def test_positional_bound_wrong_size_raises_clear_error( + self, model: "Model" + ) -> None: + """ + Shape mismatch on positional inputs surfaces as a size error, + not a 'coordinates do not match' error. + """ + coords = [pd.Index(list("abc"), name="x")] + with pytest.raises(ValueError, match=r"upper bound could not be aligned"): + model.add_variables(upper=np.array([1, 2]), coords=coords, name="np_bad") + with pytest.raises(ValueError, match=r"upper bound could not be aligned"): + model.add_variables(upper=pd.Series([1, 2]), coords=coords, name="s_bad") + + def test_unnamed_pd_index_is_size_only(self, model: "Model") -> None: + bound = DataArray([1, 2, 3], dims=["dim_0"]) + var = model.add_variables(upper=bound, coords=[pd.Index([0, 1, 2])], name="x") + assert (var.upper == [1, 2, 3]).all() + # -- Broadcasting missing dims ----------------------------------------- - def test_dataarray_broadcast_missing_dim(self, model: "Model") -> None: + @pytest.mark.parametrize( + "bound", + [ + pytest.param( + DataArray([1, 2, 3], dims=["time"], coords={"time": range(3)}), + id="DataArray", + ), + pytest.param( + pd.Series(index=pd.RangeIndex(3, name="time"), data=[1, 2, 3]), + id="Series", + ), + pytest.param( + pd.DataFrame( + index=pd.RangeIndex(3, name="time"), + columns=pd.Index(["red"], name="colour"), + data=[[1], [2], [3]], + ), + id="DataFrame", + ), + pytest.param( + pd.Series( + index=pd.MultiIndex.from_product( + [pd.RangeIndex(3), ["red"]], names=("time", "colour") + ), + data=[1, 2, 3], + ), + id="Series-multiindex", + ), + pytest.param( + pd.DataFrame( + index=pd.RangeIndex(3, name="time"), + columns=pd.MultiIndex.from_product( + [["a", "b"], ["red"]], names=("space", "colour") + ), + data=[[1, 1], [2, 2], [3, 3]], + ), + id="DataFrame-multicolumns", + ), + pytest.param( + pd.DataFrame( + index=pd.MultiIndex.from_product( + [pd.RangeIndex(3), ["a", "b"]], names=("time", "space") + ), + columns=pd.Index(["red"], name="colour"), + data=[[1], [1], [2], [2], [3], [3]], + ), + id="DataFrame-multiindex", + ), + ], + ) + def test_bound_broadcast_missing_dim( + self, model: "Model", bound: DataArray | pd.Series | pd.DataFrame + ) -> None: + """Pandas / DataArray bounds missing dims are broadcast to coords.""" time = pd.RangeIndex(3, name="time") space = pd.Index(["a", "b"], name="space") - lower = DataArray([1, 2, 3], dims=["time"], coords={"time": range(3)}) - var = model.add_variables(lower=lower, coords=[time, space], name="x") - assert set(var.data.dims) == {"time", "space"} - assert var.data.sizes == {"time": 3, "space": 2} - # Verify broadcast filled with actual values, not NaN + colour = pd.Index(["red"], name="colour") + var = model.add_variables( + lower=-bound, upper=bound, coords=[time, space, colour], name="x" + ) + assert var.dims == ("time", "space", "colour") + assert var.data.lower.dims == ("time", "space", "colour") + assert var.data.upper.dims == ("time", "space", "colour") + assert var.data.sizes == {"time": 3, "space": 2, "colour": 1} assert not var.data.lower.isnull().any() - assert (var.data.lower.sel(space="a") == [1, 2, 3]).all() - assert (var.data.lower.sel(space="b") == [1, 2, 3]).all() - - # -- Special coord formats --------------------------------------------- + assert (var.data.lower.sel(space="a", colour="red") == [-1, -2, -3]).all() + assert (var.data.lower.sel(space="b", colour="red") == [-1, -2, -3]).all() + assert (var.data.upper.sel(space="a", colour="red") == [1, 2, 3]).all() - def test_multiindex_coords(self, model: "Model") -> None: - idx = pd.MultiIndex.from_product( - [[1, 2], ["a", "b"]], names=("level1", "level2") + @pytest.mark.parametrize( + "lower, upper", + [ + pytest.param(0, "da", id="scalar-lower+da-upper"), + pytest.param("da", 1, id="da-lower+scalar-upper"), + pytest.param("da", "da", id="da-lower+da-upper"), + ], + ) + def test_dataarray_broadcast_missing_dim_order( + self, model: "Model", lower: Any, upper: Any + ) -> None: + """Dimension order follows coords, not the type of the bounds (#706).""" + x = pd.Index(["a", "b", "c"], name="x") + y = pd.Index(["X", "Y"], name="y") + full = DataArray( + np.arange(6).reshape(3, 2), coords={"x": x, "y": y}, dims=["x", "y"] ) - idx.name = "multi" - var = model.add_variables(lower=0, upper=1, coords=[idx], name="x") - assert var.shape == (4,) + # bounds are DataArrays missing the 'y' dimension + da = full.sum("y") + lower = da if lower == "da" else lower + upper = da if upper == "da" else upper + var = model.add_variables(lower=lower, upper=upper, coords=[x, y], name="x") + assert var.dims == ("x", "y") + assert var.data.lower.dims == ("x", "y") + assert var.data.upper.dims == ("x", "y") + + # -- Special coord formats --------------------------------------------- def test_xarray_coordinates_object(self, model: "Model") -> None: time = pd.RangeIndex(3, name="time") @@ -527,7 +674,7 @@ def test_one_dataarray_mismatches_other_ok(self, model: "Model") -> None: """Only the mismatched bound should raise, regardless of the other.""" lower = DataArray([0, 0, 0], dims=["x"], coords={"x": [0, 1, 2]}) upper = DataArray([1, 1], dims=["x"], coords={"x": [10, 20]}) - with pytest.raises(ValueError, match="do not match"): + with pytest.raises(ValueError, match=r"upper bound.*do not match coords"): model.add_variables( lower=lower, upper=upper, coords=self.SEQ_COORDS, name="x" ) @@ -629,7 +776,7 @@ def test_reordered_coords_reindexed(self, model: "Model") -> None: def test_reordered_coords_different_values_raises(self, model: "Model") -> None: """Overlapping but not identical coord sets must still raise.""" lower = DataArray([10, 20], dims=["x"], coords={"x": ["a", "b"]}) - with pytest.raises(ValueError, match="do not match"): + with pytest.raises(ValueError, match=r"lower bound.*do not match coords"): model.add_variables(lower=lower, coords={"x": ["a", "c"]}, name="x") # -- String and datetime coordinates ----------------------------------- @@ -657,9 +804,81 @@ def test_string_coords_mismatch(self, model: "Model") -> None: lower = DataArray( [0, 0], dims=["region"], coords={"region": ["north", "south"]} ) - with pytest.raises(ValueError, match="do not match"): + with pytest.raises(ValueError, match=r"lower bound.*do not match coords"): model.add_variables( lower=lower, coords={"region": ["north", "south", "east"]}, name="x", ) + + +class TestAddVariablesMultiIndexCoords: + """MultiIndex-specific coord handling in add_variables.""" + + @pytest.fixture + def model(self) -> "Model": + return Model() + + @pytest.fixture + def midx(self) -> pd.MultiIndex: + mi = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=("l1", "l2")) + mi.name = "multi" + return mi + + def test_scalar_bounds(self, model: "Model", midx: pd.MultiIndex) -> None: + var = model.add_variables(lower=0, upper=1, coords=[midx], name="x") + assert var.shape == (4,) + assert var.dims == ("multi",) + + def test_dataarray_bound(self, model: "Model", midx: pd.MultiIndex) -> None: + bound = DataArray([1, 2, 3, 4], dims=["multi"], coords={"multi": midx}) + var = model.add_variables(upper=bound, coords=[midx], name="x") + assert var.shape == (4,) + assert (var.data.upper == [1, 2, 3, 4]).all() + + def test_dataarray_bound_broadcast( + self, model: "Model", midx: pd.MultiIndex + ) -> None: + time = pd.Index([10, 20, 30], name="time") + bound = DataArray([1, 2, 3, 4], dims=["multi"], coords={"multi": midx}) + var = model.add_variables( + lower=-bound, upper=bound, coords=[midx, time], name="x" + ) + assert var.dims == ("multi", "time") + assert var.shape == (4, 3) + assert (var.data.upper.sel(time=10) == [1, 2, 3, 4]).all() + + def test_without_name_raises(self, model: "Model") -> None: + midx = pd.MultiIndex.from_product([[0, 1], ["a", "b"]], names=("l1", "l2")) + with pytest.raises(TypeError, match="MultiIndex.*must have .name set"): + model.add_variables(lower=0, upper=1, coords=[midx], name="x") + + def test_mismatched_multiindex_raises( + self, model: "Model", midx: pd.MultiIndex + ) -> None: + other = pd.MultiIndex.from_product([[0, 1], ["x", "y"]], names=("l1", "l2")) + other.name = "multi" + bound = DataArray([1, 2, 3, 4], dims=["multi"], coords={"multi": other}) + with pytest.raises(ValueError, match="MultiIndex.*does not match"): + model.add_variables(upper=bound, coords=[midx], name="x") + + def test_single_level_bound_broadcasts( + self, model: "Model", midx: pd.MultiIndex + ) -> None: + bound = DataArray([5, 6], dims=["l1"], coords={"l1": [0, 1]}) + # Implicit level projection is deprecated (scenario B) — warns until + # the v1 convention makes it an error. + with pytest.warns( + linopy.EvolvingAPIWarning, match=r"broadcasting level subset" + ): + var = model.add_variables(upper=bound, coords=[midx], name="x") + assert var.dims == ("multi",) + assert (var.data.upper == [5, 5, 6, 6]).all() + + def test_incomplete_level_bound_raises( + self, model: "Model", midx: pd.MultiIndex + ) -> None: + subset = pd.MultiIndex.from_tuples([(0, "a"), (1, "b")], names=("l1", "l2")) + bound = pd.Series([1, 2], index=subset) + with pytest.raises(ValueError, match="no value for .* level combination"): + model.add_variables(upper=bound, coords=[midx], name="x") diff --git a/test/test_variables.py b/test/test_variables.py index 37de6aff..e55ca680 100644 --- a/test/test_variables.py +++ b/test/test_variables.py @@ -123,20 +123,29 @@ def test_variables_mask_broadcast() -> None: assert (y.labels[:, 0:5] != -1).all() assert (y.labels[:, 5:10] == -1).all() + # Pandas Series with named index missing a dim is broadcast to data.coords. + mask_pd = pd.Series( + [True, False, True] + [False] * 7, index=pd.RangeIndex(10, name="dim_0") + ) + v = m.add_variables(lower, upper, name="v", mask=mask_pd) + assert (v.labels[[0, 2], :] != -1).all() + assert (v.labels[[1, 3, 4, 5, 6, 7, 8, 9], :] == -1).all() + + # Mask with sparse coords (subset of data's coords) now raises instead of + # emitting a FutureWarning — the rule from the bounds path applies here too. mask3 = xr.DataArray( [True, True, False, False, False], dims=["dim_0"], coords={"dim_0": range(5)}, ) - with pytest.warns(FutureWarning, match="Missing values will be filled"): - z = m.add_variables(lower, upper, name="z", mask=mask3) - assert (z.labels[0:2, :] != -1).all() - assert (z.labels[2:5, :] == -1).all() - assert (z.labels[5:10, :] == -1).all() + with pytest.raises( + ValueError, match=r"mask: coordinate values for dimension 'dim_0'" + ): + m.add_variables(lower, upper, name="z", mask=mask3) # Mask with extra dimension not in data should raise mask4 = xr.DataArray([True, False], dims=["extra_dim"]) - with pytest.raises(AssertionError, match="not a subset"): + with pytest.raises(ValueError, match=r"mask has dimension\(s\) \['extra_dim'\]"): m.add_variables(lower, upper, name="w", mask=mask4)