Merge with main and demo fixes#36
Conversation
|
|
||
| # Set index on timestamp | ||
| print("\n📍 Setting index on 'timestamp' (ascending)...") | ||
| df_indexed = df.set_index('timestamp', ascending=True) |
There was a problem hiding this comment.
The ascending=True is a departure from pandas.DataFrame.set_index, but I think this is okay, because:
- Most of the time folks would expect some kind of ordering to be available when they set an index, so that iloc can be used.
- The
ascending=Truenaming is consistent with how ordering is defined in pandas.
| df_ordered = DataFrame(ordered) | ||
|
|
||
| # Now use indexing | ||
| df_ordered._index = Index('priority', ascending=False) |
There was a problem hiding this comment.
I'm a bit confused what the intention of this line is here? I guess it's so that we don't override the sorting introduced in the ibis expression?
|
|
||
| def __init__(self, data: ibis_types.Table): | ||
| self._data = data | ||
| self._index: Index | None = None # Explicit ordering specification |
There was a problem hiding this comment.
I'm curious if we'd want to separate the ordering from the index definition for a bit better compatibility with pandas, like how bigframes does it? I suspect that adds an unnecessary level of compexity?
| path: info["extracted_name"] for path, info in self.nested_fields.items() | ||
| } | ||
|
|
||
| def filter_by(self, **kwargs) -> "DataFrameHandler": |
There was a problem hiding this comment.
This also is a departure from pandas, but I do quite like it. Happy to keep it. In regular pandas, basically every filtering method I can think of requires an implicit join on the index column.
|
Hi Tim,
let us discuss those questions during the next meeting! Thanks a lot for looking into it!
Andy
Gesendet: Donnerstag, 28. Mai 2026 um 19:41
Von: "Tim Sweña (Swast)" ***@***.***>
An: tswast/leanframe ***@***.***>
CC: andyorange ***@***.***>,Author ***@***.***>
Betreff: Re: [tswast/leanframe] Merge with main and demo fixes (PR #36)
@tswast approved this pull request.
Thanks!
In demos/demo_indexing_with_nested.py:
+ data = {
+ 'id': [1, 2, 3, 4, 5],
+ 'value': [10, 20, 30, 40, 50],
+ 'timestamp': pd.date_range('2024-01-01', periods=5)
+ }
+
+ # Create leanframe DataFrame
+ ibis_table = ibis.memtable(data)
+ df = DataFrame(ibis_table)
+
+ print(f"\nOriginal DataFrame shape: {len(df.columns)} columns")
+ print(f"Columns: {df.columns.tolist()}")
+
+ # Set index on timestamp
+ print("\n📍 Setting index on 'timestamp' (ascending)...")
+ df_indexed = df.set_index('timestamp', ascending=True)
The ascending=True is a departure from pandas.DataFrame.set_index, but I think this is okay, because:
Most of the time folks would expect some kind of ordering to be available when they set an index, so that iloc can be used.
The ascending=True naming is consistent with how ordering is defined in pandas.
In docs/indexing_guide.md:
+## Advanced: Custom Ordering Logic
+
+For complex ordering (multiple columns, null handling), directly use Ibis:
+
+```python
+# Complex ordering with Ibis
+ordered = df._data.order_by([
+ ibis.desc(df._data.priority),
+ df._data.timestamp
+])
+
+from leanframe.core.frame import DataFrame
+df_ordered = DataFrame(ordered)
+
+# Now use indexing
+df_ordered._index = Index('priority', ascending=False)
I'm a bit confused what the intention of this line is here? I guess it's so that we don't override the sorting introduced in the ibis expression?
In leanframe/core/frame.py:
@@ -32,11 +40,72 @@ class DataFrame:
def __init__(self, data: ibis_types.Table):
self._data = data
+ self._index: Index | None = None # Explicit ordering specification
I'm curious if we'd want to separate the ordering from the index definition for a bit better compatibility with pandas, like how bigframes does it? I suspect that adds an unnecessary level of compexity?
In leanframe/core/frame.py:
@@ -375,6 +517,30 @@ def extracted_fields(self) -> dict[str, str]:
path: info["extracted_name"] for path, info in self.nested_fields.items()
}
+ def filter_by(self, **kwargs) -> "DataFrameHandler":
This also is a departure from pandas, but I do quite like it. Happy to keep it. In regular pandas, basically every filtering method I can think of requires an implicit join on the index column.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi Tim,
I merged the current main branch into an_indexing and fixed the demos by re-adding the filter_by method to the DataFrameHandler. Indexing and nesting not work again!
Would love to see the branch merged into main. Also, Vodafone would love to see me mentioned as a collaborator for the project...
The remaining issues are from the linter and refer to files you created, I do not want to interfere here and leave them changes to you
P.S.: I created the demos with Chat GPT. As that is also google, I hope there are no rights conflicts.