Skip to content

Commit e493d24

Browse files
authored
Merge pull request #61 from ncdcdev/feat/issue-55-include-row-data
feat: add include_row_data parameter for Excel search (issue #55)
2 parents 1883b73 + c525f3e commit e493d24

6 files changed

Lines changed: 313 additions & 18 deletions

File tree

docs/usage.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ The `sharepoint_excel` tool allows you to read and search Excel files in SharePo
208208
| `query` | str \| None | None | Search keyword (enables search mode) |
209209
| `sheet` | str \| None | None | Sheet name (get specific sheet only) |
210210
| `cell_range` | str \| None | None | Cell range (e.g., "A1:D10") |
211+
| `include_row_data` | bool | False | Include entire row data for each search match (search mode only) |
211212

212213
### Basic Workflow
213214

@@ -248,6 +249,53 @@ result = sharepoint_excel(
248249
}
249250
```
250251

252+
**Search with Row Data (`include_row_data=True`):**
253+
254+
Use `include_row_data=True` to get the entire row data for each match in a single call, avoiding N+1 reads.
255+
256+
```python
257+
result = sharepoint_excel(
258+
file_path="/sites/finance/Shared Documents/report.xlsx",
259+
query="budget",
260+
include_row_data=True
261+
)
262+
```
263+
264+
```json
265+
{
266+
"matches": [
267+
{
268+
"sheet": "Sheet1",
269+
"coordinate": "B5",
270+
"value": "Monthly Budget",
271+
"row_data": [
272+
{"coordinate": "A5", "value": "Category"},
273+
{"coordinate": "B5", "value": "Monthly Budget"},
274+
{"coordinate": "C5", "value": 50000}
275+
]
276+
}
277+
]
278+
}
279+
```
280+
281+
**Performance Guidelines:**
282+
- **Small scale** (<50 matches): Highly effective, recommended
283+
- **Medium scale** (50-200 matches): Effective, monitor response size
284+
- **Large scale** (>200 matches): Consider response size impact
285+
286+
**Important Notes:**
287+
- `row_data` includes only non-null cells from the matched row
288+
- `row_data` does NOT include header rows (even with frozen_rows)
289+
- To understand column meanings, first read `A1:Z5` for header context
290+
- **Multiple matches in same row**: Each match gets independent `row_data` (duplicated)
291+
- Example: If "budget" matches both A5 and B5, both matches will include the same row_data
292+
- This ensures each match is self-contained but may increase response size
293+
294+
**Verified Use Case:**
295+
- 23 matches processed in 1 call (vs. 24 calls without `include_row_data`)
296+
- Token savings: ~2,300 tokens
297+
- Response time: Significantly reduced
298+
251299
#### 2. Read All Data (Default)
252300
```python
253301
# Get all sheets and all data

docs/usage_ja.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ results = sharepoint_docs_search(
208208
| `query` | str \| None | None | 検索キーワード(検索モードを有効化) |
209209
| `sheet` | str \| None | None | シート名(特定シートのみ取得) |
210210
| `cell_range` | str \| None | None | セル範囲(例: "A1:D10") |
211+
| `include_row_data` | bool | False | 検索マッチごとに行全体のデータを含める(検索モード専用) |
211212

212213
### 基本的なワークフロー
213214

@@ -248,6 +249,53 @@ result = sharepoint_excel(
248249
}
249250
```
250251

252+
**行データ付き検索(`include_row_data=True`):**
253+
254+
`include_row_data=True`を使用すると、各マッチの行全体のデータを1回の呼び出しで取得できます(N+1回の読み取りを回避)。
255+
256+
```python
257+
result = sharepoint_excel(
258+
file_path="/sites/finance/Shared Documents/report.xlsx",
259+
query="予算",
260+
include_row_data=True
261+
)
262+
```
263+
264+
```json
265+
{
266+
"matches": [
267+
{
268+
"sheet": "Sheet1",
269+
"coordinate": "B5",
270+
"value": "月間予算",
271+
"row_data": [
272+
{"coordinate": "A5", "value": "カテゴリ"},
273+
{"coordinate": "B5", "value": "月間予算"},
274+
{"coordinate": "C5", "value": 50000}
275+
]
276+
}
277+
]
278+
}
279+
```
280+
281+
**パフォーマンス目安:**
282+
- **小規模** (<50件): 効果大、推奨
283+
- **中規模** (50-200件): 効果あり、レスポンスサイズに注意
284+
- **大規模** (>200件): レスポンスサイズへの影響を考慮
285+
286+
**重要な注意事項:**
287+
- `row_data` にはマッチした行の非nullセルのみが含まれます
288+
- `row_data` にはヘッダー行は含まれません(frozen_rows設定時も同様)
289+
- 列の意味を理解するには、先に `A1:Z5` を読み取ってヘッダーコンテキストを確認してください
290+
- **同一行に複数マッチがある場合**: 各マッチに独立した `row_data` が含まれます(重複)
291+
- 例: "予算" が A5 と B5 の両方にマッチした場合、両方のマッチに同じ row_data が含まれます
292+
- 各マッチが自己完結していますが、レスポンスサイズが増加する可能性があります
293+
294+
**実証済みユースケース:**
295+
- 23件のマッチを1回の呼び出しで処理(`include_row_data` なしでは24回必要)
296+
- トークン削減: 約2,300トークン
297+
- レスポンス時間: 大幅短縮
298+
251299
#### 2. 全データ取得(デフォルト)
252300
```python
253301
# 全シート・全データを取得

src/server.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -456,6 +456,7 @@ def sharepoint_excel(
456456
include_frozen_rows: bool = True,
457457
include_cell_styles: bool = False,
458458
expand_axis_range: bool = False,
459+
include_row_data: bool = False,
459460
ctx: Context | None = None,
460461
) -> str:
461462
"""
@@ -478,6 +479,9 @@ def sharepoint_excel(
478479
expand_axis_range: 単一列/行の部分範囲を開始側に自動拡張(default: false)
479480
True: 例 "J50:J100" → "J1:J100"(行1に拡張)
480481
frozen_rows=0でヘッダー文脈が不明な場合に使用
482+
include_row_data: 検索モード時、マッチしたセルの行全体のデータを含める(default: false)
483+
True: 各マッチに row_data(同一行の非nullセル一覧)を追加
484+
読み取りモードでは無視される
481485
ctx: FastMCP context (injected automatically)
482486
483487
Returns:
@@ -497,7 +501,9 @@ def sharepoint_excel(
497501

498502
# 検索モード
499503
if query:
500-
return parser.search_cells(file_path, query, sheet_name=sheet)
504+
return parser.search_cells(
505+
file_path, query, sheet_name=sheet, include_row_data=include_row_data
506+
)
501507

502508
# 読み取りモード
503509
return parser.parse_to_json(
@@ -544,7 +550,7 @@ def register_tools():
544550
mcp.tool(
545551
description=(
546552
"Read or search Excel files in SharePoint. "
547-
"Search mode: use 'query' parameter to find cells containing specific text (returns cell locations). "
553+
"Search mode: use 'query' parameter to find cells containing specific text (returns cell locations and optionally row data). "
548554
"Read mode: use 'sheet' and 'cell_range' parameters to retrieve data from specific sections. "
549555
"When cell_range is specified with include_frozen_rows=True (default), frozen rows are automatically "
550556
"included even if they are outside the specified range. frozen_rows indicates the number of header rows "
@@ -555,10 +561,13 @@ def register_tools():
555561
"Header detection: For sheets with frozen_rows > 0, headers are automatically included with include_frozen_rows=True (default). "
556562
"For sheets with frozen_rows=0, headers are not automatically included and context may be unclear. "
557563
"ALWAYS read exactly 5 rows for header check: 'A1:Z5' (NOT 'A1:Z50' or more). "
564+
"IMPORTANT: include_row_data=True returns matched row data only (not headers), same-row matches duplicate data. "
565+
"Always read 'A1:Z5' first for header context. Effective for <200 matches. "
558566
"Prefer 'query' search when possible to locate data first. "
559-
"Workflow: 1) Search OR read 'A1:Z5' for header check, "
560-
"2) Read specific range (include_frozen_rows adds frozen headers automatically), "
561-
"3) If frozen_rows=0 and header context is unclear, retry with expand_axis_range=True "
567+
"Workflow: 1) Read 'A1:Z5' for header check (REQUIRED for understanding column structure), "
568+
"2) Search with query (optionally with include_row_data=True to get matched row data), "
569+
"3) Read specific range if needed (include_frozen_rows adds frozen headers automatically), "
570+
"4) If frozen_rows=0 and header context is unclear, retry with expand_axis_range=True "
562571
"to auto-include row 1 (for columns) or column A (for rows)."
563572
)
564573
)(sharepoint_excel)

src/sharepoint_excel.py

Lines changed: 65 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ def search_cells(
3434
file_path: str,
3535
query: str,
3636
sheet_name: str | None = None,
37+
include_row_data: bool = False,
3738
) -> str:
3839
"""
3940
セル内容を検索して該当位置を返す
@@ -67,25 +68,35 @@ def search_cells(
6768
# sheet_name 指定がある場合はそのシートを優先して検索
6869
if sheet_name:
6970
if sheet_name in workbook.sheetnames:
70-
self._scan_sheet(workbook[sheet_name], sheet_name, query, matches)
71+
self._scan_sheet(
72+
workbook[sheet_name],
73+
sheet_name,
74+
query,
75+
matches,
76+
include_row_data,
77+
)
7178

7279
# マッチが無ければ全シート走査にフォールバック
7380
if len(matches) == 0:
7481
for sn in workbook.sheetnames:
7582
if sn == sheet_name:
7683
continue
77-
self._scan_sheet(workbook[sn], sn, query, matches)
84+
self._scan_sheet(
85+
workbook[sn], sn, query, matches, include_row_data
86+
)
7887
else:
7988
# sheet_name が存在しない場合は「指定なし」と同じ扱いで全シート検索
8089
warnings.append(
8190
f"Sheet '{sheet_name}' not found. Searching all sheets instead."
8291
)
8392
for sn in workbook.sheetnames:
84-
self._scan_sheet(workbook[sn], sn, query, matches)
93+
self._scan_sheet(
94+
workbook[sn], sn, query, matches, include_row_data
95+
)
8596
else:
8697
# 全シート検索
8798
for sn in workbook.sheetnames:
88-
self._scan_sheet(workbook[sn], sn, query, matches)
99+
self._scan_sheet(workbook[sn], sn, query, matches, include_row_data)
89100

90101
logger.info(f"Found {len(matches)} matches for query '{query}'")
91102

@@ -270,6 +281,7 @@ def _scan_sheet(
270281
sheet_name_for_result: str,
271282
query: str,
272283
matches: list[dict[str, Any]],
284+
include_row_data: bool = False,
273285
) -> None:
274286
"""
275287
シート内のセルを走査してqueryに一致するセルをmatchesに追加する
@@ -281,31 +293,72 @@ def _scan_sheet(
281293
# その場合はiter_rows()を使用するフォールバックロジックが動作します。
282294
if hasattr(sheet, "_cells"):
283295
# 実在セルのみを走査(高速)
296+
# まずマッチを収集(_cellsのイテレーション中にsheetアクセスすると辞書が変わるため)
297+
new_matches: list[dict[str, Any]] = []
284298
for cell in sheet._cells.values():
285299
if cell.value is not None:
286300
cell_value_str = str(cell.value)
287301
if query in cell_value_str:
288-
matches.append(
302+
new_matches.append(
289303
{
290304
"sheet": sheet_name_for_result,
291305
"coordinate": cell.coordinate,
292306
"value": self._serialize_value(cell.value),
307+
"_row": cell.row,
293308
}
294309
)
310+
# イテレーション完了後に行データを取得
311+
for match in new_matches:
312+
row_num = match.pop("_row")
313+
if include_row_data:
314+
match["row_data"] = self._get_row_data(sheet, row_num)
315+
matches.append(match)
295316
else:
296317
# openpyxl公開APIを使用(互換性確保)
297318
for row in sheet.iter_rows(values_only=False):
298319
for cell in row:
299320
if cell.value is not None:
300321
cell_value_str = str(cell.value)
301322
if query in cell_value_str:
302-
matches.append(
303-
{
304-
"sheet": sheet_name_for_result,
305-
"coordinate": cell.coordinate,
306-
"value": self._serialize_value(cell.value),
307-
}
308-
)
323+
match = {
324+
"sheet": sheet_name_for_result,
325+
"coordinate": cell.coordinate,
326+
"value": self._serialize_value(cell.value),
327+
}
328+
if include_row_data:
329+
match["row_data"] = [
330+
{
331+
"coordinate": c.coordinate,
332+
"value": self._serialize_value(c.value),
333+
}
334+
for c in row
335+
if c.value is not None
336+
]
337+
matches.append(match)
338+
339+
def _get_row_data(self, sheet, row_num: int) -> list[dict[str, Any]]:
340+
"""
341+
指定行の非nullセルデータをリストとして返す
342+
343+
Args:
344+
sheet: openpyxl Worksheet
345+
row_num: 行番号
346+
347+
Returns:
348+
非nullセルの [{coordinate, value}, ...] リスト
349+
"""
350+
row_cells = sheet[row_num]
351+
# 単一列シートではCellオブジェクト単体が返される場合がある
352+
if isinstance(row_cells, Cell):
353+
row_cells = (row_cells,)
354+
return [
355+
{
356+
"coordinate": c.coordinate,
357+
"value": self._serialize_value(c.value),
358+
}
359+
for c in row_cells
360+
if c.value is not None
361+
]
309362

310363
def _calculate_header_range(self, cell_range: str, frozen_rows: int) -> str | None:
311364
"""

tests/test_server.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ def test_excel_search_mode(
244244

245245
# 検索メソッドが呼ばれることを確認
246246
mock_excel_parser.search_cells.assert_called_once_with(
247-
"/sites/test/Shared Documents/test.xlsx", "売上", sheet_name=None
247+
"/sites/test/Shared Documents/test.xlsx", "売上", sheet_name=None, include_row_data=False
248248
)
249249
# parse_to_jsonは呼ばれない
250250
mock_excel_parser.parse_to_json.assert_not_called()
@@ -295,6 +295,26 @@ def test_excel_with_cell_range_parameter(
295295
expand_axis_range=False,
296296
)
297297

298+
@pytest.mark.unit
299+
def test_excel_search_with_include_row_data(
300+
self, mock_config, mock_sharepoint_client, mock_excel_parser
301+
):
302+
"""Excel検索モードでinclude_row_data=Trueが渡されるテスト"""
303+
with patch(
304+
"src.server._get_sharepoint_client", return_value=mock_sharepoint_client
305+
):
306+
with patch("src.server.config", mock_config):
307+
sharepoint_excel(
308+
file_path="/sites/test/Shared Documents/test.xlsx",
309+
query="売上",
310+
include_row_data=True,
311+
)
312+
313+
mock_excel_parser.search_cells.assert_called_once_with(
314+
"/sites/test/Shared Documents/test.xlsx", "売上", sheet_name=None, include_row_data=True
315+
)
316+
mock_excel_parser.parse_to_json.assert_not_called()
317+
298318
@pytest.mark.unit
299319
def test_excel_with_real_json(
300320
self, mock_config, mock_sharepoint_client, mock_excel_parser

0 commit comments

Comments
 (0)