Skip to content

Commit 33f4e4f

Browse files
authored
Merge pull request #54 from ncdcdev/feat/merge-4-refine-excel
Feat/merge 4 refine excel
2 parents 2578a33 + 1315f75 commit 33f4e4f

15 files changed

Lines changed: 2325 additions & 852 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: CI
22

33
on:
44
pull_request:
5-
branches: [ main ]
5+
branches: [ main, feat/refine-excel-reading ]
66
push:
77
branches: [ main ]
88

CLAUDE.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,32 @@ def some_function():
6767
return something()
6868
```
6969

70+
## Design Principles
71+
72+
### MCP Server Design Philosophy
73+
74+
**Simplicity over Complexity**
75+
- Prefer simple, explicit APIs over "smart" automatic conversions
76+
- Trust LLM's ability to learn from clear error messages
77+
- Avoid over-engineering: features are easy to add but hard to remove
78+
79+
**YAGNI (You Aren't Gonna Need It)**
80+
- Only implement features when they are clearly needed
81+
- Before adding a feature, ask: "Can LLM handle this on its own?"
82+
- Consider long-term maintenance cost vs. short-term convenience
83+
84+
**Feature Addition Checklist**
85+
- [ ] Is this feature truly necessary? (Can't LLM adapt?)
86+
- [ ] Is there a simpler alternative?
87+
- [ ] Can this be removed in the future without breaking compatibility?
88+
- [ ] Is the maintenance cost acceptable?
89+
- [ ] Does this add significant value to justify the complexity?
90+
91+
**Context Efficiency**
92+
- Keep tool descriptions concise (they're included in every LLM call)
93+
- Prioritize essential information over exhaustive documentation
94+
- Use external docs (README) for detailed explanations
95+
7096
### Project-Specific Guidelines
7197

7298
#### MCP Development

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,14 @@ Two authentication methods are supported:
3535
- Read or search Excel files in SharePoint
3636
- Search mode: find cells containing specific text with `query` parameter
3737
- Read mode: get data from specific sheets/ranges with `sheet` and `cell_range` parameters
38-
- Auto-include headers: with `include_header` parameter, automatically includes frozen rows (detected via `freeze_panes`) as headers even when they're outside the specified cell range
39-
- Metadata-only mode: exclude data rows and retrieve only headers and file structure with `metadata_only` parameter
40-
- Default: lightweight response with value and coordinate only
41-
- Optional: include formatting (data_type, fill colors, merged cells, dimensions)
38+
- **Automatic header inclusion**: when `cell_range` is specified, frozen rows (headers) are automatically included by default
39+
- Set `include_frozen_rows=False` to get only the specified range
40+
- For sheets with `frozen_rows=0`, use `expand_axis_range=True` to include row 1 (for columns) or column A (for rows)
41+
- **Cell style information** (optional): set `include_cell_styles=True` to get background colors, column widths, and row heights
42+
- Default is `False` to minimize token usage
43+
- Useful for identifying highlighted cells, colored headers, or visually emphasized content
44+
- Response includes cell data in `rows` (value and coordinate) and structural information when available
45+
- Structural info: sheet name, dimensions, frozen_rows, frozen_cols, freeze_panes (when present), merged_ranges (when merged cells exist)
4246
- No Excel Services dependency - uses direct file download + openpyxl parsing
4347

4448
### OneDrive Support

README_ja.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,14 @@ stdioとHTTPの両方のトランスポートに対応しています。
3535
- SharePoint上のExcelファイルの読み取りと検索
3636
- 検索モード: `query`パラメータで特定テキストを含むセルを検索
3737
- 読み取りモード: `sheet``cell_range`パラメータで特定シート/範囲を取得
38-
- ヘッダー自動追加: `include_header`パラメータで`freeze_panes`で固定された行をヘッダーとして認識し、範囲指定時にヘッダーが範囲外でも自動的に追加
39-
- メタデータのみ取得: `metadata_only`パラメータでデータ行を除外し、ヘッダーとファイル構造のみ取得
40-
- デフォルト: 値と座標のみの軽量レスポンス
41-
- オプション: 書式情報を含む(データ型、塗りつぶし色、結合セル、サイズ)
38+
- **ヘッダー自動追加**: `cell_range`指定時、デフォルトで固定行(ヘッダー)を自動的に含める
39+
- `include_frozen_rows=False`を指定すると、指定範囲のみを取得
40+
- `frozen_rows=0`のシートでは、`expand_axis_range=True`で1行目(列の場合)またはA列(行の場合)から自動取得
41+
- **セルスタイル情報**(オプション): `include_cell_styles=True`を指定すると、背景色・列幅・行高さを取得
42+
- デフォルトは`False`でトークン消費を最小化
43+
- 強調表示されたセル、色付きヘッダー、視覚的に強調されたコンテンツの識別に便利
44+
- レスポンスには`rows`内のセルデータ(値と座標)と構造情報(利用可能な場合)を含む
45+
- 構造情報: シート名、dimensions、frozen_rows、frozen_cols、freeze_panes(存在する場合)、merged_ranges(結合セルが存在する場合)
4246
- Excel Services不要 - 直接ファイルダウンロード+openpyxl解析方式
4347

4448
### OneDrive対応

docs/usage.md

Lines changed: 68 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,28 @@ SHAREPOINT_ONEDRIVE_PATHS=sales1@company.com:/Documents/Customers,sales2@company
166166
SHAREPOINT_SITE_NAME=@onedrive,sales-team,customer-portal
167167
```
168168

169+
### `sharepoint_docs_search` Parameters
170+
171+
| Parameter | Type | Default | Description |
172+
|-----------|------|---------|-------------|
173+
| `query` | str | Required | Search keyword |
174+
| `max_results` | int | 20 | Max results (capped at 100) |
175+
| `file_extensions` | list[str] \| None | None | File extensions filter (unsupported values are ignored) |
176+
| `response_format` | str | `detailed` | `detailed` or `compact` |
177+
178+
- `max_results` is capped at 100.
179+
- `file_extensions` is filtered by `SHAREPOINT_ALLOWED_FILE_EXTENSIONS`; unsupported values are ignored.
180+
- `response_format="compact"` returns only `title` / `path` / `extension` to reduce tokens.
181+
182+
**Compact response example**
183+
```python
184+
results = sharepoint_docs_search(
185+
query="budget 2024",
186+
response_format="compact",
187+
max_results=10,
188+
)
189+
```
190+
169191
## Excel Operations Usage Examples
170192

171193
The `sharepoint_excel` tool allows you to read and search Excel files in SharePoint. It supports two modes:
@@ -186,9 +208,6 @@ The `sharepoint_excel` tool allows you to read and search Excel files in SharePo
186208
| `query` | str \| None | None | Search keyword (enables search mode) |
187209
| `sheet` | str \| None | None | Sheet name (get specific sheet only) |
188210
| `cell_range` | str \| None | None | Cell range (e.g., "A1:D10") |
189-
| `include_formatting` | bool | False | Include formatting information |
190-
| `include_header` | bool | True | Auto-detect and separate header rows using `freeze_panes` |
191-
| `metadata_only` | bool | False | Exclude data rows to return only metadata (reduce response size) |
192211

193212
### Basic Workflow
194213

@@ -256,103 +275,6 @@ result = sharepoint_excel(
256275
)
257276
```
258277

259-
#### 5. Read with Formatting Information
260-
```python
261-
# Get data with formatting (colors, merged cells, etc.)
262-
result = sharepoint_excel(
263-
file_path="/sites/finance/Shared Documents/report.xlsx",
264-
sheet="Sheet1",
265-
include_formatting=True
266-
)
267-
```
268-
269-
#### 6. Automatic Header Detection
270-
```python
271-
# Auto-detect and separate header and data rows using freeze_panes
272-
result = sharepoint_excel(
273-
file_path="/sites/finance/Shared Documents/report.xlsx",
274-
sheet="Sheet1",
275-
include_header=True
276-
)
277-
```
278-
279-
**Header Detection Response:**
280-
```json
281-
{
282-
"file_path": "/sites/finance/Shared Documents/report.xlsx",
283-
"sheets": [{
284-
"name": "Sheet1",
285-
"freeze_panes": "B2",
286-
"frozen_rows": 1,
287-
"frozen_cols": 1,
288-
"header_rows": [
289-
[
290-
{"value": "Product", "coordinate": "A1"},
291-
{"value": "Price", "coordinate": "B1"},
292-
{"value": "Stock", "coordinate": "C1"}
293-
]
294-
],
295-
"data_rows": [
296-
[
297-
{"value": "Product A", "coordinate": "A2"},
298-
{"value": 1000, "coordinate": "B2"},
299-
{"value": 50, "coordinate": "C2"}
300-
],
301-
...
302-
]
303-
}]
304-
}
305-
```
306-
307-
**Features:**
308-
- Auto-detects Excel freeze panes (frozen rows/columns)
309-
- Separates header rows and data rows in response (default behavior)
310-
- When `cell_range` is specified, automatically includes frozen range
311-
- Set `include_header=False` to return legacy `rows` format
312-
```
313-
314-
#### 7. Metadata-Only Mode (File Structure Inspection)
315-
```python
316-
# Get only file structure without data rows
317-
result = sharepoint_excel(
318-
file_path="/sites/finance/Shared Documents/large-report.xlsx",
319-
metadata_only=True
320-
)
321-
```
322-
323-
**Metadata-Only Response:**
324-
```json
325-
{
326-
"file_path": "/sites/finance/Shared Documents/large-report.xlsx",
327-
"sheets": [{
328-
"name": "Sheet1",
329-
"freeze_panes": "B2",
330-
"frozen_rows": 1,
331-
"frozen_cols": 1,
332-
"dimensions": "A1:E1000",
333-
"header_rows": [
334-
[
335-
{"value": "Product", "coordinate": "A1"},
336-
{"value": "Price", "coordinate": "B1"},
337-
{"value": "Stock", "coordinate": "C1"}
338-
]
339-
],
340-
"data_rows": []
341-
}]
342-
}
343-
```
344-
345-
**Use Cases:**
346-
- Inspect large file structure before fetching data
347-
- Understand what headers exist in each sheet
348-
- Determine the necessary `cell_range` before retrieving full data
349-
- Significantly reduce response size (save tokens)
350-
351-
**Recommended Workflow:**
352-
1. Use `metadata_only=True` to inspect file structure
353-
2. Identify the required range
354-
3. Fetch actual data with specific `cell_range`
355-
356278
### JSON Output Format
357279

358280
#### Read Mode (Default)
@@ -400,7 +322,9 @@ result = sharepoint_excel(
400322
}
401323
```
402324

403-
#### With Formatting (include_formatting=true)
325+
#### Merged Cells
326+
327+
When merged cells exist, the response includes merged cell information:
404328

405329
```json
406330
{
@@ -414,20 +338,21 @@ result = sharepoint_excel(
414338
{
415339
"value": "Department",
416340
"coordinate": "A1",
417-
"data_type": "s",
418-
"fill": {
419-
"pattern_type": "solid",
420-
"fg_color": "#CCCCCC",
421-
"bg_color": null
422-
},
423341
"merged": {
424342
"range": "A1:B1",
425343
"is_top_left": true
426-
},
427-
"width": 15.0,
428-
"height": 20.0
344+
}
429345
}
430346
]
347+
],
348+
"merged_ranges": [
349+
{
350+
"range": "A1:B1",
351+
"anchor": {
352+
"coordinate": "A1",
353+
"value": "Department"
354+
}
355+
}
431356
]
432357
}
433358
]
@@ -440,12 +365,33 @@ result = sharepoint_excel(
440365
- **value**: Cell value (string, number, date, formula, etc.)
441366
- **coordinate**: Cell position (e.g., "A1", "B2")
442367

443-
**With include_formatting=true:**
444-
- **data_type**: Data type code (`s`=string, `n`=number, `f`=formula, etc.)
445-
- **fill**: Fill color information (pattern type, foreground/background colors)
368+
**When merged cells exist:**
446369
- **merged**: Merged cell information (range, position)
447-
- **width**: Column width
448-
- **height**: Row height
370+
- **merged_ranges**: Merged ranges list per sheet (range + anchor info)
371+
372+
### Additional Metadata
373+
374+
Depending on the request, the response can include metadata such as `response_kind`, `data_included`, `requested_sheet`, `requested_range`, `freeze_panes`, `frozen_rows`, `frozen_cols`, `effective_range`, `sheet_resolution`, and `available_sheets`.
375+
376+
### Sheet Resolution and Fallbacks
377+
378+
- `sheet` is resolved by exact match or a unique `trim + casefold` match.
379+
- If not resolved, `sheet_resolution` and `available_sheets` are returned with a `warning`.
380+
- If `cell_range` is provided and `sheet` is not found, the parser falls back to all sheets.
381+
- If `sheet` is not found and no `cell_range` is provided, `sheets` is empty and `candidates` are returned.
382+
383+
### Cell Range Normalization and Expansion
384+
385+
`cell_range` is normalized/expanded internally, and the result is returned as `effective_range`.
386+
387+
- Column-only ranges (e.g., `J` / `J:J`) expand to `J1:J<max_row>`.
388+
- Single cell ranges (e.g., `C5`) expand to `C1:C5`.
389+
- Single-row ranges (e.g., `D5:H5`) expand to `A5:H5`.
390+
391+
### Large Range Limits
392+
393+
If rows/cols exceed limits, a `ValueError` is raised.
394+
Use `cell_range` to narrow the selection.
449395

450396
### Common Use Cases
451397

@@ -463,18 +409,17 @@ search_result = sharepoint_excel(file_path=file_path, query="Total Revenue")
463409
data = sharepoint_excel(file_path=file_path, sheet="Sheet1", cell_range="A1:D20")
464410
```
465411

466-
**Analyze Cell Formatting**
412+
**Inspect Merged Cells**
467413
```python
468-
# Get Excel data with formatting
469-
json_data = sharepoint_excel(file_path=file_path, include_formatting=True)
414+
# Get Excel data (merged info is included when present)
415+
json_data = sharepoint_excel(file_path=file_path)
470416
data = json.loads(json_data)
471417

472-
# Find cells with specific formatting
418+
# List merged ranges
473419
for sheet in data["sheets"]:
474-
for row in sheet["rows"]:
475-
for cell in row:
476-
if cell.get("fill", {}).get("fg_color"):
477-
print(f"Colored cell at {cell['coordinate']}: {cell['value']}")
420+
for merged in sheet.get("merged_ranges", []):
421+
anchor = merged.get("anchor", {})
422+
print(f"Merged range {merged['range']}: {anchor.get('value')}")
478423
```
479424

480425
**Export Specific Sheet to CSV**

0 commit comments

Comments
 (0)