Skip to content

Commit c7f1ab1

Browse files
committed
Initial commit
0 parents  commit c7f1ab1

5 files changed

Lines changed: 260 additions & 0 deletions

File tree

README.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# dataanalysiscompare
2+
[![PyPI version](https://badge.fury.io/py/dataanalysiscompare.svg)](https://badge.fury.io/py/dataanalysiscompare)
3+
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
4+
[![Downloads](https://static.pepy.tech/badge/dataanalysiscompare)](https://pepy.tech/project/dataanalysiscompare)
5+
[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue)](https://www.linkedin.com/in/eugene-evstafev-716669181/)
6+
7+
8+
**dataanalysiscompare** is a lightweight Python package that helps you quickly compare four popular data‑analysis tools—**Excel**, **Power BI**, **SQL**, and **Python**—based on your specific needs, project requirements, or skill level. By leveraging a language model (LLM) under the hood, the package returns a clear, standardized comparison that includes key differentiators, best‑use cases, learning curves, and integration capabilities.
9+
10+
---
11+
12+
## ✨ Features
13+
14+
- **Instant, structured comparison** of Excel, Power BI, SQL, and Python.
15+
- Works with the default **ChatLLM7** model (no extra setup required) or any other LangChain‑compatible LLM you prefer.
16+
- Simple API: just pass a natural‑language description of your use case.
17+
- Returns a list of strings that can be easily displayed, logged, or further processed.
18+
19+
---
20+
21+
## 📦 Installation
22+
23+
```bash
24+
pip install dataanalysiscompare
25+
```
26+
27+
---
28+
29+
## 🚀 Quick Start
30+
31+
```python
32+
from dataanalysiscompare import dataanalysiscompare
33+
34+
# Simple call using the default LLM (ChatLLM7)
35+
user_query = """
36+
I have a medium‑sized sales dataset in CSV format.
37+
I need to clean the data, create visual dashboards, and share insights with my team.
38+
I have basic Excel skills but want something more powerful.
39+
"""
40+
result = dataanalysiscompare(user_input=user_query)
41+
42+
for line in result:
43+
print(line)
44+
```
45+
46+
### Output (example)
47+
48+
```
49+
- Excel: Great for quick calculations and ad‑hoc analysis but limited for large datasets.
50+
- Power BI: Excellent for interactive dashboards and sharing reports; steeper learning curve.
51+
- SQL: Ideal for querying large relational datasets; requires knowledge of SQL syntax.
52+
- Python: Most flexible; powerful libraries (pandas, matplotlib, seaborn) but higher learning curve.
53+
...
54+
```
55+
56+
---
57+
58+
## 🛠️ Advanced Usage
59+
60+
### Providing Your Own LLM
61+
62+
If you prefer to use a different LangChain LLM (e.g., OpenAI, Anthropic, Google Gemini), simply pass the instantiated model via the `llm` argument.
63+
64+
#### OpenAI Example
65+
66+
```python
67+
from langchain_openai import ChatOpenAI
68+
from dataanalysiscompare import dataanalysiscompare
69+
70+
llm = ChatOpenAI(model="gpt-4o-mini")
71+
response = dataanalysiscompare(
72+
user_input="I need to automate monthly reporting from a PostgreSQL database.",
73+
llm=llm
74+
)
75+
print(response)
76+
```
77+
78+
#### Anthropic Example
79+
80+
```python
81+
from langchain_anthropic import ChatAnthropic
82+
from dataanalysiscompare import dataanalysiscompare
83+
84+
llm = ChatAnthropic(model_name="claude-3-haiku-20240307")
85+
response = dataanalysiscompare(
86+
user_input="My team wants a low‑code solution for building interactive charts.",
87+
llm=llm
88+
)
89+
print(response)
90+
```
91+
92+
#### Google Gemini Example
93+
94+
```python
95+
from langchain_google_genai import ChatGoogleGenerativeAI
96+
from dataanalysiscompare import dataanalysiscompare
97+
98+
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
99+
response = dataanalysiscompare(
100+
user_input="I need to integrate data from Excel and a MySQL database into a single dashboard.",
101+
llm=llm
102+
)
103+
print(response)
104+
```
105+
106+
### Supplying a Custom API Key for LLM7
107+
108+
The default LLM7 free‑tier limits are sufficient for most usage. If you need higher limits, provide your own API key:
109+
110+
```python
111+
from dataanalysiscompare import dataanalysiscompare
112+
113+
response = dataanalysiscompare(
114+
user_input="Describe the best data‑analysis tool for a beginner who wants to learn data science.",
115+
api_key="YOUR_LLM7_API_KEY"
116+
)
117+
print(response)
118+
```
119+
120+
You can also set the environment variable `LLM7_API_KEY` and omit the `api_key` argument.
121+
122+
---
123+
124+
## 📋 Function Signature
125+
126+
```python
127+
def dataanalysiscompare(
128+
user_input: str,
129+
api_key: Optional[str] = None,
130+
llm: Optional[BaseChatModel] = None
131+
) -> List[str]:
132+
"""
133+
Compare Excel, Power BI, SQL, and Python based on the provided user description.
134+
135+
Parameters
136+
----------
137+
user_input: str
138+
Natural‑language description of the data‑analysis needs, project, or skill level.
139+
llm: Optional[BaseChatModel]
140+
A LangChain LLM instance to use. If omitted, the default ChatLLM7 is used.
141+
api_key: Optional[str]
142+
API key for LLM7. If omitted, the function looks for the LLM7_API_KEY environment
143+
variable or falls back to the free tier.
144+
145+
Returns
146+
-------
147+
List[str]
148+
A list of strings containing the comparative insights.
149+
"""
150+
```
151+
152+
---
153+
154+
## 🧩 Dependencies
155+
156+
- `langchain-core`
157+
- `langchain-llm7`
158+
- `llmatch-messages`
159+
- `re`, `os`, `typing` (standard library)
160+
161+
All dependencies are installed automatically with the package.
162+
163+
---
164+
165+
## 📖 Documentation & Support
166+
167+
- **Source code / Issues:** <https://github....>
168+
- **LLM7 documentation:** <https://pypi.org/project/langchain-llm7/>
169+
- **LangChain docs:** <https://docs.langchain.com/>
170+
171+
If you encounter any problems or have feature requests, please open an issue on GitHub.
172+
173+
---
174+
175+
## 👤 Author
176+
177+
**Eugene Evstafev**
178+
📧 Email: [hi@euegne.plus](mailto:hi@euegne.plus)
179+
🐙 GitHub: [chigwell](https://github.com/chigwell)
180+
181+
---
182+
183+
## 📜 License
184+
185+
This project is licensed under the MIT License – see the `LICENSE` file for details.

dataanalysiscompare/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
from .main import dataanalysiscompare
2+
from .prompts import human_prompt, pattern, system_prompt
3+
4+
__all__ = ["dataanalysiscompare", "system_prompt", "human_prompt", "pattern"]

dataanalysiscompare/main.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import os
2+
import re
3+
from typing import List, Optional
4+
5+
from llmatch_messages import llmatch
6+
from langchain_core.language_models import BaseChatModel
7+
from langchain_core.messages import HumanMessage, SystemMessage
8+
from langchain_llm7 import ChatLLM7
9+
10+
from .prompts import human_prompt, pattern, system_prompt
11+
12+
13+
def dataanalysiscompare(
14+
user_input: str,
15+
api_key: Optional[str] = None,
16+
llm: Optional[BaseChatModel] = None
17+
) -> List[str]:
18+
"""Template callable; rename this function when templating."""
19+
resolved_llm = llm
20+
if resolved_llm is None:
21+
if api_key is None:
22+
api_key = os.getenv("LLM7_API_KEY")
23+
if api_key is None:
24+
api_key = "None"
25+
resolved_llm = ChatLLM7(api_key=api_key, base_url="https://api.llm7.io/v1") \
26+
if api_key else ChatLLM7(base_url="https://api.llm7.io/v1")
27+
28+
pattern_hint = f"Output must match regex: {pattern}"
29+
system_content = f"{system_prompt}\n\n{pattern_hint}"
30+
human_content = f"{human_prompt}\n\n{pattern_hint}\n\nInput:\n{user_input}".strip()
31+
32+
compiled_pattern = re.compile(pattern, re.DOTALL | re.MULTILINE)
33+
34+
response = llmatch(
35+
llm=resolved_llm,
36+
messages=[
37+
SystemMessage(content=system_content),
38+
HumanMessage(content=human_content),
39+
],
40+
pattern=compiled_pattern,
41+
verbose=False,
42+
)
43+
44+
if not response.get("success"):
45+
error_message = response.get("error_message") or "LLM7 call failed"
46+
raise RuntimeError(error_message)
47+
48+
return response.get("extracted_data") or []

dataanalysiscompare/prompts.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
system_prompt = 'You are an expert **Data Analysis Tool Advisor** specializing in providing unbiased, structured comparisons of **Excel, Power BI, SQL, and Python** for data analysis tasks. Your role is to analyze the user\'s specific needs (e.g., project requirements, skill level, data volume, team collaboration needs, or output format preferences) and generate a **clear, standardized, and expert-level comparison** of these tools.\n\n### **Core Instructions:**\n1. **User Input Handling**:\n - Accept user input in natural language (e.g., *"I need to analyze sales data for a small business with 500K rows and share insights with my team"*).\n - Extract key details like:\n - **Data volume** (small/medium/large).\n - **Team collaboration needs** (individual/team-based).\n - **Output format** (dashboards, reports, automation, etc.).\n - **Skill level** (beginner/intermediate/advanced).\n - **Budget constraints** (if mentioned).\n - **Integration requirements** (e.g., cloud, APIs, other tools).\n\n2. **Comparison Structure**:\n For each tool (**Excel, Power BI, SQL, Python**), provide the following **structured insights** in the format below:\n - **Best Use Case**: A concise description of the ideal scenario for the tool.\n - **Key Differentiators**: 3-5 bullet points highlighting strengths (e.g., *"SQL excels at querying structured databases"*).\n - **Learning Curve**: Rate from **1 (easiest)** to **5 (hardest)** and explain (e.g., *"Excel: 1/5 – Familiar to most users"*).\n - **Integration Capabilities**: How it connects with other tools (e.g., *"Python: Seamless with APIs, cloud services, and custom scripts"*).\n - **Limitations**: 1-2 key weaknesses (e.g., *"Power BI: Not ideal for raw data processing"*).\n - **Skill Level Requirement**: Beginner/Intermediate/Advanced for each tool.\n - **Cost**: Free/Paid (if applicable) and approximate cost range (e.g., *"Power BI: Free for Pro, $10/user/month for Premium"*).\n\n3. **Final Recommendation**:\n - Summarize which tool(s) align best with the user’s needs.\n - Provide a **ranked list** (e.g., *"Top 2 picks: Python (for automation) > SQL (for querying)"*).\n - Include a **brief rationale** (e.g., *"Python wins for scalability and customization"*).\n\n4. **Tone and Clarity**:\n - Avoid jargon; explain technical terms simply.\n - Use **bullet points** for readability.\n - Highlight trade-offs (e.g., *"Excel is easy but lacks advanced analytics"*).\n\n5. **Output Format**:\n Respond **only** in the following structured format (no deviations):\n ```\n <comparison>\n [User\'s needs summary: "..."]\n [Tool: Excel]\n Best Use Case: [1-2 sentences]\n Key Differentiators:\n - [Point 1]\n - [Point 2]\n - [Point 3]\n Learning Curve: [1-5] – [Explanation]\n Integration Capabilities: [Description]\n Limitations:\n - [Point 1]\n - [Point 2]\n Skill Level: [Beginner/Intermediate/Advanced]\n Cost: [Free/Paid + details]\n\n [Tool: Power BI]\n [Same structure as above...]\n\n [Tool: SQL]\n [Same structure as above...]\n\n [Tool: Python]\n [Same structure as above...]\n\n Final Recommendation:\n Ranked Picks: [1. Tool, 2. Tool, ...]\n Rationale: [Why these tools fit best]\n </comparison>\n ```\n\n6. **Fallback Handling**:\n - If the user’s input is unclear, ask clarifying questions (e.g., *"Are you working alone or with a team?"*).\n - If no tools fit perfectly, suggest hybrid approaches (e.g., *"Combine Excel for reporting + Python for automation"*).\n\n7. **Expertise**:\n - Assume you’re a **senior data analyst** with hands-on experience in all 4 tools.\n - Prioritize **practicality** over theoretical capabilities (e.g., focus on real-world usability).\n - Cite **real-world examples** where possible (e.g., *"SQL is ideal for e-commerce platforms processing millions of transactions"*).\n\n8. **Avoid**:\n - Generic answers (e.g., *"All tools are great"*).\n - Bias toward any single tool.\n - Overly technical details unless the user asks for them.'
2+
human_prompt = 'I need a comparison of data analysis tools: Excel, Power BI, SQL, and Python. The comparison should cover key differentiators, best use cases, learning curves, and integration capabilities for each tool. Please provide this information in a structured format that highlights the strengths and weaknesses of each tool for different data analysis needs.'
3+
pattern = '<comparison>\\s*(.*?)\\s*<\\/comparison>'

pyproject.toml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[build-system]
2+
requires = ["setuptools>=68.0.0", "wheel"]
3+
build-backend = "setuptools.build_meta"
4+
5+
[project]
6+
name = "dataanalysiscompare"
7+
version = "2025.12.21103520"
8+
description = "A new package that helps users compare and choose the right data analysis tool by providing structured, expert-level insights. Users input their specific data analysis needs, project requirements, or "
9+
readme = "README.md"
10+
authors = [{ name = "dataanalysiscompare", email = "hi@eugene.plus" }]
11+
requires-python = ">=3.9"
12+
dependencies = [
13+
"langchain-llm7>=0.0.0",
14+
"llmatch-messages>=0.0.0",
15+
"langchain-core>=0.3.0",
16+
]
17+
license = { text = "MIT" }
18+
19+
[project.urls]
20+
Homepage = "https://github.com/chigwell/dataanalysiscompare"

0 commit comments

Comments
 (0)