Initial commit

chigwell · chigwell · commit c7f1ab19ec28 · 2025-12-21T10:35:27.000Z
diff --git a/README.md b/README.md
@@ -0,0 +1,185 @@
+# dataanalysiscompare
+[![PyPI version](https://badge.fury.io/py/dataanalysiscompare.svg)](https://badge.fury.io/py/dataanalysiscompare)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+[![Downloads](https://static.pepy.tech/badge/dataanalysiscompare)](https://pepy.tech/project/dataanalysiscompare)
+[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue)](https://www.linkedin.com/in/eugene-evstafev-716669181/)
+
+
+**dataanalysiscompare** is a lightweight Python package that helps you quickly compare four popular data‑analysis tools—**Excel**, **Power BI**, **SQL**, and **Python**—based on your specific needs, project requirements, or skill level. By leveraging a language model (LLM) under the hood, the package returns a clear, standardized comparison that includes key differentiators, best‑use cases, learning curves, and integration capabilities.
+
+---
+
+## ✨ Features
+
+- **Instant, structured comparison** of Excel, Power BI, SQL, and Python.
+- Works with the default **ChatLLM7** model (no extra setup required) or any other LangChain‑compatible LLM you prefer.
+- Simple API: just pass a natural‑language description of your use case.
+- Returns a list of strings that can be easily displayed, logged, or further processed.
+
+---
+
+## 📦 Installation
+
+```bash
+pip install dataanalysiscompare
+```
+
+---
+
+## 🚀 Quick Start
+
+```python
+from dataanalysiscompare import dataanalysiscompare
+
+# Simple call using the default LLM (ChatLLM7)
+user_query = """
+I have a medium‑sized sales dataset in CSV format.
+I need to clean the data, create visual dashboards, and share insights with my team.
+I have basic Excel skills but want something more powerful.
+"""
+result = dataanalysiscompare(user_input=user_query)
+
+for line in result:
+    print(line)
+```
+
+### Output (example)
+
+```
+- Excel: Great for quick calculations and ad‑hoc analysis but limited for large datasets.
+- Power BI: Excellent for interactive dashboards and sharing reports; steeper learning curve.
+- SQL: Ideal for querying large relational datasets; requires knowledge of SQL syntax.
+- Python: Most flexible; powerful libraries (pandas, matplotlib, seaborn) but higher learning curve.
+...
+```
+
+---
+
+## 🛠️ Advanced Usage
+
+### Providing Your Own LLM
+
+If you prefer to use a different LangChain LLM (e.g., OpenAI, Anthropic, Google Gemini), simply pass the instantiated model via the `llm` argument.
+
+#### OpenAI Example
+
+```python
+from langchain_openai import ChatOpenAI
+from dataanalysiscompare import dataanalysiscompare
+
+llm = ChatOpenAI(model="gpt-4o-mini")
+response = dataanalysiscompare(
+    user_input="I need to automate monthly reporting from a PostgreSQL database.",
+    llm=llm
+)
+print(response)
+```
+
+#### Anthropic Example
+
+```python
+from langchain_anthropic import ChatAnthropic
+from dataanalysiscompare import dataanalysiscompare
+
+llm = ChatAnthropic(model_name="claude-3-haiku-20240307")
+response = dataanalysiscompare(
+    user_input="My team wants a low‑code solution for building interactive charts.",
+    llm=llm
+)
+print(response)
+```
+
+#### Google Gemini Example
+
+```python
+from langchain_google_genai import ChatGoogleGenerativeAI
+from dataanalysiscompare import dataanalysiscompare
+
+llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
+response = dataanalysiscompare(
+    user_input="I need to integrate data from Excel and a MySQL database into a single dashboard.",
+    llm=llm
+)
+print(response)
+```
+
+### Supplying a Custom API Key for LLM7
+
+The default LLM7 free‑tier limits are sufficient for most usage. If you need higher limits, provide your own API key:
+
+```python
+from dataanalysiscompare import dataanalysiscompare
+
+response = dataanalysiscompare(
+    user_input="Describe the best data‑analysis tool for a beginner who wants to learn data science.",
+    api_key="YOUR_LLM7_API_KEY"
+)
+print(response)
+```
+
+You can also set the environment variable `LLM7_API_KEY` and omit the `api_key` argument.
+
+---
+
+## 📋 Function Signature
+
+```python
+def dataanalysiscompare(
+    user_input: str,
+    api_key: Optional[str] = None,
+    llm: Optional[BaseChatModel] = None
+) -> List[str]:
+    """
+    Compare Excel, Power BI, SQL, and Python based on the provided user description.
+
+    Parameters
+    ----------
+    user_input: str
+        Natural‑language description of the data‑analysis needs, project, or skill level.
+    llm: Optional[BaseChatModel]
+        A LangChain LLM instance to use. If omitted, the default ChatLLM7 is used.
+    api_key: Optional[str]
+        API key for LLM7. If omitted, the function looks for the LLM7_API_KEY environment
+        variable or falls back to the free tier.
+
+    Returns
+    -------
+    List[str]
+        A list of strings containing the comparative insights.
+    """
+```
+
+---
+
+## 🧩 Dependencies
+
+- `langchain-core`
+- `langchain-llm7`
+- `llmatch-messages`
+- `re`, `os`, `typing` (standard library)
+
+All dependencies are installed automatically with the package.
+
+---
+
+## 📖 Documentation & Support
+
+- **Source code / Issues:** <https://github....>
+- **LLM7 documentation:** <https://pypi.org/project/langchain-llm7/>
+- **LangChain docs:** <https://docs.langchain.com/>
+
+If you encounter any problems or have feature requests, please open an issue on GitHub.
+
+---
+
+## 👤 Author
+
+**Eugene Evstafev**  
+📧 Email: [hi@euegne.plus](mailto:hi@euegne.plus)  
+🐙 GitHub: [chigwell](https://github.com/chigwell)
+
+---
+
+## 📜 License
+
+This project is licensed under the MIT License – see the `LICENSE` file for details.
diff --git a/dataanalysiscompare/__init__.py b/dataanalysiscompare/__init__.py
@@ -0,0 +1,4 @@
+from .main import dataanalysiscompare
+from .prompts import human_prompt, pattern, system_prompt
+
+__all__ = ["dataanalysiscompare", "system_prompt", "human_prompt", "pattern"]
diff --git a/dataanalysiscompare/main.py b/dataanalysiscompare/main.py
@@ -0,0 +1,48 @@
+import os
+import re
+from typing import List, Optional
+
+from llmatch_messages import llmatch
+from langchain_core.language_models import BaseChatModel
+from langchain_core.messages import HumanMessage, SystemMessage
+from langchain_llm7 import ChatLLM7
+
+from .prompts import human_prompt, pattern, system_prompt
+
+
+def dataanalysiscompare(
+        user_input: str,
+        api_key: Optional[str] = None,
+        llm: Optional[BaseChatModel] = None
+) -> List[str]:
+    """Template callable; rename this function when templating."""
+    resolved_llm = llm
+    if resolved_llm is None:
+        if api_key is None:
+            api_key = os.getenv("LLM7_API_KEY")
+        if api_key is None:
+            api_key = "None"
+        resolved_llm = ChatLLM7(api_key=api_key, base_url="https://api.llm7.io/v1") \
+            if api_key else ChatLLM7(base_url="https://api.llm7.io/v1")
+
+    pattern_hint = f"Output must match regex: {pattern}"
+    system_content = f"{system_prompt}\n\n{pattern_hint}"
+    human_content = f"{human_prompt}\n\n{pattern_hint}\n\nInput:\n{user_input}".strip()
+
+    compiled_pattern = re.compile(pattern, re.DOTALL | re.MULTILINE)
+
+    response = llmatch(
+        llm=resolved_llm,
+        messages=[
+            SystemMessage(content=system_content),
+            HumanMessage(content=human_content),
+        ],
+        pattern=compiled_pattern,
+        verbose=False,
+    )
+
+    if not response.get("success"):
+        error_message = response.get("error_message") or "LLM7 call failed"
+        raise RuntimeError(error_message)
+
+    return response.get("extracted_data") or []
diff --git a/dataanalysiscompare/prompts.py b/dataanalysiscompare/prompts.py
@@ -0,0 +1,3 @@
+system_prompt = 'You are an expert **Data Analysis Tool Advisor** specializing in providing unbiased, structured comparisons of **Excel, Power BI, SQL, and Python** for data analysis tasks. Your role is to analyze the user\'s specific needs (e.g., project requirements, skill level, data volume, team collaboration needs, or output format preferences) and generate a **clear, standardized, and expert-level comparison** of these tools.\n\n### **Core Instructions:**\n1. **User Input Handling**:\n   - Accept user input in natural language (e.g., *"I need to analyze sales data for a small business with 500K rows and share insights with my team"*).\n   - Extract key details like:\n     - **Data volume** (small/medium/large).\n     - **Team collaboration needs** (individual/team-based).\n     - **Output format** (dashboards, reports, automation, etc.).\n     - **Skill level** (beginner/intermediate/advanced).\n     - **Budget constraints** (if mentioned).\n     - **Integration requirements** (e.g., cloud, APIs, other tools).\n\n2. **Comparison Structure**:\n   For each tool (**Excel, Power BI, SQL, Python**), provide the following **structured insights** in the format below:\n   - **Best Use Case**: A concise description of the ideal scenario for the tool.\n   - **Key Differentiators**: 3-5 bullet points highlighting strengths (e.g., *"SQL excels at querying structured databases"*).\n   - **Learning Curve**: Rate from **1 (easiest)** to **5 (hardest)** and explain (e.g., *"Excel: 1/5 – Familiar to most users"*).\n   - **Integration Capabilities**: How it connects with other tools (e.g., *"Python: Seamless with APIs, cloud services, and custom scripts"*).\n   - **Limitations**: 1-2 key weaknesses (e.g., *"Power BI: Not ideal for raw data processing"*).\n   - **Skill Level Requirement**: Beginner/Intermediate/Advanced for each tool.\n   - **Cost**: Free/Paid (if applicable) and approximate cost range (e.g., *"Power BI: Free for Pro, $10/user/month for Premium"*).\n\n3. **Final Recommendation**:\n   - Summarize which tool(s) align best with the user’s needs.\n   - Provide a **ranked list** (e.g., *"Top 2 picks: Python (for automation) > SQL (for querying)"*).\n   - Include a **brief rationale** (e.g., *"Python wins for scalability and customization"*).\n\n4. **Tone and Clarity**:\n   - Avoid jargon; explain technical terms simply.\n   - Use **bullet points** for readability.\n   - Highlight trade-offs (e.g., *"Excel is easy but lacks advanced analytics"*).\n\n5. **Output Format**:\n   Respond **only** in the following structured format (no deviations):\n   ```\n   <comparison>\n   [User\'s needs summary: "..."]\n   [Tool: Excel]\n   Best Use Case: [1-2 sentences]\n   Key Differentiators:\n   - [Point 1]\n   - [Point 2]\n   - [Point 3]\n   Learning Curve: [1-5] – [Explanation]\n   Integration Capabilities: [Description]\n   Limitations:\n   - [Point 1]\n   - [Point 2]\n   Skill Level: [Beginner/Intermediate/Advanced]\n   Cost: [Free/Paid + details]\n\n   [Tool: Power BI]\n   [Same structure as above...]\n\n   [Tool: SQL]\n   [Same structure as above...]\n\n   [Tool: Python]\n   [Same structure as above...]\n\n   Final Recommendation:\n   Ranked Picks: [1. Tool, 2. Tool, ...]\n   Rationale: [Why these tools fit best]\n   </comparison>\n   ```\n\n6. **Fallback Handling**:\n   - If the user’s input is unclear, ask clarifying questions (e.g., *"Are you working alone or with a team?"*).\n   - If no tools fit perfectly, suggest hybrid approaches (e.g., *"Combine Excel for reporting + Python for automation"*).\n\n7. **Expertise**:\n   - Assume you’re a **senior data analyst** with hands-on experience in all 4 tools.\n   - Prioritize **practicality** over theoretical capabilities (e.g., focus on real-world usability).\n   - Cite **real-world examples** where possible (e.g., *"SQL is ideal for e-commerce platforms processing millions of transactions"*).\n\n8. **Avoid**:\n   - Generic answers (e.g., *"All tools are great"*).\n   - Bias toward any single tool.\n   - Overly technical details unless the user asks for them.'
+human_prompt = 'I need a comparison of data analysis tools: Excel, Power BI, SQL, and Python. The comparison should cover key differentiators, best use cases, learning curves, and integration capabilities for each tool. Please provide this information in a structured format that highlights the strengths and weaknesses of each tool for different data analysis needs.'
+pattern = '<comparison>\\s*(.*?)\\s*<\\/comparison>'
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,20 @@
+[build-system]
+requires = ["setuptools>=68.0.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "dataanalysiscompare"
+version = "2025.12.21103520"
+description = "A new package that helps users compare and choose the right data analysis tool by providing structured, expert-level insights. Users input their specific data analysis needs, project requirements, or "
+readme = "README.md"
+authors = [{ name = "dataanalysiscompare", email = "hi@eugene.plus" }]
+requires-python = ">=3.9"
+dependencies = [
+  "langchain-llm7>=0.0.0",
+  "llmatch-messages>=0.0.0",
+  "langchain-core>=0.3.0",
+]
+license = { text = "MIT" }
+
+[project.urls]
+Homepage = "https://github.com/chigwell/dataanalysiscompare"

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	+system_prompt = 'You are an expert Data Analysis Tool Advisor specializing in providing unbiased, structured comparisons of Excel, Power BI, SQL, and Python for data analysis tasks. Your role is to analyze the user\'s specific needs (e.g., project requirements, skill level, data volume, team collaboration needs, or output format preferences) and generate a clear, standardized, and expert-level comparison of these tools.\n\n### Core Instructions:\n1. User Input Handling:\n - Accept user input in natural language (e.g., "I need to analyze sales data for a small business with 500K rows and share insights with my team").\n - Extract key details like:\n - Data volume (small/medium/large).\n - Team collaboration needs (individual/team-based).\n - Output format (dashboards, reports, automation, etc.).\n - Skill level (beginner/intermediate/advanced).\n - Budget constraints (if mentioned).\n - Integration requirements (e.g., cloud, APIs, other tools).\n\n2. Comparison Structure:\n For each tool (Excel, Power BI, SQL, Python), provide the following structured insights in the format below:\n - Best Use Case: A concise description of the ideal scenario for the tool.\n - Key Differentiators: 3-5 bullet points highlighting strengths (e.g., "SQL excels at querying structured databases").\n - Learning Curve: Rate from 1 (easiest) to 5 (hardest) and explain (e.g., "Excel: 1/5 – Familiar to most users").\n - Integration Capabilities: How it connects with other tools (e.g., "Python: Seamless with APIs, cloud services, and custom scripts").\n - Limitations: 1-2 key weaknesses (e.g., "Power BI: Not ideal for raw data processing").\n - Skill Level Requirement: Beginner/Intermediate/Advanced for each tool.\n - Cost: Free/Paid (if applicable) and approximate cost range (e.g., "Power BI: Free for Pro, $10/user/month for Premium").\n\n3. Final Recommendation:\n - Summarize which tool(s) align best with the user’s needs.\n - Provide a ranked list (e.g., "Top 2 picks: Python (for automation) > SQL (for querying)").\n - Include a brief rationale (e.g., "Python wins for scalability and customization").\n\n4. Tone and Clarity:\n - Avoid jargon; explain technical terms simply.\n - Use bullet points for readability.\n - Highlight trade-offs (e.g., "Excel is easy but lacks advanced analytics").\n\n5. Output Format:\n Respond only in the following structured format (no deviations):\n ```\n <comparison>\n [User\'s needs summary: "..."]\n [Tool: Excel]\n Best Use Case: [1-2 sentences]\n Key Differentiators:\n - [Point 1]\n - [Point 2]\n - [Point 3]\n Learning Curve: [1-5] – [Explanation]\n Integration Capabilities: [Description]\n Limitations:\n - [Point 1]\n - [Point 2]\n Skill Level: [Beginner/Intermediate/Advanced]\n Cost: [Free/Paid + details]\n\n [Tool: Power BI]\n [Same structure as above...]\n\n [Tool: SQL]\n [Same structure as above...]\n\n [Tool: Python]\n [Same structure as above...]\n\n Final Recommendation:\n Ranked Picks: [1. Tool, 2. Tool, ...]\n Rationale: [Why these tools fit best]\n </comparison>\n ```\n\n6. Fallback Handling:\n - If the user’s input is unclear, ask clarifying questions (e.g., "Are you working alone or with a team?").\n - If no tools fit perfectly, suggest hybrid approaches (e.g., "Combine Excel for reporting + Python for automation").\n\n7. Expertise:\n - Assume you’re a senior data analyst with hands-on experience in all 4 tools.\n - Prioritize practicality over theoretical capabilities (e.g., focus on real-world usability).\n - Cite real-world examples where possible (e.g., "SQL is ideal for e-commerce platforms processing millions of transactions").\n\n8. Avoid:\n - Generic answers (e.g., "All tools are great").\n - Bias toward any single tool.\n - Overly technical details unless the user asks for them.'
	`2`	`+human_prompt = 'I need a comparison of data analysis tools: Excel, Power BI, SQL, and Python. The comparison should cover key differentiators, best use cases, learning curves, and integration capabilities for each tool. Please provide this information in a structured format that highlights the strengths and weaknesses of each tool for different data analysis needs.'`
	`3`	`+pattern = '<comparison>\\s(.?)\\s*<\\/comparison>'`