Skip to content

Commit 58512a9

Browse files
Updated ReadMe
1 parent 6324713 commit 58512a9

4 files changed

Lines changed: 85 additions & 79 deletions

File tree

README.md

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -58,33 +58,42 @@ markitdown document.docx --llm
5858
### Python API
5959

6060
```python
61-
from openize.markitdown import DocumentProcessor
61+
from _markitdown import MarkItDown
6262

63-
# Initialize with custom output directory
64-
processor = DocumentProcessor(output_dir="my_markdown_files")
63+
# Define input file and output directory
64+
input_file = "report.pdf"
65+
output_dir = "output_markdown"
6566

66-
# Convert files and save locally
67-
processor.process_document("document.docx")
68-
processor.process_document("presentation.pptx")
69-
processor.process_document("spreadsheet.xlsx")
70-
processor.process_document("sample.pdf")
67+
# Create MarkItDown instance
68+
converter = MarkItDown(output_dir)
7169

72-
# Send to LLM for processing (requires OPENAI_API_KEY environment variable)
73-
processor.process_document("document.docx", insert_into_llm=True)
74-
```
70+
# Convert document and send output to LLM
71+
converter.convert_document(input_file, insert_into_llm=True)
72+
73+
print("Conversion completed and data sent to LLM.")```
7574

7675
## Environment Variables
7776

77+
- `ASPOSE_LICENSE_PATH`: Required when using the Aspose Paid APIs. This should be set to the full path of your Aspose license file.
7878
- `OPENAI_API_KEY`: Required when using the `insert_into_llm=True` option or the `--llm` flag.
79+
- `OPENAI_MODEL`: Specifies the OpenAI model name (default: `gpt-4`).
7980

80-
## Running Tests
81+
To set these variables:
8182

82-
```sh
83-
# Install test dependencies
84-
pip install pytest pytest-mock
83+
For Unix-based systems:
84+
85+
```bash
86+
export ASPOSE_LICENSE_PATH="/path/to/license"
87+
export OPENAI_API_KEY="your-api-key"
88+
export OPENAI_MODEL="gpt-4"
89+
```
90+
91+
For Windows (PowerShell):
8592

86-
# Run the tests
87-
pytest
93+
```powershell
94+
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
95+
$env:OPENAI_API_KEY = "your-api-key"
96+
$env:OPENAI_MODEL = "gpt-4"
8897
```
8998

9099
## Contributing

packages/markitdown/README.md

Lines changed: 26 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -52,55 +52,51 @@ markitdown document.docx
5252
markitdown document.docx -o output_folder
5353

5454
# Process with an LLM (requires OPENAI_API_KEY environment variable)
55-
markitdown document.docx --llm
55+
markitdown document.docx -o output_folder --insert-into-llm
5656
```
5757

5858
### Python API
5959

6060
```python
61-
from openize.markitdown import DocumentProcessor
61+
from _markitdown import MarkItDown
6262

63-
# Initialize with custom output directory
64-
processor = DocumentProcessor(output_dir="my_markdown_files")
63+
# Define input file and output directory
64+
input_file = "report.pdf"
65+
output_dir = "output_markdown"
6566

66-
# Convert files and save locally
67-
processor.process_document("document.docx")
68-
processor.process_document("presentation.pptx")
69-
processor.process_document("spreadsheet.xlsx")
70-
processor.process_document("sample.pdf")
67+
# Create MarkItDown instance
68+
converter = MarkItDown(output_dir)
69+
70+
# Convert document and send output to LLM
71+
converter.convert_document(input_file, insert_into_llm=True)
72+
73+
print("Conversion completed and data sent to LLM.")
7174

72-
# Send to LLM for processing (requires OPENAI_API_KEY environment variable)
73-
processor.process_document("document.docx", insert_into_llm=True)
7475
```
7576

7677
## Environment Variables
7778

79+
- `ASPOSE_LICENSE_PATH`: Required when using the Aspose Paid APIs. This should be set to the full path of your Aspose license file.
7880
- `OPENAI_API_KEY`: Required when using the `insert_into_llm=True` option or the `--llm` flag.
81+
- `OPENAI_MODEL`: Specifies the OpenAI model name (default: `gpt-4`).
7982

80-
## Running Tests
83+
To set these variables:
8184

82-
```sh
83-
# Install test dependencies
84-
pip install pytest pytest-mock
85+
For Unix-based systems:
8586

86-
# Run the tests
87-
pytest
87+
```bash
88+
export ASPOSE_LICENSE_PATH="/path/to/license"
89+
export OPENAI_API_KEY="your-api-key"
90+
export OPENAI_MODEL="gpt-4"
8891
```
8992

90-
## Contributing
93+
For Windows (PowerShell):
9194

92-
We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
93-
94-
1. **Fork & Clone** – Fork the repository and clone it to your local machine.
95-
2. **Create a Branch** – Use a new branch for your contribution.
96-
3. **Sign the Contributor License Agreement (CLA)** – Before your first contribution can be accepted, you must sign our CLA via [CLA Assistant](https://cla-assistant.io). You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: [https://cla.openize.com/agreement](https://cla.openize.com/agreement).
97-
4. **Submit a Pull Request (PR)** – Once your changes are ready, open a PR with a clear description.
98-
5. **Review & Feedback** – Our maintainers will review your PR and provide feedback if needed.
99-
100-
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
95+
```powershell
96+
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
97+
$env:OPENAI_API_KEY = "your-api-key"
98+
$env:OPENAI_MODEL = "gpt-4"
99+
```
101100

102-
## License
103101

104-
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.
105102

106-
⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
from processor import DocumentProcessor
2+
3+
4+
class MarkItDown:
5+
def __init__(self, output_dir):
6+
self.output_dir = output_dir
7+
8+
def convert_document(self, input_file, insert_into_llm=False):
9+
"""Run the document conversion process."""
10+
processor = DocumentProcessor(self.output_dir)
11+
processor.process_document(input_file, insert_into_llm)

packages/markitdown/src/openize/markitdown/main.py

Lines changed: 22 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,8 @@
22
import os
33
import sys
44
import logging
5-
from processor import DocumentProcessor
5+
from _markitdown import MarkItDown
66
from license_manager import LicenseManager
7-
from llm_strategy import InsertIntoLLM
87

98

109
def ask_user_boolean(question):
@@ -19,12 +18,12 @@ def ask_user_boolean(question):
1918
print("Invalid input. Please enter 'yes' or 'no'.")
2019

2120

22-
def ensure_env_variable(var_name, prompt_message):
21+
def ensure_env_variable(var_name, prompt_message, default=None):
2322
"""Ensure an environment variable is set, otherwise ask the user and persist it."""
2423
value = os.getenv(var_name)
2524

2625
if not value:
27-
value = input(prompt_message).strip()
26+
value = input(prompt_message).strip() or default
2827
if value:
2928
set_env_variable(var_name, value)
3029
else:
@@ -44,44 +43,35 @@ def set_env_variable(var_name, value):
4443
os.system(f'echo "export {var_name}={value}" >> ~/.profile')
4544

4645

47-
def run_conversion(input_file, output_dir, insert_into_llm=False):
48-
"""Handle license setup, document processing, and optional LLM integration."""
49-
50-
# Ask user if they want to use Aspose Paid APIs
51-
use_aspose = ask_user_boolean("Do you want to use the Aspose Paid APIs?")
52-
53-
if use_aspose:
54-
ensure_env_variable("ASPOSE_LICENSE_PATH", "Enter the full path of your Aspose license file: ")
55-
license_manager = LicenseManager()
56-
license_manager.apply_license()
57-
# Only ask for OpenAI credentials if --insert-into-llm was specified
58-
59-
if insert_into_llm:
60-
ensure_env_variable("OPENAI_API_KEY", "Enter your OpenAI API key: ")
61-
ensure_env_variable("OPENAI_MODEL", "Enter OpenAI model name (default: gpt-4): ")
62-
63-
64-
65-
processor = DocumentProcessor(output_dir)
66-
processor.process_document(input_file, insert_into_llm)
67-
68-
69-
70-
7146
def main():
7247
"""Entry point for the CLI tool."""
48+
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
49+
7350
parser = argparse.ArgumentParser(description="Convert documents to Markdown.")
7451
parser.add_argument("input_file", help="Path to the input document (PDF, Word, etc.)")
75-
parser.add_argument("output_dir", help="Directory to save the converted Markdown file")
52+
parser.add_argument("-o", "--output-dir", required=True, help="Directory to save the converted Markdown file")
7653
parser.add_argument("--insert-into-llm", action="store_true", help="Insert output into LLM")
7754

7855
args = parser.parse_args()
7956

8057
try:
81-
run_conversion(args.input_file, args.output_dir, args.insert_into_llm)
58+
# Setup Aspose License if needed
59+
if ask_user_boolean("Do you want to use the Aspose Paid APIs?"):
60+
license_path = ensure_env_variable("ASPOSE_LICENSE_PATH", "Enter the full path of your Aspose license file: ")
61+
if license_path:
62+
LicenseManager().apply_license()
63+
64+
# Setup LLM credentials only if required
65+
if args.insert_into_llm:
66+
ensure_env_variable("OPENAI_API_KEY", "Enter your OpenAI API key: ")
67+
ensure_env_variable("OPENAI_MODEL", "Enter OpenAI model name (default: gpt-4): ", default="gpt-4")
68+
69+
# Run conversion
70+
markitdown = MarkItDown(args.output_dir)
71+
markitdown.convert_document(args.input_file, args.insert_into_llm)
72+
8273
except Exception as e:
83-
print(f"Error: {e}")
84-
logging.error(e)
74+
logging.error(f"Error: {e}", exc_info=True)
8575
sys.exit(1)
8676

8777

0 commit comments

Comments
 (0)