hero.title

hero.subtitle

hero.codeComment1
from lib import PipelineManager
pipeline = PipelineManager()
data = pipeline.process_files(["report.pdf", "data.csv"])

hero.codeComment2
hero.codeComment3
hero.codeComment4

features.title

features.subtitle

📄

features.universalFiles.title

features.universalFiles.description

🇹🇭

features.thaiOptimized.title

features.thaiOptimized.description

🤖

features.aiEnhancement.title

features.aiEnhancement.description

⚡

features.batchProcessing.title

features.batchProcessing.description

🎯

features.exportFormats.title

features.exportFormats.description

🔧

features.qualityControl.title

features.qualityControl.description

pipeline.title

pipeline.subtitle

pipeline.steps.input.title

pipeline.steps.input.description

pipeline.steps.cleaning.title

pipeline.steps.cleaning.description

pipeline.steps.transformation.title

pipeline.steps.transformation.description

pipeline.steps.enhancement.title

pipeline.steps.enhancement.description

pipeline.steps.validation.title

pipeline.steps.validation.description

pipeline.steps.export.title

pipeline.steps.export.description

useCases.title

useCases.subtitle

🦙

useCases.thaiModels.title

useCases.thaiModels.description

📋

useCases.documentClassification.title

useCases.documentClassification.description

💬

useCases.conversationalAI.title

useCases.conversationalAI.description

📊

useCases.dataAnalysis.title

useCases.dataAnalysis.description

🌐

useCases.webContent.title

useCases.webContent.description

⚡

useCases.prototyping.title

useCases.prototyping.description

techSpecs.title

techSpecs.subtitle

techSpecs.inputFormats.title

PDF documents with text extraction
Word documents (.docx, .doc)
Excel spreadsheets (.xlsx, .xls)
CSV and JSON data files
Plain text files
Web scraping with CSS selectors
Manual text input with templates

techSpecs.llmIntegration.title

OpenAI (GPT-3.5, GPT-4)
Anthropic Claude
Azure OpenAI
Google PaLM/Gemini
Custom OpenAI-compatible APIs
Batch processing for cost savings
Rate limiting and error recovery

techSpecs.exportFormats.title

Alpaca/Llama instruction format
ShareGPT conversation format
Vicuna training format
OpenAI fine-tuning JSONL
Hugging Face datasets
FLAN format (Google)
Custom JSON templates

techSpecs.thaiFeatures.title

PyThaiNLP integration
Word segmentation (newmm, attacut)
Syllable counting
Text normalization
Mixed Thai-English support
Readability scoring
Sentiment detection

techSpecs.qualityControls.title

Automatic duplicate detection
Text length validation
Character encoding checks
Label consistency validation
Quality scoring system
Train/validation splitting
Data balancing options

techSpecs.requirements.title

Python 3.8+ environment
Jupyter Notebook support
Auto-installs dependencies
Cross-platform compatibility
Modular library design
Progress tracking and logging
Error handling and recovery

finalCta.title

finalCta.subtitle

finalCta.downloadBtn finalCta.docsBtn

finalCta.features

contact.title

📧

contact.email.title

contact.email.description

piyapath59@gmail.com