hero.title
hero.subtitle
hero.codeComment1
from lib import PipelineManager
pipeline = PipelineManager()
data = pipeline.process_files(["report.pdf", "data.csv"])
hero.codeComment2
hero.codeComment3
hero.codeComment4
from lib import PipelineManager
pipeline = PipelineManager()
data = pipeline.process_files(["report.pdf", "data.csv"])
hero.codeComment2
hero.codeComment3
hero.codeComment4
features.title
features.subtitle
features.universalFiles.title
features.universalFiles.description
features.thaiOptimized.title
features.thaiOptimized.description
features.aiEnhancement.title
features.aiEnhancement.description
features.batchProcessing.title
features.batchProcessing.description
features.exportFormats.title
features.exportFormats.description
features.qualityControl.title
features.qualityControl.description
pipeline.title
pipeline.subtitle
1
pipeline.steps.input.title
pipeline.steps.input.description
2
pipeline.steps.cleaning.title
pipeline.steps.cleaning.description
3
pipeline.steps.transformation.title
pipeline.steps.transformation.description
4
pipeline.steps.enhancement.title
pipeline.steps.enhancement.description
5
pipeline.steps.validation.title
pipeline.steps.validation.description
6
pipeline.steps.export.title
pipeline.steps.export.description
useCases.title
useCases.subtitle
useCases.thaiModels.title
useCases.thaiModels.description
useCases.documentClassification.title
useCases.documentClassification.description
useCases.conversationalAI.title
useCases.conversationalAI.description
useCases.dataAnalysis.title
useCases.dataAnalysis.description
useCases.webContent.title
useCases.webContent.description
useCases.prototyping.title
useCases.prototyping.description
techSpecs.title
techSpecs.subtitle
techSpecs.inputFormats.title
- PDF documents with text extraction
- Word documents (.docx, .doc)
- Excel spreadsheets (.xlsx, .xls)
- CSV and JSON data files
- Plain text files
- Web scraping with CSS selectors
- Manual text input with templates
techSpecs.llmIntegration.title
- OpenAI (GPT-3.5, GPT-4)
- Anthropic Claude
- Azure OpenAI
- Google PaLM/Gemini
- Custom OpenAI-compatible APIs
- Batch processing for cost savings
- Rate limiting and error recovery
techSpecs.exportFormats.title
- Alpaca/Llama instruction format
- ShareGPT conversation format
- Vicuna training format
- OpenAI fine-tuning JSONL
- Hugging Face datasets
- FLAN format (Google)
- Custom JSON templates
techSpecs.thaiFeatures.title
- PyThaiNLP integration
- Word segmentation (newmm, attacut)
- Syllable counting
- Text normalization
- Mixed Thai-English support
- Readability scoring
- Sentiment detection
techSpecs.qualityControls.title
- Automatic duplicate detection
- Text length validation
- Character encoding checks
- Label consistency validation
- Quality scoring system
- Train/validation splitting
- Data balancing options
techSpecs.requirements.title
- Python 3.8+ environment
- Jupyter Notebook support
- Auto-installs dependencies
- Cross-platform compatibility
- Modular library design
- Progress tracking and logging
- Error handling and recovery
finalCta.title
finalCta.subtitle
finalCta.features