hero.title

hero.subtitle

hero.codeComment1
from lib import PipelineManager
pipeline = PipelineManager()
data = pipeline.process_files(["report.pdf", "data.csv"])

hero.codeComment2
hero.codeComment3
hero.codeComment4

features.title

features.subtitle

๐Ÿ“„

features.universalFiles.title

features.universalFiles.description

๐Ÿ‡น๐Ÿ‡ญ

features.thaiOptimized.title

features.thaiOptimized.description

๐Ÿค–

features.aiEnhancement.title

features.aiEnhancement.description

โšก

features.batchProcessing.title

features.batchProcessing.description

๐ŸŽฏ

features.exportFormats.title

features.exportFormats.description

๐Ÿ”ง

features.qualityControl.title

features.qualityControl.description

pipeline.title

pipeline.subtitle

1

pipeline.steps.input.title

pipeline.steps.input.description

2

pipeline.steps.cleaning.title

pipeline.steps.cleaning.description

3

pipeline.steps.transformation.title

pipeline.steps.transformation.description

4

pipeline.steps.enhancement.title

pipeline.steps.enhancement.description

5

pipeline.steps.validation.title

pipeline.steps.validation.description

6

pipeline.steps.export.title

pipeline.steps.export.description

useCases.title

useCases.subtitle

๐Ÿฆ™

useCases.thaiModels.title

useCases.thaiModels.description

๐Ÿ“‹

useCases.documentClassification.title

useCases.documentClassification.description

๐Ÿ’ฌ

useCases.conversationalAI.title

useCases.conversationalAI.description

๐Ÿ“Š

useCases.dataAnalysis.title

useCases.dataAnalysis.description

๐ŸŒ

useCases.webContent.title

useCases.webContent.description

โšก

useCases.prototyping.title

useCases.prototyping.description

techSpecs.title

techSpecs.subtitle

techSpecs.inputFormats.title

  • PDF documents with text extraction
  • Word documents (.docx, .doc)
  • Excel spreadsheets (.xlsx, .xls)
  • CSV and JSON data files
  • Plain text files
  • Web scraping with CSS selectors
  • Manual text input with templates

techSpecs.llmIntegration.title

  • OpenAI (GPT-3.5, GPT-4)
  • Anthropic Claude
  • Azure OpenAI
  • Google PaLM/Gemini
  • Custom OpenAI-compatible APIs
  • Batch processing for cost savings
  • Rate limiting and error recovery

techSpecs.exportFormats.title

  • Alpaca/Llama instruction format
  • ShareGPT conversation format
  • Vicuna training format
  • OpenAI fine-tuning JSONL
  • Hugging Face datasets
  • FLAN format (Google)
  • Custom JSON templates

techSpecs.thaiFeatures.title

  • PyThaiNLP integration
  • Word segmentation (newmm, attacut)
  • Syllable counting
  • Text normalization
  • Mixed Thai-English support
  • Readability scoring
  • Sentiment detection

techSpecs.qualityControls.title

  • Automatic duplicate detection
  • Text length validation
  • Character encoding checks
  • Label consistency validation
  • Quality scoring system
  • Train/validation splitting
  • Data balancing options

techSpecs.requirements.title

  • Python 3.8+ environment
  • Jupyter Notebook support
  • Auto-installs dependencies
  • Cross-platform compatibility
  • Modular library design
  • Progress tracking and logging
  • Error handling and recovery

finalCta.title

finalCta.subtitle

finalCta.features

contact.title

๐Ÿ“ง

contact.email.title

contact.email.description

piyapath59@gmail.com