E.V.E Dataset Builder

Upload files and/or paste URLs. The system will extract and clean the text, then create:
- a eve_corpus-*.txt file (plain text corpus)
- a eve_docs-*.jsonl file (one JSON per line: {source_type, source, text, created_at, chunk_index})

Step 1 (optional): Upload source files (you can select multiple):

Supported: .pdf, .txt, .md, .html, .htm

Step 2 (optional): Paste URLs (one per line):

The system will download the page, strip HTML tags and keep the visible text. Non-HTTP/HTTPS URLs are skipped.

Existing datasets in datasets/: