When working with Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems, splitting long documents into smaller “chunks” is essential for better performance. This project is a simple Gradio web app called LangChain Text Chunker. It lets users upload files (PDF, DOCX, TXT, HTML, Python code, Jupyter notebooks, CSV, etc.), extract the text, and try different LangChain text splitting methods interactively.You can adjust parameters like chunk size, overlap, and separators. The app shows the resulting chunks in JSON format, adds metadata (like start index), and generates ready-to-use Python code for each splitter.
Supported splitters include RecursiveCharacterTextSplitter, CharacterTextSplitter, MarkdownHeaderTextSplitter, PythonCodeTextSplitter, and TokenTextSplitter.
This project is open-source, easy to run locally with python app.py, and has a live demo on Hugging Face Spaces.
View Repository on GitHub