docs for
lmcatv0.2.0
A Python tool for concatenating files and directory structures into a
single document, perfect for sharing code with language models. It
respects .gitignore and .lmignore patterns and
provides configurable output formatting.
.gitignore patterns (can be disabled).lmignorepyproject.toml,
lmcat.toml, or lmcat.json
glob_process or
decider_process to run on files, like if you want to
convert a notebook to a markdown fileInstall from PyPI:
pip install lmcator, install with support for counting tokens:
pip install lmcat[tokenizers]Basic usage - concatenate current directory:
# Only show directory tree
python -m lmcat --tree-only
# Write output to file
python -m lmcat --output summary.md
# Print current configuration
python -m lmcat --print-cfgThe output will include a directory tree and the contents of each non-ignored file.
-t, --tree-only: Only print the directory
tree, not file contents-o, --output: Specify an output file
(defaults to stdout)-h, --help: Show help messagelmcat is best configured via a tool.lmcat section in
pyproject.toml:
[tool.lmcat]
# Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
# Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
# File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"git clone https://github.com/mivanit/lmcat
cd lmcatmake setupThe project uses make for common development tasks:
make dep: Install/update dependenciesmake format: Format code using ruff and pyclnmake test: Run testsmake typing: Run type checksmake check: Run all checks (format, test, typing)make clean: Clean temporary filesmake docs: Generate documentationmake build: Build the packagemake publish: Publish to PyPI (maintainers only)Run make help to see all available commands.
make testFor verbose output:
VERBOSE=1 make testn lines if file is too large.lmsummary/lmcat
A Python tool for concatenating files and directory structures into a
single document, perfect for sharing code with language models. It
respects .gitignore and .lmignore patterns and
provides configurable output formatting.
.gitignore patterns (can be disabled).lmignorepyproject.toml,
lmcat.toml, or lmcat.json
glob_process or
decider_process to run on files, like if you want to
convert a notebook to a markdown fileInstall from PyPI:
pip install lmcator, install with support for counting tokens:
pip install lmcat[tokenizers]Basic usage - concatenate current directory:
### Only show directory tree
python -m lmcat --tree-only
### Write output to file
python -m lmcat --output summary.md
### Print current configuration
python -m lmcat --print-cfgThe output will include a directory tree and the contents of each non-ignored file.
-t, --tree-only: Only print the directory
tree, not file contents-o, --output: Specify an output file
(defaults to stdout)-h, --help: Show help messagelmcat is best configured via a tool.lmcat section in
pyproject.toml:
[tool.lmcat]
### Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
### Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
### File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
### processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"git clone https://github.com/mivanit/lmcat
cd lmcatmake setupThe project uses make for common development tasks:
make dep: Install/update dependenciesmake format: Format code using ruff and pyclnmake test: Run testsmake typing: Run type checksmake check: Run all checks (format, test, typing)make clean: Clean temporary filesmake docs: Generate documentationmake build: Build the packagemake publish: Publish to PyPI (maintainers only)Run make help to see all available commands.
make testFor verbose output:
VERBOSE=1 make testn lines if file is too large.lmsummary/def main() -> NoneMain entry point for the script
docs for
lmcatv0.2.0
lmcat.file_statsTOKENIZERS_PRESENT: bool = Trueclass TokenizerWrapper:tokenizer wrapper. stores name and provides n_tokens
method.
uses splitting by whitespace as a fallback –
whitespace-split
TokenizerWrapper(name: str = 'whitespace-split')name: str
use_fallback: bool
tokenizer: Optional[tokenizers.Tokenizer]
def n_tokens(self, text: str) -> intReturn number of tokens in text
class FileStats:Statistics for a single file
FileStats(lines: int, chars: int, tokens: Optional[int] = None)lines: int
chars: int
tokens: Optional[int] = None
def from_file(
cls,
path: pathlib.Path,
tokenizer: lmcat.file_stats.TokenizerWrapper
) -> lmcat.file_stats.FileStatsGet statistics for a single file
path : Path Path to the file to analyzetokenizer : Optional[tokenizers.Tokenizer] Tokenizer to
use for counting tokens, if anyFileStats Statistics for the fileclass TreeEntry(typing.NamedTuple):Entry in the tree output with optional stats
TreeEntry(line: str, stats: Optional[lmcat.file_stats.FileStats] = None)Create new instance of TreeEntry(line, stats)
line: strAlias for field number 0
stats: Optional[lmcat.file_stats.FileStats]Alias for field number 1
docs for
lmcatv0.2.0
LMCatConfigIgnoreHandlersorted_entrieswalk_dirformat_tree_with_statswalk_and_collectassemble_summarymainlmcat.lmcatclass LMCatConfig(muutils.json_serialize.serializable_dataclass.SerializableDataclass):Configuration dataclass for lmcat
LMCatConfig(
*,
content_divider: str = '``````',
tree_only: bool = False,
ignore_patterns: list[str] = <factory>,
ignore_patterns_files: list[pathlib.Path] = <factory>,
plugins_file: pathlib.Path | None = None,
allow_plugins: bool = False,
glob_process: dict[str, str] = <factory>,
decider_process: dict[str, str] = <factory>,
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip'] = 'except',
tokenizer: str = 'gpt2',
tree_divider: str = '│ ',
tree_file_divider: str = '├── ',
tree_indent: str = ' ',
output: str | None = None
)content_divider: str = '``````'
tree_only: bool = False
ignore_patterns: list[str]
ignore_patterns_files: list[pathlib.Path]
plugins_file: pathlib.Path | None = None
allow_plugins: bool = False
glob_process: dict[str, str]
decider_process: dict[str, str]
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip'] = 'except'
tokenizer: str = 'gpt2'
Tokenizer to use for tokenizing the output. gpt2 by
default. passed to tokenizers.Tokenizer.from_pretrained().
If specified and tokenizers not installed, will throw
exception. fallback whitespace-split used to avoid
exception when tokenizers not installed.
tree_divider: str = '│ '
tree_file_divider: str = '├── '
tree_indent: str = ' '
output: str | None = None
def get_tokenizer_obj(self) -> lmcat.file_stats.TokenizerWrapperGet the tokenizer object
def get_processing_pipeline(self) -> lmcat.processing_pipeline.ProcessingPipelineGet the processing pipeline object
def read(cls, root_dir: pathlib.Path) -> lmcat.lmcat.LMCatConfigAttempt to read config from pyproject.toml, lmcat.toml, or lmcat.json.
def serialize(self) -> dict[str, typing.Any]returns the class as a dict, implemented by using
@serializable_dataclass decorator
def load(cls, data: Union[dict[str, Any], ~T]) -> Type[~T]takes in an appropriately structured dict and returns an instance of
the class, implemented by using @serializable_dataclass
decorator
def validate_fields_types(
self: muutils.json_serialize.serializable_dataclass.SerializableDataclass,
on_typecheck_error: muutils.errormode.ErrorMode = ErrorMode.Except
) -> boolvalidate the types of all the fields on a
SerializableDataclass. calls
SerializableDataclass__validate_field_type for each
field
class IgnoreHandler:Handles all ignore pattern matching using igittigitt
IgnoreHandler(root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig)root_dir: pathlib.Path
config: lmcat.lmcat.LMCatConfig
parser: igittigitt.igittigitt.IgnoreParser
def is_ignored(self, path: pathlib.Path) -> boolCheck if a path should be ignored
def sorted_entries(directory: pathlib.Path) -> list[pathlib.Path]Return directory contents sorted: directories first, then files
def walk_dir(
directory: pathlib.Path,
ignore_handler: lmcat.lmcat.IgnoreHandler,
config: lmcat.lmcat.LMCatConfig,
tokenizer: lmcat.file_stats.TokenizerWrapper,
prefix: str = ''
) -> tuple[list[lmcat.file_stats.TreeEntry], list[pathlib.Path]]Recursively walk a directory, building tree lines and collecting file paths
def format_tree_with_stats(
entries: list[lmcat.file_stats.TreeEntry],
show_tokens: bool = False
) -> list[str]Format tree entries with aligned statistics
entries : list[TreeEntry] List of tree entries with
optional statsshow_tokens : bool Whether to show token countslist[str] Formatted tree lines with aligned statsdef walk_and_collect(
root_dir: pathlib.Path,
config: lmcat.lmcat.LMCatConfig
) -> tuple[list[str], list[pathlib.Path]]Walk filesystem from root_dir and gather tree listing plus file paths
def assemble_summary(root_dir: pathlib.Path, config: lmcat.lmcat.LMCatConfig) -> strAssemble the summary output and return
def main() -> NoneMain entry point for the script
docs for
lmcatv0.2.0
lmcat.processing_pipelineOnMultipleProcessors = typing.Literal['warn', 'except', 'do_first', 'do_last', 'skip']def load_plugins(plugins_file: pathlib.Path) -> NoneLoad plugins from a Python file.
plugins_file : Path Path to plugins fileclass ProcessingPipeline:Manages the processing pipeline for files.
glob_process : dict[str, ProcessorName] Maps glob
patterns to processor namesdecider_process : dict[DeciderName, ProcessorName] Maps
decider names to processor names_compiled_globs : dict[str, re.Pattern] Cached compiled
glob patterns for performanceProcessingPipeline(
plugins_file: pathlib.Path | None,
decider_process_keys: dict[str, str],
glob_process_keys: dict[str, str],
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip']
)plugins_file: pathlib.Path | None
decider_process_keys: dict[str, str]
glob_process_keys: dict[str, str]
on_multiple_processors: Literal['warn', 'except', 'do_first', 'do_last', 'skip']
def get_processors_for_path(self, path: pathlib.Path) -> list[typing.Callable[[pathlib.Path], str]]Get all applicable processors for a given path.
path : Path Path to get processors forlist[ProcessorFunc] List of applicable path
processorsdef process_file(self, path: pathlib.Path) -> tuple[str, str | None]Process a file through the pipeline.
path : Path Path to process the content oftuple[str, str] Processed content and the processor
name if no processor is found, will be
(path.read_text(), None)docs for
lmcatv0.2.0
ProcessorNameDeciderNameProcessorFuncDeciderFuncPROCESSORSDECIDERSregister_processorregister_decideris_over_10kbis_documentationremove_commentscompress_whitespaceto_relative_pathipynb_to_mdmakefile_recipescsv_preview_5_lineslmcat.processorsProcessorName = <class 'str'>
DeciderName = <class 'str'>
ProcessorFunc = typing.Callable[[pathlib.Path], str]
DeciderFunc = typing.Callable[[pathlib.Path], bool]
PROCESSORS: dict[str, typing.Callable[[pathlib.Path], str]] = {'remove_comments': <function remove_comments>, 'compress_whitespace': <function compress_whitespace>, 'to_relative_path': <function to_relative_path>, 'ipynb_to_md': <function ipynb_to_md>, 'makefile_recipes': <function makefile_recipes>, 'csv_preview_5_lines': <function csv_preview_5_lines>}
DECIDERS: dict[str, typing.Callable[[pathlib.Path], bool]] = {'is_over_10kb': <function is_over_10kb>, 'is_documentation': <function is_documentation>}
def register_processor(func: Callable[[pathlib.Path], str]) -> Callable[[pathlib.Path], str]Register a function as a path processor
def register_decider(func: Callable[[pathlib.Path], bool]) -> Callable[[pathlib.Path], bool]Register a function as a decider
def is_over_10kb(path: pathlib.Path) -> boolCheck if file is over 10KB.
def is_documentation(path: pathlib.Path) -> boolCheck if file is documentation.
def remove_comments(path: pathlib.Path) -> strRemove single-line comments from code.
def compress_whitespace(path: pathlib.Path) -> strCompress multiple whitespace characters into single spaces.
def to_relative_path(path: pathlib.Path) -> strreturn the path to the file as a string
def ipynb_to_md(path: pathlib.Path) -> strConvert an IPython notebook to markdown.
def makefile_recipes(path: pathlib.Path) -> strProcess a Makefile to show only target descriptions and basic structure.
Preserves: - Comments above .PHONY targets up to first empty line - The .PHONY line and target line - First line after target if it starts with @echo
path : Path Path to the Makefile to processstr Processed Makefile contentdef csv_preview_5_lines(path: pathlib.Path) -> strPreview first few lines of a CSV file (up to 5)
Reads only first 1024 bytes and splits into lines. Does not attempt to parse CSV structure.
path : Path Path to CSV filestr First few lines of the file