Metadata-Version: 2.1
Name: ldc_doc
Version: 0.0.3
Summary: Python3 library that adds MS Word .doc support to the llm-dataset-converter library.
Home-page: https://github.com/waikato-llm/ldc-doc
Author: Peter Reutemann
Author-email: fracpete@waikato.ac.nz
License: MIT License
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
License-File: LICENSE

The **ldc-doc** library is an extension to **llm-dataset-converter**
with plugins for handling MS Word .doc files.

It requires *antiword* to be installed on the system, which *textract*
uses internally for obtaining the text from .doc files.


Changelog
=========

0.0.3 (2024-12-20)
------------------

- switched to `textract-py3` to avoid issues with modern pip (https://pypi.org/project/textract-py3/)
- switched to underscores in project name


0.0.2 (2024-07-05)
------------------

- `from-doc-pt` now uses `*.doc` as default glob


0.0.1 (2024-05-06)
------------------

- initial release

