Metadata-Version: 2.1
Name: expdf
Version: 0.3.0
Summary: PDF Download and Analysis Tool
Home-page: https://github.com/bupt-ipcr/expdf
Author: ('Jiawei Wu',)
Author-email: 13260322877@163.com
License: MIT
Description: 
        # ExPDF
        
        ## Overview
        
        ExPDF is a tool that can generate citation relationship between PDFs, and create beautiful, interactive SVG figure inside Jupyter Notebook.  
        
        ![image](https://user-images.githubusercontent.com/38694199/81917751-2ef60000-9608-11ea-9c83-98f45010a5e7.png)
        
        ## Quickstart
        
        With `Jupyter Notebook`, it is easy to visuzlize citation relationship between PDFs.  
        
        Firstly, download and install by:
        
        ```bash
        git clone https://github.com/bupt-ipcr/expdf
        cd expdf
        pip install ./
        ```
        
        Secondly, use `expdf` to generate json file like:
        
        ```bash
        expdf -d pdfs/ASV -o data.json
        ```
        
        Finally, open `jupyter notebook` and try:
        
        ```python
          import json
          from expdf.visualize import create_fig
          with open('data.json', 'r') as f:
            data = json.load(f)
          fig = create_fig(data)
          fig
        ```
        
        ## Installation
        
        download expdf with github and install it with pip
        
        ```bash
        git clone https://github.com/bupt-ipcr/expdf
        cd expdf
        pip install ./
        ```
        
        run `expdf -h` to see the help output:
        
        ```bash
        usage: expdf [-h] [-a APPEND_PDF] [-r] [-o OUTPUT_DIR] PDF_PATH
        
        Generate reference relation of all PDFs(given or inside PDF)
        
        positional arguments:
          PDF_PATH              PDF path, or directory of PDFs if -r is used
        
        optional arguments:
          -h, --help            show this help message and exit
          -a APPEND_PDF, --append APPEND_PDF
                                append a PDF file
          -d, --dir, --directory
                                treat PDF_PATH as a directory
          -e EXCLUDE_PDF, --exclude EXCLUDE_PDF
                                exclude a PDF file
          -o OUTPUT_DIR, -O OUTPUT_DIR, --output OUTPUT_DIR
                                output directory, default is current directory
          -v, --vis, --visualize
                                create a html file for visualize
          --vis-html HTML_FILENAME
                                output file name of html visualize
        ```
        
        ## Examples
        
        simply use epdf like:
        
        ```bash
        expdf pdfs/test.pdf
        ```
        
        **Treat as a directory** with `-d` and it will scan all PDFs in specify directory:
        
        ```bash
        expdf -d pdfs
        ```
        
        **Append PDFs** with `-a`, since there may be sporadic papers not in the same folder:
        
        ```bash
        expdf -d pdfs -a 1.pdf -a 2.pdf
        ```
        
        **Exclude PDFs** with `-e`, to exclude some PDFs. Note that even if exclude pdf not exists,
        there will be no error.
        
        ```bash
        expdf -d pdfs -e test.pdf
        ```
        
        To **specify output directory**, use `-o`, `-O` or `--output` like:
        
        ```bash
        expdf pdfs/test.pdf -O ./urdir
        ```
        
        To **generate visualize html file**, use `-v` and `--vis-html` like:
        
        ```bash
        expdf -r pdfs/ASV -v --vis-html='vis.html'
        ```
        ## Usage as Python library
        
        Here we have three main parts of expdfs: `ExPDFParser`, `Graph` and `render`.
        
        - `ExPDFParser`
        
          a parser built top on pdfminer, look for metadata, links and references of a PDF file.
        
          ```python
          # ensure you have ./tests/test.pdf
          from expdf import ExPDFParser
          pdf = ExPDFParser("tests/test.pdf")
          print('title: ', pdf.title)
          print('info: ', pdf.info)
          print('metadata: ', pdf.metadata)
          
          print('Links: ')
          for link in pdf.links:
            print(f'- {link}')
        
          print('Refs: ')
          for ref in pdf.refs:
            print(f'- {ref}')
          ```
        
        - `PDFNode`
        
          `PDFNode` is a class that maintain a dict of all its instances. Two PDF that have same title(or just have difference in punctuations) will point to same node.
          `LocalPDFNode` is a subclass of `PDFNode`, which enables you to modify references of a PDF.
        
          usually it is used with parser like:
        
          ````python
          from expdf import ExPDFParser, LocalPDFNode
          
          expdf_parser = ExPDFParser("tests/test.pdf")
          localPDFNode = LocalPDFNode(expdf_parser.title, expdf_parser.refs)
          pdf_info = PDFNode.get_json()
          print(pdf_info)
          ````
        
          otherwise, you can also assign title and refs without parser(maybe human is more precise than parser and regex expressions), just like:
        
          ```python
          from expdf.graph import PDFNode, LocalPDFNode
          
          # just a example, we wwill never see title like this
          LocalPDFNode('title0', refs=['title1', 'title2'])
          LocalPDFNode('title1', refs=['title3'])
          LocalPDFNode('title2', refs=['title3'])
          pdf_info = PDFNode.get_json()
          print(pdf_info)
          ```
        
        - `visualize`
        
          PDFNode give you infos of PDFs, such as citation relationship(show as parents and children). But why not visualize it?
        
          `visuzlize` provides a top-level function `create_fig` built on `networkx`, `plotly`. `networkx` provedes methods to  allocate positions
          of all nodes and `plotly` is a powerful visualization tool.
        
          `render` invokes `create_fig` and write it into html file.
        
          Visualize is recommended to be use inside `jupyter notebook`, since plotly only support events(click, hover, etc) with it.  You can use like:
        
          ```bash
          expdf -d pdfs/ASV -o data.json
          ```
        
          ```python
          # in your jupyter notebook
          import json
          from expdf.visualize import create_fig
          with open('data.json', 'r') as f:
            data = json.load(f)
          fig = create_fig(data)
          fig
          ```
        
          You can also save it as html, just like:
        
          ```bash
          expdf -d pdfs/ASV -o data.json -v --vis-html=vis.html
          ```
        
        ## Various
        
        - Author: Jiawei Wu <13260322877@163.com>
        - License: MIT
        
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Provides-Extra: test
