Metadata-Version: 2.1
Name: com.dvsnier.sniffer
Version: 0.0.1.dev1
Summary: this is dvsnier sniffer.
Author: dvsnier
Author-email: dovsnier@qq.com
License: MIT
Project-URL: Bug_Tracker, https://gitee.com/python-partner/Python-Test/issues/
Project-URL: Documentation, https://packaging.python.org/tutorials/distributing-packages/
Project-URL: Funding, https://donate.pypi.org/
Project-URL: Homepage, https://gitee.com/python-partner/Python-Test/
Project-URL: Wiki, https://gitee.com/python-partner/Python-Test/wiki/
Project-URL: Source, https://gitee.com/python-partner/Python-Test/
Keywords: sniffer,development
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: Free For Educational Use
Classifier: License :: Free for non-commercial use
Classifier: License :: Freely Distributable
Classifier: License :: Freeware
Classifier: Natural Language :: Chinese (Simplified)
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: !=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,<4,>=2.7
Description-Content-Type: text/markdown
Provides-Extra: docs
Provides-Extra: test
Provides-Extra: typing
Provides-Extra: virtualenv
License-File: LICENSE
License-File: LICENSE.txt

# Python NetWork Sniffer

![Python Logo](https://www.python.org/static/community_logos/python-logo.png "the Python Test Project")

- [一. 配置](#一-配置)
  - [1.1. 脚本配置](#11-脚本配置)
    - [1.1.1. 嗅探单页](#111-嗅探单页)
    - [1.1.2. 嗅探多页](#112-嗅探多页)
  - [1.2. 文件配置](#12-文件配置)
- [二. 运行](#二-运行)
  - [2.1. 脚本运行](#21-脚本运行)

## 一. 配置

### 1.1. 脚本配置

打开 `.\scripts\crawler_script.py` 脚本配置如下:

#### 1.1.1. 嗅探单页

嗅探单页数据如下所示:

```python
if __name__ == "__main__":
    '''主函数入口'''
    page_size = 1
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()
```

#### 1.1.2. 嗅探多页

嗅探`[1, 5)` 页数据如下所示:

方式一:

```python
if __name__ == "__main__":
    '''主函数入口'''
    page_size = 5
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()
```

方式二:

```python
if __name__ == "__main__":
    '''主函数入口'''
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_start=1, page_stop=5).run()
```

### 1.2. 文件配置

文件配置列表如下:

```bash
# the version information
version_name = v0.0.1.dev1
version_code = 1
version_info = 0.0.1.dev1


# CRAWLER URL PREFIX
article-alias = 'crawler_alias'
sn-url-prefix = ['http://bbs.tianya.cn/post-xxx-yyy-{}.shtml']


# REGION_INCLUSIVE_EXCLUSIVE = 0
# REGION_EXCLUSIVE_INCLUSIVE = 1
# REGION_EXCLUSIVE_EXCLUSIVE = 2
# REGION_INCLUSIVE_INCLUSIVE = 3
page-start = 1
page-stop = 0
page-flag = 0
# page-flag = 1
# page-flag = 2
# page-flag = 3
# False: first pull , second translate True: one pull after another translate
article-flag = True
# article-flag = False


# True: multi media resources are stored locally, otherwise they are not
# article-multi-media-persistence = True
# article-multi-media-persistence = False

# True: multi media resources are high quality, otherwise they are not
# article-multi-media-quality = True
# article-multi-media-quality = False


User-Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
# User-Agent = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'


# output-directory = "..."
# output-uuid-encryptor = True
# output-uuid-encryptor = False
```

说明如下:

- `version_name`: 配置文件版本名称;
- `version_code`: 配置文件版本号;
- `version_info`: 配置文件版本信息;
- `article-alias`: 文章别名(`OPTIONAL`), 生成的目录格式为 `[bbs|json]_YYmmdd[_alias]`;
- `sn-url-prefix`: 文章的地址格式, 一般格式为 `http://bbs.tianya.cn/post-xxx-yyy-{}.shtml`;
- `page-start`: 页面开始;
- `page-stop`: 页面结束(`OPTIONAL`);
- `page-flag`: 页面标记(`OPTIONAL`), 支持4 种类型, `REGION_INCLUSIVE_EXCLUSIVE`, `REGION_EXCLUSIVE_INCLUSIVE`, `REGION_EXCLUSIVE_EXCLUSIVE`, `REGION_INCLUSIVE_INCLUSIVE`;
- `article-flag`: 文章数据流生成风格(`OPTIONAL`);
- `article-multi-media-persistence`: 文章关联到的媒体资源持久化到本地(`OPTIONAL`, `RECOMMENDED`);
- `article-multi-media-quality`: 文章关联到的媒体资源质量(`OPTIONAL`);
- `User-Agent`: 文章请求的用户代理(`OPTIONAL`， `RECOMMENDED`);
- `output-directory`: 文章输出到指定目录(`OPTIONAL`， `RECOMMENDED`);
- `output-uuid-encryptor`: 文章输出 id 加密处理(`OPTIONAL`);

## 二. 运行

### 2.1. 脚本运行

脚本运行如下:

```bash
# Windows
python ./scripts/crawler_script.py 
# Macintosh
python .\scripts\crawler_script.py 
```
