Metadata-Version: 2.1
Name: secmmf
Version: 1.0.2
Summary: Python scraper for SEC N-MFP2 filings.
Home-page: https://github.com/yj1990/sec_mmf
Author: Yangjue Han
Author-email: yangjue.han20@gmail.com
License: UNKNOWN
Description: # SEC N-MFP2 Money Market Fund Holdings Data
        
        - Author: Yangjue Han
        - Date: May 2020
        
        ## Introduction
        This repository contains code that enables the user to parse and download money market fund holdings information in N-MFP2 filings from SEC EDGAR system. At the end of every month, all U.S. money market funds are required to report their securities holdings to SEC, including identification, maturity, market value, yield to maturity, issuer information, and other features. For repurchase agreement contracts, money market funds also have to report information on collateral securities. The granularity of this dataset provides an unparallel opportunity for financial economists to study questions related to the shadow banking system.
        
        ## Installation
        
        ```
        pip install secmmf
        ```
        
        ## Usage
        
        ### Initial setup
        The module `secmmf` contains a set of functions that parse and download the information in N-MFP2 filings. The user should first specify the path of a directory to store the downloaded data to `data_dir` and the storage of `data_dir` should be at least 20GBs. Note that the package can only be used to download N-MFP2 filings, but could be extended to download other filings with a similar format. The package does not provide any data cleaning function.
        
        ```
        import secmmf
        
        data_dir = ## YOUR DIRECTORY HERE ##
        pathfile = 'xmlpath.csv' # no need to change this
        ```
        
        ### Build index of filings 
        First we download and extract the paths of filings from SEC EDGAR system using method `download_sec_index()`. 
        - By specifying `start_date` and `end_date`, the user will limit the time range to [`start_date`,`end_date`]. The default start date is 2016-10 and end date is the current month. 
        - The method will output a csv file named `index_file.csv` in `data_dir`. We then use `generate_index()` to create a file of urls linked to XML files that can be easily parsed, named `pathfile`.
        
        ```
        secmmf.download_sec_index(data_dir, form_name = 'N-MFP2', start_date = '2016-10', end_date = '2020-05')
        generate_index(data_dir, pathfile)
        ```
        
        ### Download raw files
        Next, we download XML files from `pathfile` and parse them into un-modified csv files using `scrape()`. 
        - Depending on the number of files to download and internet connection, this step might take up to 4 hours. 
        - Paths contained in `pathfile` are divided into 20 blocks and the corresponding csv files will be saved into 20 different subfolders. 
        - If the program is interrupted at any block, the user can specify `start_block` or `end_block` to modify the exact portion of files to download.
        
        ```
        scrape(data_dir, pathfile, N_blocks=20, start_block=1, end_block=20)
        ```
        
        ### Generate fund-level and holdings-level tables
        Raw csv files contain unstructure data that combine both fund-level information and asset holdings. We can use `gen_table_fund()` and `gen_table_holdings()` to generate tables with ready-to-use data on fund-level information and asset holdings. Finally, `wrap()` method combines all formatted tables into a single table for each level of information and clean up the data folder. 
        
        ```
        gen_table_fund(data_dir, pathfile)
        gen_table_holdings(data_dir, pathfile)
        wrap(data_dir)
        # Done!
        ```
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
