Metadata-Version: 2.1
Name: facebook_page_scraper
Version: 0.1.10
Summary: Python package to scrap facebook's pages front end with no limitations
Home-page: https://github.com/shaikhsajid1111/facebook_page_scraper
Author: Sajid Shaikh
Author-email: shaikhsajid3732@gmail.com
License: MIT
Keywords: web-scraping selenium facebook facebook-pages
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

<h1> Facebook Page Scraper </h1>

[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/release/python-360/)

<p> No registration, No need of API key, No limitation on number of requests. Import the library and <b> Just Do It !<b> </p>

<h2> Prerequisites </h2>

- Internet Connection
- Python 3.6+
- Chrome or Firefox browser installed on your machine
<br>

<hr>
<h2>Installation:</h2>

<h3> Installing from source: </h3>

```
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
```

<h4> Inside project's directory </h4>

```
python3 setup.py install
```
<br>
<p>Installing with pypi</p>

```
pip3 install facebook-page-scraper
```
<br>
<hr>
<h2> How to use? </h2>



```python
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "facebookai"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
meta_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy)

```

<h3> Parameters for  <code>Facebook_scraper(page_name,posts_count,browser,proxy) </code> class </h3>
<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>

<tr>
<td>
page_name
</td>
<td>
string
</td>
<td>
name of the facebook page
</td>
</tr>

<tr>
<td>
posts_count
</td>
<td>
integer
</td>
<td>
number of posts to scrap, if not passed default is 10
</td>
</tr>

<tr>
<td>
browser
</td>
<td>
string
</td>
<td>
which browser to use, either chrome or firefox. if not passed,default is chrome
</td>
</tr>

<tr>
<td>
proxy(optional)
</td>
<td>
string
</td>
<td>
optional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>
</td>
</tr>

</table>
<br>
<hr>
<br>

<h3> Done with instantiation?. <b>Let the scraping begin!</b> </h3>
<br

>
<h3> For post's data in <b>JSON</b> format:</h3>

```python
#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

```
Output:
```javascript

{
  "2024182624425347": {
    "name": "Meta AI",
    "shares": 0,
    "reactions": {
      "likes": 154,
      "loves": 19,
      "wow": 0,
      "cares": 0,
      "sad": 0,
      "angry": 0,
      "haha": 0
    },
    "reaction_count": 173,
    "comments": 2,
    "content": "Weâ€™ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/â€¦/the-first-high-performance-self-sâ€¦",
    "posted_on": "2022-01-20T22:43:35",
    "video": "",
    "image": [
      "https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
    ],
    "post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
  }, ...

}

```
Output Structure for JSON format:


``` javascript
{
    "id": {
        "name": string,
        "shares": integer,
        "reactions": {
            "likes": integer,
            "loves": integer,
            "wow": integer,
            "cares": integer,
            "sad": integer,
            "angry": integer,
            "haha": integer
        },
        "reaction_count": integer,
        "comments": integer,
        "content": string,
        "video" : string,
        "image" : list,
        "posted_on": datetime,  //string containing datetime in ISO 8601
        "post_url": string
    }
}

```

<br>
<hr>
<br>

<h3> For saving post's data directly to <b>CSV</b> file</h3>

``` python
#call scrap_to_csv(filename,directory) method


filename = "data_file"  #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename,directory)

```

content of ```data_file.csv```:
```csv
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"Weâ€™ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/â€¦/the-first-high-performance-self-sâ€¦",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
```

<br>

<hr>
<br>

<h3> Parameters for  <code> scrap_to_csv(filename,directory) </code> method. </h3>

<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>

<tr>
<td>
filename
</td>
<td>
string
</td>

<td>
name of the CSV file where post's data will be saved
</td>

</tr>

<tr>
<td>
directory
</td>
<td>
string
</td>

<td>
directory where CSV file have to be stored.
</td>

</tr>

</table>

<br>
<hr>
<br>



<h3>Keys of the outputs:</h3>
<table>
<th>
<tr>

<td>
Key
</td>



<td>
Type
</td>

<td>
Description
</td>

<tr>
</th>


<td>
<tr>

<td>
id
</td>
<td>
string
</td>
<td>
Post Identifier(integer casted inside string)
</td>
</tr>

</td>

<tr>
<td>
name
</td>
<td>
string
</td>
<td>
Name of the page
</td>
</tr>

<tr>
<td>
shares
</td>
<td>
integer
</td>
<td>
share count of post
</td>
</tr>

<tr>
<td>
reactions
</td>
<td>
dictionary
</td>
<td>
dictionary containing reactions as keys and its count as value. Keys => <code> ["likes","loves","wow","cares","sad","angry","haha"] </code>
</td>
</tr>

<tr>
<td>
reaction_count
</td>
<td>
integer
</td>
<td>
total reaction count of post
</td>
</tr>


<tr>
<td>
comments
</td>
<td>
integer
</td>
<td>
comments count of post
</td>
</tr>

<tr>
<td>
content
</td>
<td>
 string
</td>
<td>
content of post as text
</td>
</tr>

<tr>
<td>
video
</td>
<td>
 string
</td>
<td>
URL of video present in that post
</td>
</tr>


<tr>
<td>
image
</td>
<td>
 list
</td>
<td>
python's list containing URLs of all images present in the post
</td>
</tr>

<tr>
<td>
posted_on
</td>
<td>
datetime
</td>
<td>
time at which post was posted(in ISO 8601 format)
</td>
</tr>

<tr>
<td>
post_url
</td>
<td>
string
</td>
<td>
URL for that post
</td>
</tr>


</table>
<br>
<hr>
<h2> Privacy </h2>

<p> This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrap anything private. </p>

<br>
<hr>
<h2> Tech </h2>
<p>This project uses different libraries to work properly.</p>
<ul>
<li> <a href="https://www.selenium.dev/" target='_blank'>selenium</a>
<li> <a href="https://pypi.org/project/webdriver-manager/" target='_blank'>webdriver manager</a>
<li> <a href="https://pypi.org/project/python-dateutil/" target='_blank'>python dateutil</a>
</ul>
<br>

<hr>
If you encounter anything unusual please feel free to create issue <a href='https://github.com/shaikhsajid1111/facebook_page_scraper/issues'>here</a>
<hr>

<h2> LICENSE </h2>
MIT


