Metadata-Version: 2.1
Name: seleniumprocessor
Version: 0.1.4
Summary: A simple library to set up Selenium processes
Home-page: https://github.com/auino/seleniumprocessor
Author: Enrico Cambiaso
Author-email: enrico.cambiaso@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# seleniumprocessor

A simple library to set up Selenium processes

### Description ###

This library allows you to easily set up a process based on [Selenium](https://www.selenium.dev).
Thanks to the use of a specific format, it is possible to easily define processes to be passed to [Selenium](https://www.selenium.dev).

### Installation ###

```
pip install seleniumprocessor
```

Install a [Selenium](https://www.selenium.dev) web driver, e.g., the [Chrome WebDriver](https://sites.google.com/chromium.org/driver/)

#### Available methods ####

`initiate_connection(webdriverfile, url, to, loginrequired=True, headless=False)`, returning a `selenium.webdriver.chrome.webdriver.WebDriver` object allowing browser control
* `webdriverfile` is the path of the [Selenium](https://www.selenium.dev) web driver file
* `url` is the url to open
* `to` is the timeout to wait, regarding page loading
* `loginrequired` specifies if a manual login from the user is required (`True`) or not (`False`)
* `headless` specifies if the browser has to be executed in headless mode (`True`) or not (`False`)

`run_process(brw, url_home, to, p, backtohome_begin=True, backtohome_end=True, checkfilterpassed_callback=None)`, returning an object, as specified in the process `p`
* `brw` the `selenium.webdriver.chrome.webdriver.WebDriver` object used to control the browser
* `url_home` the home page url
* `to` the timeout used to wait the home page load
* `p` the list of actions in the current process
* `backtohome_begin` specifies if the browser should be redirected to the home page at begin of the method (`True`) or not (`False`)
* `backtohome_end` specifies if the browser should be redirected to the home page at end of the method (`True`) or not (`False`)
* `checkfilterpassed_callback` identifies a callback function used to check filters defined in the process `p`, returing a boolean value (`True` if the filter is passed, `False` otherwise)

#### Objects structure ####

The main process object is a list of actions to sequentially execute on the process.
Each action is represented by an array map with the following fields:
* `name`: the name identifying the DOM objects to find
* `class_name`: the class name identifying the DOM objects to find
* `index` (optional): in case of multiple DOM objects with the same class (or in case a DOM object which is not the first one has to be considered), it is possible to specify the index of the DOM object, in the list of DOM objects using the same class
* `sleep` (optional): the sleep timeout used after the action is performed
* `filter`: a string passed to the `checkfilterpassed_callback` for filtering actions
* `action_parameters` (optional): its definition depends on the `action` field
* `action`: the action to execute:
    * `click`: to perform a click on the DOM object
    * `click-repeated`: to perform a repeated click on the DOM object, until the object is present (useful with `sleep`, e.g., for pages loading portions of a lists, with a final button to load additional results); the optional `action_parameters` parameter represents the class name of the objects to count: when the object is unchanged, repeated clicks will be interrupted
    * `navigate`: to navigate by clicking a specific sequence of objects, by their text value; the `action_parameters` parameter represents the `>` separated navigation path
    * `scroll_to`: to scroll to the specific element
    * `empty_value`: to empty the `value` property of the DOM object
    * `store_text`: to store data on the returning object generated by the `run_process` method; the `action_parameters` parameter represents the name of the property on the object
    * `send_keys`: to send a key input to a specific DOM object
    * `select`: to select a specific value of a specific combo-box DOM object, where the value is specified in the `action_parameters` parameter
    * `foreach`: to loop on all the DOM objects retrieved to execute repeated actions
* `context` (optional): in case the `foreach` action is used, the context of all sub-items to be found will refer to the parent DOM object used in the loop; in this case, to consider the whole page, it is possible to specify `whole_page` as `context`

### Sample usage ###

#### Get all repositories of [@auino](https://github.com/auino) ####

```
# import the library
import seleniumprocessor

# define initial variables
URL_HOME = 'https://github.com/auino'
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
	{'class_name':'UnderlineNav-item', 'index':1, 'action':'click', 'sleep':SLEEP_TO}, # clicking on the Repository tab, the second one, on top of the page
	{'class_name':'source', 'action':'foreach', 'action_parameters':[ # looping on all repositories
		{'class_name':'wb-break-all', 'action':'store_text', 'action_parameters':'name'}, # storing the repository name
		{'class_name':'color-text-secondary', 'action':'store_text', 'action_parameters':'description'} # storing the repository description
	]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)
```

#### Get all publications of a given user from [Google Scholar](https://scholar.google.com) ####

```
import seleniumprocessor

# define initial variables
USERPROFILE = 'UlbGEQwAAAAJ'
URL_HOME = 'https://scholar.google.com/citations?user={}'.format(USERPROFILE)
SLEEP_TO = 3

# initiate a connection on auino GitHub page (not requiring a login)
brw = seleniumprocessor.initiate_connection('./chromedriver', URL_HOME, 3, False)

# define the process to be executed
p = [
    {'id':'gsc_prf_in', 'action':'store_text', 'action_parameters':'name'}, # storing researcher's name
    {'class_name':'gs_lbl', 'index':-1, 'action':'click-repeated', 'action_parameters':'gsc_a_tr', 'sleep':SLEEP_TO}, # clicking the button at the end of the page, to extend the list of publications
    {'class_name':'gsc_a_tr', 'action':'foreach', 'action_parameters':[ # looping on all publications
        {'class_name':'gsc_a_at', 'action':'store_text', 'action_parameters':'title'}, # storing the publication name
        {'class_name':'gs_gray', 'index':0, 'action':'store_text', 'action_parameters':'authors'}, # storing the authors of the publication
        {'class_name':'gs_gray', 'index':1, 'action':'store_text', 'action_parameters':'venue'}, # storing the venue of the publication
        {'class_name':'gsc_a_ac', 'action':'store_text', 'action_parameters':'citations'}, # storing the number of citations of the publication
        {'class_name':'gsc_a_h', 'action':'store_text', 'action_parameters':'year'}, # storing the year of the publication
    ]}
]

# run the process
data = seleniumprocessor.run_process(brw, URL_HOME, SLEEP_TO, p, backtohome_begin=False)

# showing resulting data
print(data)
```

### TODO ###

* Improve code readability
* Extend supported objects structure

### Contacts ###

You can find me on [Twitter](https://twitter.com) as [@auino](https://twitter.com/auino).


