Metadata-Version: 2.1
Name: SwimScraper
Version: 0.0.2
Summary: A package to scrape professional and college swimming data.
Home-page: https://github.com/maflancer/SwimScraper
Author: Matthew Flancer
Author-email: maf291@pitt.edu
License: UNKNOWN
Project-URL: Bug Tracker, https://github.com/maflancer/SwimScraper/issues
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# SwimScraper

* Python scraping package for college and professional swimming data -  all data is from https://swimcloud.com. 
* The package can be found on https://pypi.org/project/SwimScraper/.

## Installation
* You can install SwimScraper using pip:
```pip install SwimScraper```
* An example of one way to use the scraping functions:
```
from SwimScraper import SwimScraper as ss

#gets Pitt men roster for 2020
pitt_M_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', team_ID = 405, gender = 'M', year = 2020)

#gets list of all meets that Pitt participated in for 2020
pitt_meetlist_2020 = ss.getTeamMeetList(team_name = 'University of Pittsburgh', team_ID = 405, year = 2020)
```

## Scraping Functions

**Getting Team Data**

* **getCollegeTeams(team_names, conference_names, division_names)** -> returns list of teams where each team has a team_name, team_ID, team_state, team_division, team_division_ID, team_conference, team_conference_ID
  * **Select one of the three inputs:**
  * team_names - ```team_list = ss.getCollegeTeams(team_names = ['University of Pittsburgh', 'University of Louisville'])```
  * conference_names - ```ACC_teams = ss.getCollegeTeams(division_names = ['ACC'])```
  * division_names - ```d1_teams = ss.getCollegeTeams(conference_names = ['Division 1'])```
* **getTeamRankingsList(gender, season_ID, year)** -> returns list of top 50 countries where each team has a team_name, team_ID, and swimcloud_points (score given by swimcloud.com based on team's fastest times)
  * **Select a gender and either a season_ID (e.g., 19 for the 2015-16 season, 24 for the 2020-21 season) or year**
  * season_ID - ```male_rankings_2015 = ss.getTeamRankingsList('M', season_ID = 19)```
  * year - ```female_rankings_2019 = ss.getTeamRankingsList('F', year = 2019)```

**Getting Roster Data**

* **getRoster(team, gender, team_ID, season_ID, year, pro)** -> returns list of swimmers where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, grade, hometown_state, hometown_city, HS_power_index (a score given to high school students for recruiting - scale is from 1.00 (best) to 100.00)
  * **Select a gender, a team name or team_ID, a season_ID or year, and set pro = True for non-College teams**
  * team - ```pitt_F_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', gender = 'F', year = 2020)```
  * team_ID - ```boston_college_M_roster_2018 = ss.getRoster(team = '', team_ID = 228, gender = 'M', season_ID = 22)```
  * pro - ```japan_M_roster_2020 = ss.getRoster(team = 'Japan', team_ID = 10008082, gender = 'M', year = 2020, pro = True)```
* **getHSRecruitRankings(class_year, gender, state, state_abbreviation, international)** -> returns list of the top 200 High School recruits from the specified class where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, hometown_state, hometown_city, HS_power_index
  * **Select a year, gender, a state or state_abbreviation, and set international = True for international HS students**
  * ```male_recruits_2018 = ss.getHSRecruitRankings(2018, 'M')```
  * state - ```female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state = 'Hawaii')```
  * state_abbreviation - ```female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state_abbreviation = 'HI')```

**Getting Swimmer Data**

* **getPowerIndex(swimmer_ID)** -> returns a swimmer's HS recruiting power index
  * ```swimmer_433591_power_index = ss.getPowerIndex(433591)```
* **getSwimmerEvents(swimmer_ID)** -> returns a list of all events that the specified swimmer has participated in
  * ```swimmer_362091_event_list = ss.getSwimmerEvents(362091)``` 
* **getSwimmerTimes(swimmer_ID,  event_name, event_ID)** -> returns a list of all of the swimmer's times in the specified event where each time has a swimmer_ID, pool_type, event, event_ID, time, meet_name, year, date, improvement (improvement from last time)
  * event_name - ```swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '50 Free')```
  * event_ID - ```swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '', event_ID = 150)```

**Getting Meet Data**

* **getTeamMeetList(team_name, team_ID, season_ID, year)** -> returns a list of all the meets the team has competed in for the specififed season or year where each meet has a team_ID, meet_ID, meet_name, meet_date, meet_location
  * ```pitt_2019_meet_list = ss.getTeamMeetList(team_name = 'University of Pittsburgh', year = 2019)```
  * ```USA_2019_meet_list = ss.getTeamMeetList(team_name = '', team_ID = 10008158, season_ID = 23)```
* **getMeetEventList(meet_ID)** -> returns a list of which events took place at the specified meet where each event has an event_name, event_ID and an event_href which can be used as an input in the following functions that get meet results
  * ```olympics_2012_event_list = ss.getMeetEventList(196380)``` 
* **getCollegeMeetResults(meet_ID, event_name, gender, event_ID, event_href)** -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, score, and improvement
  * event_name - ```pitt_army_100free_results = ss.getCollegeMeetResults(190690,'100 Free', 'F')```
  * event_ID - ```pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_ID = 1100)```
  * event_href (from getMeetEventList) - ```pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_href = '/results/190690/event/17/')```
* **getProMeetResults(meet_ID, event_name, gender, event_ID, event_href)** -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, FINA_score, and improvement
  * ```olympics2016_200free_male_times = ss.getProMeetResults(106117, event_name = '200 Free', gender = 'M')```
  * ```olympics2016_400medleyrelay_women_times = ss.getProMeetResults(106117, event_name = '', gender = 'F', event_ID = 7400)```
  * ```olympics2012_50free_women_times = ss.getProMeetResults(196380, event_name = '', gender = 'F', event_href = '/results/196380/event/1/')```


## Other Helper Functions

* **getTeamID(team_name)** - gets corresponding team_ID for the specified team   ***currently only for college teams**
* **getTeamName(team_ID)** - gets team_name for the specified team_ID   ***currently only for college teams**
* **getSeasonID(year)** - gets season ID for a specified year
* **getYear(season_ID)** - gets year for a specified season_ID
* **getEventID(event_name)** - gets event_ID for a specified event_name
* **getEventName(event_ID)** - gets event_name for a specified event_ID
* **convertTime(display_time)** - converts a time of the format minutes:seconds (1:53.8) to seconds (113.8)




