SportsCardTool package¶
Submodules¶
SportsCardTool.bref_tool module¶
- SportsCardTool.bref_tool.grab_debut_dict(years: List[str], allow_repeats: bool = False, dictionary: Dict = {'players': {}, 'years': {}}) dict ¶
Master function that allows users to search accross multiple years
- Args:
years: A list of strings representing years. IE: [“1940”, “1941”]
allow_repeats: If True will search a year again even if it is already in passed in dictionary.
dictionary: A dict that defualts to two empty keyed dicts for years and players.
- Returns:
A dictionary in the same format as what is passed in with player info from all debut years.
- SportsCardTool.bref_tool.grab_debut_year(year: str) Dict ¶
Grabs bref info for all players who debuted in a given year.
- Args:
year: A string containing a year to be searching on baseball reference IE: “2017”
- Returns:
A dictionary with the players stored in a dictionary keyed by their names.
Note this function sleeps between each call to respect baseball reference’s policies, please do not modify.
- SportsCardTool.bref_tool.remove_accents(input: str) str ¶
Removes accent marks and capitlization from string.
Standardizing strings makes matching between cardlists and bref possible. Credit To: https://stackoverflow.com/questions/517923
- Args:
input: A string to be modified.
- Returns:
An output string that is lowered and removes all non standard characthers.
SportsCardTool.scraping_tool module¶
- SportsCardTool.scraping_tool.dump_data(card_list: List[Dict], csv_name: str = 'demo_cards.csv')¶
Takes a list of dictionaries and creates a new csv file containing them
- Args:
card_list: A list of dictionaries representing cards. csv_name: A name/path for output file defaults to demo_cards.csv
- SportsCardTool.scraping_tool.filter_hrefs(links: List[Tag], filter: str) List[str] ¶
Filters tag objects according to filter and returns matching href strings.
Returns a list of the href strings inside of each tag if they contain the filter string.
- Args:
links: List of bs4 tag objects that may or may not have href filter: String that is used to filter.
- Returns:
A list of strings, each one an href that contains the filter
- SportsCardTool.scraping_tool.get_soup(href: str) BeautifulSoup ¶
Gets a BeautifulSoup object given an href string.
The BeautifulSoup object is gathered by making a request to the page and parsing the response via an lxml parser. If the request fails or the parsing fails an empty BeatifulSoup object will be returned.
- Args:
href: A string containing the href of a webpage to turned to soup.
- Returns:
A BeautifulSoup object which will contain the contents of the webpage or be empty if the request or parsing fails.
- SportsCardTool.scraping_tool.grab_bref_info(name: str) Dict ¶
Tries different variations of name to find bref_info
- Args:
name: A string containing the player name.
- Returns:
A dict containing bref_info for the given player or a placeholder dictionary.
- SportsCardTool.scraping_tool.grab_card_list(year_links: List[str]) List[Dict] ¶
Finds all groups and sets in a year and returns all cards.
- Args:
year_links: A list of href strings to different years to be parsed
- Returns:
A list of all cards from different groups/sets in the year as parsed by parse_panel.
Note performance is still slow due to network calls and the amount of data contained in years grows exponentially over time.
- SportsCardTool.scraping_tool.grab_year_links(year_list: List[str]) List[Tuple[Tag]] ¶
Takes a list of years and finds link Tags for each year on SportsCardsChecklist.com.
- Args:
year_list: List of strings with each representing a numeric year IE: [‘2015’, ‘2016’]
- Returns:
A list of tuples with the first value being an <a> tag that contained any one of the years specified and the second value being a str representing the year.
IE: (bs4.element.Tag, “2015”)
Note that thereis currently a bug where years on the website are parsed based on first year in link which could cause unexpected behaviors IE ‘2003-07’ would be parsed as 2003.
- SportsCardTool.scraping_tool.parse_panel(panel: Tag, year: str, group: str, set: str) Dict ¶
Takes in a panel and other gathered info and creates a card dict to be returned.
- Args:
panel: A html tag containing all of the players info. year: A string representing the year the card belongs to. Group: A string representing the group the card belongs to Set: A string representing the set the card belongs to.
- Returns:
A dictionary containing all of the data that was able to extracted. If the players name was previously saved and then matched, this card will also show how it was created reletive to the players career.
Note the listing parser to find player names still struggles identifying several types of cards (team cards, multi player, checklist).
- SportsCardTool.scraping_tool.process_group_links(group_links: List[str], year: str) List[Dict] ¶
Proccesses group links into sets and then returns all cards in the group.
- Args:
group_links: A list of href strings. year: A string representing the year the cards belongs to.
- Returns:
A list of card dictionaries as proccessed by parse_panel. The cards returned will have a reference to the year, group, and set respectively.
- SportsCardTool.scraping_tool.process_set_links(set_links: List[str], year: str, group: str = '') List[Dict] ¶
Proccesses set links and then returns all cards in the sets.
- Args:
set_links: A list of href strings. year: A string representing the year the cards belongs to.
- Returns:
A list of card dictionaries as proccessed by parse_panel. The cards returned will have a reference to the year, group, and set respectively.
Note if not group is detected group is set to be equal to set to enable easy future searching.
SportsCardTool.searching_tool module¶
- class SportsCardTool.searching_tool.query_builder(base: str = 'http://flask-cards-env.eba-gsyr32jx.us-east-2.elasticbeanstalk.com/api/v1/sportscards/search?')¶
Bases:
object
The query_builder class allows for quick queries to SportsCardTool’s API
The API can also be accessed manually for now, but this class helps build queries progmatically.
- Attributes:
query: A string containing the query to the API terms: A int counting the number of terms in query
- add_item(filters: dict)¶
Adds items from a dictionary of key value pairs to the query.
Key value pairs are parsed into url params. In the future we would like to support clauses and conditionals, but currently we support the following:
Possible keys: [name, team, group, set, year, serial, auto, mem, contains] Specify multi value queries as comma seperated string IE {“name”: “Rafael Devers,Juan Soto”} Searching via contains key checks if string is contained in the listing (non-case sensitive)
- Args:
filters: A dictionary keyed by term and valued with a string containing all desired values.
- grab_data(min_results: int = 1000)¶
Executes query as defined by class atribute.
Pages through results produced by query string untill it hits the min results argument or runs out of data.
- Args:
min_results: An int specifying how many results at a minimum to return.
- Returns:
A tuples where the first value is a list of dictionarties of results and the second value is an int with the total number of results.