Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
63c4cc9
switching from requests to selenium
Feb 10, 2025
1d5e255
added code to expand ai overview text and urls
Feb 17, 2025
002386b
update: add lang arg to search using hl url param
gitronald Feb 26, 2025
4bf035b
version: 0.5.1.dev0
gitronald Feb 26, 2025
834769b
update: add lang to output
gitronald Feb 26, 2025
e09030b
update: add language to serp model
gitronald Feb 26, 2025
9377d75
version: 0.5.1.dev1
gitronald Feb 26, 2025
19a94f9
Update header_text en español v2.py
mariaelissat Mar 1, 2025
3010788
update: null arg handling
gitronald Mar 6, 2025
e547b88
version: 0.5.1.dev4
gitronald Mar 6, 2025
1ae2813
fix: canonical name to uule converter with protobuf
gitronald Mar 7, 2025
bddd4a1
update: more specific dir name for geotargets csv download
gitronald Mar 7, 2025
74a1487
version: 0.5.1.dev5
gitronald Mar 7, 2025
6a72f9e
Merge pull request #75 from gitronald/locations
gitronald Mar 7, 2025
e18eed5
version: 0.5.1
gitronald Mar 7, 2025
8ffbf34
Update WebSearcher/classifiers/header_text.py
gitronald Mar 7, 2025
10a60a5
Merge pull request #74 from mariaelissat/patch-2
gitronald Mar 7, 2025
a136095
update: formatting, drop repeated Video labels
gitronald Mar 7, 2025
f701a9e
version: 0.5.2.dev0
gitronald Mar 7, 2025
9d977b0
Merge branch 'master' into dev
gitronald Mar 7, 2025
425f27a
Merge branch 'selenium' into selenium-branch
gitronald Mar 9, 2025
6ff6262
Merge pull request #72 from EvanUp/selenium-branch
gitronald Mar 9, 2025
0730e6b
version: 0.5.2
gitronald Mar 9, 2025
d25f07e
Merge branch 'master' into dev
gitronald Mar 9, 2025
15adc0a
Merge branch 'master' into dev
gitronald Mar 9, 2025
56f9699
Merge branch 'dev' into selenium
gitronald Mar 9, 2025
7b4b95c
version: 0.6.0.dev0
gitronald Mar 9, 2025
acea022
update: dedupe args, add version_main for chromedriver launch
gitronald Mar 9, 2025
80403f2
update: poetry lock file
gitronald Mar 9, 2025
465c553
update: reorg selenium code
gitronald Mar 9, 2025
ea6eebc
update: specify args, headless not working locally
gitronald Mar 9, 2025
82bf507
update: collection code and selenium test
gitronald Mar 11, 2025
5ca2da7
update: save method variable along with metadata
gitronald Mar 11, 2025
b950989
update: handle null links in tw result
gitronald Mar 11, 2025
dc00990
version: 0.6.0.dev1
gitronald Mar 11, 2025
a7cfd5a
update: move driver init to search, add driver cleanup
gitronald Mar 11, 2025
2a86a5a
version: 0.6.0.dev2
gitronald Mar 11, 2025
76fa069
update: add parse both features and results options
gitronald Mar 19, 2025
6270f5d
version: 0.6.0.dev3
gitronald Mar 19, 2025
6752498
version: 0.6.0.dev4
gitronald Mar 26, 2025
165f9e3
update: condense args, use currently reliable default query
gitronald Mar 26, 2025
b77c413
update: use pydantic models for configs and defaults
gitronald Mar 26, 2025
9b925ae
update: model directory with multiple files, new BaseConfig model
gitronald Mar 26, 2025
4e87302
update: use baseconfig in searchconfig
gitronald Mar 26, 2025
d0a7aa9
update: clean log config, header as arg
gitronald Mar 26, 2025
2fc420d
update: use search params pydantic model
gitronald Mar 26, 2025
083282f
update: move selenium to new searchers dir
gitronald Mar 27, 2025
7ae02c6
update: model docs
gitronald Mar 27, 2025
b725ace
add: searches directory for diff methods
gitronald Mar 27, 2025
6d4642f
add: file for requests code, update outputs
gitronald Mar 27, 2025
bd2b76e
version: 0.6.0.dev5
gitronald Mar 28, 2025
0c959b9
update: cleaner selenium cleanup
gitronald Mar 28, 2025
82ef0db
update: consistent logging and serp handling
gitronald Mar 28, 2025
3fd8e19
update: simplify search logic, use SearchParams, ai expand logic in s…
gitronald Mar 28, 2025
b899271
Merge branch 'master' into dev
gitronald Mar 28, 2025
bdb5975
update: drop python version file, use python>=3.10 in pyproject
gitronald Mar 28, 2025
d5c7539
fix: selenium output reference
gitronald Mar 28, 2025
64ee056
update: demo scripts
gitronald Mar 28, 2025
f8337ca
Merge branch 'master' into dev
gitronald Mar 28, 2025
16ba005
update: timestamp before request, ai expand as search param, load sea…
gitronald Apr 1, 2025
51ee3b2
update: poetry lock
gitronald Apr 1, 2025
9e8a393
Merge branch 'master' into dev
gitronald Apr 1, 2025
6ee5209
update: using orjson for speed, must decode dumps to string
gitronald Apr 1, 2025
48ae902
update: archive result collector, ignore archive
gitronald Apr 1, 2025
9aaf210
Merge branch 'master' into dev
gitronald Apr 1, 2025
d076e4f
fix: downgrade log warning to debug
gitronald Apr 2, 2025
e726acd
update: breaking change for log config, using logger kwargs
gitronald Apr 2, 2025
fef3595
Merge branch 'master' into dev
gitronald Apr 2, 2025
70e774f
build(deps): bump h11 from 0.14.0 to 0.16.0
dependabot[bot] Apr 24, 2025
592c69b
update: ad component parsers
gitronald Apr 27, 2025
2ca4081
version: 0.6.5.dev0
gitronald Apr 27, 2025
0f9dc40
update: videos component parser
gitronald Apr 27, 2025
502a025
version: 0.6.5.dev1
gitronald Apr 27, 2025
cc2395c
update: discussions and forums classifier
gitronald Apr 28, 2025
cedc7d2
update: extract more divs for top_bar layout
gitronald Apr 28, 2025
1c68ea8
version: 0.6.5.dev2
gitronald Apr 28, 2025
29d62c7
fix: drop debug print and fix print var
gitronald Apr 28, 2025
fa411b8
update: expand general classifier classes
gitronald Apr 28, 2025
2f9bb28
update: extract from top bar for 2025 serps
gitronald Apr 28, 2025
5373a85
update: expand images sub cmpt class list and title/url parsing
gitronald Apr 28, 2025
036e67a
update: reduce doc strings
gitronald Apr 28, 2025
205bba5
version: 0.6.5.dev3
gitronald Apr 28, 2025
e23e70b
update: more restrictive discussions classifier
gitronald Apr 29, 2025
41cfba2
update: expand classes for video cmpt extraction
gitronald Apr 29, 2025
2d01701
fix: no empty whitespace in filter_empty_divs func
gitronald May 8, 2025
9d66539
update: more knowledge panel identifiers
gitronald May 8, 2025
85f5766
fix: count sub ranks for standard ads
gitronald May 8, 2025
ac79df0
update: result types dictionaries
gitronald May 8, 2025
a737aaf
move: extractors to dir
gitronald May 8, 2025
b6be243
rename: extractors code
gitronald May 8, 2025
52c79f6
add: breakout extractor functions into files by section
gitronald May 8, 2025
a3b7c00
version: 0.6.5.dev4
gitronald May 8, 2025
1a32ee0
add: recent_posts variant of top_stories
gitronald May 9, 2025
f775eac
update: remove duplicate log
gitronald May 9, 2025
e0edd4e
version: 0.6.5.dev5
gitronald May 9, 2025
bd02fa8
fix: missing comma
gitronald May 9, 2025
71e1a55
update: main column extractors
gitronald May 9, 2025
9955b7c
update: bump h11 per dependabot
gitronald May 9, 2025
951a5da
fix: handle serps with no rcnt div
gitronald May 15, 2025
22f639b
update: stricter news_quotes classification, more knowledge classifie…
gitronald May 15, 2025
ebcca84
fix: stricter parsing for songs id div
gitronald May 17, 2025
cc9a938
build(deps-dev): bump tornado from 6.4.2 to 6.5.1
dependabot[bot] May 23, 2025
8daaec7
build(deps): bump requests from 2.32.3 to 2.32.4
dependabot[bot] Jun 10, 2025
a4a4a32
build(deps): bump protobuf from 6.30.0 to 6.31.1
dependabot[bot] Jun 17, 2025
a05270e
build(deps): bump urllib3 from 2.3.0 to 2.5.0
dependabot[bot] Jun 19, 2025
0209fdb
version: 0.6.5a0
gitronald Oct 14, 2025
b7cc700
refactor: convert Footer methods to staticmethod
gitronald Oct 14, 2025
e66515c
fix: update demo-search entry point to use typer app
gitronald Oct 14, 2025
bb938d5
update: version in __init__.py to match pyproject.toml
gitronald Oct 14, 2025
cc33dea
update: default Chrome version to 141
gitronald Oct 14, 2025
eaa4314
merge: dependabot PR #78 (h11 0.14.0 → 0.16.0)
gitronald Oct 14, 2025
db590a5
merge: dependabot PR #79 (tornado 6.4.2 → 6.5.1)
gitronald Oct 14, 2025
8b7e6a6
merge: dependabot PR #80 (requests 2.32.3 → 2.32.4)
gitronald Oct 14, 2025
5d28346
merge: dependabot PR #81 (protobuf 6.30.0 → 6.31.1)
gitronald Oct 14, 2025
aedc28a
merge: dependabot PR #82 (urllib3 2.3.0 → 2.5.0)
gitronald Oct 14, 2025
f83ef9f
merge: dependabot updates (h11, tornado, requests, protobuf, urllib3)
gitronald Oct 14, 2025
1cb9ae5
update: bump requests to 2.32.4 and protobuf to 6.31.1
gitronald Oct 14, 2025
3cb1093
update: regenerate poetry.lock
gitronald Oct 14, 2025
eb3cec9
update: github actions readme section
gitronald Dec 5, 2025
a864e09
version: 0.6.5
gitronald Dec 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Below are some details about recent updates. For a longer list, see the [Update
- [Repair or Enhance a Parser](#repair-or-enhance-a-parser)
- [Add a Parser](#add-a-parser)
- [Testing](#testing)
- [GitHub Actions](#github-actions)
- [Update Log](#update-log)
- [Similar Packages](#similar-packages)
- [License](#license)
Expand Down Expand Up @@ -119,7 +120,7 @@ drwxr-xr-x 2 user user 4.0K 2024-11-11 10:55 html/

### Step by Step

Example search and parse pipeline:
Example search and parse pipeline (via requests):

```python
import WebSearcher as ws
Expand All @@ -143,7 +144,7 @@ se = ws.SearchEngine(
"headless": False,
"use_subprocess": False,
"driver_executable_path": "",
"version_main": 133,
"version_main": 141,
}
)
```
Expand Down Expand Up @@ -253,6 +254,22 @@ With the `-k` flag you can run a test for a specific html file:
pytest -k "1684837514.html"
```

---
## GitHub Actions

This repository uses GitHub Actions for automated publishing:

**Release Workflow** (`.github/workflows/publish.yml`)
Automatically publishes to PyPI when a pull request is merged into `master`. The workflow:
- Triggers on merged PRs to `master`
- Builds the package using Poetry
- Publishes to PyPI using trusted publishing (no API tokens required)

To release a new version:
1. Update the version in `pyproject.toml`
2. Create a PR to `master`
3. Once merged, the package is automatically published to PyPI

---
## Update Log

Expand Down
2 changes: 1 addition & 1 deletion WebSearcher/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.6.4"
__version__ = "0.6.5"
from .searchers import SearchEngine
from .parsers import parse_serp, FeatureExtractor
from .extractors import Extractor
Expand Down
4 changes: 3 additions & 1 deletion WebSearcher/classifiers/header_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ def _get_header_level_mapping(level) -> dict:
"local_results": [
"Local Results",
"Locations",
"Places", "Sitios"
"Places",
"Sitios",
"Businesses",
"locations",
],
Expand All @@ -116,6 +117,7 @@ def _get_header_level_mapping(level) -> dict:
"News",
"Noticias",
"Market news"],
"recent_posts": ["Recent posts"],
"twitter": ["Twitter Results"],
"videos": ["Videos"]
}
Expand Down
22 changes: 15 additions & 7 deletions WebSearcher/classifiers/main.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import bs4
from .. import logger
log = logger.Logger().start(__name__)

from .header_text import ClassifyHeaderText
from .. import webutils
import bs4

class ClassifyMain:
"""Classify a component from the main section based on its bs4.element.Tag """
Expand All @@ -14,6 +14,7 @@ def classify(cmpt: bs4.element.Tag) -> str:
# Ordered list of classifiers to try
component_classifiers = [
ClassifyMain.top_stories, # Check top stories
ClassifyMain.discussions_and_forums, # Check discussions and forums
ClassifyHeaderText.classify, # Check levels 2 & 3 header text
ClassifyMain.news_quotes, # Check news quotes
ClassifyMain.img_cards, # Check image cards
Expand All @@ -40,6 +41,12 @@ def classify(cmpt: bs4.element.Tag) -> str:

return cmpt_type

@staticmethod
def discussions_and_forums(cmpt: bs4.element.Tag) -> str:
conditions = [
cmpt.find("div", {"class": "IFnjPb", "role": "heading"}),
]
return 'discussions_and_forums' if all(conditions) else "unknown"

@staticmethod
def available_on(cmpt: bs4.element.Tag) -> str:
Expand Down Expand Up @@ -68,7 +75,7 @@ def general(cmpt: bs4.element.Tag) -> str:
"format-01": cmpt.attrs["class"] == ["g"],
"format-02": ( ("g" in cmpt.attrs["class"]) &
any(s in ["Ww4FFb"] for s in cmpt.attrs["class"]) ),
"format-03": any(s in ["hlcw0c", "MjjYud"] for s in cmpt.attrs["class"]),
"format-03": any(s in ["hlcw0c", "MjjYud", "PmEWq"] for s in cmpt.attrs["class"]),
"format-04": cmpt.find('div', {'class': ['g', 'Ww4FFb']}),
}
else:
Expand Down Expand Up @@ -143,7 +150,9 @@ def knowledge_panel(cmpt: bs4.element.Tag) -> str:
cmpt.find("h1", {"class": "VW3apb"}),
cmpt.find("div", {"class": ["knowledge-panel", "knavi", "kp-blk", "kp-wholepage-osrp"]}),
cmpt.find("div", {"aria-label": "Featured results", "role": "complementary"}),
webutils.check_dict_value(cmpt.attrs, "jscontroller", "qTdDb")
cmpt.find("div", {"jscontroller": "qTdDb"}),
webutils.check_dict_value(cmpt.attrs, "jscontroller", "qTdDb"),
cmpt.find('div', {'class':'obcontainer'})
]
return 'knowledge' if any(conditions) else "unknown"

Expand Down Expand Up @@ -179,10 +188,9 @@ def top_stories(cmpt: bs4.element.Tag) -> str:
@staticmethod
def news_quotes(cmpt: bs4.element.Tag) -> str:
"""Classify top stories components"""
conditions = [
cmpt.find("g-tray-header", role="heading"),
]
return 'news_quotes' if all(conditions) else "unknown"
header_div = cmpt.find("g-tray-header", role="heading")
condition = webutils.get_text(header_div, strip=True) == "News quotes"
return 'news_quotes' if condition else "unknown"

@staticmethod
def twitter(cmpt: bs4.element.Tag) -> str:
Expand Down
2 changes: 2 additions & 0 deletions WebSearcher/component_parsers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from .latest_from import parse_latest_from
from .local_news import parse_local_news
from .perspectives import parse_perspectives
from .recent_posts import parse_recent_posts

from .local_results import parse_local_results
from .map_results import parse_map_results
Expand Down Expand Up @@ -57,6 +58,7 @@
('news_quotes', parse_news_quotes, 'News Quotes'),
('people_also_ask', parse_people_also_ask, 'People Also Ask'),
('perspectives', parse_perspectives, 'Perspectives & Opinions'),
('recent_posts', parse_recent_posts, 'Recent Posts'),
('scholarly_articles', parse_scholarly_articles, 'Scholar Articles'),
('searches_related', parse_searches_related, 'Related Searches'),
('shopping_ads', parse_shopping_ads, 'Shopping Ad'),
Expand Down
105 changes: 85 additions & 20 deletions WebSearcher/component_parsers/ads.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,24 @@
- added new div class for text field
- added labels (e.g., "Provides abortions") from <span class="mXsQRe">, appended to text field

2025-04-27: added carousel sub_type, global parsed output

"""

from .. import webutils
from .shopping_ads import parse_shopping_ads
import bs4

PARSED = {
'type': 'ad',
'sub_type': '',
'sub_rank': 0,
'title': '',
'url': '',
'cite': '',
'text': '',
}

def parse_ads(cmpt: bs4.element.Tag) -> list:
"""Parse ads from ad component"""

Expand All @@ -27,12 +38,14 @@ def parse_ads(cmpt: bs4.element.Tag) -> list:
parsed_list = [parse_ad_secondary(sub, sub_rank) for sub_rank, sub in enumerate(subs)]
elif sub_type == 'standard':
subs = webutils.find_all_divs(cmpt, 'div', {'class': ['uEierd', 'commercial-unit-desktop-top']})
for sub in subs:
for sub_rank, sub in enumerate(subs):
sub_classes = sub.attrs.get("class", [])
if "commercial-unit-desktop-top" in sub_classes:
parsed_list.extend(parse_shopping_ads(sub))
elif "uEierd" in sub_classes:
parsed_list.append(parse_ad(sub))
parsed_list.append(parse_ad(sub, sub_rank=sub_rank))
elif sub_type == 'carousel':
parsed_list = parse_ad_carousel(cmpt, sub_type)
return parsed_list


Expand All @@ -41,20 +54,71 @@ def classify_ad_type(cmpt: bs4.element.Tag) -> str:
label_divs = {
"legacy": webutils.find_all_divs(cmpt, 'div', {'class': 'ad_cclk'}),
"secondary": webutils.find_all_divs(cmpt, 'div', {'class': 'd5oMvf'}),
"standard": webutils.find_all_divs(cmpt, 'div', {'class': ['uEierd', 'commercial-unit-desktop-top']})
"standard": webutils.find_all_divs(cmpt, 'div', {'class': ['uEierd', 'commercial-unit-desktop-top']}),
"carousel": webutils.find_all_divs(cmpt, 'g-scrolling-carousel'),
}
for label, divs in label_divs.items():
if divs:
return label
return 'unknown'


def parse_ad_carousel(cmpt: bs4.element.Tag, sub_type: str, filter_visible: bool = True) -> list:

def parse_ad_carousel_div(sub: bs4.element.Tag, sub_type: str, sub_rank: int) -> dict:
"""Parse ad carousel div, seen 2025-02-06"""
parsed = PARSED.copy()
parsed['sub_type'] = sub_type
parsed['sub_rank'] = sub_rank
parsed['title'] = webutils.get_text(sub, 'div', {'class':'e7SMre'})
parsed['url'] = webutils.get_link(sub)
parsed['text'] = webutils.get_text(sub, 'div', {"class":"vrAZpb"})
parsed['cite'] = webutils.get_text(sub, 'div', {"class":"zpIwr"})
parsed['visible'] = not (sub.has_attr('data-has-shown') and sub['data-has-shown'] == 'false')
return parsed

def parse_ad_carousel_card(sub: bs4.element.Tag, sub_type: str, sub_rank: int) -> dict:
"""Parse ad carousel card, seen 2024-09-21"""
parsed = PARSED.copy()
parsed['sub_type'] = sub_type
parsed['sub_rank'] = sub_rank
parsed['title'] = webutils.get_text(sub, 'div', {'class':'gCv54b'})
parsed['url'] = webutils.get_link(sub, {"class": "KTsHxd"})
parsed['text'] = webutils.get_text(sub, 'div', {"class":"VHpBje"})
parsed['cite'] = webutils.get_text(sub, 'div', {"class":"j958Pd"})
parsed['visible'] = not (sub.has_attr('data-viewurl') and sub['data-viewurl'])
return parsed

ad_carousel_parsers = [
{'find_kwargs': {'name': 'g-inner-card'},
'parser': parse_ad_carousel_card},
{'find_kwargs': {'name': 'div', 'attrs': {'class': 'ZPze1e'}},
'parser': parse_ad_carousel_div}
]

output_list = []
ad_carousel = cmpt.find('g-scrolling-carousel')
if ad_carousel:
for parser_details in ad_carousel_parsers:
parser_func = parser_details['parser']
kwargs = parser_details['find_kwargs']
sub_cmpts = webutils.find_all_divs(ad_carousel, **kwargs)
if sub_cmpts:
for sub_rank, sub in enumerate(sub_cmpts):
parsed = parser_func(sub, sub_type, sub_rank)
output_list.append(parsed)

if filter_visible:
output_list = [{k:v for k,v in x.items() if k != 'visible'} for x in output_list if x['visible']]
return output_list


def parse_ad(sub: bs4.element.Tag, sub_rank: int = 0) -> dict:
"""Parse details of a single ad subcomponent, similar to general"""
parsed = {"type": "ad",
"sub_type": "standard",
"sub_rank": sub_rank}
parsed = PARSED.copy()
parsed["sub_type"] = "standard"
parsed["sub_rank"] = sub_rank

parsed['title'] = webutils.get_text(sub, 'div', {'role':'heading'})
parsed['url'] = webutils.get_link(sub, {"class":"sVXRqc"})
parsed['cite'] = webutils.get_text(sub, 'span', {"role":"text"})
Expand Down Expand Up @@ -96,13 +160,14 @@ def parse_ad_menu(sub: bs4.element.Tag) -> list:

def parse_ad_secondary(sub: bs4.element.Tag, sub_rank: int = 0) -> dict:
"""Parse details of a single ad subcomponent, similar to general"""
parsed = PARSED.copy()
parsed["sub_type"] = "secondary"
parsed["sub_rank"] = sub_rank

parsed = {"type": "ad",
"sub_type": "secondary",
"sub_rank": sub_rank}
parsed['title'] = sub.find('div', {'role':'heading'}).text
parsed['url'] = sub.find('div', {'class':'d5oMvf'}).find('a')['href']
parsed['cite'] = sub.find('span', {'class':'gBIQub'}).text
parsed['title'] = webutils.get_text(sub, 'div', {'role':'heading'})
link_div = sub.find('div', {'class':'d5oMvf'})
parsed['url'] = webutils.get_link(link_div) if link_div else ''
parsed['cite'] = webutils.get_text(sub, 'span', {'class':'gBIQub'})

# Take the top div with this class, should be main result abstract
text_divs = sub.find_all('div', {'class':'yDYNvb'})
Expand All @@ -123,14 +188,14 @@ def parse_ad_secondary(sub: bs4.element.Tag, sub_rank: int = 0) -> dict:

def parse_ad_legacy(sub: bs4.element.Tag, sub_rank: int = 0) -> dict:
"""[legacy] Parse details of a single ad subcomponent, similar to general"""

parsed = {"type": "ad",
"sub_type": "legacy",
"sub_rank": sub_rank}
parsed = PARSED.copy()
parsed["sub_type"] = "legacy"
parsed["sub_rank"] = sub_rank

header = sub.find('div', {'class':'ad_cclk'})
parsed['title'] = header.find('h3').text
parsed['url'] = header.find('cite').text
parsed['text'] = sub.find('div', {'class':'ads-creative'}).text
parsed['title'] = webutils.get_text(header, 'h3')
parsed['url'] = webutils.get_text(header, 'cite')
parsed['text'] = webutils.get_text(sub, 'div', {'class':'ads-creative'})

bottom_text = sub.find('ul')
if bottom_text:
Expand Down
18 changes: 9 additions & 9 deletions WebSearcher/component_parsers/footer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,31 @@

class Footer:

@classmethod
def parse_image_cards(self, elem) -> list:
@staticmethod
def parse_image_cards(elem) -> list:
subs = webutils.find_all_divs(elem, 'div', {'class':'g'})
return [self.parse_image_card(sub, sub_rank) for sub_rank, sub in enumerate(subs)]
return [Footer.parse_image_card(sub, sub_rank) for sub_rank, sub in enumerate(subs)]

@classmethod
def parse_image_card(self, sub, sub_rank=0) -> dict:
@staticmethod
def parse_image_card(sub, sub_rank=0) -> dict:
parsed = {'type':'img_cards', 'sub_rank':sub_rank}
parsed['title'] = webutils.get_text(sub, "div", {'aria-level':"3", "role":"heading"})
images = sub.find_all('img')
if images:
parsed['details'] = [{'text':i['alt'], 'url':i['src']} for i in images]
return parsed

@classmethod
def parse_discover_more(self, elem) -> list:
@staticmethod
def parse_discover_more(elem) -> list:
carousel = elem.find('g-scrolling-carousel')
return [{
'type':'discover_more',
'sub_rank':0,
'text': '|'.join(c.text for c in carousel.find_all('g-inner-card'))
}]

@classmethod
def parse_omitted_notice(self, elem) -> list:
@staticmethod
def parse_omitted_notice(elem) -> list:
return [{
'type':'omitted_notice',
'sub_rank':0,
Expand Down
Loading
Loading