Skip to content

Can not download the dataset #6

@FDlalala

Description

@FDlalala

Hello, I followed the instruction and after I did bash use_meta_100h.sh and successful downloaded .parquet and .pkl. But after that I got errots as like:

2025-07-30 10:57:14.167 | INFO | main:load_urls_from_pickle:20 - Loaded 291 podcast URLs from /apdcephfs_zwfy/share_303841515/Tealab/data/balalaika/Balalaika100H.pkl
2025-07-30 10:57:14.168 | INFO | main:main:58 - Using 4 workers
2025-07-30 10:57:14.168 | INFO | main:main:62 - Downloading episodes {'104214794', '104214790', '104214792', '104214791', '104214795', '104214796', '104214793'} for podcast ID: 22391094
2025-07-30 10:57:15.424 | ERROR | main:main:73 - Error downloading podcast 22391094: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844235241991-5329902998396024136","hostname":"music-web-mobile-production-music-86.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:15.424 | INFO | main:main:62 - Downloading episodes {'139641820', '140051990', '138746583', '138348599', '139053287', '138548286', '139378281', '137902073'} for podcast ID: 33111708
2025-07-30 10:57:17.889 | ERROR | main:main:73 - Error downloading podcast 33111708: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844237727890-3319444633801507517","hostname":"music-web-mobile-production-music-25.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:17.889 | INFO | main:main:62 - Downloading episodes {'105542303', '105542300', '105542301', '105542304', '105542306', '105542305', '105542299', '105542302'} for podcast ID: 22870289
2025-07-30 10:57:19.140 | ERROR | main:main:73 - Error downloading podcast 22870289: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844238998310-12902771748044133683","hostname":"music-web-mobile-production-music-19.sas.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:19.140 | INFO | main:main:62 - Downloading episodes {'111319332', '111319336', '111319335', '111319330', '111319334', '111319331', '111319333', '111319337'} for podcast ID: 24857516
2025-07-30 10:57:20.614 | ERROR | main:main:73 - Error downloading podcast 24857516: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844240457698-7863424378030152510","hostname":"music-web-mobile-production-music-80.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:20.615 | INFO | main:main:62 - Downloading episodes {'139512432', '139206470', '139936598', '140101626', '139719583', '139369304', '140273861', '138802302'} for podcast ID: 27798647

I checked the src/download/download_prepared.py and found in main()
for podcast_id, episode_ids in podcast_episode_map.items():
dummy_url = f"https://music.yandex.ru/album/{podcast_id}"
logger.info(f"Downloading episodes {episode_ids} for podcast ID: {podcast_id}")
try:
result = download_podcast(
client=client,
url=dummy_url,
podcasts_path=podcasts_path,
num_workers=num_workers,
episode_ids=episode_ids
)
logger.info(result)
except Exception as e:
logger.error(f"Error downloading podcast {podcast_id}: {e}")

and can't load the websit here https://music.yandex.ru/album/105542303, neither other podcast id

Image

Can you download this dataset by scripts or open this website? If not, could you please provide other methods to download this dataset?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions