Hello, I followed the instruction and after I did bash use_meta_100h.sh and successful downloaded .parquet and .pkl. But after that I got errots as like:
2025-07-30 10:57:14.167 | INFO | main:load_urls_from_pickle:20 - Loaded 291 podcast URLs from /apdcephfs_zwfy/share_303841515/Tealab/data/balalaika/Balalaika100H.pkl
2025-07-30 10:57:14.168 | INFO | main:main:58 - Using 4 workers
2025-07-30 10:57:14.168 | INFO | main:main:62 - Downloading episodes {'104214794', '104214790', '104214792', '104214791', '104214795', '104214796', '104214793'} for podcast ID: 22391094
2025-07-30 10:57:15.424 | ERROR | main:main:73 - Error downloading podcast 22391094: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844235241991-5329902998396024136","hostname":"music-web-mobile-production-music-86.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:15.424 | INFO | main:main:62 - Downloading episodes {'139641820', '140051990', '138746583', '138348599', '139053287', '138548286', '139378281', '137902073'} for podcast ID: 33111708
2025-07-30 10:57:17.889 | ERROR | main:main:73 - Error downloading podcast 33111708: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844237727890-3319444633801507517","hostname":"music-web-mobile-production-music-25.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:17.889 | INFO | main:main:62 - Downloading episodes {'105542303', '105542300', '105542301', '105542304', '105542306', '105542305', '105542299', '105542302'} for podcast ID: 22870289
2025-07-30 10:57:19.140 | ERROR | main:main:73 - Error downloading podcast 22870289: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844238998310-12902771748044133683","hostname":"music-web-mobile-production-music-19.sas.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:19.140 | INFO | main:main:62 - Downloading episodes {'111319332', '111319336', '111319335', '111319330', '111319334', '111319331', '111319333', '111319337'} for podcast ID: 24857516
2025-07-30 10:57:20.614 | ERROR | main:main:73 - Error downloading podcast 24857516: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844240457698-7863424378030152510","hostname":"music-web-mobile-production-music-80.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:20.615 | INFO | main:main:62 - Downloading episodes {'139512432', '139206470', '139936598', '140101626', '139719583', '139369304', '140273861', '138802302'} for podcast ID: 27798647
I checked the src/download/download_prepared.py and found in main()
for podcast_id, episode_ids in podcast_episode_map.items():
dummy_url = f"https://music.yandex.ru/album/{podcast_id}"
logger.info(f"Downloading episodes {episode_ids} for podcast ID: {podcast_id}")
try:
result = download_podcast(
client=client,
url=dummy_url,
podcasts_path=podcasts_path,
num_workers=num_workers,
episode_ids=episode_ids
)
logger.info(result)
except Exception as e:
logger.error(f"Error downloading podcast {podcast_id}: {e}")
and can't load the websit here https://music.yandex.ru/album/105542303, neither other podcast id
Can you download this dataset by scripts or open this website? If not, could you please provide other methods to download this dataset?
Hello, I followed the instruction and after I did bash use_meta_100h.sh and successful downloaded .parquet and .pkl. But after that I got errots as like:
2025-07-30 10:57:14.167 | INFO | main:load_urls_from_pickle:20 - Loaded 291 podcast URLs from /apdcephfs_zwfy/share_303841515/Tealab/data/balalaika/Balalaika100H.pkl
2025-07-30 10:57:14.168 | INFO | main:main:58 - Using 4 workers
2025-07-30 10:57:14.168 | INFO | main:main:62 - Downloading episodes {'104214794', '104214790', '104214792', '104214791', '104214795', '104214796', '104214793'} for podcast ID: 22391094
2025-07-30 10:57:15.424 | ERROR | main:main:73 - Error downloading podcast 22391094: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844235241991-5329902998396024136","hostname":"music-web-mobile-production-music-86.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:15.424 | INFO | main:main:62 - Downloading episodes {'139641820', '140051990', '138746583', '138348599', '139053287', '138548286', '139378281', '137902073'} for podcast ID: 33111708
2025-07-30 10:57:17.889 | ERROR | main:main:73 - Error downloading podcast 33111708: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844237727890-3319444633801507517","hostname":"music-web-mobile-production-music-25.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:17.889 | INFO | main:main:62 - Downloading episodes {'105542303', '105542300', '105542301', '105542304', '105542306', '105542305', '105542299', '105542302'} for podcast ID: 22870289
2025-07-30 10:57:19.140 | ERROR | main:main:73 - Error downloading podcast 22870289: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844238998310-12902771748044133683","hostname":"music-web-mobile-production-music-19.sas.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:19.140 | INFO | main:main:62 - Downloading episodes {'111319332', '111319336', '111319335', '111319330', '111319334', '111319331', '111319333', '111319337'} for podcast ID: 24857516
2025-07-30 10:57:20.614 | ERROR | main:main:73 - Error downloading podcast 24857516: {'name': 'Unavailable For Legal Reasons', 'message': ''} (451): b'{"invocationInfo":{"req-id":"1753844240457698-7863424378030152510","hostname":"music-web-mobile-production-music-80.klg.yp-c.yandex.net"},"error":{"name":"Unavailable For Legal Reasons","message":""}}'
2025-07-30 10:57:20.615 | INFO | main:main:62 - Downloading episodes {'139512432', '139206470', '139936598', '140101626', '139719583', '139369304', '140273861', '138802302'} for podcast ID: 27798647
I checked the src/download/download_prepared.py and found in main()
for podcast_id, episode_ids in podcast_episode_map.items():
dummy_url = f"https://music.yandex.ru/album/{podcast_id}"
logger.info(f"Downloading episodes {episode_ids} for podcast ID: {podcast_id}")
try:
result = download_podcast(
client=client,
url=dummy_url,
podcasts_path=podcasts_path,
num_workers=num_workers,
episode_ids=episode_ids
)
logger.info(result)
except Exception as e:
logger.error(f"Error downloading podcast {podcast_id}: {e}")
and can't load the websit here https://music.yandex.ru/album/105542303, neither other podcast id
Can you download this dataset by scripts or open this website? If not, could you please provide other methods to download this dataset?