The parler parser is used to parse parler HTML posts and user profiles. Parler post dumps can be found from here.
Refer to here
import glob
from parler.parser.postParser import PostParser
from parler.dataType.post import Post
files = glob.glob('posts/*')
data = []
for file in files:
post = PostParser(file).parse()
if (post is not None):
data.append(post.convert())
print(data)from parler.parser.profilePageParser import ProfilePageParser
file = r".\profile\00KimPossible00\posts\index.html"
timestamp = 20201124075219
profilePage = ProfilePageParser(file, timestamp)
user, posts = profilePage.parse()
print(user.convert())
print()
for post in posts:
print(post.convert())
print()You should get the same results as shown in sample_output.
-
Determine what type of post we are dealing with:
- New Post
- Echoed Post
- Echoed Post with Reply
- Echoed Post with Root Echo and No Reply
- Echoed Post with Root Echo and Reply
-
If
new post, parse the only post asmain postelse parse thereplypost asmain post. -
If not
new post, parse theechoed post. -
If
echoed postorechoed post with root echo and no reply:- Use the "Echoed by ... " line to fill out
mainpost info with theuserandcreated_at - Grab
usernamefrom the meta information stored in the header. - No profile badge can be found in the post this way.
- The
comment_count,echo_count,upvote_countbelongs to the echoed post.
- Use the "Echoed by ... " line to fill out
-
Else:
- The
comment_count,echo_count,upvote_countbelongs to themainpost.
- The
-
If
Echoed Post with Root Echo and No ReplyorEchoed Post with Root Echo and Reply:- Parse the
firstpost for theroot echo.
- Parse the