The query relevance check in check_tweet_content fails for queries that contain only search operators without content keywords.
The current implementation splits the entire query string into words and checks if any of them appear in the tweet text, username, or name:
query_words = synapse.get("query", "").strip().lower().split(" ")
This includes search operators like from:, min_faves:, since:, filter:, etc. These operators are query instructions, not content terms, and will never appear in tweet text.
For queries containing only operators, valid tweets will always fail the relevance check.
Example:
- Query:
from:elonmusk
query_words = ["from:elonmusk"]
- Tweet text: "Bitcoin is the future", username: "elonmusk"
"from:elonmusk" is not found in text, username, or name
- Valid tweet gets score 0
Proposed Solution
Write advanced parsing for search query, exclude operators when comparing query words.
We may need LLM model also to check relevance.
The query relevance check in
check_tweet_contentfails for queries that contain only search operators without content keywords.The current implementation splits the entire query string into words and checks if any of them appear in the tweet text, username, or name:
This includes search operators like
from:,min_faves:,since:,filter:, etc. These operators are query instructions, not content terms, and will never appear in tweet text.For queries containing only operators, valid tweets will always fail the relevance check.
Example:
from:elonmuskquery_words=["from:elonmusk"]"from:elonmusk"is not found in text, username, or nameProposed Solution
Write advanced parsing for search query, exclude operators when comparing query words.
We may need LLM model also to check relevance.