The ML repository for the Problem Statement - Digital Alpha SaaS Analyzer at Inter IIT Tech Meet 10.0. The deployed version of the website can be found here. The report can be found here
-
dict-sentiment.ipynb -> For sentiment analysis(6 classes - lexicon based)
- Input: The input of the file is specified in the
temp_textvariable. - Output: The input is passed to the
get_class_counterfunction, which returns the sentiment dictionary containing the results.
- Input: The input of the file is specified in the
-
finbert_inference.ipynb -> For sentiment analysis(3 classes - transformer)
- Input: The input of the file is specified in the
temp_textvariable. - Output; The input is passed to the
get_outputfunction, which returns the sentiment dictionary.
- Input: The input of the file is specified in the
-
mdna_extractor.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the
filing_urlandsection_name, where the names have their usual meanings - Output: The output is obtained from the
get_sectionfunction, which returns the desired section text
- Input: The input to the function is the
-
find_company_trends_using_lda.py -> For extracting the latest trending topics relevant to the company
- Input: The input to the file is the company title and the number of tweets we want to extract
- Output: The output of the file is a list of top keywords relevant to the company
-
extract_metrics_from_fillings.ipynb -> For extracting metrics from the fillings
- Input: The inputs are:
api_key- for accessing the fillings using sec-apiurl- url to the filingmetric- name of the metric in lowercaseval_type- metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']k- window size for metric search, default = 6relevant_sections- list of sections to search for the metric
- Output: The output of the file is value of the metric extracted from the filing stored in
correct_valuevariable
- Input: The inputs are:
-
extract_tables.ipynb -> For extracting tables from the fillings
- Input: The inputs are
api_keyfor accessing the fillings using sec-api,urlto the filing and thesection - Output: The output of the file is the tables extracted from the filing stored in
tablesvariable
- Input: The inputs are
-
qna_on_tables.ipynb -> For question answering on the tables
- Input: The inputs are
tableandques(a list of questions) - Output: The output of the file is the answers to the question based on the table
- Input: The inputs are
-
theme-vocab-builder.ipynb -> To build vocabulary for various sectors
- Input: any important data file related to various sectors
- Output: The output of the file is the vocabulary file for various sectors
-
exposure-calc.ipynb -> to calculate the exposure of a company to various sectors
- Input: The inputs are -
filing.txt- sec filing of a companytheme.txt- vocabulary file for a specific sector
- Output: The output of the file is the similarity score with respect to the vocabulary of a specific sector
- Input: The inputs are -
-
generate_questions_answers.ipynb -> to generate questions and answers from the text given
- Input: The only input is the
text - Output: The output of the file is the generated questions and answers in the dictionary
qna_dict
- Input: The only input is the
-
summarize_text.ipynb -> to summarize the text given
- Input: The only input is the
text - Output: The output of the file is the summary of the text in the variable
summary
- Input: The only input is the
-
10Q_parser.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the link of the filing and section number
- Output: The output is obtained from the
parse_10q_filingfunction, which returns the desired section text
-
find_metric.ipynb -> complete pipeline for extracting metrics from filings of a company
- Input: The inputs are -
api_key- for accessing the fillings using sec-apicompany_cik- cik of the cmopanymetric- name of the metric in lowercaseval_type- metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']k- window size for metric search, default = 6
- Output: The output of the file is
valueof the metric extracted from the filings
- Input: The inputs are -