Skip to content

I have developed a Text Summarizer that accepts both text and PDF inputs. It utilizes the NLTK algorithm for summarization and employs lemmatization for improved accuracy. Additionally, the tool provides an audio file containing the summary. Moreover, it supports the option to read/listen to the summary in different languages.

Notifications You must be signed in to change notification settings

Devanshu1013/Saransh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Saransh

Project Definition: The project aims to develop a comprehensive text summarization and translation tool that allows users to efficiently process and analyze textual content. The tool encompasses various functionalities, including reading and extracting text from PDF files, generating audio summaries, and translating these summaries into different languages.

Features:
• Automatic Summarization: Saransh uses natural language processing algorithms to automatically generate summaries from lengthy text document. It analyzes the content, identifies key information, and creates a coherent summary. • Customizable Summarization Parameters: Users can do summarization process by adjusting parameters such as summary length. • Multilingual Support: Users can do summarization process by adjusting parameters such as summary length, key terms emphasis, and more.

Workflow of this project:

  1. Text Extraction: • Develop a PDF reader capable of extracting text from specific pages of a PDF document. Also, it includes the text input.
  2. Word Count: • Implement a word count function to analyze the length of the input text.
  3. Text-to-Speech: • Utilize the pyttsx3 library to convert the input text into audio format.
  4. Text Summarization: • With the help if nltk algorithm it will create a text summarization function that generates a summary based on word frequency and sentence importance. • It does this thing with the help of lemmatization that reducing words to their base or root form, known as the lemma. • The primary objective of lemmatization is to simplify the words so that different inflections or variations of a word are represented as a common base form.
  5. Variable Length Summarization: • Implement a summarization function with a variable length, allowing users to control the summary length as a percentage of the original text.
  6. Translation: • Develop a translation function that translates the generated summary into multiple languages using the translate library.
  7. Chunked Translation: • Enhance the translation process by breaking down the text into smaller chunks for efficient translation.
  8. Audio Output: • Save the generated summaries and translations as audio files for user convenience. Tools and Technologies: • Python: Programming language for script development. • NLTK (Natural Language Toolkit): Library for text processing tasks. • Lemmatization: For the Reduction of word variations, enhanced consistency, improved frequency analysis, removal of stop words, semantic understanding. • PyMuPDF: Library for working with PDF files. • pyttsx3: Library for text-to-speech conversion. • Translate Library: for language translation. • Operating System (os) Module: for file operations.

Use Case Area:

  1. News articles or blog summarization to quickly grasp the main points.
  2. Summarizing lengthy research papers or reports for easier digestion.
  3. Summarizing legal documents for faster understanding of key points.
  4. Summarizing customer feedback for quick analysis.
  5. Last minute exam preparation.

Conclusion: The developed tool provides a versatile and efficient solution for processing textual content. Users can extract, summarize, and translate text seamlessly, with the option to control the length of summaries. This project opens avenues for applications in research, education, and accessibility, demonstrating the power of natural language processing and text analysis.

About

I have developed a Text Summarizer that accepts both text and PDF inputs. It utilizes the NLTK algorithm for summarization and employs lemmatization for improved accuracy. Additionally, the tool provides an audio file containing the summary. Moreover, it supports the option to read/listen to the summary in different languages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published