feat: Implement auto-renaming for file uploads to prevent S3 filename conflicts by abuzarmahmood · Pull Request #11 · abuzarmahmood/ocr_database

abuzarmahmood · 2025-02-08T22:58:23Z

No description provided.

… conflicts

abuzarmahmood · 2025-02-08T22:58:26Z

Generated edit command:

Summary of User's Issues and Requests:
The user has requested an enhancement to the file upload functionality in the ocr_database repository. Specifically, they want to implement an auto-renaming feature for files if the filename already exists in the S3 bucket before uploading. This involves modifying the upload logic to check for existing filenames and rename the file if necessary. Additionally, tests should be added to ensure this functionality works as expected.
Overview of Plan to Address the Issues:
1. Modify the Upload.py file to include logic that checks for existing filenames in the S3 bucket and renames the file by appending a unique identifier if a conflict is detected.
2. Update the s3_test.py file to include test cases that verify the auto-renaming functionality works correctly when a filename conflict occurs.

Specific Details of Changes to be Made:

Modifications in Upload.py:

File Path: /home/exouser/Desktop/blech_github_bot/src/repos/abuzarmahmood/ocr_database/src/pages/Upload.py
Changes:
- Add a function get_unique_filename to generate a unique filename if a conflict is detected.
- Modify the file upload logic to use the unique filename.

import time

def get_unique_filename(s3, path, filename):
    base, ext = os.path.splitext(filename)
    counter = 1
    new_filename = filename
    while s3.exists(os.path.join(path, new_filename)):
        new_filename = f"{base}_{int(time.time())}_{counter}{ext}"
        counter += 1
    return new_filename

# Inside the form submission logic
if submit_button:
    if uploaded_file is not None:
        # Get a unique filename
        unique_filename = get_unique_filename(s3, save_path, uploaded_file.name)
        
        # Use unique_filename instead of uploaded_file.name
        save_page_path = os.path.join(
            save_path,
            unique_filename.split('.')[0] + '_' + str(i) + '.pdf'
        )
        
        # Continue with the existing upload logic

Modifications in s3_test.py:

File Path: /home/exouser/Desktop/blech_github_bot/src/repos/abuzarmahmood/ocr_database/src/_tests/s3_test.py
Changes:
- Add a test case test_auto_rename_on_conflict to verify that files are renamed correctly when a filename conflict occurs.

def test_auto_rename_on_conflict():
    # Setup: Upload a file with a known name
    original_filename = 'test_file.pdf'
    s3.put(original_filename, f's3://ocr-database-s3/{original_filename}')
    
    # Attempt to upload the same file again
    new_filename = get_unique_filename(s3, 's3://ocr-database-s3', original_filename)
    s3.put(original_filename, f's3://ocr-database-s3/{new_filename}')
    
    # Assert: Check that the new file exists with a different name
    assert s3.exists(f's3://ocr-database-s3/{new_filename}')
    assert new_filename != original_filename

These changes should effectively implement the auto-renaming feature and ensure it is tested thoroughly.

TERMINATE

Aider output:

Found ANTHROPIC_API_KEY so using sonnet since no --model was specified.
Aider v0.74.1
Main model: claude-3-5-sonnet-20241022 with diff edit format, infinite output
Weak model: claude-3-5-haiku-20241022
Git repo: .git with 9 files
Repo-map: using 4096 tokens, auto refresh


src/_tests/s3_test.py

src/pages/Upload.py
I'll help implement the auto-renaming feature using proper SEARCH/REPLACE blocks. Let's make the changes step by step:                                                                                                                           

 1 First, let's add the new imports and helper function to Upload.py:                                                                                                                                                                            

src/pages/Upload.py                                                                                                                                                                                                                              

                                                                                                                                                                                                                                                 
 <<<<<<< SEARCH                                                                                                                                                                                                                                  
 import streamlit as st                                                                                                                                                                                                                          
 import pandas as pd                                                                                                                                                                                                                             
 from io import StringIO                                                                                                                                                                                                                         
 import os                                                                                                                                                                                                                                       
 import pandas as pd                                                                                                                                                                                                                             
 from datetime import datetime                                                                                                                                                                                                                   
 from PyPDF2 import PdfWriter, PdfReader                                                                                                                                                                                                         
 import s3fs                                                                                                                                                                                                                                     
 =======                                                                                                                                                                                                                                         
 import streamlit as st                                                                                                                                                                                                                          
 import pandas as pd                                                                                                                                                                                                                             
 from io import StringIO                                                                                                                                                                                                                         
 import os                                                                                                                                                                                                                                       
 import pandas as pd                                                                                                                                                                                                                             
 from datetime import datetime                                                                                                                                                                                                                   
 from PyPDF2 import PdfWriter, PdfReader                                                                                                                                                                                                         
 import s3fs                                                                                                                                                                                                                                     
 import time                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                 
 def get_unique_filename(s3, path, filename):                                                                                                                                                                                                    
     """Generate a unique filename if a conflict is detected"""                                                                                                                                                                                  
     base, ext = os.path.splitext(filename)                                                                                                                                                                                                      
     counter = 1                                                                                                                                                                                                                                 
     new_filename = filename                                                                                                                                                                                                                     
     while s3.exists(os.path.join(path, new_filename)):                                                                                                                                                                                          
         new_filename = f"{base}_{int(time.time())}_{counter}{ext}"                                                                                                                                                                              
         counter += 1                                                                                                                                                                                                                            
     return new_filename                                                                                                                                                                                                                         
 >>>>>>> REPLACE                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                 

 2 Now modify the file upload logic to use the unique filename:                                                                                                                                                                                  

src/pages/Upload.py                                                                                                                                                                                                                              

                                                                                                                                                                                                                                                 
 <<<<<<< SEARCH                                                                                                                                                                                                                                  
             save_page_path = os.path.join(                                                                                                                                                                                                      
                     save_path,                                                                                                                                                                                                                  
                     uploaded_file.name.split('.')[0] + '_' + str(i) + '.pdf'                                                                                                                                                                    
                     )                                                                                                                                                                                                                           
 =======                                                                                                                                                                                                                                         
             base_filename = get_unique_filename(s3, save_path, uploaded_file.name)                                                                                                                                                              
             save_page_path = os.path.join(                                                                                                                                                                                                      
                     save_path,                                                                                                                                                                                                                  
                     base_filename.split('.')[0] + '_' + str(i) + '.pdf'                                                                                                                                                                         
                     )                                                                                                                                                                                                                           
 >>>>>>> REPLACE                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                 

 3 Let's update the test file with new test cases:                                                                                                                                                                                               

src/_tests/s3_test.py                                                                                                                                                                                                                            

                                                                                                                                                                                                                                                 
 <<<<<<< SEARCH                                                                                                                                                                                                                                  
 import s3fs                                                                                                                                                                                                                                     
 import os                                                                                                                                                                                                                                       
 import streamlit as st                                                                                                                                                                                                                          
 import pandas as pd                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                 
 s3 = s3fs.S3FileSystem(                                                                                                                                                                                                                         
         anon=False,                                                                                                                                                                                                                             
         key = st.secrets["S3_KEY"],                                                                                                                                                                                                             
         secret = st.secrets["S3_SECRET"]                                                                                                                                                                                                        
         )                                                                                                                                                                                                                                       
 s3.ls('s3://ocr-database-s3')                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                 
 wanted_files = [x for x in os.listdir('.') if x.endswith('.csv')][0]                                                                                                                                                                            
                                                                                                                                                                                                                                                 
 # Upload file                                                                                                                                                                                                                                   
 s3.put(wanted_files, 's3://ocr-database-s3/' + wanted_files)                                                                                                                                                                                    
                                                                                                                                                                                                                                                 
 # Load cvs to dataframe                                                                                                                                                                                                                         
 df = pd.read_csv(s3.open('s3://ocr-database-s3/' + wanted_files))                                                                                                                                                                               
 =======                                                                                                                                                                                                                                         
 import s3fs                                                                                                                                                                                                                                     
 import os                                                                                                                                                                                                                                       
 import streamlit as st                                                                                                                                                                                                                          
 import pandas as pd                                                                                                                                                                                                                             
 import pytest                                                                                                                                                                                                                                   
 import time                                                                                                                                                                                                                                     
 from pages.Upload import get_unique_filename                                                                                                                                                                                                    
                                                                                                                                                                                                                                                 
 s3 = s3fs.S3FileSystem(                                                                                                                                                                                                                         
         anon=False,                                                                                                                                                                                                                             
         key = st.secrets["S3_KEY"],                                                                                                                                                                                                             
         secret = st.secrets["S3_SECRET"]                                                                                                                                                                                                        
         )                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                 
 def test_get_unique_filename():                                                                                                                                                                                                                 
     # Setup test environment                                                                                                                                                                                                                    
     test_path = f's3://{st.secrets["S3_BUCKET_NAME"]}/test'                                                                                                                                                                                     
     test_filename = 'test_file.pdf'                                                                                                                                                                                                             
                                                                                                                                                                                                                                                 
     # Create a test file                                                                                                                                                                                                                        
     with s3.open(os.path.join(test_path, test_filename), 'wb') as f:                                                                                                                                                                            
         f.write(b'test content')                                                                                                                                                                                                                
                                                                                                                                                                                                                                                 
     # Test unique filename generation                                                                                                                                                                                                           
     unique_name = get_unique_filename(s3, test_path, test_filename)                                                                                                                                                                             
     assert unique_name != test_filename                                                                                                                                                                                                         
     assert unique_name.startswith('test_file_')                                                                                                                                                                                                 
     assert unique_name.endswith('.pdf')                                                                                                                                                                                                         
                                                                                                                                                                                                                                                 
     # Cleanup                                                                                                                                                                                                                                   
     s3.rm(os.path.join(test_path, test_filename))                                                                                                                                                                                               
                                                                                                                                                                                                                                                 
 def test_file_upload():                                                                                                                                                                                                                         
     wanted_files = [x for x in os.listdir('.') if x.endswith('.csv')][0]                                                                                                                                                                        
                                                                                                                                                                                                                                                 
     # Upload file                                                                                                                                                                                                                               
     s3.put(wanted_files, 's3://ocr-database-s3/' + wanted_files)                                                                                                                                                                                
                                                                                                                                                                                                                                                 
     # Load csv to dataframe                                                                                                                                                                                                                     
     df = pd.read_csv(s3.open('s3://ocr-database-s3/' + wanted_files))                                                                                                                                                                           
     assert not df.empty                                                                                                                                                                                                                         
 >>>>>>> REPLACE                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                 

These changes will:                                                                                                                                                                                                                              

 1 Add a function to generate unique filenames                                                                                                                                                                                                   
 2 Modify the upload process to use unique filenames                                                                                                                                                                                             
 3 Add tests to verify the functionality                                                                                                                                                                                                         

To run the tests, you can use:                                                                                                                                                                                                                   

                                                                                                                                                                                                                                                 
 python -m pytest src/_tests/s3_test.py -v                                                                                                                                                                                                       
                                                                                                                                                                                                                                                 

Tokens: 4.2k sent, 904 received. Cost: $0.03 message, $0.03 session.
Applied edit to src/_tests/s3_test.py
Applied edit to src/pages/Upload.py
Commit 9e99592 feat: Implement auto-renaming for file uploads to prevent S3 filename conflicts

python -m pytest src/_tests/s3_test.py -v

This response was automatically generated by blech_bot

feat: Implement auto-renaming for file uploads to prevent S3 filename…

9e99592

… conflicts

abuzarmahmood linked an issue Feb 8, 2025 that may be closed by this pull request

auto-rename if filename already present #8

Open

abuzarmahmood mentioned this pull request Feb 8, 2025

auto-rename if filename already present #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement auto-renaming for file uploads to prevent S3 filename conflicts#11

feat: Implement auto-renaming for file uploads to prevent S3 filename conflicts#11
abuzarmahmood wants to merge 1 commit intomasterfrom
8-auto-rename-if-filename-already-present

abuzarmahmood commented Feb 8, 2025

Uh oh!

abuzarmahmood commented Feb 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abuzarmahmood commented Feb 8, 2025

Uh oh!

abuzarmahmood commented Feb 8, 2025

Generated edit command:

Aider output:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant