s-ccs · behinger · Dec 5, 2025 · Dec 1, 2025 · Dec 1, 2025 · Dec 4, 2025
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,7 @@ empty_log_process_temp.py
 
 tests/**/bids/
 tests/test_main_functionality/data/projects/test-project/sub-100
+tests/data
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/docs/bids_convert_and_upload.md b/docs/bids_convert_and_upload.md
@@ -91,15 +91,15 @@ If otherFilesUsed=True in project config file:
 
 1. Behavioral files are copied via `_copy_behavioral_files()`.
 
-    - Validates required files against TOML config (`OtherFilesInfo`). In this config we add the the extensions of the expected other files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required other files. We also have mandatory labnotebook and participant info files in .tsv format.
-    - Renames files to include sub-XXX_ses-YYY_ prefix if missing.
-    - Deletes the other files in the project_other directory that are not listed in `OtherFilesInfo` in the project config file. It doesn"t delete from the source directory, only from out BIDS dataset.
+    - Validates required files against TOML config (`OtherFilesInfo`). In this config we add the the extensions of the expected other files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required other files. We also typically have mandatory labnotebook and participant info files in .tsv format.
+    - The `"*.src"="beh/{prefix}_target"` allows users to easily add BIDS-compatible custom data from the experiments. Note that `json` sidecars are not automatically generated yet.
+
 
 2. Experimental files are copied via `_copy_experiment_files().`
 
-    - Gathers files from the experiment folder.
+    - Gathers files from the `<PROJECTS_OTHER>/experiment/` folder.
     - Copies into BIDS `misc/` directory i.e. `<BIDS_ROOT>/misc/`
-    - Compresses into experiment.tar.gz.
+    - Compresses into `experiment.tar.gz`.
     - Removes the uncompressed folder.
 
 There is a flag in the `lslautobids run` command called `--redo_other_pc` which when specified, forces overwriting of existing other and experiment files in the BIDS dataset. This is useful if there are updates or corrections to the other/behavioral data that need to be reflected in the BIDS dataset.
@@ -121,7 +121,7 @@ This produces a clean, memory-efficient Raw object ready for BIDS conversion.
 #### BIDS Validation (`validate_bids()`)
 This function validates the generated BIDS files using the `bids-validator` package. It performs the following steps:
 - Walks through the BIDS directory.
-- Skips irrelevant files: (`.xdf`, `.tar.gz`, behavioral files, hidden/system files.)
+- Skips irrelevant files already ignored in `.bidsignore` (`misc` folder, some hidden files)
 - Uses `BIDSValidator` to validate relative paths. 
 - If any file fails validation, logs an error and returns 0 ; Otherwise, logs success and returns 1.
 

diff --git a/docs/data_organization.md b/docs/data_organization.md
@@ -54,7 +54,7 @@ Filename Convention for the raw data files :
 
 ## Project Other Folder
 
-This folder contains the experimental and behavioral files which we also store in the dataverse. The folder structure is should as follows:
+This folder contains the experimental and behavioral files which we also store in the dataverse. The folder structure has to be as follows:
 
         projectname/
         └── experiment
@@ -65,6 +65,7 @@ This folder contains the experimental and behavioral files which we also store i
                     └── beh
                         └── behavioral_files((lab notebook, CSV, EDF file, etc))
 
+It is possible to modify the `src=target` syntax to "skip" folders via `..` (maybe we should simply allow `{prefix}` in the src as well => not yet implemented)
 - **projectname** - any descriptive name for the project
 - **experiment** - contains the experimental files for the project. Eg: showOther.m, showOther.py
 - **data** - contains the behavioral files for the corresponding subject. Eg: experimentalParameters.csv, eyetrackingdata.edf, results.tsv. 
@@ -91,8 +92,9 @@ This folder contains the converted BIDS data files and other files we want to ve
                     .........
             └── beh
                 └──behavioral files (other files)
-            └── misc
+            └── misc (added to .bidsignore)
                 └── experimental files (This needs to stored in zip format)
+                └── labnotebook, subjectform etc. 
     └── sourcedata
         └── raw xdf files
     └── dataset_description.json

diff --git a/docs/developers_documentation.md b/docs/developers_documentation.md
@@ -270,9 +270,8 @@ If otherFilesUsed=True in project config file:
 
 1. Behavioral files are copied via `_copy_behavioral_files()`.
 
-    - Validates required files against TOML config (`OtherFilesInfo`). In this config we add the the extensions of the expected other files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required other files. We also have mandatory labnotebook and participant info files in .tsv format.
-    - Renames files to include sub-XXX_ses-YYY_ prefix if missing.
-    - Deletes the other files in the project_other directory that are not listed in `OtherFilesInfo` in the project config file. It doesn"t delete from the source directory, only from out BIDS dataset.
+    - Validates required files against TOML config (`OtherFilesInfo`). In this config we add the the extensions of the expected other files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required other files. We also typically use a mandatory labnotebook and participant info files in .tsv format. Currently it is not possible to convert files in this step, but should maybe become possible for e.g. `EDF` files and `CSV=>TSV` files
+    - follows the src=target regexp syntax to copy files over
 
 2. Experimental files are copied via `_copy_experiment_files().`
 
@@ -300,7 +299,7 @@ This produces a clean, memory-efficient Raw object ready for BIDS conversion.
 #### 5. BIDS Validation (`validate_bids()`)
 This function validates the generated BIDS files using the `bids-validator` package. It performs the following steps:
 - Walks through the BIDS directory.
-- Skips irrelevant files: (`.xdf`, `.tar.gz`, behavioral files, hidden/system files.)
+- Skips irrelevant files: (`misc`-folder, hidden/system files.)
 - Uses `BIDSValidator` to validate relative paths. 
 - If any file fails validation, logs an error and returns 0 ; Otherwise, logs success and returns 1.
 

diff --git a/docs/tutorial.md b/docs/tutorial.md
@@ -103,7 +103,7 @@ In this example, we will see how to use the LSLAutoBIDS package to:
     otherFilesUsed = true
 
   [OtherFilesInfo]
-    expectedOtherFiles = [".edf", ".csv", "_labnotebook.tsv", "_participantform.tsv"]
+    expectedOtherFiles = ["*.edf"="misc/{prefix}_et.edf", "*.csv"="misc/{prefix}_beh.csv", "*_labnotebook.tsv"="misc/{prefix}_labnotebook.tsv", "*_participantform.tsv"="{prefix}_participantform.tsv"]
 ```
 2. Run the conversion and upload command to convert the `xdf` files to BIDS format and upload the data to the dataverse.
 ```

diff --git a/lslautobids/convert_to_bids_and_upload.py b/lslautobids/convert_to_bids_and_upload.py
@@ -2,6 +2,7 @@
 import os
 import shutil
 import sys
+import re
 
 from pyxdf import match_streaminfos, resolve_streams
 from mnelab.io.xdf import read_raw_xdf
@@ -92,101 +93,89 @@ def copy_source_files_to_bids(self,xdf_file,subject_id,session_id,other, logger)
 
     def _copy_behavioral_files(self, file_base, subject_id, session_id, logger):
         """
-        Copy behavioral files to the BIDS structure.
+        Copy behavioral files to the BIDS structure based on regex patterns.
+        Iterates through patterns and matches files, copying them directly to target locations.
 
         Args:
             file_base (str): Base name of the file (without extension).
             subject_id (str): Subject ID.
             session_id (str): Session ID.
+            logger: Logger instance.
         """
+
         project_name = cli_args.project_name
         logger.info("Copying the behavioral files to BIDS...")
+
+        # Get the TOML configuration
+        toml_path = os.path.join(project_root, cli_args.project_name, cli_args.project_name + '_config.toml')
+        data = read_toml_file(toml_path)
+        _expectedotherfiles = data["OtherFilesInfo"]["expectedOtherFiles"]
+
+        if not isinstance(_expectedotherfiles, dict):
+            raise ValueError("expectedOtherFiles must be a dictionary with regex patterns. List format is no longer supported since v0.2.0 .")
+
         # get the source path
-        behavioural_path = os.path.join(project_other_root,project_name,'data', subject_id,session_id,'beh')
-        # get the destination path
-        dest_dir = os.path.join(bids_root , project_name,  subject_id , session_id , 'beh')
-        #check if the directory exists
-        os.makedirs(dest_dir, exist_ok=True)
-
-        processed_files = []
+        behavioural_path = os.path.join(project_other_root, project_name, 'data', subject_id, session_id, 'beh')
+
+        if not os.path.exists(behavioural_path):
+            raise FileNotFoundError(f"Behavioral path does not exist: {behavioural_path} - did you forget to mount?")
+            return
+
         # Extract the sub-xxx_ses-yyy part
         def extract_prefix(filename):
             parts = filename.split("_")
             sub = next((p for p in parts if p.startswith("sub-")), None)
             ses = next((p for p in parts if p.startswith("ses-")), None)
             if sub and ses:
-                return f"{sub}_{ses}_"
+                return f"{sub}_{ses}"
             return None
 
         prefix = extract_prefix(file_base)
-
-        for file in os.listdir(behavioural_path):
-            # Skip non-files (like directories)
-            original_path = os.path.join(behavioural_path, file)
-            if not os.path.isfile(original_path):
-                continue
-
-            if not file.startswith(prefix):
-                logger.info(f"Renaming {file} to include prefix {prefix}")
-                renamed_file = prefix + file
-            else:
-                renamed_file = file
+        processed_files = []
 
-            processed_files.append(renamed_file)
-            dest_file = os.path.join(dest_dir, renamed_file)
+        # Get all files in source directory once
+        source_files = [f for f in os.listdir(behavioural_path) 
+                       if os.path.isfile(os.path.join(behavioural_path, f))]
 
+        # Iterate through patterns (not files)
+        for pattern, target_template in _expectedotherfiles.items():
+            compiled_regex = re.compile(pattern)
+
+            # Find matching files for this pattern
+            matched_files = [f for f in source_files if compiled_regex.match(f)]
+
+            if not matched_files:
+                raise FileExistsError(f"No files matched pattern '{pattern}' in {behavioural_path}")
+
+            if len(matched_files) > 1:
+                raise ValueError(f"Multiple files matched pattern '{pattern}': {matched_files}. Only one file per pattern is supported - manually intervention required")
+
+            # Process the first matching file
+            file = matched_files[0]
+            original_path = os.path.join(behavioural_path, file)
+
+            # Format the target path with prefix
+            target_path = target_template.format(prefix=prefix)
+            dest_file = os.path.join(bids_root, project_name, subject_id, session_id, target_path)
+
+            # Ensure destination directory exists
+            os.makedirs(os.path.dirname(dest_file), exist_ok=True)
+
+            # Track the relative path for checking
+            processed_files.append(target_path)
+
             if cli_args.redo_other_pc:
-                logger.info(f"Copying (overwriting if needed) {file} to {dest_file}")
+                logger.info(f"Copying (overwriting) {file} to {target_path}")
                 shutil.copy(original_path, dest_file)
             else:
                 if os.path.exists(dest_file):
-                    logger.info(f"Behavioural file {file} already exists in BIDS. Skipping.")
+                    logger.info(f"Behavioural file {target_path} already exists in BIDS. Skipping.")
                 else:
-                    logger.info(f"Copying new file {file} to {dest_file}")
+                    logger.info(f"Copying {file} to {target_path}")
                     shutil.copy(original_path, dest_file)
-
-
-
-        unnecessary_files = self._check_required_behavioral_files(processed_files, prefix, logger)
-
-        # remove the unnecessary files
-        for file in unnecessary_files:
-            file_path = os.path.join(dest_dir, file)
-            if os.path.exists(file_path):
-                logger.info(f"Removing unnecessary file: {file_path}")
-                os.remove(file_path)
-            else:
-                logger.warning(f"File to remove does not exist: {file_path}")
-
-
-
-    def _check_required_behavioral_files(self, files, prefix, logger):
-        """
-        Check for required behavioral files after copying.
-
-        Args:
-            files (list): List of copied file names.
-            prefix (str): Expected prefix (e.g., "sub-001_ses-002_").
-        """
-        logger.info("Checking for required behavioral files...")
-
-        # Get the expected file names from the toml file
-        toml_path = os.path.join(project_root, cli_args.project_name, cli_args.project_name + '_config.toml')
-        data = read_toml_file(toml_path)
-
-        required_files = data["OtherFilesInfo"]["expectedOtherFiles"]
-
-
-        for required_file in required_files:
-            if not any(f.startswith(prefix) and f.endswith(required_file) for f in files):
-                raise FileNotFoundError(f"Missing required behavioral file: {required_file}")
 
-        unnecessary_files = []
-        # remove everything except the required files
-        for file in files:
-            if not any(file.endswith(required_file) for required_file in required_files):
-                unnecessary_files.append(file)
-        return unnecessary_files
+        logger.info(f"Successfully processed {len(processed_files)} behavioral files")
+
 
 
     def _copy_experiment_files(self, subject_id, session_id, logger):
@@ -350,8 +339,6 @@ def convert_to_bids(self, xdf_path,subject_id,session_id, run_id, task_id,other,
                 f.write('sourcedata\n')
                 # ignore the code folder - containing log files
                 f.write('code\n')
-                # ignore the beh folder in each sub-xxx/ses-yyys
-                f.write('**/beh\n')
                 # ignore the misc folder in each sub-xxx/ses-yyy
                 f.write('**/misc\n')
                 # ignore hidden files
@@ -387,19 +374,19 @@ def validate_bids(self,bids_path,subject_id,session_id, logger):
                 file_path = os.path.join(root, file)
 
                 # Skip non-relevant files
-                if file_path.endswith(".xdf") or file_path.endswith(".tar.gz") or 'beh' in file_path or file.startswith('.') or '.git' in file_path or os.path.basename(root).startswith('.'):
+                if 'misc' in file_path or file.startswith('.') or '.git' in file_path or os.path.basename(root).startswith('.'):
                     continue
 
                 if root == root_directory:
                     # Validate BIDS for files in the root directory
-                    res = BIDSValidator().is_bids(file)           
+                    res = BIDSValidator().is_bids('/'+file)           
                 else:
                     # Modify file path to be relative to the root directory
                     relative_path = os.path.relpath(file_path, root_directory)
                     res = BIDSValidator().is_bids('/'+relative_path)
 
                 if not res:
-                    print(f"Validation failed for {file_path}")
+                    logger.info(f"Validation failed for {file_path}")
 
 
                 file_paths.append(res)  

diff --git a/lslautobids/gen_project_config.py b/lslautobids/gen_project_config.py
@@ -21,7 +21,19 @@
 
   [OtherFilesInfo]
     otherFilesUsed = true # Set to true if you want to include other (non-eeg-files) files (experiment files, other modalities like eye tracking) in the dataset, else false
-    expectedOtherFiles = [".edf", ".csv", "_labnotebook.tsv", "_participantform.tsv"] # List of expected other file extensions. Only the expected files will be copied to the beh folder in BIDS dataset. Give an empty list [] if you don't want any other files to be in the dataset. In this case only experiment files will be zipeed and copied to the misc folder in BIDS dataset.
+
+  # expectedOtherFiles: Dictionary format with regex patterns
+  # - The key is a regular expression to match source filenames in the project_other/.../beh/ folder
+  # - The value is a template path that includes {prefix} (e.g. sub-003_ses-002) and the target folder (beh/ or misc/)
+  # - Only files matching these patterns will be copied to the BIDS dataset
+  # the following is a sample configuration, you could also write it in short-hand notation: expectedOtherFiles={ ".*.edf"= "beh/{prefix}_physio.edf", ...}
+
+  [OtherFilesInfo.expectedOtherFiles]
+    ".*.edf" = "beh/{prefix}_physio.edf"
+    ".*.csv" = "beh/{prefix}_beh.tsv"
+    ".*_labnotebook.tsv" = "misc/{prefix}_labnotebook.tsv"
+    ".*_participantform.tsv" = "misc/{prefix}_participantform.tsv"
+
 
   [FileSelection]
     ignoreSubjects = ['sub-777'] # List of subjects to ignore during the conversion - Leave empty to include all subjects. Changing this value will not delete already existing subjects.

diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,10 @@
 pyxdf
 mne
 mne-bids
-bids_validator==1.13.1
-datalad-dataverse==1.0.1
-datalad-installer==1.0.3
-pyDataverse==0.3.1
+bids_validator>=1.13.1
+datalad-dataverse>=1.0.1
+datalad-installer>=1.0.3
+pyDataverse>=0.3.1
 requests>=2.12.0
 jsonschema>=3.2.0
 AnnexRemote@git+https://github.com/Lykos153/AnnexRemote.git@master#egg=AnnexRemote
@@ -13,4 +13,4 @@ pyyaml
 mnelab
 pybv
 pytest
-eeglabio
+eeglabio