Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
73731f9
added .janno column Species
nevrome Aug 24, 2025
9fcbbf1
turned Collection_ID into a list column
nevrome Aug 24, 2025
7ae46fc
added a Custodian_Institution column
nevrome Aug 24, 2025
f2c2fec
added .janno columns for cultural and archaeological group attribution
nevrome Aug 27, 2025
ff70a71
added the new Chromosomal_Anomalies .janno column
nevrome Aug 27, 2025
014e154
soft-retiring of the Capture_Type ReferenceGenome
nevrome Aug 27, 2025
45ab988
turned Damage into a list-column
nevrome Aug 27, 2025
1823987
mechanism to apply adjustments to .janno columns Endogenous and Damag…
nevrome Aug 27, 2025
6858403
added the new submitted_md5 column to the .ssf implementation and als…
nevrome Aug 28, 2025
566ef0b
replaced Source_Tissue with Source_Material + Source_Material_Note
nevrome Aug 28, 2025
6afce07
note fields should not be list columns
nevrome Aug 28, 2025
7f9b8b2
stylish-haskell
nevrome Aug 28, 2025
acc47e6
reworking the tests after the preceding changes
nevrome Aug 28, 2025
ca6174e
moved the Species column after the mandatory ones
nevrome Aug 28, 2025
f3576be
testing two new possible error states
nevrome Aug 28, 2025
5f31750
update of golden test data
nevrome Aug 28, 2025
6c6400c
stylish-haskell
nevrome Aug 28, 2025
4c232d1
first, entirely untested algorithm to automatically position _Note co…
nevrome Sep 8, 2025
7411850
testing and tweaking of makeHeaderWithAdditionalColumns + removing no…
nevrome Sep 9, 2025
4581e6d
brought back some note columns for the init template
nevrome Sep 9, 2025
35f2eca
stylish-haskell
nevrome Sep 9, 2025
e329520
adjusted regular tests
nevrome Sep 9, 2025
7707e3e
adjustments to golden test test data
nevrome Sep 9, 2025
12a8f32
added a little test for the new makeHeaderWithAdditionalColumns
nevrome Sep 9, 2025
1820432
implemented code layout changes suggested by @stschiff
nevrome Sep 13, 2025
cb2b59a
changelog entry
nevrome Sep 13, 2025
ca59cd9
Merge pull request #358 from poseidon-framework/jannoColOutSort
nevrome Sep 13, 2025
b7b840e
merge conflict
nevrome Oct 27, 2025
667edfb
solving merge conflict
nevrome Oct 27, 2025
d20720c
stylish haskell
nevrome Oct 27, 2025
0d4a557
update of changelog, to bring it in sync with the schema repo dev PR
nevrome Dec 1, 2025
b276cdb
added the new column Alternative_IDs_Context
nevrome Dec 8, 2025
b864005
ran and updated tests, found a bug in the implementation
nevrome Dec 8, 2025
2f4b817
Merge branch 'master' into poseidon300cols
nevrome Dec 15, 2025
dacf80d
added new column Indvidual_ID
nevrome Jan 8, 2026
6745a51
merge conflict
nevrome Jan 16, 2026
f5cbffa
solved merge conflict
nevrome Jan 16, 2026
0abb550
added the option WISC2013 to the Capture_Type janno type
nevrome Jan 16, 2026
638f213
first draft of a mechanism to construct values from .janno/.ssf field…
nevrome Jan 16, 2026
f5ffe44
moved Endogenous and Damage rescaling to the respective make functions
nevrome Jan 17, 2026
4076e0b
adjusted tests
nevrome Jan 17, 2026
3e767e4
stylish haskell
nevrome Jan 17, 2026
54ec268
handling of ReferenceGenome in Capture_Type depending on the Poseidon…
nevrome Jan 18, 2026
f6ca17a
switch to Poseidon v3.0.0.
nevrome Jan 18, 2026
7132fd9
added a warning about the outdated Source_Tissue column
nevrome Jan 18, 2026
c699e45
made some cross-column consistency checks dependent on the Poseidon v…
nevrome Jan 18, 2026
fa035f8
changelog
nevrome Jan 18, 2026
aa4e4db
Merge pull request #364 from poseidon-framework/versionedcsvparsing
nevrome Jan 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
- V X.X.X.X:
- Introduced smart .janno field construction based on the relevant Poseidon version.
- Changes to .janno columns according to Poseidon v3.0.0:
- Replaced column `Source_Tissue` with column `Source_Material`.
- New column `Individual_ID`.
- New column `Species`.
- New column `Alternative_IDs_Context` linked to `Alternative_IDs`.
- New column `Custodian_Institution`.
- New columns `Cultural_Era` + `Cultural_Era_URL` and `Archaeological_Culture` + `Archaeological_Culture_URL`.
- New column `Chromosomal_Anomalies`.
- Made column `Collection_ID` a list column.
- Soft-retired the option `ReferenceGenome` in the column `Capture_Type`.
- Added rescaling feature for the columns `Endogenous` and `Damage` for packages below Poseidon v3.0.0..
- Made column `Damage` a list column.
- Added the option `WISC2013` to the column `Capture_Type`.
- Changed the handling of `_Note` columns. Previously they were explicitly specified and part of the `JannoRow` record type. Now they are just treated as arbitrary additional columns that get algorithmically sorted in when writing .janno files (e.g. in `forge`). See `makeHeaderWithAdditionalColumns`.
- Changes to .ssf columns according to Poseidon v3.0.0:
- New column `submitted_md5`.
- V 1.6.8.0:
- Added a mechanism to check for the presence and completeness of usually optional .janno and .ssf columns. It is exclusively used in `validate`, where a user can set one or multiple of these additional mandatory columns with `-j,--mandatoryJannoColumn` and `-s,--mandatorySSFColumn`.
- Fixed the golden tests for `validate`. They had become ineffective, because `validate` does not generate stdout any more.
Expand Down
49 changes: 25 additions & 24 deletions src/Poseidon/CLI/Jannocoalesce.hs
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,29 @@

module Poseidon.CLI.Jannocoalesce where

import Poseidon.Janno (JannoRow (..), JannoRows (..),
parseJannoRowFromNamedRecord,
readJannoFile, writeJannoFile)
import Poseidon.Package (PackageReadOptions (..),
defaultPackageReadOptions,
getJointJanno,
readPoseidonPackageCollection)
import Poseidon.Utils (PoseidonException (..), PoseidonIO,
logDebug, logInfo, logWarning)
import Poseidon.Janno (JannoRow (..), JannoRows (..),
parseJannoRowFromNamedRecord,
readJannoFile, writeJannoFile)
import Poseidon.Package (PackageReadOptions (..),
defaultPackageReadOptions,
getJointJanno,
readPoseidonPackageCollection)
import Poseidon.Utils (PoseidonException (..), PoseidonIO,
logDebug, logInfo, logWarning)

import Control.Monad (filterM, forM_, when)
import Control.Monad.Catch (MonadThrow, throwM)
import Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString.Char8 as BSC
import qualified Data.Csv as Csv
import qualified Data.HashMap.Strict as HM
import qualified Data.IORef as R
import Data.List ((\\))
import Data.Text (pack, replace, unpack)
import System.Directory (createDirectoryIfMissing)
import System.FilePath (takeDirectory)
import Text.Regex.TDFA ((=~))
import Control.Monad (filterM, forM_, when)
import Control.Monad.Catch (MonadThrow, throwM)
import Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString.Char8 as BSC
import qualified Data.Csv as Csv
import qualified Data.HashMap.Strict as HM
import qualified Data.IORef as R
import Data.List ((\\))
import Data.Text (pack, replace, unpack)
import Poseidon.PoseidonVersion (latestPoseidonVersion)
import System.Directory (createDirectoryIfMissing)
import System.FilePath (takeDirectory)
import Text.Regex.TDFA ((=~))

-- the source can be a single janno file, or a set of base directories as usual.
data JannoSourceSpec = JannoSourceSingle FilePath | JannoSourceBaseDirs [FilePath]
Expand All @@ -48,7 +49,7 @@ data JannoCoalesceOptions = JannoCoalesceOptions
runJannocoalesce :: JannoCoalesceOptions -> PoseidonIO ()
runJannocoalesce (JannoCoalesceOptions sourceSpec target outSpec fields overwrite sKey tKey maybeStrip) = do
JannoRows sourceRows <- case sourceSpec of
JannoSourceSingle sourceFile -> readJannoFile [] sourceFile
JannoSourceSingle sourceFile -> readJannoFile latestPoseidonVersion [] sourceFile
JannoSourceBaseDirs sourceDirs -> do
let pacReadOpts = defaultPackageReadOptions {
_readOptIgnoreChecksums = True
Expand All @@ -57,7 +58,7 @@ runJannocoalesce (JannoCoalesceOptions sourceSpec target outSpec fields overwrit
, _readOptOnlyLatest = True
}
getJointJanno <$> readPoseidonPackageCollection pacReadOpts sourceDirs
JannoRows targetRows <- readJannoFile [] target
JannoRows targetRows <- readJannoFile latestPoseidonVersion [] target

newJanno <- makeNewJannoRows sourceRows targetRows fields overwrite sKey tKey maybeStrip

Expand Down Expand Up @@ -123,7 +124,7 @@ mergeRow cp targetRow sourceRow fields overwrite sKey tKey = do
-- fill in the target row with dummy values for desired fields that might not be present yet
targetComplete = HM.union targetRowRecord (HM.fromList $ map (, BSC.empty) sourceKeysDesired)
newRowRecord = HM.mapWithKey fillFromSource targetComplete
parseResult = Csv.runParser . parseJannoRowFromNamedRecord [] $ newRowRecord
parseResult = Csv.runParser . parseJannoRowFromNamedRecord latestPoseidonVersion [] $ newRowRecord
logInfo $ "matched target " ++ BSC.unpack (targetComplete HM.! BSC.pack tKey) ++
" with source " ++ BSC.unpack (sourceRowRecord HM.! BSC.pack sKey)
case parseResult of
Expand Down
54 changes: 28 additions & 26 deletions src/Poseidon/CLI/Rectify.hs
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,34 @@ module Poseidon.CLI.Rectify (
runRectify, RectifyOptions (..), PackageVersionUpdate (..), ChecksumsToRectify (..)
) where

import Poseidon.Contributor (ContributorSpec (..))
import Poseidon.EntityTypes (HasNameAndVersion (..),
PacNameAndVersion (..),
renderNameWithVersion)
import Poseidon.GenotypeData (GenotypeDataSpec (..),
GenotypeFileSpec (..))
import Poseidon.Janno (writeJannoFileWithoutEmptyCols)
import Poseidon.Package (PackageReadOptions (..),
PoseidonPackage (..),
defaultPackageReadOptions,
readPoseidonPackageCollection,
writePoseidonPackage)
import Poseidon.Utils (PoseidonIO, getChecksum, logDebug,
logInfo, logWarning)
import Poseidon.Version (VersionComponent (..),
updateThreeComponentVersion)
import Poseidon.Contributor (ContributorSpec (..))
import Poseidon.EntityTypes (HasNameAndVersion (..),
PacNameAndVersion (..),
renderNameWithVersion)
import Poseidon.GenotypeData (GenotypeDataSpec (..),
GenotypeFileSpec (..))
import Poseidon.Janno (writeJannoFileWithoutEmptyCols)
import Poseidon.Package (PackageReadOptions (..),
PoseidonPackage (..),
defaultPackageReadOptions,
readPoseidonPackageCollection,
writePoseidonPackage)
import Poseidon.PoseidonVersion (PoseidonVersion (..))
import Poseidon.Utils (PoseidonIO, getChecksum, logDebug,
logInfo, logWarning)
import Poseidon.Version (VersionComponent (..),
updateThreeComponentVersion)

import Control.DeepSeq ((<$!!>))
import Control.Monad (when)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Time (UTCTime (..), getCurrentTime)
import Data.Version (Version (..), makeVersion, showVersion)
import System.Directory (doesFileExist, removeFile)
import System.FilePath ((</>))
import Control.DeepSeq ((<$!!>))
import Control.Monad (when)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.List (nub)
import Data.Maybe (fromJust)
import Data.Time (UTCTime (..), getCurrentTime)
import Data.Version (Version (..), makeVersion,
showVersion)
import System.Directory (doesFileExist, removeFile)
import System.FilePath ((</>))

data RectifyOptions = RectifyOptions
{ _rectifyBaseDirs :: [FilePath]
Expand Down Expand Up @@ -96,7 +98,7 @@ updatePoseidonVersion :: Maybe Version -> PoseidonPackage -> PoseidonIO Poseidon
updatePoseidonVersion Nothing pac = return pac
updatePoseidonVersion (Just ver) pac = do
logDebug "Updating Poseidon version"
return pac { posPacPoseidonVersion = ver }
return pac { posPacPoseidonVersion = PoseidonVersion ver }

addContributors :: Maybe [ContributorSpec] -> PoseidonPackage -> PoseidonIO PoseidonPackage
addContributors Nothing pac = return pac
Expand Down
5 changes: 3 additions & 2 deletions src/Poseidon/CLI/Validate.hs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import qualified Data.ByteString.Char8 as Bchs
import Data.List (groupBy, intercalate, sortOn)
import Data.Yaml (decodeEither')
import Poseidon.EntityTypes (IndividualInfo (..))
import Poseidon.PoseidonVersion (latestPoseidonVersion)
import System.Exit (exitFailure, exitSuccess)

-- | A datatype representing command line options for the validate command
Expand Down Expand Up @@ -106,12 +107,12 @@ runValidate (ValidateOptions (ValPlanGeno geno) _ _ noExitCode _) = do
conclude True noExitCode
runValidate (ValidateOptions (ValPlanJanno path) mandatoryJannoCols _ noExitCode _) = do
logInfo $ "Validating: " ++ path
(JannoRows entries) <- readJannoFile mandatoryJannoCols path
(JannoRows entries) <- readJannoFile latestPoseidonVersion mandatoryJannoCols path
logInfo $ "All " ++ show (length entries) ++ " entries are valid"
conclude True noExitCode
runValidate (ValidateOptions (ValPlanSSF path) _ mandatorySSFCols noExitCode _) = do
logInfo $ "Validating: " ++ path
(SeqSourceRows entries) <- readSeqSourceFile mandatorySSFCols path
(SeqSourceRows entries) <- readSeqSourceFile latestPoseidonVersion mandatorySSFCols path
logInfo $ "All " ++ show (length entries) ++ " entries are valid"
conclude True noExitCode
runValidate (ValidateOptions (ValPlanBib path) _ _ noExitCode _) = do
Expand Down
Loading