ctdam.parser.file_collection module¶
- ctdam.parser.file_collection.get_collection(path_to_files, file_suffix='cnv', only_metadata=False, pattern='', sorting_key=None)[source]¶
Factory to create instances of FileCollection, depending on input type.
- Parameters:
path_to_files (
Path|str) – The path to the directory to search for files.file_suffix (
str) – The suffix to search for. (Default value = “cnv”)only_metadata (
bool) – Whether to read only metadata. (Default value = False)pattern (
str) – A filter for file selection. (Default value = ‘’)sorting_key (
Callable|None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- Return type:
Type[FileCollection]
- class ctdam.parser.file_collection.FileCollection(path_to_files, file_suffix, only_metadata=False, pattern='', sorting_key=None)[source]¶
Bases:
UserListA representation of multiple files of the same kind. These files share the same suffix and are otherwise closely connected to each other. A common use case would be the collection of CNVs to allow for easier processing or integration of field calibration measurements.
- Parameters:
path_to_files (
str|Path) – The path to the directory to search for files.file_suffix (
str) – The suffix to search for. (Default value = “cnv”)only_metadata (
bool) – Whether to read only metadata. (Default value = False)pattern (
str) – A filter for file selection. (Default value = ‘’)sorting_key (
Callable|None) – A callable that returns the filename-part to use to sort the collection. (Default value = None)
- extract_file_type(suffix)[source]¶
Determines the file type using the input suffix.
- Parameters:
suffix (
str) – The file suffix.- Return type:
Type[DataFile]
- collect_files(pattern='', sorting_key=<function FileCollection.<lambda>>)[source]¶
Creates a list of target files, recursively from the given directory. These can be sorted with the help of the sorting_key parameter, which is a Callable that identifies the part of the filename that shall be used for sorting.
- Parameters:
pattern (
str) – A filter for file selection. Is given to rglob. (Default value = ‘’)sorting_key (
Callable|None) – The part of the filename to use in sorting. (Default value = lambda file: int(file.stem.split(“_”)[3]))
- Return type:
list[Path]
- load_files(only_metadata=False)[source]¶
Creates python instances of each file.
- Parameters:
only_metadata (
bool) – Whether to load only file metadata. (Default value = False)- Return type:
list[DataFile]
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]¶
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (
bool) – (Default value = False)coordinates (
bool) – (Default value = False)time_correction (
bool) – (Default value = False)cast_identifier (
bool) – (Default value = False)
- Return type:
list[DataFrame]
- get_collection_dataframe(list_of_dfs=None)[source]¶
Creates one DataFrame from the individual ones, by concatenation.
- Parameters:
list_of_dfs (
list[DataFrame] |None) – A list of the individual DataFrames. (Default value = None)- Return type:
DataFrame
- tidy_collection_dataframe(df)[source]¶
Apply the different dataframe edits to the given dataframe.
- Parameters:
df (
DataFrame) – A DataFrame to edit.- Return type:
DataFrame
- use_bad_flag_for_nan(df)[source]¶
Replace all Nan values by the bad flag value, defined inside the files.
- Parameters:
df (
DataFrame) – The dataframe to edit.- Return type:
DataFrame
- set_dtype_to_float(df)[source]¶
Use the float-dtype for all DataFrame columns.
- Parameters:
df (
DataFrame) – The dataframe to edit.- Return type:
DataFrame
- class ctdam.parser.file_collection.CnvCollection(*args, **kwargs)[source]¶
Bases:
FileCollectionSpecific methods to work with collections of .cnv files.
- get_dataframes(event_log=False, coordinates=False, time_correction=False, cast_identifier=False)[source]¶
Collects all individual dataframes and allows additional column creation.
- Parameters:
event_log (
bool) – (Default value = False)coordinates (
bool) – (Default value = False)time_correction (
bool) – (Default value = False)cast_identifier (
bool) – (Default value = False)
- Return type:
list[DataFrame]
- class ctdam.parser.file_collection.HexCollection(*args, xmlcon_pattern='', path_to_xmlcons='', **kwargs)[source]¶
Bases:
FileCollectionSpecific methods to work with collections of .hex files.
Especially concerned with the detection of corresponding .XMLCON files.
- get_xmlcons()[source]¶
Returns all .xmlcon files found inside the root directory and its children, matching a given pattern.
Does use the global sorting_key to attempt to also sort the xmlcons the same way. This is meant to be used in the future for a more specific hex-xmlcon matching.
- Return type:
list[str]