ctdam.parser.datafiles module¶

class ctdam.parser.datafiles.DataFile(path_to_file, only_header=False)[source]¶

Bases: object

The base class for all Sea-Bird data files, which are .cnv, .btl, and .bl . One instance of this class, or its children, represents one data text file. The different information bits of such a file are structured into individual lists or dictionaries. The data table will be loaded as numpy array and can be converted to a pandas DataFrame. Datatype-specific behavior is implemented in the subclasses.

Parameters:

path_to_file (Path | str) – The file to the data file.
only_header (bool) – Whether to stop reading the file after the metadata header.

path_to_file[source]¶

The path to the file this object represents

Type:: Path

file_name[source]¶

The file name

Type:: str

file_dir[source]¶

The directory the file resides in

Type:: Path

raw_file_data[source]¶

The text file input

Type:: list

header[source]¶

The full file header

Type:: list

sbe9_data[source]¶

Device specific information

Type:: list

metadata[source]¶

Non-SeaBird metadata

Type:: dict

metadata_list[source]¶

Unstructured metadata for easier export

Type:: list

data_table_description[source]¶

The column names and other info

Type:: list

sensor_data[source]¶

The sensor lines

Type:: list

sensors[source]¶

Xml-parsed sensor data

Type:: dict

processing_info[source]¶

Everything after the sensor data

Type:: list

data[source]¶

The data table

Type:: list

metadata[source]¶

Parsed custom metadata

Type:: dict

start_time[source]¶

The start time of the data acquisition

Type:: datetime

start_position[source]¶

Latitude, Longitude tuple

Type:: tuple

cruise[source]¶

The name of the cruise the data belongs to

Type:: str

station[source]¶

The station idenifier of the data

Type:: str

event_name[source]¶

The streamlined data event name, consisting of cruise and station name

Type:: str

read_event_information(regex_string='(?P<c>[a-z]{1,3}\\\\d{1,3})(-|_|\\\\/)?(?P<cn>1|2)?(-|_)(?P<s>\\\\d{1,4})(-|_)(?P<e>\\\\d{1,2})', leading_zeroes=False)[source]¶

Save the event metadata of the cast inside self.station .

Additionally save cruise information inside self.cruise, if possible. The data sources are file name and custom metadata header, in this order.

Parameters:

regex_string (str) – The regex to use for event metadata retrieval
leading_zeroes (bool) – Whether to save the info with leading zeroes (Default value = False)

read_file()[source]¶

Reads and structures all the different information present in the file.

Lists and Dictionaries are the data structures of choice. Uses basic prefix checking to distinguish different header information.

reading_start_time()[source]¶

Extracts the Cast start time from the metadata header.

Return type:: datetime | None

reading_start_position()[source]¶

Extracts the Casts starting position.

Return type:: Tuple

sensor_xml_to_flattened_dict(sensor_data)[source]¶

Reads the pure xml sensor input and creates a multilevel dictionary, dropping the first two dictionaries, as they are single entry only.

Parameters:: sensor_data (str) – The raw xml sensor data.
Return type:: list[dict] | dict

structure_metadata(metadata_list)[source]¶

Creates a dictionary to store custom metadata, of which Sea-Bird allows 12 lines in each file.

Parameters:: list (metadata_list) – A list of the individual lines of metadata found in the file
Return type:: dict

define_output_path(file_path=None, file_name=None, file_type='.csv')[source]¶

Creates a Path object holding the desired output path.

Parameters:

file_path (Path | str | None) – Directory the file sits in (Default value = self.file_dir)
file_name (str | None) – The original file name (Default value = self.file_name)
file_type (str) – The file suffix (Default value = “.csv”)

Return type:

Path

to_csv(data, with_header=True, output_file_path=None, output_file_name=None)[source]¶

Writes a .csv file from the given data.

Parameters:

data (DataFrame | ndarray) – The source data to use.
with_header (bool) – Indicating whether the header shall appear in the output (Default value = True)
output_file_path (Path | str | None) – File directory (Default value = None)
output_file_name (str | None) – Original file name (Default value = None)

selecting_columns(list_of_columns, df)[source]¶

Alters the dataframe to only hold the given columns.

Parameters:

list_of_columns (list | str) – A collection of columns
df (DataFrame) – Dataframe (Default value = None)