The OctoAssociator class

A PyOcto workflow consists of two steps. First, you need to create an associator instance. In this step, you set the configuration for the model. Second, you run the associator by providing it with a list of picks and station metadata. If you’ve previously used the GaMMA or REAL associators, you might be interested in using the compatibility interfaces for this step.

class pyocto.associator.OctoAssociator(xlim, ylim, zlim, velocity_model, time_before, min_node_size=10.0, min_node_size_location=1.5, pick_match_tolerance=1.5, min_interevent_time=3.0, exponential_edt=False, edt_pick_std=1.0, max_pick_overlap=4, n_picks=10, n_p_picks=3, n_s_picks=3, n_p_and_s_picks=3, refinement_iterations=3, time_slicing=1200.0, node_log_interval=0, queue_memory_protection_dfs_size=500000, location_split_depth=6, location_split_return=4, min_pick_fraction=0.25, second_pass_overwrites=None, n_threads=None, velocity_model_location=None, crs=None)[source]

Bases: object

The OctoAssociator is the main class of PyOcto. An instance of this associator describes the configuration of the algorithm. To start the actual association, use the associate() function. You can also use one of the alternative interfaces associate_gamma(), associate_real(), or associate_seisbench().

In addition to the core functionality, this class offers convenience functions related to association. If you want to define your search area using latitude and longitude with an automatic coordinate projection, use from_area(). If you want to identify an appropriate projection for you stations, use get_crs(). For coordinate projections, transform_stations() and transform_events() are useful helper functions. To convert obspy inventory object to station data frames for PyOcto, use inventory_to_df().

As the PyOcto output locations are only preliminary, there is a helper function to convert the outputs to the NonLinLoc input format. Check out to_nonlinloc().

This documentation explains all parameters from a technical perspective. For a more user-centric view on how to set appropriate parameters, check out this guide on parameter choices.

Parameters:
  • xlim (tuple[float, float]) – Limit of the search space in kilometers in the x direction.

  • ylim (tuple[float, float]) – Limit of the search space in kilometers in the y direction.

  • zlim (tuple[float, float]) – Limit of the search space in kilometers in depth direction. Negative values indicated above surface locations, positive values below surface.

  • velocity_model (VelocityModel) – The velocity model.

  • time_before (float) – The overlap between consecutive time slices.

  • min_node_size (float) – Minimum node size for association. If a node becomes smaller, the event creation process with localisation and pick refinement is triggered.

  • min_node_size_location (int) – Minimum node size, i.e., precision regarding discretisation, for the location algorithm. Usually smaller than min_node_size.

  • pick_match_tolerance (float) – Maximum difference between the predicted travel time and the observed time for associating a pick to an origin in the refinement step.

  • min_interevent_time (float) – Minimum time required between two events.

  • exponential_edt (float) – Exponentiate the individual term in the EDT loss. This will make the loss surface more spiky, leading to better locations. However, to accurately find minima, the location_split_depth and location_split_return need to be increased, leading to higher computational cost.

  • edt_pick_std (float) – Standard deviation for the EDT loss. Only relevant if exponential_edt is enabled.

  • max_pick_overlap (int) – Maximum number of picks shared between two events. Note that overlaps are only possible at the intersection of different time blocks.

  • n_picks (int) – Minumum required number picks for an event.

  • n_p_picks (int) – Minumum required number P picks for an event.

  • n_s_picks (int) – Minumum required number S picks for an event.

  • n_p_and_s_picks (int) – Minumum required number of stations that have both P and S pick for an event.

  • refinement_iterations (int) – The number of localisation and pick matching iterations.

  • time_slicing (float) – The size of each time block.

  • node_log_interval (int) – If the value is larger than zero, each thread prints every time the number of nodes explored so far is divisible by the value.

  • queue_memory_protection_dfs_size (int) – Maximum size of the priority queue per thread. If the queue is full, all further nodes are explored using depth-first search.

  • location_split_depth (int) – Search depth for location splits.

  • location_split_return (int) – Part of search depth for location splits that is not descended but only used to evenly sample the space. Always needs to be smaller than location_split_depth.

  • min_pick_fraction (float) – A distance based pick criterion. If for a station, less than this fraction of closer stations have at least one pick, the station picks are discarded.

  • second_pass_overwrites (Optional[dict[str, Any]]) – If not None, PyOcto will perform a second pass of the association procedure. In this second pass, only picks not associated in the first round will be used. The results of both passes are concatenated. The overwrites define all parameters that should be different in the second pass. A typical use case for this feature might be to associate events with many P picks but few S picks. This is, for example, a common scenario for large events picked with ML pickers. The waveform saturates and in effect the ML picker fails to identify the S picks. This case could be caught by setting a high n_picks and a low n_s_picks and a low n_p_and_s_picks. At the same time, these settings might not be advisable for a larger processing because it will lead to many missed small events with lower numbers of P picks. The number of iterations for the second pass can be controlled with the keyword iterations. If it is not set, only a single second pass is performed.

  • n_threads (Optional[int]) – The number of threads to use. By default, the number of threads will be set to the number of available cores.

  • velocity_model_location (Optional[VelocityModel]) – The velocity model for location. If not set, the same model as for association will be used.

  • crs (Optional[CRS]) – The coordinate reference system. Required for all helper functions for coordinate transformation.

associate(picks, stations)[source]

Run the PyOcto associator. For details on the data formats, see the data format description.

Parameters:
  • picks (DataFrame) – The picks in PyOcto format

  • stations (DataFrame) – The stations in PyOcto format

Return type:

tuple[DataFrame, DataFrame]

Returns:

Two dataframes. The first contains the events. The second one the assignment of picks to events.

associate_gamma(picks, stations)[source]

Run associator on the GaMMA input format

Parameters:
  • picks (DataFrame) – Picks in GaMMA format

  • stations (DataFrame) – Stations in GaMMA format

Return type:

tuple[DataFrame, DataFrame]

Returns:

The outputs from associate

associate_real(pick_path, station_path)[source]

Run associator on the REAL input format

Parameters:
  • pick_path (Union[str, Path]) – Path of the directory containing the pick files for REAL

  • station_path (Union[str, Path]) – Path of the station file for REAL

Return type:

tuple[DataFrame, DataFrame]

Returns:

The outputs from associate

associate_seisbench(picks, stations)[source]

Run associator on a list of SeisBench picks.

Parameters:
  • picks – A list of picks as output by SeisBench. This list can be a SeisBench PickList instance.

  • stations (DataFrame) – Stations in PyOcto format

Return type:

tuple[DataFrame, DataFrame]

Returns:

The outputs from associate

property crs: CRS | None

Get and set the local coordinate reference system if defined.

Returns:

The local coordinate reference system.

classmethod from_area(lat, lon, zlim, velocity_model, time_before, **kwargs)[source]

Create an associator instance based on a bounding box in latitude, longitude and depth.

Parameters:
  • lat (tuple[float, float]) – Minimum and maximum latitude of study area in degrees

  • lon (tuple[float, float]) – Minimum and maximum longitude of study area in degrees

  • zlim (tuple[float, float]) – Minimum and maximum depth of study area in km

  • velocity_model (VelocityModel) – see the class constructor

  • time_before (float) – see the class constructor

  • kwargs – passed to class constructor

Returns:

an instance of OctoAssociator

static get_crs(stations, warning_limit_deg=15.0)[source]

Get a transverse Mercator projection coordinate reference system centered in the middle of the station distribution.

Parameters:
  • stations (DataFrame) – A data frame with coordinates in latitude and longitude

  • warning_limit_deg (float) – If the along-axis distance between too stations is higher than this value, a warning is printed.

Return type:

CRS

Returns:

A coordinate reference system

inventory_to_df(inventory)[source]

Convert an obspy inventory to a dataframe. Applies the coordinate projection if set.

Parameters:

inventory – An obspy inventory object

Return type:

DataFrame

Returns:

A data frame with the station that can be input to associate()

static to_nonlinloc(assignments, path, pick_std=0.05)[source]

Write the outputs to the .obs format that can be parsed by NonLinLoc

Parameters:
  • assignments (DataFrame) – The assignments as output by PyOcto

  • path (Union[str, Path]) – Output path for the observations

  • pick_std (float) – Gaussian uncertainty of the picks in seconds. Currently, does not support individual uncertainties per pick.

Return type:

None

transform_events(events)[source]

Project event coordinates from local coordinate system to global coordinate system. Requires the crs attribute to be set. Note that the original data frame is modified in-place.

Parameters:

events (DataFrame) – A data frame with the events as output by associate()

Return type:

DataFrame

Returns:

A dataframe with additional latitude, longitude and depth columns.

transform_stations(stations)[source]

Project stations from cartesian coordinates into a local coordinate system. Requires the crs attribute to be set. Note that the original data frame is modified in-place.

Parameters:

events – A data frame with the stations containing the latitude, longitude and elevation columns. Elevation needs to be provided in meters. Note that the transform flips the sign convention. elevation is in meters above zero, z is in kilometers below zero.

Return type:

DataFrame

Returns:

A dataframe with additional x, y and z.