The OctoAssociator class
A PyOcto workflow consists of two steps. First, you need to create an associator instance. In this step, you set the configuration for the model. Second, you run the associator by providing it with a list of picks and station metadata. If you’ve previously used the GaMMA or REAL associators, you might be interested in using the compatibility interfaces for this step.
- class pyocto.associator.OctoAssociator(xlim, ylim, zlim, velocity_model, time_before, min_node_size=10.0, min_node_size_location=1.5, pick_match_tolerance=1.5, min_interevent_time=3.0, exponential_edt=False, edt_pick_std=1.0, max_pick_overlap=4, n_picks=10, n_p_picks=3, n_s_picks=3, n_p_and_s_picks=3, refinement_iterations=3, time_slicing=1200.0, node_log_interval=0, queue_memory_protection_dfs_size=500000, location_split_depth=6, location_split_return=4, min_pick_fraction=0.25, second_pass_overwrites=None, n_threads=None, velocity_model_location=None, crs=None)[source]
Bases:
objectThe OctoAssociator is the main class of PyOcto. An instance of this associator describes the configuration of the algorithm. To start the actual association, use the
associate()function. You can also use one of the alternative interfacesassociate_gamma(),associate_real(), orassociate_seisbench().In addition to the core functionality, this class offers convenience functions related to association. If you want to define your search area using latitude and longitude with an automatic coordinate projection, use
from_area(). If you want to identify an appropriate projection for you stations, useget_crs(). For coordinate projections,transform_stations()andtransform_events()are useful helper functions. To convert obspy inventory object to station data frames for PyOcto, useinventory_to_df().As the PyOcto output locations are only preliminary, there is a helper function to convert the outputs to the NonLinLoc input format. Check out
to_nonlinloc().This documentation explains all parameters from a technical perspective. For a more user-centric view on how to set appropriate parameters, check out this guide on parameter choices.
- Parameters:
xlim (
tuple[float,float]) – Limit of the search space in kilometers in the x direction.ylim (
tuple[float,float]) – Limit of the search space in kilometers in the y direction.zlim (
tuple[float,float]) – Limit of the search space in kilometers in depth direction. Negative values indicated above surface locations, positive values below surface.velocity_model (
VelocityModel) – The velocity model.time_before (
float) – The overlap between consecutive time slices.min_node_size (
float) – Minimum node size for association. If a node becomes smaller, the event creation process with localisation and pick refinement is triggered.min_node_size_location (
int) – Minimum node size, i.e., precision regarding discretisation, for the location algorithm. Usually smaller than min_node_size.pick_match_tolerance (
float) – Maximum difference between the predicted travel time and the observed time for associating a pick to an origin in the refinement step.min_interevent_time (
float) – Minimum time required between two events.exponential_edt (
float) – Exponentiate the individual term in the EDT loss. This will make the loss surface more spiky, leading to better locations. However, to accurately find minima, the location_split_depth and location_split_return need to be increased, leading to higher computational cost.edt_pick_std (
float) – Standard deviation for the EDT loss. Only relevant if exponential_edt is enabled.max_pick_overlap (
int) – Maximum number of picks shared between two events. Note that overlaps are only possible at the intersection of different time blocks.n_picks (
int) – Minumum required number picks for an event.n_p_picks (
int) – Minumum required number P picks for an event.n_s_picks (
int) – Minumum required number S picks for an event.n_p_and_s_picks (
int) – Minumum required number of stations that have both P and S pick for an event.refinement_iterations (
int) – The number of localisation and pick matching iterations.time_slicing (
float) – The size of each time block.node_log_interval (
int) – If the value is larger than zero, each thread prints every time the number of nodes explored so far is divisible by the value.queue_memory_protection_dfs_size (
int) – Maximum size of the priority queue per thread. If the queue is full, all further nodes are explored using depth-first search.location_split_depth (
int) – Search depth for location splits.location_split_return (
int) – Part of search depth for location splits that is not descended but only used to evenly sample the space. Always needs to be smaller than location_split_depth.min_pick_fraction (
float) – A distance based pick criterion. If for a station, less than this fraction of closer stations have at least one pick, the station picks are discarded.second_pass_overwrites (
Optional[dict[str,Any]]) – If not None, PyOcto will perform a second pass of the association procedure. In this second pass, only picks not associated in the first round will be used. The results of both passes are concatenated. The overwrites define all parameters that should be different in the second pass. A typical use case for this feature might be to associate events with many P picks but few S picks. This is, for example, a common scenario for large events picked with ML pickers. The waveform saturates and in effect the ML picker fails to identify the S picks. This case could be caught by setting a highn_picksand a lown_s_picksand a lown_p_and_s_picks. At the same time, these settings might not be advisable for a larger processing because it will lead to many missed small events with lower numbers of P picks. The number of iterations for the second pass can be controlled with the keyworditerations. If it is not set, only a single second pass is performed.n_threads (
Optional[int]) – The number of threads to use. By default, the number of threads will be set to the number of available cores.velocity_model_location (
Optional[VelocityModel]) – The velocity model for location. If not set, the same model as for association will be used.crs (
Optional[CRS]) – The coordinate reference system. Required for all helper functions for coordinate transformation.
- associate(picks, stations)[source]
Run the PyOcto associator. For details on the data formats, see the data format description.
- Parameters:
picks (
DataFrame) – The picks in PyOcto formatstations (
DataFrame) – The stations in PyOcto format
- Return type:
tuple[DataFrame,DataFrame]- Returns:
Two dataframes. The first contains the events. The second one the assignment of picks to events.
- associate_gamma(picks, stations)[source]
Run associator on the GaMMA input format
- Parameters:
picks (
DataFrame) – Picks in GaMMA formatstations (
DataFrame) – Stations in GaMMA format
- Return type:
tuple[DataFrame,DataFrame]- Returns:
The outputs from associate
- associate_real(pick_path, station_path)[source]
Run associator on the REAL input format
- Parameters:
pick_path (
Union[str,Path]) – Path of the directory containing the pick files for REALstation_path (
Union[str,Path]) – Path of the station file for REAL
- Return type:
tuple[DataFrame,DataFrame]- Returns:
The outputs from associate
- associate_seisbench(picks, stations)[source]
Run associator on a list of SeisBench picks.
- Parameters:
picks – A list of picks as output by SeisBench. This list can be a SeisBench PickList instance.
stations (
DataFrame) – Stations in PyOcto format
- Return type:
tuple[DataFrame,DataFrame]- Returns:
The outputs from associate
- property crs: CRS | None
Get and set the local coordinate reference system if defined.
- Returns:
The local coordinate reference system.
- classmethod from_area(lat, lon, zlim, velocity_model, time_before, **kwargs)[source]
Create an associator instance based on a bounding box in latitude, longitude and depth.
- Parameters:
lat (
tuple[float,float]) – Minimum and maximum latitude of study area in degreeslon (
tuple[float,float]) – Minimum and maximum longitude of study area in degreeszlim (
tuple[float,float]) – Minimum and maximum depth of study area in kmvelocity_model (
VelocityModel) – see the class constructortime_before (
float) – see the class constructorkwargs – passed to class constructor
- Returns:
an instance of OctoAssociator
- static get_crs(stations, warning_limit_deg=15.0)[source]
Get a transverse Mercator projection coordinate reference system centered in the middle of the station distribution.
- Parameters:
stations (
DataFrame) – A data frame with coordinates in latitude and longitudewarning_limit_deg (
float) – If the along-axis distance between too stations is higher than this value, a warning is printed.
- Return type:
CRS- Returns:
A coordinate reference system
- inventory_to_df(inventory)[source]
Convert an obspy inventory to a dataframe. Applies the coordinate projection if set.
- Parameters:
inventory – An obspy inventory object
- Return type:
DataFrame- Returns:
A data frame with the station that can be input to
associate()
- static to_nonlinloc(assignments, path, pick_std=0.05)[source]
Write the outputs to the .obs format that can be parsed by NonLinLoc
- Parameters:
assignments (
DataFrame) – The assignments as output by PyOctopath (
Union[str,Path]) – Output path for the observationspick_std (
float) – Gaussian uncertainty of the picks in seconds. Currently, does not support individual uncertainties per pick.
- Return type:
None
- transform_events(events)[source]
Project event coordinates from local coordinate system to global coordinate system. Requires the crs attribute to be set. Note that the original data frame is modified in-place.
- Parameters:
events (
DataFrame) – A data frame with the events as output byassociate()- Return type:
DataFrame- Returns:
A dataframe with additional latitude, longitude and depth columns.
- transform_stations(stations)[source]
Project stations from cartesian coordinates into a local coordinate system. Requires the crs attribute to be set. Note that the original data frame is modified in-place.
- Parameters:
events – A data frame with the stations containing the latitude, longitude and elevation columns. Elevation needs to be provided in meters. Note that the transform flips the sign convention. elevation is in meters above zero, z is in kilometers below zero.
- Return type:
DataFrame- Returns:
A dataframe with additional x, y and z.