A Reference Class for representing a group of consistency test participants

Fields

participants

A list of Participant class instances.

Methods

add_participant(participant)

Add a passed participant to the participantgroup's list of participants. The participant's entry in the list is named based on the participant's id. Note that if you try to add a participant with an id that's identical to one of the participants already in the participantgroup's list of participants, the already existing same-id participant is overwritten.

add_participants(participant_list)

Go through a passed list of Participant instances and add each one using the add_participant() method.

check_valid_get_twcv_scores( min_complete_graphemes = 5, dbscan_eps = 20, dbscan_min_pts = 4, max_var_tight_cluster = 150, max_prop_single_tight_cluster = 0.6, safe_num_clusters = 3, safe_twcv = 250, complete_graphemes_only = TRUE, symbol_filter = NULL )

Checks if participants' data are valid based on passed arguments. This method aims to identify participants who had too few responses or varied their response colors too little, by marking them as invalid. Note that there are no absolutely correct values, as what is 'too little variation' is highly subjective. You might need to tweak parameters to be in line with your project's criteria, especially if you use another color space than CIELUV, since the default values are based on what seems to make sense in a CIELUV context. If you use the results in a research article, make sure to reference synr and specify what parameter values you passed to the function.

This method relies heavily on the DBSCAN algorithm and the package 'dbscan', and involves calculating a synr-specific 'Total Within-Cluster Variance' (TWCV) score. You can find more information, and what the parameters here mean, in the documentation for the function validate_get_twcv. Note that DBSCAN clustering and related calculations are performed on a per-participant basis, before they are summarized in the data frame returned by this method.

Parameters

  • min_complete_graphemes The minimum number of graphemes with complete (all non-NA color) responses that a participant's data must have for them to not be categorized as invalid based on this criterion. Defaults to 7.

  • dbscan_eps Radius of 'epsilon neighborhood' when applying (on a per-participant basis) DBSCAN clustering. Defaults to 30.

  • dbscan_min_pts Minimum number of points required in the epsilon neighborhood for core points (including the core point itself). Defaults to 4.

  • max_var_tight_cluster Maximum variance for an identified DBSCAN cluster to be considered 'tight-knit'. Defaults to 150.

  • max_prop_single_tight_cluster Maximum proportion of points allowed to be within a single 'tight-knit' cluster (if a participant's data exceed this limit, they are classified as invalid). Defaults to 0.6.

  • safe_num_clusters Minimum number of identified DBSCAN clusters (including 'noise' cluster only if it consists of at least 'dbscan_min_pts' points) that guarantees validity of a participant's data if points are 'non-tight-knit'. Defaults to 3.

  • safe_twcv Minimum total within-cluster variance (TWCV) score that guarantees a participant's data's validity if points are 'non-tight-knit'. Defaults to 250.

  • complete_graphemes_only A logical vector. If TRUE, only data from graphemes that have all non-NA color responses are used; if FALSE, even data from graphemes with some NA color responses are used. Defaults to TRUE.

  • symbol_filter A character vector (or NULL) that specifies which graphemes' data to use. Defaults to NULL, meaning data from all of the participants' graphemes will be used.

Returns

A data frame with columns

  • valid Holds TRUE for participants whose data were classified as valid, FALSE for participants whose data were classified as invalid.

  • reason_invalid Strings which describe for each participant why their data were deemed invalid. Participants whose data were classified as valid have empty strings here.

  • twcv Numeric column which holds participants' calculated TWCV scores (NA for participants who had no/too few graphemes with complete responses).

  • num_clusters One-element numeric (or NA if there are no/too few graphemes with complete responses) vector indicating the number of identified clusters counting toward the tally compared with 'safe_num_clusters'.

get_ids()

Returns a character vector with all ids for participants associated with the participantgroup.

get_mean_colors(symbol_filter = NULL, na.rm = FALSE)

Returns an nx3 data frame of mean colors for participants in the group, where the columns represent chosen color space axis 1, 2, and 3, respectively (e.g. 'R', 'G', 'B' if 'sRGB' was specified upon participantgroup creation).

If na.rm=FALSE, for each participant calculates the mean color if all of the participants' graphemes only have response colors that are non-NA, otherwise puts NA values for that participant's row in matrix. If na.rm=TRUE, for each participant calculates the mean color for all of the participant's valid response colors, while ignoring NA response colors. Note that for participants whose graphemes ALL have at least one NA response color value, an NA is put in the row corresponding to that participant, regardless of what na.rm is set to.

If a character vector is passed to symbol_filter, only data from graphemes with symbols in the passed vector are used when calculating each participant's mean color.

get_mean_consistency_scores( method = "euclidean", symbol_filter = NULL, na.rm = FALSE )

Returns a vector of mean consistency scores for participants in the group. If na.rm=FALSE, for each participant calculates the mean consistency score if all of the participants' graphemes only have response colors that are non-NA, otherwise puts an NA value for that participant in returned vector. If na.rm=TRUE, for each participant calculates the mean consistency score for all of the participant's graphemes that only have non-NA response colors, while ignoring graphemes that have at least one NA response color value. Note that for participants whose graphemes ALL have at least one NA response color value, an NA is put in the returned vector for that participant, regardless of what na.rm is set to.

If a character vector is passed to symbol_filter, only data from graphemes with symbols in the passed vector are used when calculating each participant's mean score.

Use the method argument to specify what kind of color space distances should be used when calculating consistency scores (usually 'manhattan' or 'euclidean' - see documentation for the base R dist function for all options)

get_mean_response_times(symbol_filter = NULL, na.rm = FALSE)

Returns the mean response times, with respect to Grapheme instances associated with each participant. If na.rm=TRUE, for each participant returns mean response time even if there are missing response times. If na.rm=FALSE, returns mean response time if there is at least one response time value for at least one of the participants' graphemes. If a character vector is passed to symbol_filter, only data from graphemes with symbols in the passed vector are used when calculating each participant's mean response time.

get_numbers_all_colored_graphemes(symbol_filter = NULL)

Returns a vector with numbers representing how many graphemes with all-valid (non-na) response colors that each participant has. If a character vector is passed to symbol_filter, only data connected to graphemes with symbols in the passed vector are used.

has_participants()

Returns TRUE if there is at least one participant in the participantgroup's participants list, otherwise returns FALSE

save_plots( save_dir = NULL, file_format = "png", dpi = 300, cutoff_line = FALSE, mean_line = FALSE, grapheme_size = 2, grapheme_angle = 0, foreground_color = "black", background_color = "white", symbol_filter = NULL, ... )

Goes through all participants and for each one produces and saves a ggplot2 plot that describes the participant's grapheme color responses and per-grapheme consistency scores, using the ggsave function.

If a character vector is passed to symbol_filter, only data for graphemes with symbols in the passed vector are used.

If path is not specified, plots are saved to the current working directory. Otherwise, plots are saved to the specified directory. The file is saved using the specified file_format, e. g. JPG (see ggplot2::ggsave documentation for list of supported formats), and the resolution specified with the dpi argument.

If cutoff_line=TRUE, each plot will include a blue line that indicates the value 135.30, which is the synesthesia cut-off score recommended by Rothen, Seth, Witzel & Ward (2013) for the L*u*v color space. If mean_line=TRUE, the plot will include a green line that indicates the participant's mean consistency score for graphemes with all-valid response colors (if the participant has any such graphemes). If a vector is passed to symbol_filter, this green line represents the mean score for ONLY the symbols included in the filter.

Pass a value to grapheme_size to adjust the size of graphemes shown at the bottom of the plot, e. g. increasing the size if there's empty space otherwise, or decreasing the size if the graphemes don't fit. Similarly, you can use the grapheme_angle argument to rotate the graphemes, which might help them fit better.

Apart from the ones above, all other arguments that ggsave accepts (e. g. 'scale') also work with this function, since all arguments are passed on to ggsave.