from IPython.core.display import HTML, Markdown, display

import numpy as np
import numpy.random as npr
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats as stats
import statsmodels.formula.api as smf
import pingouin as pg
import math

import ipywidgets as widgets

import ipywidgets as widgets
from IPython.display import display, Markdown, clear_output
import numpy as np
import pandas as pd
import os
from datetime import datetime
from ipycanvas import Canvas, hold_canvas

from sdt_exp import *

# Enable plots inside the Jupyter Notebook
%matplotlib inline

Signal detection: Part 2

Authored by Todd Gureckis

In the previous lab you learned some of the basic analysis steps involved in signal detection analysis. In particular, we learned how to compute a statistic we called d’ (dprime) and c (bias/criterion). These variables are commonly used in studies of perception to understand the performance of a subject on a task independent of their response bias (here meaning tendency to give a particular response).

In the second part of the lab we will analyze data from a simple detection experiment you will have just conducted on yourselves. You will get to put your python knowledge into practice by reading in the data analyzing it.

The Experiment Instructions

In this experiment you job will be similar to that of a doctor. There are going to be small “clusters” of points which we might think of as a tumor or some other biological measurement. You can use the interaction element below to examine a few of these “clusters”.

Try using the widget below to slide between a lot of dots within a cluster and a small number. Sweep back and forth and get an idea of what a “cluster” could look like. The dots are spread out to a certain degree as you can see more clearly with the large number of dots.

@widgets.interact(dots=(1,75, 5))
def draw(dots):
    return draw_stimulus(include_source=True, n_background_dots=0, n_source_dots=dots, source_sd=10, source_border=20, width=400, height=400)

The challenge is that these dots are going to be hidden in a background of “noise” which is other dots unrelated to the cluster. Here is an example of the background with no cluster present:

draw_stimulus(include_source=False, n_background_dots=30*30, n_source_dots=0, source_sd=10, source_border=20, width=400, height=400)

You can run the above cell many times to see different example patterns when the cluster is absent.

Next here is an example of a very large cluster hidden in the background noise:

draw_stimulus(include_source=True, n_background_dots=30*30, n_source_dots=60, source_sd=10, source_border=20, width=400, height=400)

From this example you can see a very clear “cluster” or tumor in the stimulus. Here is one where there is a cluster/tumor but it is a little less obvious:

draw_stimulus(include_source=True, n_background_dots=30*30, n_source_dots=50, source_sd=10, source_border=20, width=400, height=400)

Finally, here is one where it is pretty hard to tell that there is a cluster at all (oh but there is one involving only 20 dots!):

draw_stimulus(include_source=True, n_background_dots=30*30, n_source_dots=20, source_sd=10, source_border=20, width=400, height=400)

Your job in the experiment will be to view stimuli such as the ones above and judge if you think there is a cluster/tumor present in the image. It can often be hard to tell because even in the random dots there can be little clusters of dots that group together. Just try to do your best. Notice that there is a red “absent” button and a green “present” button. You should press “absent” if you think the cluster/tumor is not there, and press “present” if you think it is there. There are 100 trials overall.

Before running the next cell set the subject number to your NYU netid (I will anonymize this later but will let me check that everyone has done the experiment).

Run the Experiment!

subject_number = 'bl1611' # set your subject number here
exp = Detection_Experiment(subject_number)
exp.start_experiment()

Getting the data

After you run the experiment, the data will be output to a .csv file tied to your subject number. Also it is available as as pandas data frame in the current kernel environment as exp.trials:

exp.trials.head()
signal_present signal_type trial_num button_position subject_number
0 1 60 0 left bl1611
1 1 10 1 left bl1611
2 1 10 2 left bl1611
3 1 15 3 left bl1611
4 0 0 4 left bl1611

Experiment design

In the experiment, you saw a number of trials where field of random colored and positioned dots appear. On some trials a small “cluster” of points was added to the stimulus. The number of dots in the cluster was manipulated across conditions so that it was more or less obvious that there was a cluster. In some cases the number of dots within the cluster was so small that it was hard to detect it, but other cases it might have been more visually obvious. This is very similar to the type of visual diagnosis that a doctor or airport screener might perform (deciding of a tumor or a gun is present in an image). Your task was to indicate if it was a “stimulus/cluster present” trial or a “stimulus/cluster absent” trial. The computer recorded your response and your reaction time.

An overview of the situation is shown here:

The things we manipulated!

There were actually four different types of clusters shown on any given “stimulus present” trial. One was a small stimulus made up of 10 dots, one was a medium made up of 15 dots, one type of made up of 25, and finally one was made up of 60 dots:

As the number of dots increases the general visibility of the cluster becomes more clear and easier to detect.

In total there were 100 stimulus absent trials and 25 of each of the four types of stimulus present trials. The purpose of this was so that a perfect subject with robot-quality vision would say “yes” half the time in the experiment so there is no overall bias toward one response.

In this lab we are interested in analyzing the effect of strength of the stimulus cluster (i.e., number of dots) on the two signal detection variables we learned about in the previous lecture and lab.

Analysis Steps

Remember from the previous lab that our model for signal detection looks something like this:

and we showed how we can compute \(d'\) and \(c\) by computing only the number of false alarms and hits and subjecting those to a particular mathematical transformation. In this case we actually have four stimulus conditions (10, 15, 25, or 60). So our goal will be to compute the hits and false alarms separately for the two different types of stimuli. This will mean that each person in our experiment will be described by eight numbers:

  • A d’ (d-prime) under (10, 15, 25, 60) visibility cluster conditions

  • A c under (10, 15, 25, 60) visibility cluster conditions

Once we have this we want to perform a simple hypothesis test. Is performance, measured by d’ higher in the high visibility conditions? We have strong reason to suspect this is the case but it’ll be a nice check of our understanding if we can related the abstract calculation we performed in the previous lab to answer this simple quesiton.

Answering this question will push our skills at using pandas to organize our data, and using pingouin to compute a t-test.

When computing d-prime with real data, sometimes we run into issues if the hit rate is perfect (1.0) or the false alarm rate is perfect (0.0). D-prime is undefined in these cases, and it’s likely unrealistic anyway to assume that the rates are really perfect. Thus, to keep the calculations numerically stable, please use ppf_smooth (see below) as a replacement for stats.norm.ppf. This new function uses uses 0.01 as the minimum rate, and 0.99 as the maximum. (There are somewhat more elegant ways to do this if you are publishing your analyses, see this link, but this will be okay for our purposes here).

def ppf_smooth(rate):
    # Replacement for stats.norm.ppf that uses 0.01 as the minimum rate, and
    #   0.99 as the maximum
    #
    # Input: 
    #   rate : either hit or false alarm rate
    assert(rate >= 0. and rate <=1.) # rate must be between 0. and 1.
    rate = max(rate,0.01)
    rate = min(rate,0.99)
    return stats.norm.ppf(rate)

As a reminder the following show a simple for loop (called a list comprehension) that lets you get the filenames for the lab data set. You can use this to find the file names that you need to read in with pd.read_csv(). If you need a reminder refer back to your mental rotation lab which should be on your Jupyterhub node still!

# this is an example list comprehension which reads in the all the files.
# the f.startswith() part just gets rid of any junk files in that folder 
filenames=['sdt1-data/'+f for f in os.listdir('sdt1-data') if not f.startswith('.')]
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-11-867726fa727e> in <module>
      1 # this is an example list comprehension which reads in the all the files.
      2 # the f.startswith() part just gets rid of any junk files in that folder
----> 3 filenames=['sdt1-data/'+f for f in os.listdir('sdt1-data') if not f.startswith('.')]

FileNotFoundError: [Errno 2] No such file or directory: 'sdt1-data'