Skip to content

Latest commit

 

History

History
36 lines (33 loc) · 4.13 KB

audio_scanning_instructions.md

File metadata and controls

36 lines (33 loc) · 4.13 KB

Audio scanning instructions for choosing high speech activity and high turn-taking activity clips

Workflow:

For each child...

  1. Listen to entire file, noting the approximate onset times of bursts of vocalization by the target child:
  • The listener may lightly multi-task during this process while listening to the audio at up to 2x speed (up to 4x speed if the child is clearly sleeping but the listener must backtrack at first evidence of waking at 1x speed).
  • When the child is vocalizing, the listener should listen at 1x or 1.5x speed, maximum. Except when:
    • There are multiple relevant speakers that should be noted (then: max 1x speed) or
    • When the child is only whining/crying for a long period (then: max 2x speed, but backtrack at longer/more regular/possibly more interactive vocalizations and start playback again at 1x speed.
  • In addition to the onset time, briefly note anything about the type/style of speech and the responsiveness of those around.
  • Ignore production bursts that are mostly crying and clips where the researcher is present.
  • Highlight clips that you think are likely to be among the top clips for that child based on your whole-recording listening (our strategy: mark promising clips with light (good) and dark (excellent) green cell shading).
  1. Choose from among the noted vocalization bursts:
  • Listen again to each candidate clip, adding to the notes any further information about the vocalization or turn-taking activity quality (e.g., "highly variable babbling" or "tickling game"), including its approximate duration.
  • Eliminate clips that overlap with previously transcribed segments.
  • Eliminate clips that contain significant amounts of backround noise.
  • All else being equal, give preference to clips featuring speech from underrepresented foreground speakers.
  • From the pool of candidate clips, select the N turn-taking clips based on the richest interaction.
    • Rich interaction: temporally contingent vocalization between the target child and at least one other person
    • Ideal features: high variety in vocalization types and more frequent speaker transitions
  • From the remainining clips, select the N vocal activity clips based on the highest-volume samples of the most mature and/or variable spontaneous (i.e., non-imitative) vocal behavior by the target child. The intention is to capture the child's (best) ability to produce different vocalizations.
    • Rich vocal activity: clips featuring the target child's most mature and/or diverse vocalizations of the day are valued over clips that simply have dense target child vocalizations.
    • For children who are not yet using lexical utterances, top maturity is phonologically diverse canonical babbling.
    • For children using lexical utterances, maturity is morphological complexity and utterance length and bottom.
    • Low-maturity sounds include crying/fussing sounds and laughter.
    • Ideal features: high density, high maturity, and high diversity vocalizations.

Practical notes

  • Pass 1 was done by Marisa Casillas or a trained student assistant (MW). Pass 2 was done by Marisa Casillas.
  • We reviewed clips using VLC media player; the +, -, and = shortcuts are very useful; and one can change = to * to control playback with one hand.
  • It's okay to collapse adjacent vocalizations into one note during the first pass; it can be considered more carefully on a second pass whether the vocalizations are part of a sequence or not in the second pass.
  • We informally listed sleeping breaks and researcher arrival/departure times in the notes with gray cells.
  • This is a subjective and imperfect process. For example, we noticed that we tended to be more generous in considering the vocalizations of young/quiet children than older/more vocal children during the first pass.
  • When the child was producing a lot of vocalizations over a short period, but with short silences between vocalization sequences, we tried to note the onset times of the (longer/more substantial) subsequences as well.
  • Our audio notes are available here.