Temporal Structure in Zebra Finch Song: Implications for the Motor Code and Learning Process
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
One of the touchstone questions in neuroscience is how the nervous system encodes complex behavioral sequences such as speech. With experience-dependent learning, well-defined anatomy and complex temporal organization, zebra finch song has served as an excellent model system for these questions. Male songs are learned from older males during a sensitive period that includes song memorization and vocal learning guided by auditory feedback. Once learned, song acoustics are hierarchically organized into syllables, continuous stretches of vocalization separated by silent gaps, which are arranged into stereotyped sequences termed motifs; on a finer scale, syllables are composed of one or more notes, vocalizations with a homogenous spectral profile. Although much is known about the song system, progress has been limited by conflicting data on the neural basis of the acoustic hierarchy and the role this organization plays during learning: While behavioral and electrophysiological studies have suggested separate circuits and learning stages for individual syllables and syllable sequence, these models have been challenged by physiological evidence that songs are actually driven by a clock-like mechanism that does not segment songs into different units.
We have analyzed and modeled trial-to-trial timing variability in zebra finch song acoustics to investigate whether the hierarchy is in fact represented in the song system and learning process. Using automated template matching and dynamic time warping, we made millisecond-precise timing measurements in tens of thousands of recordings of both adult and juvenile song. In each adult song, we find rendition-to-rendition tempo variability that is spread across syllables and gaps; however syllable lengths stretch and compress with tempo changes proportionally less than gaps, \ie\ they are less ``elastic." Such non-uniformity is at odds with the simplest clock-based model in which songs are driven by a timing mechanism that paces song evenly across syllable-gap sequences. On the other hand, in a subsequent analysis we factored out tempo changes and used the remaining variability to investigate subsyllabic timescales that contradict the hierarchical model as well. Here, we find length variability that is specific to 10-msec song slices and independent of neighboring vocalization, yet correlated across motifs, providing the first behavioral evidence for a 5-10 msec timescale of song representation and an interaction with a neuromodulatory source operating on a much slower timescale. We have developed a model of song production constrained by the timing data; modeling suggests that adult song may be produced by an underlying chain of activity on a single 5-10 msec timescale, but with properties such as synaptic strength that do correspond to the acoustic hierarchy.
Finally, we analyzed juvenile song within the same framework and investigated how the timing properties we modeled may develop during sensorimotor learning. The behavioral data indicate a period towards the end of learning in which syllable sequences become more stereotyped, tempo increases selectively among gaps, and independent timing variability falls two- to threefold across syllables and gaps. In remarkable contrast, over this same period we find no changes in patterns of global tempo variability or the fine timescale patterns indicative of chaining mechanisms. Overall, the developmental data suggest a final phase of song learning in which syllable-based representations are consolidated into the longer sequence-based chaining mechanisms proposed for the adult system. A similar process of linking simpler chains to form more functional activity patterns has been proposed for neocortex and other models of sequence learning in mammalian systems. In this respect, adult zebra finch song representations may be most analogous with procedural memory and overlearned sequences such as repetitive speech patterns.