Doctor of Philosophy (PhD)
Long Short Term Memory cells are a type of recurrent neural network that perform well when predicting sequence data. This works presents four approaches to modeling audio data. The models are trained to predict either raw audio samples or magnitude spectrum windows based on prior input audio. In a process called sampling, the models can then be employed to generate new audio using what they learned about the data they were trained on.
Four methods for sampling are presented. The first has the model predict a vector for each vector in the input. The second has the model predict one magnitude spectrum window for each magnitude spectrum window in the input. The third has the model predict a single audio sample for each input vector. The fourth has the model predict a single sample for the entire input matrix.
The end result is a method by which novel audio can be generated. This method differs from other synthesis techniques in that the content of the generated audio is not easily foreseeable. The various approaches to sampling afford different levels of control over the output. The biggest drawback to creating audio using this method is that the process is often too slow to run in real time.
Finally two applications are presented that allow the user to easily create and launch jobs. Four musical compositions are presented that use the sounds created using predictions from the models. The process for generating the source materials for each piece and the manipulations that were made to them after they were generated are also described.
Pfalz, Andrew, "Generating Audio Using Recurrent Neural Networks" (2018). LSU Doctoral Dissertations. 4601.