Master of Applied Statistics (MApStat)


Experimental Statistics

Document Type



Many attempts have been made to achieve successful recognition of handwritten digits. We report our results of using statistical method on handwritten digit recognition. A digitized handwritten numeral can be represented by an image with grayscales. The image includes features that are mapped into two-dimensional space with row and column coordinates. Based on this structure, two-dimensional penalized signal logistic regression (PSR) is applied to the recognition of handwritten digits. The data set is taken from the USPS zip code database that contains 7219 training images and 2007 test images. All the images have been deslanted and normalized into 16 x 16 pixels with various grayscales. The PSR method constructs a coefficient surface using a rich two-dimensional tensor product B-splines basis, so that the surface is more flexible than needed. We then penalize roughness of the coefficient surface with difference penalties on each coefficient associate with the rows and columns of the tensor product B-splines. The optimal penalty weight is found in several minutes of iterative operations. A competitive overall recognition error rate of 8.97% on the test data set was achieved. We will also review an artificial neural network approach for comparison. By using PSR, it requires neither long learning time nor large memory resources. Another advantage of the PSR method is that our results are obtained on the original USPS data set without any further image preprocessing. We also found that PSR algorithm was very capable to cope with high diversity and variation that were two major features of handwritten digits.



Document Availability at the Time of Submission

Release the entire work immediately for access worldwide.

Committee Chair

Brian D. Marx