training on mnist with cnn
refer to training on mnist, which was my first attempt with my "old" feedforward neural network implementation of simple perceptrons, but now i have a (rather slow, currently) convolutional neural network implementation, so time to see the results:
load the dataset (replace
train-data-file
and test-data-file
if youre not me):
(ql:quickload "cl-csv")
(defun parse-mylist (mylist)
(let* ((digit (parse-integer (car mylist))) ;; digit is at the beginning of the list
(mylist (cdr mylist)) ;; the rest of the list contains the pixels
(size (floor (sqrt (length mylist))))
(arr (make-array (list 1 size size))))
(loop for i from 0 below size do
(loop for j from 0 below size do
(setf (aref arr 0 i j) (/ (parse-integer (elt mylist (+ (* size i) j))) 255))))
;; (setf (aref arr 0 i j) (parse-integer (elt mylist (+ (* size i) j))))))
(cons arr digit)))
(defun load-mnist ()
(defparameter *mnist-train-data* (cl-csv:read-csv (pathname train-data-file)))
(defparameter *mnist-test-data* (cl-csv:read-csv (pathname test-data-file)))
;; use vector, access is O(1) unlike lists
(defparameter *mnist-train-data* (map 'vector #'parse-mylist (cdr *mnist-train-data*)))
(defparameter *mnist-test-data* (map 'vector #'parse-mylist (cdr *mnist-test-data*))))
note that i had to normalize the pixel value from 1-255 to 0-1, otherwise i couldnt train the network, all the deltas were turning into 0
each image is of size 28x28, we can use the following architecture:
;; input meant to be of size 1x28x28
(defun construct-mnist-network ()
(defparameter *mnist-network*
(make-network
:layers (list
(make-3d-convolutional-layer-from-dims :dims '(32 1 5 5)) ;; size of image becomes 32x24x24
(make-pooling-layer :rows 2 :cols 2
:pooling-function #'average-pooling-function
:unpooling-function #'average-unpooling-function) ;; size beccomes 32x12x12
(make-3d-convolutional-layer-from-dims :dims '(16 32 5 5)) ;; size becomes 16x8x8
(make-pooling-layer :rows 2 :cols 2
:pooling-function #'average-pooling-function
:unpooling-function #'average-unpooling-function) ;; size becomes 6x4x4
(make-flatten-layer) ;; flatten it, becomes 6x4x4=96
(make-dense-layer :num-units 30 :prev-layer-num-units 96
:activation-function #'relu
:activation-function-derivative #'relu-derivative)
(make-dense-layer :num-units 10 :prev-layer-num-units 30
:activation-function #'sigmoid
:activation-function-derivative #'sigmoid-derivative))
:learning-rate 0.02)))
example usage:
(construct-mnist-network)
;; might wanna make weights closer to 0
(divide-network-weights *mnist-network* 5)
,*mnist-network*
total network weights: 16780, learning rate: 0.02 {10186DB593}>
#<DENSE-LAYER weights: 300, dimensions: (10 30)>
#<DENSE-LAYER weights: 2880, dimensions: (30 96)>
#<FLATTEN-LAYER {10185E1AC3}>
#<POOLING-LAYER rows: 2, columns: 2>
#<3D-CONVOLUTIONAL-LAYER weights: 12800, dimensions: (16 32 5 5)>
#<POOLING-LAYER rows: 2, columns: 2>
#<3D-CONVOLUTIONAL-LAYER weights: 800, dimensions: (32 1 5 5)>
#<NETWORK
after running load-mnist
, we can begin training
because at the time (<2023-08-12 Sat 17:52:12>), my training algorithm for cnn's was slow, i wanted to measure the accuracy of the algorithm but didnt want to wait days for training to finish, so i tried training on a single image, the network should overfit and be able to classify the image correctly:
(defun train-on-mnist-single-image ()
(let ((x (car (elt *mnist-train-data* 0)))
(y (make-array '(10)))
;; (nw (make-network
;; :layers
;; (list
;; (make-3d-convolutional-layer-from-dims
;; :dims '(16 1 3 3)
;; :activation-function #'relu
;; :activation-function-derivative #'relu-derivative)
;; (make-flatten-layer)
;; (make-3d-convolutional-layer-from-dims
;; :dims '(10 144)
;; :activation-function #'sigmoid
;; :activation-function-derivative #'sigmoid-derivative))
;; :learning-rate 0.02)))
(nw *mnist-network*))
(setf (aref y (cdr (elt *mnist-train-data* 0))) 1)
(format t "~%out layer should be: ~A" y)
(print "running 100 epochs:")
(loop for i from 0 below 10000 do
(network-train
nw
(list x)
(list y))
(format t "~%cost: ~A" (network-test nw (list x) (list y))))
(format t "~%out layer: ~A" (car (car (network-feedforward *mnist-network* (car (elt *MNIST-TRAIN-DATA* 0))))))))
eventually, after alot debugging of my code, simplified network did converge and overfit, so the next task was to train it on the actual dataset and not just a single image
(defun train-on-mnist ()
(network-train-distributed-cpu
,*mnist-network*
(map
'list
(lambda (data-entry)
(let ((in-tensor (car data-entry))
(digit (cdr data-entry))
(out-tensor (make-array 10)))
(setf (aref out-tensor digit) 1)
(cons in-tensor out-tensor)))
,*mnist-train-data*)))