Read / Load images into a numpy array
If you are building a neural network from scratch then you might face a situation where you have a number of images and you have to load them in an array . Basically convert pixel images into a number which is a readable form for the computer. I was trying to solve the classic cat v dog problem where i encountered this challenge . I have a directory full of images of cats and dogs and to read them into a numpy array was a challenging task.
Here i am sharing the block of code which help me achieved so and also a little explantaion of steps for you to customize as per your requirments.
import glob
import random
def read_image_to_array(path):
# Convert all the file paths into a list
# filelist[1] = 'CatvDog/train/train\\cat.1.jpg'
filelist = glob.glob(path)
# We shuffle all the elements of the list as the input images are in serail order of cats and dogs
random.shuffle(filelist)
#from the name of image we extract if its a cat or dog , cat = 0 , dog = 1
y=[]
for filepath in filelist:
if (filepath.split("\\")[1].split(".")[0]) == "cat":
y.append(0)
elif (filepath.split("\\")[1].split(".")[0]) == "dog":
y.append(1)
#Since y is a list we convert it into a array, "-1" denotes number of images
Y = np.array(y).reshape(-1,1)
#Syntax to read all images into a array
X = np.array([np.array(cv2.resize(cv2.imread(fname), (128,128))) for fname in filelist])
# Return 2 arrays of X and Y
return X,Y
X,Y = read_image_to_array('CatvDog/train/train/*.jpg')
print(X.shape,Y.shape)
#(25000, 128, 128, 3) (25000, 1)
The Kaggle competition where you can download this data is dogs-vs-cats
The images provided in this competition were all of different sizes , hence this code converts them all to a standard size of 128×128 pixel . You may choose 64×64 or 256×256 depending on the available CPU or GPU . X.shape might be a bit confusing to you so let me explain . X.shape = (25000, 128, 128, 3) which means there are a total of 25k images given to us , since images are colored hence each image has 3 channels RGB (Red , Blue , Green) each channel is of dimension 128×128 hence you see 128,128,3 meaning 3 channels. For our neural network we will flatten each image into a 2d vector of (128x128x3,1) or (49152,1) . This means we will input close to 49k features to our neural network .
Image Courtesy : Andrew NG course on Deep Learning(Coursera)