AI/ML

Read / Load images into a numpy array

Photo by Mike van den Bos on Unsplash

If you are building a neural network from scratch then you might face a situation where you have a number of images and you have to load them in an array . Basically convert pixel images into a number which is a readable form for the computer. I was trying to solve the classic cat v dog problem where i encountered this challenge . I have a directory full of images of cats and dogs and to read them into a numpy array was a challenging task.

Here i am sharing the block of code which help me achieved so and also a little explantaion of steps for you to customize as per your requirments.

import glob
import random

def read_image_to_array(path):
    
    # Convert all the file paths into a list 
    # filelist[1] = 'CatvDog/train/train\\cat.1.jpg'
    filelist = glob.glob(path) 
    
    # We shuffle all the elements of the list as the input images are in serail order of cats and dogs
    random.shuffle(filelist)
    
    #from the name of image we extract if its a cat or dog , cat = 0 , dog = 1
    y=[]
    for filepath in filelist:
        if (filepath.split("\\")[1].split(".")[0]) == "cat":
            y.append(0)
        elif (filepath.split("\\")[1].split(".")[0]) == "dog":
            y.append(1)
      
    #Since y is a list we convert it into a array, "-1" denotes number of images 
    Y = np.array(y).reshape(-1,1)
          
    #Syntax to read all images into a array
    X = np.array([np.array(cv2.resize(cv2.imread(fname), (128,128))) for fname in filelist])
    
    # Return 2 arrays of X and Y
    return X,Y

X,Y = read_image_to_array('CatvDog/train/train/*.jpg')

print(X.shape,Y.shape)
#(25000, 128, 128, 3) (25000, 1)

The Kaggle competition where you can download this data is dogs-vs-cats

The images provided in this competition were all of different sizes , hence this code converts them all to a standard size of 128×128 pixel . You may choose 64×64 or 256×256 depending on the available CPU or GPU . X.shape might be a bit confusing to you so let me explain . X.shape = (25000, 128, 128, 3) which means there are a total of 25k images given to us , since images are colored hence each image has 3 channels RGB (Red , Blue , Green) each channel is of dimension 128×128 hence you see 128,128,3 meaning 3 channels. For our neural network we will flatten each image into a 2d vector of (128x128x3,1) or (49152,1) . This means we will input close to 49k features to our neural network .

Image Courtesy : Andrew NG course on Deep Learning(Coursera)

Leave a Reply