Dogs vs Cats Classification - Transfer Learning using Tensor Fl

This project is to detect a picture is whether for Dog or Cat. Write an algorithm to classify whether images contain either a dog or a cat

  1. Problem: To detect the where the given picture is for Dog or Cat

  2. Data Data is coming from the Kaggle

Note:- I using google colab for this model. So you have to save data in your google drive and connect your google drive to google colab.

  1. Evaluation: For each image in the test set, predict a label for its id (1 = dog, 0 = cat):

  2. Features:- There is train data with 25k images with label 1 = dog and 0 = cat There is test data with 12.5k images with no labels

Code:-

First, we have import some module which will be needed in future

# import EDA(Exploratory data analysis)
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from IPython.display import Image

# import evaluation data
import tensorflow as tf 
import tensorflow_hub as hub

Getting data from the file

  • Set the label according to pic name provided, if in pic name 'cat' present then set the label as 0 and if in pic name 'dog' present set the label name like '1'
  • Some visualization of the picture
# Get the all picture name list 
import os 
pic_name_list = os.listdir('/content/drive/My Drive/Dog vs Cat/train')
len(pic_name_list)

Result -

25000

Check the list of name

pic_name_list[:5]

result -

['cat.9552.jpg',
 'cat.9562.jpg',
 'cat.9545.jpg',
 'cat.9542.jpg',
 'cat.9590.jpg']
  • Now in a given training data label is not found so you have to set your own labels using picture name.
picture_label = []
for i in range(len(pic_name_list)):
  picture_label.append(pic_name_list[i].split('.')[0])

len(picture_label), picture_label[-5:]

Result

25000
['dog', 'dog', 'dog', 'dog', 'dog']
  • Now will get file pathname of all the picture which will use as a feature.
## Get the filenames for the picture
file_name = ['/content/drive/My Drive/Dog vs Cat/train/' + obj_name for obj_name in os.listdir('/content/drive/My Drive/Dog vs Cat/train')]
file_name[:5]

Result

['/content/drive/My Drive/Dog vs Cat/train/cat.9552.jpg',
 '/content/drive/My Drive/Dog vs Cat/train/cat.9562.jpg',
 '/content/drive/My Drive/Dog vs Cat/train/cat.9545.jpg',
 '/content/drive/My Drive/Dog vs Cat/train/cat.9542.jpg',
 '/content/drive/My Drive/Dog vs Cat/train/cat.9590.jpg']

Now check the path name is correct of not

## Display some picture
Image(file_name[1])

Result

Another example

Image(file_name[-1])

Result

  • Creating a data frame for our data
# Creating the data fram
df_catvsdog = pd.DataFrame({'file_name':file_name, 
                            'picture_label':picture_label},)
df_catvsdog.head()

Result

    file_name    picture_label
0    /content/drive/My Drive/Dog vs Cat/train/cat.9...    cat
1    /content/drive/My Drive/Dog vs Cat/train/cat.9...    cat
2    /content/drive/My Drive/Dog vs Cat/train/cat.9...    cat
3    /content/drive/My Drive/Dog vs Cat/train/cat.9...    cat
4    /content/drive/My Drive/Dog vs Cat/train/cat.9...    cat
  • Check how many data present for dog and cat
df_catvsdog['picture_label'].value_counts().plot(kind='bar');
plt.xticks(rotation =True)

Result

df_catvsdog['picture_label'].value_counts()

Result

dog    12500
cat    12500
Name: picture_label, dtype: int64

Shuffle the data so that there will be a sequence of data changed Note :- Random state is use because every time you shuffle the data always be same

from sklearn.utils import shuffle
df_shuffle = shuffle(df_catvsdog, random_state=101).reset_index(drop='True')
df_shuffle.head()

Result

file_name    picture_label
0    /content/drive/My Drive/Dog vs Cat/train/cat.3...    cat
1    /content/drive/My Drive/Dog vs Cat/train/cat.6...    cat
2    /content/drive/My Drive/Dog vs Cat/train/dog.7...    dog
3    /content/drive/My Drive/Dog vs Cat/train/cat.1...    cat
4    /content/drive/My Drive/Dog vs Cat/train/dog.7...    dog

Check some images again

Image(df_shuffle.file_name[0])

Result

  • As you know the computer always works on the number so we have to change our label into a number
X = df_shuffle['file_name']
labels = df_shuffle['picture_label']

unique_labels = np.unique(labels)

y = [label == unique_labels for label in labels]

unique_labels

Result

array(['cat', 'dog'], dtype=object)

Get data ready for Evaluation For that:-

  1. Change Images into IMAGE number
  2. Convert the IMAGE NUMBER Into Tensor Image Number
  3. Make all number which is present Tensor Image Number in between (0,1)
  4. Create the batches
  5. Visualize the data

Note:- Always try to create a function it can be more generalized to use

Here, we create some function which helps to change our images into tensor images

# Convert the image into tensor, channnelize ,resize and value in between [0,1].
def Image_to_Tensor(path, img_height= 224, img_width=224):
  """
  This function will convert the image into tensor, channnelize ,resize and value in between [0,1].
  """
  ## Read the picture
  image = tf.io.read_file(path)
  ## read the image in RBG format
  image = tf.image.decode_jpeg(image, channels=3)
  ##change the image so each value lie in between 0 to 1
  image = tf.image.convert_image_dtype(image, tf.float32 )
  ## Resize the image according to given imput 
  image = tf.image.resize(image, [img_height, img_width])

  return image
## Data in the form of (image, label)

def data_layout(path, label):
  '''
  This function will retrun in data in form of (image, label)
  '''

  image = Image_to_Tensor(path)

  return image, label

In deep learning/computer vision, it is better to make your data into batches. It will help model to generalize their prediction and help to improve the model.

## Define the function which convert data into batches

def process_image(X, y=None, batch_size=32, test_data=None, val_data=None):

  '''
  This function will convert the process the image path and convert them into format that can be evaluatable
  '''

  # process image for test data
  if test_data:
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X)))
    data_batch = data.map(Image_to_Tensor).batch(batch_size)

  # proces for validation data
  elif val_data:
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X),
                                              tf.constatn(y)))
    data_batch = data.map(data_layout).batch(batch_size)

  # Validation image for training data
  else:
    data = tf.data.Dataset.from_tensor_slices((tf.constant(X),
                                              tf.constant(y)))
    data = data.shuffle(buffer_size=len(X))

    data_batch = data.map(data_layout).batch(batch_size)

  return data_batch

Create the train batches

full_data= process_image(X, y)
full_data

Result

<BatchDataset shapes: ((None, 224, 224, 3), (None, 2)), types: (tf.float32, tf.bool)>

Visualizing the image after resize

def visual_image(images, labels):
  '''
  Visualizing the 25 images
  '''

  plt.figure(figsize =(10,10))
  for i in range(25):
    plt.subplot(5, 5, i+1)
    plt.imshow(images[i])
    #plt.title('Cat' if labels[i] == False else 'Dog')
    plt.xticks([])
    plt.yticks([])
    plt.axis('off')

Now try to visualize 25 Images

images, labels = next(full_data.as_numpy_iterator())
visual_image(images, labels)

Result

Setting for model

  1. Set the data format for INPUT, OUTPUT and model URL
  2. Use a sequential method of tf.keras for model
  3. compile the model with loss function, Optimizer, and metric
  4. Build the model

Note:- Here we use the transfer learning method. So we will those will actually be trained. Here we are using one of the models from tensor flow hub %[tfhub.dev/google/imagenet/mobilenet_v2_130_..

Again we try to create the function which will help to generalize our code

# Set the Image size
IMG_HEIGHT = 224
IMG_WIDTH  = 224

# Set the input
INPUT = [None, IMG_HEIGHT, IMG_WIDTH, 3]

# Set the Output
OUTPUT = len(unique_labels)

# URL For the model
MODEL_URL = 'https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4'

The function which will create the model

 define the function to create the model
def create_model(INPUT_SHAPE , OUTPUT_SHAPE, Model_url):

  '''
  The fuction used for create the model 
  '''
  print('Model training:', Model_url)
  # crete the model 
  model = tf.keras.Sequential([
                               hub.KerasLayer(Model_url) ,
                               tf.keras.layers.Dense(units = OUTPUT_SHAPE, 
                                                     activation = 'softmax')
                               ])

  # Compile the model
  model.compile(loss = tf.keras.losses.CategoricalCrossentropy(),
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

  # Build the model
  model.build(INPUT_SHAPE)

  # Retrun the model
  return model

Creating callbacks Callbacks are helper function a model can use during training to do such things as save its progress, check its progress or stop training early if a model stop improving

We'll create two callbacks, one for TesnsorBoard which helps track our model's progress and another for early stopping which prevents our model fro training too long.

TensorBoard Callback To tensorborad notebook:-

  1. Load the TensorBoard notebook extension
  2. Create a TensorBoard callback which is able to save logs to a directory and pass it to model's fit() function
  3. Visualize our models training logs with the %tensorboard magic function(after training the model)
## Load TensorBoard notebook extension
%load_ext tensorboard
import datetime

def create_tensorboard_callback():
  # Create a log directory for storing TensorBoard Logs
  logdir = os.path.join('/content/drive/My Drive/Dog Vision/logs',
                        # Make it so the logs get tracked whenever we run an experiment'
                        datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
  return tf.keras.callbacks.TensorBoard(logdir)

Early stopping Early stopping helps stop our model from overfitting by stopping training if a certail evaluation metric stops improving

full_model = create_model(INPUT, OUTPUT, MODEL_URL)

Train the model

## Train the model 
full_model.fit(x=full_data,
               epochs = 100,
               callbacks =[tensorboard, early_stopping])

The result - It will take approve 2hrs to process 25000 data

Epoch 1/100
782/782 [==============================] - 9054s 12s/step - loss: 0.0591 - accuracy: 0.9782
Epoch 2/100
782/782 [==============================] - 119s 152ms/step - loss: 0.0343 - accuracy: 0.9889
Epoch 3/100
782/782 [==============================] - 119s 152ms/step - loss: 0.0306 - accuracy: 0.9901
Epoch 4/100
782/782 [==============================] - 118s 150ms/step - loss: 0.0285 - accuracy: 0.9904
Epoch 5/100
782/782 [==============================] - 118s 151ms/step - loss: 0.0268 - accuracy: 0.9912
Epoch 6/100
782/782 [==============================] - 118s 151ms/step - loss: 0.0264 - accuracy: 0.9918
Epoch 7/100
782/782 [==============================] - 118s 151ms/step - loss: 0.0269 - accuracy: 0.9906
Epoch 8/100
782/782 [==============================] - 119s 153ms/step - loss: 0.0233 - accuracy: 0.9921
Epoch 9/100
782/782 [==============================] - 119s 152ms/step - loss: 0.0235 - accuracy: 0.9917
Epoch 10/100
782/782 [==============================] - 118s 151ms/step - loss: 0.0230 - accuracy: 0.9918
Epoch 11/100
782/782 [==============================] - 120s 153ms/step - loss: 0.0234 - accuracy: 0.9920
<tensorflow.python.keras.callbacks.History at 0x7f919868a898>

Save the model

#Save the model 
def save_model(model, prefix = None):
  '''
  This function is use for saving the model
  '''

  model_dir = os.path.join('/content/drive/My Drive/Dog vs Cat/Models/',
                           datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
                           )
  model_path = model_dir + '-' + prefix  + '.h5'

  print(f'Saving the model at {model_path}....')

  model.save(model_path)

  return model_path

Load the model

# Load the model
def load_model(path_name):

  '''
  This fuction is use for loading the model
  '''

  model = tf.keras.models.load_model(path_name,
                                     custom_objects = {'KerasLayer': hub.KerasLayer})

  return model

Save the model so that we can use it in the future

save_model(full_model, prefix='full_model_cat_vs_dog_mobilenetv2_adam')

Result

Saving the model at /content/drive/My Drive/Dog vs Cat/Models/20200815-103434-full_model_cat_vs_dog_mobilenetv2_adam.h5....
'/content/drive/My Drive/Dog vs Cat/Models/20200815-103434-full_model_cat_vs_dog_mobilenetv2_adam.h5

Lode the model

full_loaded_model = load_model('/content/drive/My Drive/Dog vs Cat/Models/20200815-103434-full_model_cat_vs_dog_mobilenetv2_adam.h5')

You can evaluate that both models are same

full_model.evaluate(val_data_batches)

Result

13/13 [==============================] - 2s 138ms/step - loss: 0.0064 - accuracy: 0.9975
[0.006375470198690891, 0.9975000023841858]

for loaded model

full_loaded_model.evaluate(val_data_batches)

Result

13/13 [==============================] - 2s 135ms/step - loss: 0.0064 - accuracy: 0.9975
[0.006375469733029604, 0.9975000023841858]

Now it's time to test your model on test data. Are you excited?

Load the path for test data

## Loading data of test data path

test_data_path = os.listdir('/content/drive/My Drive/Dog vs Cat/test1')
test_file_name = ['/content/drive/My Drive/Dog vs Cat/test1/' + name for name in test_data_path]
len(test_file_name)

Result

12500

create the file path for test data

# Test on first 25 images
test_draft_data = np.array(test_file_name[:25])
test_draft_data[:5]

Result

array(['/content/drive/My Drive/Dog vs Cat/test1/12041.jpg',
       '/content/drive/My Drive/Dog vs Cat/test1/12027.jpg',
       '/content/drive/My Drive/Dog vs Cat/test1/12045.jpg',
       '/content/drive/My Drive/Dog vs Cat/test1/12026.jpg',
       '/content/drive/My Drive/Dog vs Cat/test1/12020.jpg'], dtype='<U50')

Process the image on test data and predict the data

test_draft_data_batch = process_image(test_draft_data, test_data=True)
test_draft_data_predict = full_loaded_model.predict(test_draft_data_batch)

Create which will help to visualize you result

def plot_predicted_figure(images, predict_label):
  '''
  This function will show the images of predicted label

  For Valid data set if predicted label and true label not match the title will be 'RED' else 'GREEN'
  '''
  plt.imshow(images)
  plt.xticks([])
  plt.yticks([])
  ## Check wheather the label is correct or not
  colors = 'orange'
  if true_label:
    if true_label == predict_label:
      colors = 'green'
    else:
      colors = 'red'
  plt.title(predict_label, color=colors)

Visualize 25 images of data, Note:- Here we doing unbatchify process because we are creating an image from batch data and batch data in the form of (Images, labels) so we have to unlatch the see the result

## Define the function to which is use to visualize the predicted model

def Show_predicted_25_image(prediction, data_batches, test_data = False):

  ''' 
  This fuction is use to show the first 25 predicted images

  '''
  ## Number fo images to be printed
  nrows = 5
  ncols = 5
  num_images = nrows * ncols

  ## Define which help to return the predicted label value
  def know_predicted_label(label_weight):

    label = unique_labels[np.argmax(label_weight)]
    return label

  ## Getting labels which are predicted by model
  predicted_label = []
  for i in range(num_images):
    predicted_label.append(know_predicted_label(prediction[i]))

  ## Unbatchify the image, label from batch data
  images = []

  for image in data_batches.unbatch().as_numpy_iterator():
     images.append(image)

  ## Visualizing the images and their label values for
  num_images = nrows * ncols
  plt.figure(figsize=(8,10))
  for i in range(num_images):
    plt.subplot(5, 5, i+1)
    plot_predicted_figure(images[i], 
                          predicted_label[i])
  plt.show()

So you are ready to see you result !!!!!!! Here you are

Show_predicted_25_image(test_draft_data_predict, test_draft_data_batch, test_data=True)

Result

Now save your result in txt format

## Save predictions(numpy array) to csv file
np.savetxt('/content/drive/My Drive/Dog vs Cat/preds_array_cat_vs_dog_test_data.csv', full_test_data_predictions, delimiter =",")

Predict on your's provided pictures

For that save some pictures in your google drive. Then do the same process of test data prediction.

## Loading data of test data path

user_data_path = os.listdir('/content/drive/My Drive/Dog vs Cat/User')
user_file_name = ['/content/drive/My Drive/Dog vs Cat/User/' + name for name in user_data_path]

Process the image

# Creating batches for full data set
user_data_batch = process_image(user_file_name, test_data=True)

Prediction on pictures

# Prediction on user
user_prediction = full_loaded_model.predict(user_data_batch)

Create a function to visualize your data

def cat_vs_dog_predicition_user(predicted_value, data_batch):

  # Get the breed name from the predicted Value
  predicted_label = []
  for i in range(len(predicted_value)):
    predicted_label.append(unique_labels[np.argmax(predicted_value[i])])

  # Unbatchify the image
  images_usr = []
  for images in data_batch.unbatch().as_numpy_iterator():
    images_usr.append(images)


  # Show the image and their label
  plt.figure(figsize=(20,20))
  for i in range(len(predicted_label)):
    plt.subplot(1,4, i+1)
    plt.imshow(images_usr[i])
    plt.xticks([])
    plt.yticks([])
    plt.title(predicted_label[i])
    # Turn the greed line off
    plt.axis('off')
cat_vs_dog_predicition_user(user_prediction, user_data_batch)

Result Look at them they are so cute .........