{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
My goal in this series is to deploy a neural network capable of identifying and localizing pedestrians in an image (the combination of image classification and localization is called object detection). In the first part of the series I will do this by downloading a pretrained model and using transfer learning to fine tune it for my problem. In the second part, I will try to create and deploy a model from scratch.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "My main motivation for this project was simply to gain an understanding of deep learning, a field I knew nothing about prior to this project. Furthermore, object detection problems tend to involve much deeper networks than object classification, and the concepts involved here are highly transferable across other deep learning domains.
\n", "Of course, object detection has some pretty cool applications in its own right. For example, security companies build on top of object detection models to do things like gait analysis and tracking people across multiple cameras. Self-driving cars need to be able to perform object detection to avoid hitting pedestrians and other cars. And one could imagine lots of fun home applications for object detection.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why Transfer Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Practically speaking, you will almost always use transfer learning when dealing with neural network. The reason for this is twofold:\n", "1. Using a pretrained model drastically cuts down the amount of resources needed to fine tune a network. By using a pretrained model you will require fewer training samples and less computing time for a comparable level of accuracy.\n", "2. It is unlikely that you will be creating a deep learning model that is completely new. Most new models are really variations of existing models, so it makes sense to take advantage of existing work\n", "Consequently, transfer learning is one of the most important skills you can have with respect to deep learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is a Neural Network" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To understand what a neural network is, we can break the concept down into two components, structure and learning. \n", "\n", "**Structure**: Although different model types can have additional components, all neural networks have an input layer, some number of hidden layers, and an output layer, as seen in the picture bellow.\n", "\n", "The data I am using comes from https://www.kaggle.com/smeschke/pedestrian-dataset#crosswalk.csv. It consists of 3 videos of pedestrians using crosswalks in different situations as well as CSV files for each video which give the bounding box information for the pedestrians in the video in the following format, (x, y, height, width).
\n", "My goal in this section is to break down each video into its component frames in jpg format. I then need to create a data frame which contains the following columns: file path, width, height, class, xmin, xmax, ymin, ymax
\n", "It is also worth noting that all of the models in the object detection API are size agnostic. They perform all necessary image padding and scaling for you. However, I will go over how to do data standardization and augmentation in the next post.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extract Images from Video" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the data comes as video files, I will use cv2 to read it and capture frames. The code bellow carries out the following steps:\n", "\n", "1. Gets a list of videos in the data directories\n", "2. Defines a function which\n", " 1. reads a video file\n", " 2. creates a directory named after the video if none exists\n", " 3. captures a frame and saves it as a jpeg to the directory\n", "3. applies the function to all files in the list" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "import cv2\n", "import pandas as pd\n", "import numpy as np\n", "import os" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#get list of video files\n", "dataPath = 'data/'\n", "dataFiles = os.listdir(dataPath)\n", "videoFiles = [dataPath+file for file in dataFiles if file.endswith('.avi')]" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#define function to turn video into images\n", "def frame_capture(path): \n", " \n", " cap = cv2.VideoCapture(path) \n", " currentFrame = 0\n", " directory = path.strip('.avi')\n", " try:\n", " if not os.path.exists(directory):\n", " os.makedirs(directory)\n", " except OSError:\n", " print ('Error: Creating directory of data')\n", " # checks whether frames were extracted \n", " success = 1\n", "\n", " while success:\n", " # Capture frame-by-frame\n", " success, frame = cap.read()\n", "\n", " # Saves image of the current frame in jpg file\n", " name = directory + '/frame' + str(currentFrame) + '.jpg'\n", " cv2.imwrite(name, frame)\n", "\n", " # To stop duplicate images\n", " currentFrame += 1\n", "\n", " # When everything done, release the capture\n", " cap.release()\n", " cv2.destroyAllWindows()\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#turn each video file into a directory of image files\n", "for file in videoFiles:\n", " frame_capture(file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create CSV files for training/testing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that I have my images, I need to combine my image file paths with my bounding box data in the format required by the object detection API. I also need to split my data into a training set and testing set." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "boundingBoxFiles = ['data/night.csv', 'data/fourway.csv', 'data/crosswalk.csv'] " ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "#Reformat CSV data into required format\n", "pedestrian_labels = pd.DataFrame()\n", "for file in boundingBoxFiles:\n", " name = file.replace('/','.').split('.')[1]\n", " df = pd.read_csv(file)\n", " new_df = pd.DataFrame()\n", " #create columns for new dataframe\n", " new_df['filename'] = df.index.astype(str)\n", " new_df['filename'] = 'data/'+ name+ '/frame'+ new_df['filename']+ '.jpg'\n", " new_df['width'] = df.w\n", " new_df['height'] = df.h\n", " new_df['class'] = 'pedestrian'\n", " new_df['xmin'] = df.x\n", " new_df['ymin'] = df.y\n", " new_df['xmax'] = df.x + df.w\n", " new_df['ymax'] = df.y + df.h\n", " #store to central data frame\n", " pedestrian_labels = pedestrian_labels.append(new_df)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "#split data into training and testing set, then save file\n", "train_labels = pedestrian_labels.sample(frac=0.8,random_state=17)\n", "test_labels = pedestrian_labels.drop(train_labels.index)\n", "pedestrian_labels.to_csv('data/pedestrian_labels.csv', index=False)\n", "train_labels.to_csv('data/train_labels.csv', index=False)\n", "test_labels.to_csv('data/test_labels.csv', index=False)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " | filename | \n", "width | \n", "height | \n", "class | \n", "xmin | \n", "ymin | \n", "xmax | \n", "ymax | \n", "
---|---|---|---|---|---|---|---|---|
118 | \n", "data/night/frame118.jpg | \n", "156 | \n", "312 | \n", "pedestrian | \n", "1631 | \n", "473 | \n", "1787 | \n", "785 | \n", "
400 | \n", "data/night/frame400.jpg | \n", "172 | \n", "345 | \n", "pedestrian | \n", "1199 | \n", "484 | \n", "1371 | \n", "829 | \n", "
439 | \n", "data/night/frame439.jpg | \n", "145 | \n", "291 | \n", "pedestrian | \n", "1040 | \n", "470 | \n", "1185 | \n", "761 | \n", "
452 | \n", "data/night/frame452.jpg | \n", "143 | \n", "286 | \n", "pedestrian | \n", "966 | \n", "469 | \n", "1109 | \n", "755 | \n", "
455 | \n", "data/night/frame455.jpg | \n", "141 | \n", "283 | \n", "pedestrian | \n", "946 | \n", "466 | \n", "1087 | \n", "749 | \n", "