CNN (Convolution neural network), which is used for image classification and speech recognition among
neural networks learning based on positive data, has been continuously developed to have a high
performance structure to date. There are many difficulties to utilize in an embedded system with limited
resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is
used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies
greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread
(SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing
Convolution and Pooling operations with threads. The remaining threads have increased the operation speed
by using the method used in the following feature maps and kernel calculations.