In the last decade, NAND flash-based SSDs have been widely adopted for high-end enterprise systems in an attempt to provide a high-performance and reliable storage. However, inferior performance is frequently attained mainly due to the need for Garbage Collection (GC). GC in flash memory is the process of identifying and clearing the blocks of unneeded data to create space for the new data to be allocated. GC is a high-latency operation and once it is scheduled for service to a block of a plane in a flash chip (each flash chip consists of multiple planes), it can increase latency for later arriving I/O requests to the same plane. Apart from that, the consequent high latency also keep other planes of the same chip, that are not involved in this GC, idle for a long time. We show that for the baseline SSD with modern FTL, GC considerably reduces the plane-level parallelism, causing significant performance degradation. There are several circuit-level constraints that make it difficult to allow subsequent I/O operations and/or GCs to be served concurrently from the same chip, but different planes, during the long latency GC. This paper proposes a novel GC strategy, called Parallel GC (PaGC), whose goal is to proactively run GC on the remaining planes of a flash chip whenever any of its planes needs to execute on-demand GC. The resulting PaGC system boosts the response time of I/O requests by up to 45% (32% on average) for different GC settings and across a wide spectrum of enterprise I/O workloads.