This paper presents the design and performance evaluation of our high-performance decoupled architecture, the HiDISC (Hierarchical Decoupled Instruction Stream Computer). HiDISC provides low memory access latency by introducing enhanced data prefetching techniques at both the hardware and the software levels. Three processors, one for each level of the memory hierarchy, act in concert to mask the memory latency. Our performance evaluation benchmarks include the Data-Intensive Systems Benchmark suite and the DIS Stressmark suite. Our simulation results point to a distinct advantage of the HiDISC system over current prevailing superscalar architectures for both sets of the benchmarks. On the average, a 12% improvement in performance is achieved while 17% of cache misses are eliminated.