Efficient cache tag management is a primary design objective for large, in-package DRAM caches. Recently, Tagless DRAM Caches (TDCs) have been proposed to completely eliminate tagging structures from both on-die SRAM and in-package DRAM, which are a major scalability bottleneck for future multi-gigabyte DRAM caches. However, TDC imposes a constraint on DRAM cache block size to be the same as OS page size (e.g., 4KB) as it takes a unified approach to address translation and cache tag management. Caching at a page granularity, or page-based caching, incurs significant off-package DRAM bandwidth waste by over-fetching blocks within a page that are not actually used. Footprint caching is an effective solution to this problem, which fetches only those blocks that will likely be touched during the page's lifetime in the DRAM cache, referred to as the page's footprint. In this paper we demonstrate TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme. Since there are no cache tags in TDC, the footprints of cached pages are tracked at TLB, instead of cache tag array, to incur much lower on-die storage overhead than the original design. Besides, when a cached page is evicted, its footprint will be stored in the corresponding page table entry, instead of an auxiliary on-die structure (i.e., Footprint History Table), to prevent footprint thrashing among different pages, thus yielding higher accuracy in footprint prediction. The resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency. Our evaluation with 3D Through-Silicon-Via-based in-package DRAM demonstrates an average reduction of off-package bandwidth by 32.0%, which, in turn, improves IPC and EDP by 17.7% and 25.4%, respectively, over the state-of-the-art TDC with no footprint caching.