Traffic measurement is key to many network management tasks such as performance monitoring and cyber-security. Its aim is to inspect the packet stream passing through a network device, classify them into flows according to the header fields, and obtain statistics about the flows. For processing big streaming data in size-limited SRAM of line cards, many space-sublinear algorithms have been proposed, such as CountMin and CountSketch. However, most of them are designed for specific measurement tasks. Implementing multiple independent sketches places burden for online operations of a network device. It is highly desired to design a universal sketch that not only tracks individual large flows (called heavy hitters) but also reports overall traffic distribution statistics (called moments). The prior work UnivMon successfully tackled this ambitious quest. However, it incurs large and variable per-packet processing overhead, which may result in a significant throughput bottleneck in high-rate packet streaming, given that each packet requires 65 hashes and 64 memory accesses on average and many times of that in the worst case. To address this performance issue, we need to fundamentally redesign the solution architecture from hierarchical sampling to new progressive sampling and from CountSketch to new ActiveCM+, which ensure that per-packet overhead is a small constant (4 hash and 4 memory accesses) in the worst case, making it much more suitable for online operations, especially for pipeline implementation. The new design also makes effort to reduce memory footprint or equivalently improve measurement accuracy under the same memory. Our experiments show that our solution incurs just one sixteenth per-packet overhead of UnivMon, while improving measurement accuracy by three times under the same memory.