A community detection (CD) method is usually evaluated by what extent it is able to discover the 'ground-truth' community structure of a network. A certain 'node-centric metadata' is used to define the ground-truth partition. However, nodes in real networks often have multiple metadata types (e.g., occupation, location); each can potentially form a ground-truth partition. Our experiment with 10 CD methods on 5 datasets (having multiple metadata-based ground-truth partitions) show that the metadata-based evaluation is misleading because there is no single CD method that can outperform others by detecting all types of metadata-based partitions. We further show that the community structure obtained from the CD methods is usually topologically stronger than any metadata-based partitions. Finally, we suggest a new task-based evaluation framework for CD methods and show that a certain type of CD methods is useful for a certain type of task.