- Published on
7. List, set, and dictionary in Pyhon
- Authors
7. If you have to choose between a list, set, and a dictionary to store 10 million integers, what will you use? Bear in mind that you would later like to query the frequency of a number within the dataset.
The problem:
- Store 10 million integers.
- Later query the frequency of any number efficiently.
Option 1: List
- Stores duplicates, keeps order.
- To find frequency:
list.count(x)
→ O(n) time. - With 10 million integers, frequency queries would be very slow.
- Memory: stores all values including duplicates (could be large).
Option 2: Set
- Stores only unique elements (no duplicates).
- No frequency info is stored, so you cannot query frequency.
- Good for membership tests (is x present?), but not frequency.
- Memory: smaller than list if many duplicates.
collections.Counter
)
Option 3: Dictionary (or - Keys: integers; Values: frequency counts.
- Frequency queries: O(1) average time.
- Stores frequency directly, so you get what you need efficiently.
- Memory: overhead for dictionary, but usually acceptable for 10 million entries (especially if many duplicates).
Recommendation:
Use a dictionary or collections.Counter
to store frequencies.
Example:
from collections import Counter
data = [...] # your 10 million integers list
freq = Counter(data)
# Query frequency of a number:
print(freq[42]) # O(1) time
Summary:
Data Structure | Supports Frequency Query? | Query Time | Memory Use |
---|---|---|---|
List | Yes (with count()) | O(n) (slow) | High (stores all values) |
Set | No | N/A | Lower (unique values only) |
Dictionary | Yes | O(1) (fast) | Higher (stores counts) |
Final answer:
Use a dictionary (Counter
) to efficiently store and query frequencies.