Course Content
Optimization Techniques in Python
Optimization Techniques in Python
Sets and Tuples
Before we proceed with sets and tuples, it is important to mention that we won't discuss dictionaries here.
Set
A set offers an average O(1) time complexity for insertions, deletions, and lookups, meaning these operations are performed in constant time, regardless of the set’s size. This makes sets much faster than lists for membership testing and operations like adding or removing elements, where lists require O(n) time complexity (time grows linearly with the size of the list) in the worst case.
When to use:
- You need unique elements, ensuring that there are no duplicates in your collection;
- Fast membership testing is required, making sets ideal for tasks like checking for the existence of an item;
- You are performing operations such as set unions, intersections, or differences, which sets support with optimized methods;
- Order does not matter, as sets are inherently unordered, and there’s no need to maintain insertion order.
# Removing duplicates from a list using a set numbers = [1, 3, 2, 3, 5, 4, 5] unique_numbers = set(numbers) print(f'Unique Numbers: {unique_numbers}') # Fast membership testing names = {'Alice', 'Bob', 'Charlie'} print(f'Is Alice in the set? {"Alice" in names}') print(f'Is Eve in the set? {"Eve" in names}') # Set operations: union, intersection, and difference set_a = {1, 4, 3, 2} set_b = {3, 5, 4, 6} print(f'Union: {set_a.union(set_b)}') print(f'Intersection: {set_a.intersection(set_b)}') print(f'Difference: {set_a.difference(set_b)}') # Removing elements from a set safely with discard names.discard('Alice') # Safe removal, no error if the element doesn't exist print(f'Names after removal: {names}')
Let's now compare the performance of a set to a list in membership testing:
import os os.system('wget https://staging-content-media-cdn.codefinity.com/courses/8d21890f-d960-4129-bc88-096e24211d53/section_1/chapter_3/decorators.py 2>/dev/null') from decorators import timeit_decorator # Create a large list and set with the same elements large_list = list(range(10000000)) large_set = set(large_list) # Test membership for an element at the end element_to_find = 9999999 @timeit_decorator(number=50) def test_membership(element, collection): return element in collection print('List:') print(test_membership(element_to_find, large_list)) print('Set:') print(test_membership(element_to_find, large_set))
Tuple
Tuples are typically used when you need to ensure that the data cannot be changed or as a key in a dict
or element in set
(because tuples are hashable).
-
Better than lists: when you need immutable data, want to use the collection as a dictionary key or set element, or when you need memory-efficient storage for a fixed-size collection of items;
-
Better than NumPy arrays: when your data is non-numerical or when immutability is crucial. While NumPy arrays are designed for numerical computations and are mutable by default, tuples provide safety for non-numerical data or small, structured collections that must remain constant.
# Each tuple in the list represents an immutable student record students = [ (1834, 'James', 'Johnson'), (2749, 'Alice', 'Smith'), (4923, 'Bob', 'Brown') ] # Attempting to modify a tuple will raise a TypeError students[0][1] = 'Fred'
Since the students
list contains student records (ID, first name, last name) that need to be read-only, it is better to use tuples for each record instead of lists. Additionally, as mentioned above, tuples are slightly more memory-efficient compared to lists.
1. Which of the following scenarios is best suited for using a set instead of a list?
2. You have a dataset with millions of records and need to frequently check if specific values exist within it. Which data structure is the most efficient for this purpose?
3. You are creating a record for each student that includes a unique ID, first name, and last name. The data should not be altered once created. Which data structure would be the most appropriate?
Thanks for your feedback!