Learn Python – Itertools module in python for data analytics
image courtesy: Dataquest.io |
Itertools let you create combine multiple list into one or generate permutations, combinations of a set and to find subset in just a method call of itertools.
You can implement some fun and interesting tactics with it like creating infinite counting series of number, creating a cycle and make number repeat itself for times.
The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination.
Let’s generate a sequence of infinite positive number starting from 5 using itertools method count():
It will generally return an iterative variable that you can use in your looping, but in code commented out is the output how it will be actually
import itertools
for i in itertools.count(5):
if i == 100:
break;
print(i)
# 5 6 7 8 9 10 11 12 …
# passing additional argument to method call you can set step size
for i in itertools.count(5,0.5):
if i == 100:
break;
print(i)
# 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 …
Always make sure to give breaking condition in your loop unless you don’t want a loop to continue forever
What if we have to repeat in a loop of ABCD only? Just a simple method call and pass the sequence of alphabet or digit or symbols. cycle() is method that make the task simple let’s see:
import itertools
count = 0;
for i in itertools.cycle(‘ABCD’):
if count > 50:
break;
print(i); count = count + 1;
# A B C D A B C D A B C ….
Suppose from a long set you have to generate all possible subset that can be form from taking two elements at a time. Using itertools.combinations() you can get all possible subsets. Argument could be numpy array or a list or a tuple or just a string.
import itertools
#list we pass to combinations() as a set
ls = [5,6,8,7,9,10,12]
for i in itertools.combinations(ls,2):
print(i)
Let’s look at another interesting method that will be always come in handy if you are doing data cleaning. Here we have list of natural numbers from which we want only positive numbers and all other negative numbers to be dropped.
import itertools
age = [18,19,17,20,0,-20,24,-1]
filtered_age = []
cond = itertools.filterfalse(lambda x : x<=0,age)
for a in cond:
filtered_age.append(a)
print(filtered_age)
It will output a list that only contains positive age, though age cannot be 0 or negative unless we are time travelling back ☺
+
Iterators terminating on the shortest input sequence:
Iterator | Arguments | Results | Example |
accumulate() | p [,func] | p0, p0+p1, p0+p1+p2, … | accumulate([1,2,3,4,5]) –> 13 6 10 15 |
chain() | p, q, … | p0, p1, … plast, q0, q1, … | chain(‘ABC’, ‘DEF’) –> A B CD E F |
chain.from_iterable() | iterable | p0, p1, … plast, q0, q1, … | chain.from_iterable([‘ABC’,’DEF’]) –> A B C D E F |
compress() | data, selectors | (d[0] if s[0]), (d[1] if s[1]), … | compress(‘ABCDEF’,[1,0,1,0,1,1]) –> A C E F |
dropwhile() | pred, seq | seq[n], seq[n+1], starting when pred fails | dropwhile(lambda x: x<5,[1,4,6,4,1]) –> 6 4 1 |
filterfalse() | pred, seq | elements of seq where pred(elem) is false | filterfalse(lambda x: x%2,range(10)) –> 0 2 4 6 8 |
groupby() | iterable[, key] | sub-iterators grouped by value of key(v) | |
islice() | seq, [start,] stop [, step] | elements from seq[start:stop:step] | islice(‘ABCDEFG’, 2, None) –>C D E F G |
starmap() | func, seq | func(*seq[0]), func(*seq[1]), … | starmap(pow, [(2,5), (3,2),(10,3)]) –> 32 9 1000 |
takewhile() | pred, seq | seq[0], seq[1], until pred fails | takewhile(lambda x: x<5,[1,4,6,4,1]) –> 1 4 |
tee() | it, n | it1, it2, … itn splits one iterator into n | |
zip_longest() | p, q, … | (p[0], q[0]), (p[1], q[1]), … | zip_longest(‘ABCD’, ‘xy’,fillvalue=’-‘) –> Ax By C- D- |
Infinite iterators:
Iterator | Arguments | Results | Example |
count() | start, [step] | start, start+step, start+2*step, … | count(10) –> 10 11 12 13 14… |
cycle() | p | p0, p1, … plast, p0, p1, … | cycle(‘ABCD’) –> A B C D AB C D … |
repeat() | elem [,n] | elem, elem, elem, … endlessly or up to n times | repeat(10, 3) –> 10 10 10 |
Combinatoric iterators:
Iterator | Arguments | Results |
product() | p, q, … [repeat=1] | cartesian product, equivalent to a nested for-loop |
permutations() | p[, r] | r-length tuples, all possible orderings, no repeated elements |
combinations() | p, r | r-length tuples, in sorted order, no repeated elements |
combinations_with_replacement() | p, r | r-length tuples, in sorted order, with repeated elements |
product(‘ABCD’, repeat=2) | AA AB AC AD BA BB BC BD CA CB CC CD DADB DC DD | |
permutations(‘ABCD’, 2) | AB AC AD BA BC BD CA CB CD DA DB DC | |
combinations(‘ABCD’, 2) | AB AC AD BC BD CD | |
combinations_with_replacement(‘ABCD’,2) | AA AB AC AD BB BC BD CC CD DD |