Learn Python – Itertools module in python for data analytics

image courtesy: Dataquest.io
Itertools let you create combine multiple list into one or generate permutations, combinations of a set and to find subset in just a method call of itertools.

You can implement some fun and interesting tactics with it like creating infinite counting series of number, creating a cycle and make number repeat itself for times.

The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination.

Let’s generate a sequence of infinite positive number starting from 5 using itertools method count():

It will generally return an iterative variable that you can use in your looping, but in code commented out is the output how it will be actually
import itertools
for i in itertools.count(5):
 if i == 100:
   break;
 print(i)
# 5 6 7 8 9 10 11 12 …
# passing additional argument to method call you can set step size
for i in itertools.count(5,0.5):
 if i == 100:
   break;
 print(i)
# 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 …
Always make sure to give breaking condition in your loop unless you don’t want a loop to continue forever
What if we have to repeat in a loop of ABCD only? Just a simple method call and pass the sequence of alphabet or digit or symbols. cycle() is method that make the task simple let’s see:

import itertools
count = 0;
for i in itertools.cycle(‘ABCD’):
 if count > 50:
   break;
 print(i); count = count + 1;
# A B C D A B C D A B C ….

Suppose from a long set you have to generate all possible subset that can be form from taking two elements at a time. Using itertools.combinations() you can get all possible subsets. Argument could be numpy array or a list or a tuple or just a string.

import itertools
#list we pass to combinations() as a set
ls = [5,6,8,7,9,10,12]
for i in itertools.combinations(ls,2):
 print(i)
Let’s look at another interesting method that will be always come in handy if you are doing data cleaning. Here we have list of natural numbers from which we want only positive numbers and all other negative numbers to be dropped.
import itertools
age = [18,19,17,20,0,-20,24,-1]
filtered_age = []
cond = itertools.filterfalse(lambda x : x<=0,age)
for a in cond:
 filtered_age.append(a)
print(filtered_age)
It will output a list that only contains positive age, though age cannot be 0 or negative unless we are time travelling back
+

Iterators terminating on the shortest input sequence:

Iterator
Arguments
Results
Example
accumulate()
p [,func]
p0, p0+p1, p0+p1+p2, …
accumulate([1,2,3,4,5]) –> 13 6 10 15
chain()
p, q, …
p0, p1, … plast, q0, q1, …
chain(‘ABC’, ‘DEF’) –> A B CD E F
chain.from_iterable()
iterable
p0, p1, … plast, q0, q1, …
chain.from_iterable([‘ABC’,’DEF’]) –> A B C D E F
compress()
data, selectors
(d[0] if s[0]), (d[1] if s[1]), …
compress(‘ABCDEF’,[1,0,1,0,1,1]) –> A C E F
dropwhile()
pred, seq
seq[n], seq[n+1], starting when pred fails
dropwhile(lambda x: x<5,[1,4,6,4,1]) –> 6 4 1
filterfalse()
pred, seq
elements of seq where pred(elem) is false
filterfalse(lambda x: x%2,range(10)) –> 0 2 4 6 8
groupby()
iterable[, key]
sub-iterators grouped by value of key(v)
islice()
seq, [start,] stop [, step]
elements from seq[start:stop:step]
islice(‘ABCDEFG’, 2, None) –>C D E F G
starmap()
func, seq
func(*seq[0]), func(*seq[1]), …
starmap(pow, [(2,5), (3,2),(10,3)]) –> 32 9 1000
takewhile()
pred, seq
seq[0], seq[1], until pred fails
takewhile(lambda x: x<5,[1,4,6,4,1]) –> 1 4
tee()
it, n
it1, it2, … itn splits one iterator into n
zip_longest()
p, q, …
(p[0], q[0]), (p[1], q[1]), …
zip_longest(‘ABCD’, ‘xy’,fillvalue=’-‘) –> Ax By C- D-


Infinite iterators:

Iterator
Arguments
Results
Example
count()
start, [step]
start, start+step, start+2*step, …
count(10) –> 10 11 12 13 14…
cycle()
p
p0, p1, … plast, p0, p1, …
cycle(‘ABCD’) –> A B C D AB C D …
repeat()
elem [,n]
elem, elem, elem, … endlessly or up to n times
repeat(10, 3) –> 10 10 10

Combinatoric iterators:
Iterator
Arguments
Results
product()
p, q, … [repeat=1]
cartesian product, equivalent to a nested for-loop
permutations()
p[, r]
r-length tuples, all possible orderings, no repeated elements
combinations()
p, r
r-length tuples, in sorted order, no repeated elements
combinations_with_replacement()
p, r
r-length tuples, in sorted order, with repeated elements
product(‘ABCD’, repeat=2)
AA AB AC AD BA BB BC BD CA CB  CC CD DADB DC DD
permutations(‘ABCD’, 2)
AB AC AD BA BC BD CA CB CD  DA DB DC
combinations(‘ABCD’, 2)
AB AC AD BC BD CD
combinations_with_replacement(‘ABCD’,2)
AA AB AC AD BB BC BD CC CD DD


Source method list: official python documentation

Python Programs On Demand ‘Free’