Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Python Collections (Retired) Dictionaries Membership

willyraider
willyraider
6,550 Points

Incremental order in for loop

Is there an incremental order in python for "for loops"?

Because if I execute the attached counts.py several times, the order of the output changes.

Output:

treehouse:~/workspace$ python counts.py                                                          
apples                                                                                           
bananas                                                                                          
coconuts                                                                                         
treehouse:~/workspace$ python counts.py                                                          
apples                                                                                           
coconuts                                                                                         
bananas                                                                                          
treehouse:~/workspace$ python counts.py                                                          
coconuts                                                                                         
bananas                                                                                          
apples  
counts.py
# You can check for dictionary membership using the
# "key in dict" syntax from lists.

### Example
my_dict = {'apples': 1, 'bananas': 2, 'coconuts': 3}
my_list = ['apples', 'coconuts', 'grapes', 'strawberries']
# members(my_dict, my_list) => 2

def members(my_dict, my_list):
  count = 0
  for key in my_dict:
    print(key)
    if key in my_list:
      count += 1
  return count

members(my_dict, my_list)

3 Answers

Chris Freeman
MOD
Chris Freeman
Treehouse Moderator 68,423 Points

Thanks Josh Keenan for the mention. In general, I research this info as needed and don't have it "on recall".

In simple terms, the items in a dictionary are stored in "slots". Which slot to use for each entry is determined by a hashing algorithm that returns an integer based on the key, the value, and some "secret keys". This integer is used to pick the slot to store the item in the dictionary. The secret keys change for each run of python.

Because hashing is used beyond dictionaries (such has obscuring original data values), the exact hash value returned is not deterministic. That is, the hash of a specific item will change between instantiations of python, but will be the same for the lifetime of that object with a python session. So each time you run your program the resulting hash values of your dictionary keys will change. This may change the slots where the values are held. The slot order is determines the print order.

A dict is initialized to 8 slots and is resized when it reaches two-thirds full. With 8 initial slots [0-7], a binary mask of 0b111 will extracted the lowest 3 bits of the hash value which can map to the slot number. Using a function to extract the binary value of the hash integer to handle negative binary numbers.

For example:

## Run 1
$ python
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def to_twoscomplement(bits, value):
...     if value < 0:
...         value = ( 1<<bits ) + value
...     formatstring = '{:0%ib}' % bits
...     return formatstring.format(value)
... 
>>> # hash 0
... hash('0')
-5983707641340658336
>>> # binary value
... to_twoscomplement(64, hash('0'))
'1010110011110101100110011001101001101110000110010111110101100000'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('0'))[-3:]
'000'
>>> # slot value 
... hash('0') & 0x7
0
>>> 
>>> # hash 1
... hash('1')
-1529977540348658223
>>> # binary value
... to_twoscomplement(64, hash('1'))
'1110101011000100011011011000100010000001001100010011110111010001'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('1'))[-3:]
'001'
>>> # slot value 
... hash('1') & 0x7
1
>>> 
>>> # hash 2
... hash('2')
-3394813998062570398
>>> # binary value
... to_twoscomplement(64, hash('2'))
'1101000011100011001100101010011110111110111101001111010001100010'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('2'))[-3:]
'010'
>>> # slot value 
... hash('2') & 0x7
2

# Above hash values predict a dictionary will print the keys in order: '0', '1', '2':
>>> d1 = {'0': 'a', '1': 'b', '2': 'c'}
>>> d1
{'0': 'a', '1': 'b', '2': 'c'}

## Run 2
$ python
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def to_twoscomplement(bits, value):
...     if value < 0:
...         value = ( 1<<bits ) + value
...     formatstring = '{:0%ib}' % bits
...     return formatstring.format(value)
... 
>>> # hash 0
... hash('0')
8677678239253310006
>>> # binary value
... to_twoscomplement(64, hash('0'))
'0111100001101101010011100110000111011101110101001010101000110110'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('0'))[-3:]
'110'
>>> # slot value 
... hash('0') & 0x7
6
>>> 
>>> # hash 1
... hash('1')
-8884389653283825069
>>> # binary value
... to_twoscomplement(64, hash('1'))
'1000010010110100010011101010111010001101110111001001101001010011'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('1'))[-3:]
'011'
>>> # slot value 
... hash('1') & 0x7
3
>>> 
>>> # hash 2
... hash('2')
-1672787438145109052
>>> # binary value
... to_twoscomplement(64, hash('2'))
'1110100011001001000100001011000000010100101000000000001111000100'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('2'))[-3:]
'100'
>>> # slot value 
... hash('2') & 0x7
4

# Above hash values predict a dictionary will print the keys in order: '1', '2', '0':
>>> d1 = {'0': 'a', '1': 'b', '2': 'c'}
>>> d1
{'1': 'b', '2': 'c', '0': 'a'}

Note: when two hash values end up the same, the algorithm changes for these "collisions". A probing algorithm is used to find an open slot. The more collisions a dictionary has, the slower the lookups become.

References used in this answer:

Steven Parker
Steven Parker
229,785 Points

It's nice to see that my original answer regarding order has been confirmed, but we still don't know why the order is different in successive runs.

Thanks to the detailed explanation of hashing, we can now say that my original guess that the internal hashing algorithm uses intentionally randomized, time-based, or uninitialized values applies specifically to the "secret keys".

Josh Keenan
Josh Keenan
19,652 Points

Lies, you have it all on recall!

Chris Freeman
Chris Freeman
Treehouse Moderator 68,423 Points

I would say we do know why the order changes (the use of a process dependent hash function gives the slot number), but we may not know explicitly how (since the secret key part of the equation is buried in the CPython core code. See pyhash.h file for some details.

I haven't see an explicit list of reasons of why the hash() values change between executions. One notion I read is that some web site inputs are stored in dicts. if the hash values were deterministic then a Denial of Service attack may be caused by constructing dict key values that overload a dictionary with many hash value collisions causing poor performance. By adding a non-deterministic element to the hash value calculation it would be difficult to purposely create a dict with a high number of hash value collisions.

In short, the Python philosophy is not to worry about the hash algorithm, but trust it gives sufficiently efficient low collision values and handles collisions with low overhead.

Steven Parker
Steven Parker
229,785 Points

Standard Python dictionaries do not have an order. A for..in loop will traverse all elements, but not in any specific order.

:point_right: If you care about the order, you could use an OrderedDict, or just sort the original, like this:

  for key in sorted(my_dict):

Now as to why the unspecified order is actually different in successive runs, that is truly a mystery! But it's not sequential, I tried it several times and occasionally got the same order twice in a row. I know the effective order is supposed to result from the dictionary's history of insertions and deletions. But since it's always created with the same components, I can only guess that the internal hashing algorithm uses a randomized, time-based, or uninitialized (not good programming!) value at some point.

Josh Keenan
Josh Keenan
19,652 Points

Tagging the mystical Python genius Chris Freeman to see if he can answer this question.

willyraider
willyraider
6,550 Points

Thank you Steven Parker!

I've testet lists again dictionaries. It seems that dictionaries do not have an incremental order.

  • the output of dict_num with the code given below is always different (2,1,0) or (1,0,2)
  • the ouput of list_num and _list string is always the same (0,1,2)
dict_num = {'0', '1', '2'}
list_num = [0, 1, 2]
list_string = ['0', '1', '2']

def test():
  for x in dict_num:
    print(x)
  for x in list_num:
    print(x)
  for x in list_string:
    print(x)
test()
Chris Freeman
Chris Freeman
Treehouse Moderator 68,423 Points

The line dict_num = {'0', '1', '2'} defines a set not a dict.

A dict needs key / value pairs: dict_num = {'0': 'a', '1':'b', '2':'c'}