Ad

In Python How To Count How Many Times Certain Words Appear Without Specifying The Word

- 1 answer

lets say i have the following text file. Let's say each color name is an account name and i want to know how many person are under it. all the account names come after a "/" or a "-". There are 3 accounts in the file I shared. It's the first word comes after "Color: ". So there are 3 accounts here. red, blue, and black. So, red/test/base, red-img-tests, red-zero-tests, and red-replication-tests are all part of account "red". And then I have to finally say how many of the person are there under red. So here it's red : 4.

---------------------------------
Color: red/test/base
  person: latest
---------------------------------
Color: red-img-tests
  person: latest
---------------------------------
Color: red-zero-tests
  person: latest
---------------------------------
Color: red-replication-tests
  person: latest
---------------------------------
Color: blue
  person: latest
---------------------------------
Color: black/red-config-img
  person: 7e778bb
  person: 82307b2
  person: 8731770
  person: 7777aae
  person: 081178e
  person: c01ba8a
  person: 881b1ad
  person: d2fb1d7
---------------------------------
Color: black/pasta
  person: latest
---------------------------------
Color: black/base-img
  person: 0271332
  person: 70da077
  person: 3700c07
  person: c2f70ff
  person: 0210138
  person: 083af8d

  person: latest
---------------------------------
Color: black/food-pasta-8.0
  person: latest

my output will be:

    red: 4
    blue: 1
    black: 17

I have thousands of line so as you can see, i can't really specify the words like 'red' or 'blue'... it has to somehow read each of them and see if they are the same as the following line.

for now i am doing the following to get the account names out.

import re
for line in f.readlines():#gives array of lines
    acc_name = re.split('; |, |\/|\-|\:', line)[1].strip()
Ad

Answer

I have a solution using Counter for you:

import collections

data = """
---------------------------------
Color: red/test/base
  person: latest
---------------------------------
Color: red-img-tests
  person: latest
---------------------------------
Color: red-zero-tests
  person: latest
---------------------------------
Color: red-replication-tests
  person: latest
---------------------------------
Color: blue
  person: latest
---------------------------------
Color: black/red-config-img
  person: 7e778bb
  person: 82307b2
  person: 8731770
  person: 7777aae
  person: 081178e
  person: c01ba8a
  person: 881b1ad
  person: d2fb1d7
---------------------------------
Color: black/pasta
  person: latest
---------------------------------
Color: black/base-img
  person: 0271332
  person: 70da077
  person: 3700c07
  person: c2f70ff
  person: 0210138
  person: 083af8d
  """

print (data)
colors = ["black", "red", "blue"]
final_count = []
for line in data.split("\n"):
    for color in colors:
        if color in line:
            final_count.append(color)
            #break # Uncomment this break if you don't want to count
            # two colors in the same line
final_count = collections.Counter(final_count)
print(final_count)

Output

Counter({'blue': 1, 'black': 3, 'red': 5})

Here's the link to Python official documentation and a quick reference:

This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

Ad
source: stackoverflow.com
Ad