Ad

Why I Am Getting Empty List When I Use Split()?

- 1 answer

I have a textfile as:

-- Generated ]
FILEUNIT
  METRIC /

Hello
-- timestep: Jan 01,2017 00:00:00
  3*2 344.0392 343.4564 343.7741
  343.9302 343.3884 343.7685 0.0000 341.0843
  342.2441 342.5899 343.0728 343.4850 342.8882
  342.0056 342.0564 341.9619 341.8840 342.0447 /

I have written a code to read the file and remove the words, characters and empty lines, and do some other processes on that and finally return those numbers in the last four lines. I cannot understand how to put all the numbers of the text file properly in a list. Right now the new_line generates a string of those lines with numbers

import string

def expand(chunk):
    l = chunk.split("*")
    chunk = [str(float(l[1]))] * int(l[0])

    return chunk

with open('old_textfile.txt', 'r') as infile1:
    for line in infile1:
        if set(string.ascii_letters.replace("e","")) & set(line):
            continue

        chunks = line.split(" ")
        #Get rid of newlines
        chunks = list(map(lambda chunk: chunk.strip(), chunks))
        if "/" in chunks:
            chunks.remove("/")

        new_chunks = []
        for i in range(len(chunks)):
            if '*' in chunks[i]:
                new_chunks += expand(chunks[i])
            else:
                new_chunks.append(chunks[i])
        new_chunks[len(new_chunks)-1] = new_chunks[len(new_chunks)-1]+"\n"
        new_line = " ".join(new_chunks)

when I use the

A = new_line.split()
B = list(map(float, A))

it returns an empty list. Do you have any idea how I can put all these numbers in one single list? currently, I am writing the new_line as a textfile and reading it again, but it increase my runtime which is not good.

f = open('new_textfile.txt').read()
A = f.split()
B = list(map(float, A))
list_1.extend(B)

There was another solution to use Regex, but it deletes 3*2. I want to process that as 2 2 2

import re

with open('old_textfile.txt', 'r') as infile1:
    lines = infile1.read()

nums = re.findall(r'\d+\.\d+', lines)
print(nums)
Ad

Answer

I'm not quite sure if I entirely understand what you are trying to do, but as I understand it you want to extract all numbers which either are in a decimal form \d+\.\d+ or an integer which is multiplied by another integer using an asterisk, so \d+\*\d+. You want the results all in a list of floats where the decimals are in the list directly and for the integers the second is repeated by the first.

One way to do this would be:

lines = """
-- Generated ]
FILEUNIT
  METRIC /

Hello
-- timestep: Jan 01,2017 00:00:00
  3*2 344.0392 343.4564 343.7741
  343.9302 343.3884 343.7685 0.0000 341.0843
  342.2441 342.5899 343.0728 343.4850 342.8882
  342.0056 342.0564 341.9619 341.8840 342.0447 /
"""

nums = []
for n in re.findall(r'(\d+\.\d+|\d+\*\d+)', lines):
    split_by_ast = n.split("*")
    if len(split_by_ast) == 1:
        nums += [float(split_by_ast[0])]
    else:
        nums += [float(split_by_ast[1])] * int(split_by_ast[0])

print(nums)

Which returns:

[2.0, 2.0, 2.0, 344.0392, 343.4564, 343.7741, 343.9302, 343.3884, 343.7685, 0.0, 341.0843, 342.2441, 342.5899, 343.0728, 343.485, 342.8882, 342.0056, 342.0564, 341.9619, 341.884, 342.0447]

The regular expression searches for numbers matching one of the formats (decimal or int*int). Then in case of a decimal it is directly appended to the list, in case of int*int it is parsed to a smaller list repeating the second int by first int times, then the lists are concatenated.

Ad
source: stackoverflow.com
Ad