Ad

How To Prevent Writing Into Txt File The Same Words Using Open(text.txt,a)?

- 1 answer

I have a question regarding appending to text file. I have written a script and what this script does is that it will read the URL in JSON format and extract the list of titles and write into the file "WordsInCategory.text".

As this code will be used in a loop thus I used f1 = open('WordsInCategory.text', 'a').

But I encountered a problem, that is it will add in already existing title into the file.

I am having trouble coming out with a solution to solve this problem and using 'w' will overwrite what it is written.

My code is as follows:

import urllib2
import json


url1 ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtype=page&cmtitle=Category:Geography&cmlimit=100'

json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)

f1 = open('WordsInCategory.text', 'a')

for item in data1['query']: 
    for i in data1['query']['categorymembers']:
        f1.write((i['title']).encode('utf8')+"\n")  

Please advice on how I should modify my code.

Thank you.

Ad

Answer

I would suggest saving every title in an array, before writing to a file (and hence writing only once to the given file). You can modify your code this way :

import urllib2
import json

data = []

f1 = open('WordsInCategory.text', 'w')

url1 ='https://en.wikipedia.org/w/api.php?\
action=query&format=json&list=categorymembers\
&cmtype=page&cmtitle=Category:Geography&cmlimit=100'

json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)

for item in data1['query']: 
    for i in data1['query']['categorymembers']:
        data.append(i['title'].encode('utf8')+"\n")

# Do additional requests, and append the new titles to the data array

f1.write(''.join(set(data)))

f1.close()

set allows me to delete any duplicate entry.

If keeping the titles in memory is a problem, you can check if the title already exists before writing it to the file, but it may be awfully time consuming :

import urllib2
import json

data = []

url1 ='https://en.wikipedia.org/w/api.php?\
action=query&format=json&list=categorymembers\
&cmtype=page&cmtitle=Category:Geography&cmlimit=100'

json_obj = urllib2.urlopen(url1)
data1 = json.load(json_obj)

for item in data1['query']: 
    for i in data1['query']['categorymembers']:
        title = (i['title'].encode('utf8')+"\n")

        with open('WordsInCategory.text', 'r') as title_check:
            if title not in title_check:
                data.append(title)

with open('WordsInCategory.text', 'a') as f1:
    f1.write(''.join(set(data)))

# Handle additional requests

Hope it'll be helpful.

Ad
source: stackoverflow.com
Ad