How To Check Set Of Files Conform To A Naming Scheme

- 1 answer

I have a bunch of files (TV episodes, although that is fairly arbitrary) that I want to check match a specific naming/organisation scheme..

Currently: I have three arrays of regex, one for valid filenames, one for files missing an episode name, and one for valid paths.

Then, I loop though each valid-filename regex, if it matches, append it to a "valid" dict, if not, do the same with the missing-ep-name regexs, if it matches this I append it to an "invalid" dict with an error code (2:'missing epsiode name'), if it matches neither, it gets added to invalid with the 'malformed name' error code.

The current code can be found here

I want to add a rule that checks for the presence of a folder.jpg file in each directory, but to add this would make the code substantially more messy in it's current state..

How could I write this system in a more expandable way?

The rules it needs to check would be..

  • File is in the format Show Name - [01x23] - Episode Name.avi or Show Name - [01xSpecial02] - Special Name.avi or Show Name - [01xExtra01] - Extra Name.avi
  • If filename is in the format Show Name - [01x23].avi display it a 'missing episode name' section of the output
  • The path should be in the format Show Name/season 2/the_file.avi (where season 2 should be the correct season number in the filename)
  • each Show Name/season 1/ folder should contain "folder.jpg"

.any ideas? While I'm trying to check TV episodes, this concept/code should be able to apply to many things..

The only thought I had was a list of dicts in the format:

checker = [
    'name':'valid files',
    'function':check_valid(), # runs check_valid() on all files
    'status':0 # if it returns True, this is the status the file gets


I want to add a rule that checks for the presence of a folder.jpg file in each directory, but to add this would make the code substantially more messy in it's current state..

This doesn't look bad. In fact your current code does it very nicely, and Sven mentioned a good way to do it as well:

  1. Get a list of all the files
  2. Check for "required" files

You would just have have add to your dictionary a list of required files:

checker = {
  'required': ['file', 'list', 'for_required']

As far as there being a better/extensible way to do this? I am not exactly sure. I could only really think of a way to possibly drop the "multiple" regular expressions and build off of Sven's idea for using a delimiter. So my strategy would be defining a dictionary as follows (and I'm sorry I don't know Python syntax and I'm a tad to lazy to look it up but it should make sense. The /regex/ is shorthand for a regex):

check_dict = {
  'delim'    : /\-/,
  'parts'    : [ 'Show Name', 'Episode Name', 'Episode Number' ],
  'patterns' : [/valid name/, /valid episode name/, /valid number/ ],
  'required' : ['list', 'of', 'files'],
  'ignored'  : ['.*', 'hidden.txt'],
  'start_dir': '/path/to/dir/to/test/'
  1. Split the filename based on the delimiter.
  2. Check each of the parts.

Because its an ordered list you can determine what parts are missing and if a section doesn't match any pattern it is malformed. Here the parts and patterns have a 1 to 1 ratio. Two arrays instead of a dictionary enforces the order.

Ignored and required files can be listed. The . and .. files should probably be ignored automatically. The user should be allowed to input "globs" which can be shell expanded. I'm thinking here of svn:ignore properties, but globbing is natural for listing files.

Here start_dir would be default to the current directory but if you wanted a single file to run automated testing of a bunch of directories this would be useful.

The real loose end here is the path template and along the same lines what path is required for "valid files". I really couldn't come up with a solid idea without writing one large regular expression and taking groups from it... to build a template. It felt a lot like writing a TextMate language grammar. But that starts to stray on the ease of use. The real problem was that the path template was not composed of parts, which makes sense but adds complexity.

Is this strategy in tune with what you were thinking of?