How Do I Search Directories And Find Files That Match Regex?
Solution 1:
import os
import re
rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
for root, dirs, files inos.walk(rootdir):
for file in files:
if regex.match(file):
print(file)
CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT
That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson
import os
import re
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'for root, dirs, files inos.walk("../Documents"):
for file in files:
res = re.match(rx, file)
if res:
if res.group(1):
print("ZIP",file)
if res.group(2):
print("RAR",file)
if res.group(3):
print("R01",file)
It might be possible to do this in a nicer way, but this works.
Solution 2:
Given that you are a beginner, I would recommend using glob
in place of a quickly written file-walking-regex matcher.
Snippets of functions using glob
and a file-walking-regex matcher
The below snippet contains two file-regex searching functions (one using glob
and the other using a custom file-walking-regex matcher). The snippet also contains a "stopwatch" function to time the two functions.
import os
import sys
from datetime import timedelta
from timeit import time
import os
import re
import glob
defstopwatch(method):
deftimed(*args, **kw):
ts = time.perf_counter()
result = method(*args, **kw)
te = time.perf_counter()
duration = timedelta(seconds=te - ts)
print(f"{method.__name__}: {duration}")
return result
return timed
@stopwatchdefget_filepaths_with_oswalk(root_path: str, file_regex: str):
files_paths = []
pattern = re.compile(file_regex)
for root, directories, files in os.walk(root_path):
for file in files:
if pattern.match(file):
files_paths.append(os.path.join(root, file))
return files_paths
@stopwatchdefget_filepaths_with_glob(root_path: str, file_regex: str):
return glob.glob(os.path.join(root_path, file_regex))
Comparing runtimes of the above functions
On using the above two functions to find 5076 files matching the regex filename_*.csv
in a dir called root_path
(containing 66,948 files):
>>> glob_files = get_filepaths_with_glob(root_path, 'filename_*.csv')
get_filepaths_with_glob: 0:00:00.176400>>> oswalk_files = get_filepaths_with_oswalk(root_path,'filename_(.*).csv')
get_filepaths_with_oswalk: 0:03:29.385379
The glob
method is much faster and the code for it is shorter.
For your case
For your case, you can probably use something like the following to get your *.zip
,*.rar
and *.r01
files:
files = []
for ext in ['*.zip', '*.rar', '*.r01']:
files += get_filepaths_with_glob(root_path, ext)
Solution 3:
Here's an alternative using glob
.
from pathlib import Path
rootdir = "/mnt/externa/Torrents/completed"for extension in'zip rar r01'.split():
for path in Path(rootdir).glob('*.' + extension):
print("match: " + path)
Solution 4:
I would do it this way:
import re
from pathlib import Path
defglob_re(path, regex="", glob_mask="**/*", inverse=False):
p = Path(path)
if inverse:
res = [str(f) for f in p.glob(glob_mask) ifnot re.search(regex, str(f))]
else:
res = [str(f) for f in p.glob(glob_mask) if re.search(regex, str(f))]
return res
NOTE: per default it will recursively scan all subdirectories. If you want to scan only the current directory then you should explicitly specify glob_mask="*"
Post a Comment for "How Do I Search Directories And Find Files That Match Regex?"