Recently a colleague of mine talked about how he was analyzing data from e-record. He wanted to be able to parse data from E-record in a specific way, yet was dependent on e-record and the build team for how this data was given. He knows how to program in visual basic and manipulate excel tables. I wanted to see if I could accomplish the same thing with python and was able to come up with a solution that involved using openpyxl and the re python modules.
Regex, short for regular expressions are super helpful for doing very complicated and specific pattern matching in strings.
def patternMatch(word: str):
# find first instance of newline followed by four digits
regex = r'\n\d\d\d\d'
pattern = re.compile(regex)
matches = pattern.finditer(word)
# Create list for the span locations and the actual split locations that we get from the spans
splits = []
split_locations = []
for match in matches:
splits.append(match.span()[0])
for i in range(len(splits)):
if len(splits) == 1:
split_locations.append((0, splits[i]))
split_locations.append((splits[i]+1, len(word)))
else:
if i == 0:
split_locations.append((0, splits[i]-1))
split_locations.append((splits[i]+1, splits[i+1]))
elif i == len(splits) - 1:
split_locations.append((splits[i]+1, len(word)))
else:
split_locations.append((splits[i]+1, splits[i+1]))
# split out out the times from the names
for splits in split_locations:
print(word[splits[0]: splits[1]].split('\n '))
# if there is no match from the regex search, then we can do a split on the value of the cell itself
if len(splits) == 0:
print(word.split('\n '))
You need to help me with fuzzy logic!
Lunch?