Regex

Recently a colleague of mine talked about how he was analyzing data from e-record. He wanted to be able to parse data from E-record in a specific way, yet was dependent on e-record and the build team for how this data was given. He knows how to program in visual basic and manipulate excel tables. I wanted to see if I could accomplish the same thing with python and was able to come up with a solution that involved using openpyxl and the re python modules.

Regex, short for regular expressions are super helpful for doing very complicated and specific pattern matching in strings.

def patternMatch(word: str):
    # find first instance of newline followed by four digits
    regex = r'\n\d\d\d\d'
    pattern = re.compile(regex)
    matches = pattern.finditer(word)

    # Create list for the span locations and the actual split locations that we get from the spans
    splits = []
    split_locations = []
    for match in matches:
        splits.append(match.span()[0])


    for i in range(len(splits)):
        if len(splits) == 1:
            split_locations.append((0, splits[i]))
            split_locations.append((splits[i]+1, len(word)))
        else:
            if i == 0:
                split_locations.append((0, splits[i]-1))
                split_locations.append((splits[i]+1, splits[i+1]))
            elif i == len(splits) - 1:
                split_locations.append((splits[i]+1, len(word)))
            else:
                split_locations.append((splits[i]+1, splits[i+1]))
        
    
    # split out out the times from the names
    for splits in split_locations:
        print(word[splits[0]: splits[1]].split('\n   '))

    # if there is no match from the regex search, then we can do a split on the value of the cell itself
    if len(splits) == 0:
        print(word.split('\n   '))

2 thoughts on “Regex”

Leave a Reply

Your email address will not be published. Required fields are marked *