Tag Archive for load file

2: Regular Expressions

The query browser that R&D set me up with was a Godsend.  I had no idea SQL was such a logical language; it is pretty much plain English.

Of course we always want more so I began asking programmers to build more tables for me to query.  Senior salespeople (and a very important client) urgently needed better transparency on structured products like CMBS, ABS and CDOs.  I asked R&D to help me figure out how much relevant content was in the source documents and once again they gave me the best help I could hope for.  “It will be easier for you to learn regular expressions than to wait for us to do it for you.”

RegEx is simple pattern matching.  Hit <control> <f> on any high-end text editor and it will give you the choice of “finding” a string of text or one that matches a regular expression pattern.  \s[A-Za-z]{5}\s is an expression that will find any five-letter word.   Starting the expression with a ^ will only match the pattern at the beginning of a line and finishing it with a $ will only match it at the end of a line.  The text editor will instantly match strings to the expression so fine-tuning your RegEx can be done very quickly.

Within a couple of hours the expression I wrote to match structured product quotes confirmed that there were a boat-load of these quotes embedded in text.

xkcdregex

Text editors let you search & replace text with expressions and vice versa. This makes it possible to convert search results into delimited formats that can be reliably loaded into a database via a simple command like “LOAD DATA INFILE “filename.txt” INTO TABLE “tablename” FIELDS DELIMITED BY “,”.  Thats all you need to load a comma delimited document into a database.