Text Funnels

Sometimes you may have an item of text where you only need a portion of the text and want to strip off the beginning and or the end of the text. In addition to the functions in the previous section, Panorama has a special tool for stripping off the ends of a text item. This tool is called a text funnel. Text funnels are powerful tools, however, many users find them a bit difficult to figure out. In recent years we’ve added many functions that can perform most of the operations that a text tool can perform. Before deciding to use a text funnel you may want to check out Text Formulas, Text Arrays and Tag Parsing.

A text funnel is used a bit differently than other Panorama functions and operators. The text funnel always follows the text item that is being “stripped.” In a sense a text funnel has three parameters, the text item, start, and end. But as you can see below, these parameters are arranged quite differently than they are for other functions:

<text item>[<start>,<end>]

The first parameter, text item, is the item of text which will be stripped to get the final result. This may be a field, a variable, or an entire formula (as long as it produces a text item as its final result). If you use an entire formula you should put parentheses around the formula.

The second parameter, start, specifies the first character you want to include in the final output. For example if you want to strip off the first three characters the start should be 4 (because the 4th character is the first one we want to keep). If the starting position is past the end of the text all the text will be stripped out and the formula is left with an empty text item.

The third parameter, end, specifies the last character you want to include. For example, if you want to strip off everything after the 12th character, the end should be 12. If the ending position is before the starting position, all the text will be stripped and the formula is left with an empty text item.

The real trick in setting up text funnels is deciding what the start and end parameters should be. The following sections will describe several techniques for setting up these parameters.

Numeric Start and End Positions

The simplest way to specify starting and ending positions is with a number. Positive numbers are counted from the beginning of the original text item (1 is the first character in the original text item). Negative numbers are counted from the last character of the original text item (-1 is the last character).

Our first example removes the first character from the Notes field.

Notes[2,-1]

The next example does the exact opposite–it removes the last character from the Notes field.

Notes[1,-2]

By using the same number for the start and end a text funnel can strip out a single character. The procedure below uses the text funnel [1,1] to check to see if the first character of the phone number is a (. If so, it uses another text funnel to strip out the area code.

if Phone[1,1]="("
    AreaCode=Phone[2,4]
endif

A procedure can use a variable to pre-load the start and end positions. The procedure below will strip out everything starting with the phrase Private Notes Below:.

local X
X=search(Notes,"Private Notes Below:")
if X<>0
    PublicNotes=Notes[1,X-1]
else
    PublicNotes=Notes
endif

Specifying Numeric Length Instead of Position

An alternate form of text funnel allows you to specify the length of the text to be stripped out, instead of the ending position. This alternate form simply uses a semicolon instead of a comma:

<text item>[<start>;<length>]

The length specifies the number of characters from the starting position. A positive length means that the stripped text begins at the starting position and extends to the right. A negative length means that the stripped text begins at the starting position and extends to the left. The character at the starting position is always included (unless the length is zero).

Let’s look at two examples of this technique. The first extracts the area code from a long distance phone number.

Phone[2;3]

The next example strips out the local phone number (the last 8 characters).

Phone[-1;-8]

If the original text item is too short to fulfill the request, the text funnel will take whatever it can get. For example, if the phone number is only 3 characters long, the value of the local phone number will be 3 characters long.

Start/End Positions by Character Matching

The previous section described how to strip out text by absolute numeric position within the original text (for example from character 3 to character 8). Another technique is to specify not the absolute position, but the character value where stripping should begin and/or end. For example instead of telling the text funnel to strip off everything before position 5, you tell the funnel to strip off everything before the first $ character, or everything after the last % character. The text funnel scans the original text looking for a matching character, and then strips the text accordingly.

To specify a starting or ending position by character matching, simply supply a character instead of a number. For example, suppose you had a field named Line that contained data like this:

X2245A Tape Cartridge $22.95

To extract just the price from Line you could use this text funnel.

Line["$",-1]

This formula will take Line and strip off everything in front of the first dollar sign. In our example this will be the value $22.95. If there is no dollar sign in Line then the result will be empty text (“”).

Notice that the output of a text funnel is always a text item, not an actual number. If you wanted to convert this to a number you would have to remove the dollar sign with an additional text funnel and use the val( function.

A text funnel can use a character value for either the starting or ending position, or both. Here is an example that extracts the hour from the time by stripping off everything after the first colon:

Time[1,":"]

Both of these examples are developed further in the next section.

Cascading Text Funnels

The examples in the previous section both have a problem: they don’t strip off enough text. The first example strips off the price but leaves the dollar sign (for example $45.67). The second example strips the hour from the time but leaves the colon (for example 9:). These problems can be solved by using two text funnels in a row. Adding a second text funnel is easy–just enter it after the first funnel. This example strips off the $ symbol from the beginning of the price using a regular numeric position text funnel.

Line["$",-1][2,-1]

The table below shows how some typical data would be processed by this formula.

Original Data	After `["$",-1]`	After `[2,-1]`
`X2245A Tape Cartridge $22.95`	`$22.95`	`22.95`
`AF8899 Data Casette $7.80`	`$7.80`	`7.80`
`XB3 Head Cleaner $19.50`	`$19.50`	`19.50`

This example strips off the hour from the time–including the pesky extra colon.

Time[1,":"][1,-2]

Once again, the table shows how the data is processed by each text funnel.

Original Data	After `[1,":"]`	After `[1,-2]`
`9:42 AM`	`9:`	`9`
`3:07:12 PM`	3:	`3`
`11:23 AM`	`11:`	`11`

These examples show two text funnels cascaded together, but there is no limit to the number of text funnels you can use in a row. Each funnel chops away at the text until you have just the text you want. Usually the best approach to developing a series of cascaded funnels is to develop one funnel at a time. Make sure one funnel really does what you want and expect it to before adding the next one.

Character Matching in Reverse Gear

If the character to be matched is preceded by a minus sign, the text funnel will match the last instance of the value in the original text instead of matching the first.

The example below strips out the year from an appointment. The formula assumes that there is a date in the format mm/dd/yy somewhere in the text item. The funnel will attempt to match up the last / symbol in the original text.

Year="19"+Appointment["-/",-1][2;2]

Here is how the data is processed.

Original Data	After `["-/",-1]`	After `[2;2]`
`Lunch with Bob 3/4/01`	`/01`	`01`
`call Joan 4/2/99 3PM`	`/99 2PM`	`99`
`10/7 L's Birthday`	`/7 L's Birthday`	`7`
`call Ted`	`call Ted`	`al`

The last two lines above shows the hazards of making faulty assumptions. Neither line contains a valid mm/ dd/yy date. The result in this case is a bogus year. Unfortunately there is no magic pill fix for this kind of problem. As a programmer you must think of, check for and process every possible option. If you absolutely know that there will be a date in the text item, fine. If not, you’ll have to write a more complicated procedure to check for a properly formatted date before you strip out the year. Here’s an example of a more robust procedure.

if Appointment notmatch "*/*/*"
    message "Sorry, the appointment has no year!"
    stop
endif
Year="19"+Appointment["-/",-1][2;2]

This procedure could still be fooled–for example data containing two dates would trip it up. Designing a completely foolproof procedure is left as an exercise to the reader.

Stripping Out Individual Words

One of the most common needs is to strip out a single word at the beginning, middle or end of a text item. This is easily done by using a space as the matching character value. You’ll need to look at the formulas in this section very carefully. Don’t confuse a space (" ") with an empty text item (“”). They’re not the same thing. (For clarity, the samples below showing how the data is processed use ⚀ to show a space whenever it is at the beginning or end of a text item. For example, ⚀now means space followed by now.)

Here’s a formula that extracts the first word from an item of text by stripping off all the rest of the words.

Original[1," "][1,-2]

Here’s how this formula would process several sample text strings.

Original Data	After `[1," "]`	After `[1,-2]`
`Now is the time`	`Now⚀`	`Now`
`Boston reports 23 degrees`	`Boston⚀`	`Boston`
`Apple stock up 5 points`	`Apple⚀`	`Apple`

Note: This is a useful illustration, but in modern versions of Panorama a much simpler way to extract the first word is to use the firstword( or nthword( functions, like this:

firstword(Original)
nthword(Original,1)

This next formula does the exact opposite: it strips off the first word, leaving the rest of the words.

Original[" ",-1][2,-1]

Here is how this formula would process the same sample strings as before.

Original Data	After `[" ",-1]`	After `[2,-1]`
`Now is the time`	`⚀is the time`	`is the time`
`Boston reports 23 degrees`	`⚀reports 23 degrees`	`reports 23 degrees`
`Apple stock up 5 points`	`⚀stock up 5 points`	`stock up 5 points`

Note: Again, while this example illustrates how cascaded text funnels work, the moderrn after( function is a simpler way to achieve this result.

after(Original," ")

You can cascade these two text funnels to produce a formula that extracts the 2nd word from the original text, stripping off the rest.

Original[" ",-1][2,-1][1," "][1,-2]

Here is how this formula would process the same sample strings as before.

Original Data	After `[" ",-1]`	After `[2,-1]`	After `[1," "]`	After `[1,-2]`
`Now is the time`	`⚀is the time`	`is the time`	`is⚀`	`is`
`Boston reports 23 degrees`	`⚀reports 23 degrees`	`reports 23 degrees`	`reports⚀`	`reports`
`Apple stock up 5 points`	`⚀stock up 5 points`	`stock up 5 points`	`stock⚀`	`stock`

This process can be repeated indefinitely. However, a better approach is probably to use the array( function with space as a separator character, or to simply use the nthword( function.

Here’s a text funnel that extracts the last word from a text item, stripping off the earlier words (if any).

Original["- ",-1][2,-1]

Here’s how this formula would process several sample text strings.

Original Data	After `["- ",-1]`	After `[2,-1]`
`Now is the time`	`⚀time`	`time`
`Boston reports 23 degrees`	`⚀degrees`	`degrees`
`Apple stock up 5 points`	`⚀points`	`points`

A close examination will show that this is exactly the same as the first example but with an extra minus sign to specify the last space instead of the first space.

Note: Again, this illustrates text funnel operation, but we recommend that you use the lastword( function if you need to extract the last word, like this:

lastword(Original)

Multiple Matching Characters for Start/End Position

Sometimes you may need to use multiple character values to specify the starting or ending position of a text funnel. Any one of these character values will match up with the original text. For example, a sentence may end with a period, a question mark, or an exclamation mark. To use more than one matching character simply list each character separated by commas. Here is an example that extracts the first sentence from a letter. All the text after the first sentence is stripped off.

Letter[1,".,?,!"]

You can include a comma as one of the character values. This example extracts everything up to the first semicolon, comma, or colon. All the text after that point is stripped off.

Description[1,";,,,:"]

If you use alphabetic values, don’t forget that upper and lower case are separate values, even for the same letter. This example extracts am or pm from a time expressed as text (eg, “10.37 pm”).

Appointment["a,p,A,P";2]

It’s also possible to specify a range of matching characters, for example 0 through 9 or A through Z. To specify a range the starting and ending characters must be separated by a dash, for example “0–9”. The range will include the two characters and all characters between them (using Unicode order).

Here is an example that extracts the frequency from a radio station. The call letters are stripped off.

Station["0-9",-1]

Here’s how this formula would process several sample text strings.

Original Data	After `["0-9",-1]`
`KFI 640 AM`	`640 AM`
`KLSX 97.1 FM`	`97.1 FM`
`KFAC 105.1 FM`	`105.1 FM`
`KROQ 106.7 FM`	`106.7 FM`

A text funnel can combine multiple character ranges, or combine a range with one or more separate character values. The next example strips off everything before the first number, or before the first dollar sign (whichever comes first).

Line["0-9,$",-1]

Here’s how this formula would process several sample text strings.

Original Data	After `["0-9,$",-1]`
`Tape Cartridge $22.95`	`$22.95`
`Data Cassette 7.80`	`7.80`
`XB3 Cleaner $19.50`	`3 Cleaner $19.50`

The last line shows a possible pitfall of this text funnel. Text funnels rely on consistent patterns in the data. If there isn’t a pattern you can identify accurately, you won’t be able to design a funnel to strip the text apart reliably. In this case a more reliable pattern would be to notice that the price is always the last word of Line, so the text funnel below will strip off the price reliably.

Line["- ",-1][2,-1]

Be sure to test your text funnels with a wide variety of sample data to make sure you have identified a consistent pattern.

As mentioned in the previous section, putting a minus sign in front of the character value tells the text funnel to find the last matching character, instead of the first. This works for character ranges too. This example extracts the item name from Line by stripping off everything after the last letter.

Line[1,"-A-Z,a-z"]

Here’s how this formula would process our sample text strings.

Original Data	After `[1,"-A-Z,a-z"]`
`Tape Cartridge $22.95`	`Tape Cartridge`
`Data Cassette 7.80`	`Data Casette`
`XB3 Cleaner $19.50`	`XB3 Cleaner`

Although this example has two ranges (A-Z or a-z) only one minus sign is needed (at the beginning). If the first character is a minus sign, the text funnel will always look for the last matching character in the original text.

Non-Matching Character for Start/End Position

The previous examples have all used one or more characters that must match a character in the original text item. By using the ≠ symbol you can specify that the text funnel should begin (or end) with the first character (or characters) that does not match. For example, you might want to match with the first character that is not a number, or the last character that is not a space.

An example should make this clearer. Suppose you have imported some numbers that have one or more asterisks in front of them, and you want to strip off the asterisks. The text funnel in this formula will set the starting position to the first character in the original text that is not an asterisk.

Imported["≠*",-1]

Here’s how this formula would process some sample text strings.

Original Data	After `["≠*",-1]`
`****23.67`	`23.67`
`***782.12`	`782.12`
`*****2.98`	`2.98`

You can use this feature to strip off leading spaces.

Name["≠ ",-1]

Here’s how this formula would process some sample text strings (leading spaces are shown as ⚀ for clarity).

Original Data	After `["≠ ",-1]`
`⚀⚀Jeff Nance`	`Jeff Nance`
`⚀Williams`	`Williams`

Note: An easier way to strip leading and trailing spaces is to use the strip( function, which is designed for that purpose.

The example below specifies that the starting position should be the first character that is not a letter, not a comma, and not a space. It extracts the zip code or Canadian postal code from an address.

CityStateZip["≠A-Z,a-z,,, ",-1]

Here’s how this formula would process some typical addresses.

Original Data	After `["≠A-Z,a-z,,, ",-1]`
`Fullerton, CA 92831`	`92831`
`Kamloops, BC 3J2 X7G`	`3J2 X7G`

The astute reader may have realized that a simpler text funnel can do the same job, ["0-9",-1]. Of course it would not have illustrated the ≠ feature. The moral of the story is: watch out for college solutions when a grade school solution may work just as well!

A text funnel that uses the ≠ symbol can also work in reverse gear, so that it specifies the last character that does not match, instead of the first. The ≠ symbol must be first, and then the - symbol. For example, here is yet another formula for extracting the price from Line.

Line["≠-0-9,.",-1][2,-1]

Here’s how this formula would process some of our favorite sample text strings.

Original Data	After `["≠-0-9,.",-1][2,-1]`
`Tape Cartridge $22.95`	`22.95`
`Data Cassette 7.80`	`7.80`
`XB3 Cleaner $19.50`	`19.50`

Unlike some of our previous examples, this formula does not rely on a $ symbol or a space in front of the price, and it does not choke if there is a number in the item description.

Limitations of Text Funnels

Unlike Humpty Dumpty, text items are easy to put together but hard to take apart intelligently. Text funnels are a powerful tool, but they do have limitations. One limitation is that, by themselves, they can only work with one character at a time. If you want to start stripping text with the word fax or P.O. Box, a text funnel can’t do it on its own. You’ll have to combine the funnel with the search( function for jobs like this.

The most important limitation of text funnels is that they cannot work reliably if there is not a single consistent pattern in the data. If the data has no pattern at all, you’re out of luck (short of re-keying data). If the data has two or more patterns you’ll need to isolate each pattern and process each one with a separate text funnel. One way to do this is with the ?( function. This formula extracts the local phone number from a complete phone number. If the complete phone number starts with (, the formula uses a text funnel that strips out the area code, otherwise the local number starts with the first character.

?( Phone[1,1]="(" , Phone[7;8] , Phone[1;8] )

Here’s how this formula would process some of our favorite sample text strings.

Original Data	After `?(Phone[1,1]="(",Phone[7;8],Phone[1;8])`
`(714) 555-1212`	`555-1212`
`852-9632`	`852-9632`
`(562) 492-1438 ext 23`	`492-1438`

Don’t be afraid to combine text funnels with other functions and statements. Some functions that are often useful with text funnels include ?(, length(, strip(, stripchar(, search(, replace(, and array(.