[kepler-users] string splitter

Christopher Brooks cxh at eecs.berkeley.edu
Thu Jun 10 14:13:36 PDT 2010


Hi Madhu,
Thanks for checking.  What about StreamTokenizer? I'm not sure it it will do
anything different.

See also
http://www.velocityreviews.com/forums/t136007-stringtokenizer-ignores-tokens-without-content.html
which says:

> Ues the constructor StringTokenizer(String, delims, true) to have it
> return delimiters, and count those.

See also http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4086845
which suggests
java.util.Scanner

_C


On 6/10/10 1:34 PM, Madhusudan Gujral wrote:
> Hi Chris,
>
>    StringTokenizer will simply eliminate all the 'No space'. I tried, it does not work.
>
> --Madhu
> ________________________________________
> From: Corinna Gries [cgries at wisc.edu]
> Sent: Thursday, June 10, 2010 1:25 PM
> To: Christopher Brooks
> Cc: Madhusudan Gujral; Kepler User
> Subject: Re: [kepler-users] string splitter
>
> Hi Christopher,
>
> thanks for clarifying.
>
> Corinna
>
> Christopher Brooks wrote:
>> Hi Corinna,
>> This seems like a design problem in the underlying Java library:
>>
>> kepler/actors/src/org/resurgence/actor/StringSplitter.java
>> uses java.util.String.split()
>>
>> http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#split%28java.lang.String%29
>>
>> says
>>>   This method works as if by invoking the two-argument split  method
>>> with the given expression
>>>   and a limit argument of zero. Trailing empty strings are therefore
>> not included in the resulting array.
>>>
>>> The string "boo:and:foo", for example, yields the following results
>>> with these expressions:
>>>
>>>      Regex     Result
>>>      :     { "boo", "and", "foo" }
>>>      o     { "b", "", ":and:f" }
>>
>> I added a comment to StringSplitter about this limitation.
>>
>> Probably what we need in this case is a StringTokenizer actor.
>>
>> _Christopher
>>
>>
>> On 6/10/10 12:20 PM, Corinna Gries wrote:
>>> Hi Madhu,
>>>
>>> when I run the attached workflow the output is this:
>>>
>>> {"6/19/2009", "CEW", "", "6/10/2009", "6", "1", "1000", "52.5", "31",
>>> "1", "1", "1.017410714", "0.625589286", "1.3125"}
>>>
>>> and it is missing the empty string in position 14. I think you may have
>>> had a space after the last comma, which made it work just fine.
>>>
>>> Corinna
>>>
>>> Madhusudan Gujral wrote:
>>>> Hi Corinna,
>>>>
>>>> I passed the splitter results to dsplay actor. What I observe if
>>>> following
>>>> {"6/19/2009", "CEW", "", "6/10/2009", "6", "1", "1000", "52.5", "31",
>>>> "1", "1", "1.017410714", "0.625589286", "1.3125", "", " "}
>>>>
>>>> It has elements 14 'No Space' and 15 'Space' displayed correctly. When
>>>> I used Array Element actor to display the values for element 14, 15,
>>>> it does not complain, but there is nothing to show.
>>>> My guess is that the problem is related to post processing (processing
>>>> the empty tokens).
>>>>
>>>> Thanks
>>>> --Madhu
>>>>
>>>> ________________________________________
>>>> From: kepler-users-bounces at kepler-project.org
>>>> [kepler-users-bounces at kepler-project.org] On Behalf Of Corinna Gries
>>>> [cgries at wisc.edu]
>>>> Sent: Thursday, June 10, 2010 11:31 AM
>>>> To: Kepler User
>>>> Subject: [kepler-users] string splitter
>>>>
>>>> Hi again,
>>>>
>>>> when I am running this line of data:
>>>> 6/19/2009,CEW,,6/10/2009,6,1,1000,52.5,31,1,1,1.017410714,0.625589286,1.3125,,
>>>>
>>>>
>>>>
>>>> through the string splitter, splitting it on ',' it omits the last
>>>> empty
>>>> string, i.e. does not pass an empty string. Trying to read the array
>>>> element in position 14 throws an error rather than returning an empty
>>>> string, which is what I had expected.
>>>>
>>>> I can work around it by just adding something to end of the line but is
>>>> that an otherwise meaningful behavior?
>>>>
>>>> Corinna
>>>> _______________________________________________
>>>> Kepler-users mailing list
>>>> Kepler-users at kepler-project.org
>>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>>
>>>
>>> _______________________________________________
>>> Kepler-users mailing list
>>> Kepler-users at kepler-project.org
>>> http://mercury.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-users
>>

-- 
Christopher Brooks, PMP                       University of California
CHESS Executive Director                      US Mail: 337 Cory Hall
Programmer/Analyst CHESS/Ptolemy/Trust        Berkeley, CA 94720-1774
ph: 510.643.9841 fax:510.642.2718	      (Office: 545Q Cory)
home: (F-Tu) 707.665.0131 cell: 707.332.0670



More information about the Kepler-users mailing list